Performance monitoring best practices

When you design a servers performance monitoring system there are several things that you will have to consider. Best practices when implementing such systems are:

  • Set up a monitoring configuration
  • Keep monitoring overhead low
  • Centralized place for monitoring
  • Analyze performance results and establish a performance baseline
  • Set alerts
  • Tune performance
  • Plan ahead

When setting up a monitoring system you have to consider what kind of system is “good enough” for you. You will have to decide if you go with an opensource monitoring system or if you go with a commercial system. Since i’m not a fan of commercial closed source system i will focus on opensource solutions:

  1. Nagios – Nagios is a powerful monitoring system that enables organizations to identify and resolve IT infrastructure problems before they affect critical business processes
  2. Cacti – Cacti is a complete network graphing solution designed to harness the power of RRDTool‘s data storage and graphing functionality. Cacti provides a fast poller, advanced graph templating, multiple data acquisition methods, and user management features out of the box.
  3. Munin – Munin the monitoring tool surveys all your computers and remembers what it saw. It presents all the information in graphs through a web interface. Its emphasis is on plug and play capabilities.
  4. You may also want to take a look at http://compari.tech/bandwidthmonitoring for some other useful bandwidth monitoring tools

Nagios has the advantage that it can be set up to send SMS alerts to predefined groups of users in case of alerts. Cacti has the advantage that you can evaluate in time how your systems performs and you can have a good idea of the trends. Munin can monitor certain aspects better than Cacti but is more invasive on the systems you install it on.

Next thing will be to keep the monitoring overhead low. This can be done by :

  1. Don’t query the servers too often.
  2. Monitoring system should be tun on a standalone server that does monitoring and nothing else.
  3. Archive unneeded data.
  4. Use asynchronous requests when possible

On previous point i said that monitoring system should run on a standalone server . This means exactly Centralized place for monitoring .

Ideally, all logs from different areas of monitoring should be stored in a centralized place where one UI can be used to analyze the data. Based on your user scenarios, consider identifying which teams to partner with, so log data can be viewed as a coherent whole. The reasons behind centralization are:

  1. You can easy implement a strict user control / user policy / procedures ( You will need it if you need  Sarbanes-Oxley compliant )
  2. Minimize the admin time. Imagine that you have 20 servers and each one with it’s own monitoring system.
  3. Giving access to some users on relevant graphs / logs is easy
  4. You can get an overview on the whole system

After you implemented the system and data starts to pile up you can do an analysis of performance results . This should be done as often as possible in order to identify trends and also to catch “exceptions”. For example at the end of each month servers that runs accounting will have increased load than on a normal day. If you do not pay attention you might find yourself in pretty delicate position when users requests more capacity or more processing power and according to trend it wasn’t necessary.

After getting a base line for the performance you can Set alerts for moments that systems behave out of the ordinary or for problems with the system. For example if a server uses 15G RAM out of 16G RAM you might want to be notified about that to schedule a downtime to add more RAM or to see what is going on with the applications running on that server.

Performance tunning is a delicate job and take an awful lot of time. Because a system can be optimized according to a scenario. If the data doesn’t fit in that scenario you might need to adjust servers parameters in order to adapt to the scenario. Databases, apache servers, kernel parameters can be tuned to suit your needs.

Also the baseline and graphs of the performance allows you to Plan ahead the evolution of your systems. For example you can predict with good accuracy when or if your will need to purchase new hardware or when you will need to upgrade your existing systems.

Adding Oracle support to PHP

If you want to connect to an Oracle database with PHP you will need the pecl module named oci8

First in order to compile it you will need Oracle Instant Client ( both basic & sdk ) . You can download them from here:

http://www.oracle.com/technology/software/tech/oci/instantclient/htdocs/linuxsoft.html

If you don’t have an Oracle account you will need to create one.

At the time of this post the following files are available: instantclient-basic-linux32-11.2.0.1.zip , instantclient-sdk-linux32-11.2.0.1.zip .

Create a directory in /opt  mkdir /opt/oracle/instantclient and copy those files there . Then unzip them . You will need to create a symbolic link ln -s libclntsh.so.11.1 libclntsh.so

Then you will need to install libaio if you don’t have it already . Don’t forget to add /opt/oracle/instantclient/instantclient_11_2 to /etc/ld.so.conf .

At this point you are ready to install the pecl extension to php.Create a temporary directory ( /tmp/1 ) and cd there.

mkdir /tmp/1

cd /tmp/1

pecl download oci8

tar xf oci8-1.3.5.tar

cd oci8-1.3.5

phpize

./configure –with-oci8=shared,instantclient,/opt/oracle/instantclient/instantclient_11_2

make

make install

edit /etc/php.ini and add

extension=oci8.so

And enjoy oracle extension for php. Note: if you have apache running restart it.


Working with svn

When you manage a project with svn there are a lot of things to consider: how you create the repository, what external resources will be imported in the repository. Basically  it’s a constant job on how to organize things better to keep developers happy.

After lots of try/fail cycles i came to the conclusion that for linux distribution the best approach would be a tree like this

server
-> bzip2
* trunk
* tags
* branches
-> atk
* trunk
* tags
* branches
-> tfm-filesystem32
* trunk
* tags
* branches
server64
-> bzip2
* trunk
* tags
* branches
-> atk
* trunk
* tags
* branches
-> tfm-filesystem64
* trunk
* tags
* branches

What is wrong with this structure? In time a developer will update bzip2 from server 32 tree but will forget or won’t have the time to update server64 tree. So the trees will not stay in sync. And this is bad because it might happen that the projects will have different sources.

What can be done? Use svn:external declarations. First idea was to create a common tree and create externals from both server and server64. But this way we will have 3 trees and lot of work to do. And it will become hard to manage. Another approach was needed. We decided that server is the main tree. In the server64 tree we declared bzip2 to be external and pulled from 32 bit tree.

This way if a developer commits a change will go in both trees simultaneously.

How to use svn: external? Let’s take for example bzip2 from server64

svn del bzip2; svn commit ; svn pedit svn:externals . ; svn up ; svn commit

and when editing svn external you should add a like like this:

bzip2 https://svn.tfm.ro/tfm/server/bzip2