Error logs are every developers saviour when it comes to finding out what has gone wrong with a system, but on systems which can generate thousands of log events every second, finding the number of errors in a given time frame can be a complete nightmare.

The most powerful tool for searching log data is of course the unix grep command which offers a way of quickly filtering out only the information you need and when used with a combination of sed and awk will give exactly the information you need.

In order to find out how many unique events happen in a given time frame we can chain these tools together  and use them to filter down an exact count of issues raised.

Lets say, for example I have a system that has been systematically failing due to a connection error with a third party system. I know that in this instance the failure is logged with the identifier of ‘Soap Fault’. Feeding this message into grep will then return all lines which contain only that string. The wc command will then tell me how many events have been generated in the log using this handle.

Now this is handy if only a quick count of errors is required, but what if you want to find out exactly how many events have been generated on each day?

In order to do this, we need to start chaining commands together and because of the nature of log files which are timestamped to the second, these chains can quickly become very difficult to read.

The following sed command takes the time element of the log and parses it to zero out the minutes and seconds of each hour in a standard PHP log where the timestamp is given in the format  [11-Feb-2013 23:23:48].

The first part of this (up to the semi-colon) parses the timestamp, keeps only the hours and throws the rest away by replacing it with 0s. Therefore the example given above would become  [11-Feb-2013 23:00:00]. We then strip out the year (we don’t care about this) and replace all hyphens (-) with spaces leaving us with 11 Feb 23:00:00.

The next step is to use the awk command to re-format this date slightly so that it is in the right order for our gnuplot script. This we achieve with the command  awk '{ print $2"-"$1"-"$3; }' which formats the date as:  Feb-11-23:00:00.

If we chain this onto the end of the above two commands we are only left with the date part of the log for all Soap Fault messages. By further chaining this through uniq we can pull hour the exact number of events logged for a given hour.

The final stage is to change the order of the columns and comma separate them using a second awk command given as awk '{ print $2",",$1; }'.

In order to feed this into gnuplot we need to write it out to a temporary file first and then run the plot script given below.

This gives us a nicely plotted bar graph of all uniq soap fault events grouped by hour.

Using tools such as grep and sed can be fantastic for quickly searching the logs but presentation is everything and visualising what is happening in the logs over time helps in understanding a given system with a greater degree of accuracy.

You can download this as a bash script at create_chart

In order to use this, save the file generated and then run “chmod +x create_chart.bash”.

The script takes 2 arguments to run, logfile and the string to search for.

Example:  $ ./create_chart.bash /var/log/php5/error_log 'Fatal' will give you a chart of all fatal errors found in the log.

The script requires that both gnuplot (http://www.gnuplot.info/) and ImageMagick (http://www.imagemagick.org/) are installed. ImageMagick is only used for display purposes. If you don’t want the image to automatically display, remove the following lines from the create_chart function:

 

Leave a Comment