Monday, April 03, 2006

Using awk to calculate the total times by parsing a log file

I came across a problem that I imaged would be very common to anyone trying to run an experiment with a software product: 'Analysing a log file and calculating a the total time a user has been performing a certain task'. A number of solutions exist, one of which is to export the whole log to a spreadsheet app like Excel and calculate the differences of time using a simple formula. However, using spreadsheets can be very tedious. The Linux shell command, Awk allows the creation of small scripts that completely automates the whole process. For windows users like me, cygwin is freely available (www.cygwin.com).

Here's the script that I used;

BEGIN {total = 0}
/.*system_login/ { sprintf("date +%%s -d '%s %s'", $1, $2) |
getline tmstamp1;}
/.*system_logout/ { sprintf("date +%%s -d '%s %s'", $1, $2) |
getline tmstamp2;
print (user " " $1 " " tmstamp2 - tmstamp1);
total += tmstamp2 - tmstamp1;
}
END{print (user " total " total);}

The awk language is a bit cryptic but nothing to be afraid of. The main idea of awk is it searches for a patter and performs a command on the found line. Each command is formatted /search pattern/action/. More information available at http://www.vectorsite.net/tsawk.html. So the second line of the script decodes, search for occurrences of "system_login" (with any thing in front) and perform the date extraction command that follows.

An awk script has a begin part, a body part and an end part. The being part is executed prior to anything else, so initialisation goes there and ending is just before the script quits. The body is where the actual logic is performed.

I've used the shell "date" command to convert a date to a value in seconds. However, you can't simply go result = system("date ...") (system can be used to execute a shell command within awk). It only returns result of command 1 or 0. So the command has to executed using sprintf and piped to getline.

Since I had a log file under the name log.dat in a directory created that corresponds their login name, I had to loop through all the directories using a for loop;

for i in *; do awk -f process-times.awk user=$i $i/log.dat ; done

No comments: