watch command and load averages

You can use the “ps” command to find the top CPU consumers on a UNIX/Linux server.  Below we find the Process ID which is hogging CPU/Memory:

$ ps -e -o pcpu,pid,user,tty,args |grep -i oracle|sort -n -k 1 -r|head

You can also use the watch command by enclosing the ps command in double quotes.

$ watch “ps -ef | awk -F’ ‘ ‘{print \$2}'”
$ watch “ps -e -o pcpu,pid,user,args |sort -k 1 -n -r | head -10”

If you?ve spent much time working in a UNIX environment you?ve probably seen the load averages more than a few times.

load averages: 2.43, 2.96, 3.41

In his blog entry from late last year, Zach sums it up quite nicely:

In short it is the average sum of the number of processes waiting in the run-queue plus the number currently executing over 1, 5, and 15 minute time periods.

The formula is a bit more complicated than that, but this serves well as a functional definition. Zach provides a bit more detail in his article and also points out Dr. Neil Gunther?s article on the topic which has as much depth on the topic as anyone could ever ask.

So what does this mean about your system?

Well, for a quick example let?s consider the output below. The load average of a system can typically be found by running top or uptime and users typically don?t need any special privileges for these commands.

load averages: 2.43, 2.96, 3.41

Here we see the one minute load average is 2.43, five minute is 2.96, and fifteen minute load average is 3.41.

Here are some conclusions we can draw from this.

  • On average, over the past one minute there have been 2.43 processes running or waiting for a resource
  • Overall the load is on a down-trend since the average number of processes running or waiting in the past minute (2.43) is lower than the average running or waiting over the past 5 minutes (2.96) and 15 minutes (3.41)
  • This system is busy, but we cannot conclude how busy solely from load averages.

It is important here to mention that the load average does not take into account the number of processes. Another critical detail is that processes could be waiting for any number of things including CPU, disk, or network.

So what we do know is that a system that has a load average significantly higher than the number of CPUs is probably pretty busy, or bogged down by some bottleneck. Conversely a system which has a load average significantly lower than the number of CPUs is probably doing just fine.