Sunday, November 13, 2011

Monitoring your OS Resources by top, vmstat, iostat, ps and lsof commands

While using any Operating system it is common question from developers or end users, "Why my system is slow", "Currently which processes are running", "which processes are taking more CPU or memory" , "Whether there is network or DISK I/O or memory or CPU usage issue", "who is currently running which program" etc. As a system administrator you should regularly monitor your system usage and find out any performance issues happen in a particular time. In this post I will tell about several most important OS commands by which you can monitor your system.

1) top command:
top is most common keyword through which you can find out which users are connected, which processes are running, which processes are taking more CPU or memory and many more. Simply issue "top" command from operating system to check the output of the command.
Sample top output which by default displays the processes in the order of CPU usage and it will display a continually updating report of system resource usage.
$top
top - 13:43:14 up 231 days, 12:16,  2 users,  load average: 2.25, 2.85, 3.00
Tasks: 679 total,   2 running, 677 sleeping,   0 stopped,   0 zombie
Cpu(s):  7.4%us,  0.5%sy,  0.0%ni, 90.3%id,  1.6%wa,  0.1%hi,  0.1%si,  0.0%st
Mem:  132099408k total, 131273300k used,   826108k free,  2509704k buffers
Swap: 133122612k total,   139832k used, 132982780k free, 115727764k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
23555 oracle    16   0 20.5g 883m 879m R   99  0.7   3:02.56 oracle
29158 oracle    15   0 20.5g 113m 108m S   38  0.1   0:38.20 oracle
 1271 oracle    15   0 13112 1436  712 R    4  0.0   0:00.02 top
 8617 oracle    16   0 10.1g 1.3g 1.3g S    2  1.1 139:19.24 oracle
14894 oracle    15   0 1243m  35m  25m S    2  0.0  36:45.03 oracle
14896 oracle    15   0 1246m  45m  36m S    2  0.0  58:51.09 oracle
15985 oracle    15   0 1245m 282m 274m S    2  0.2  19:09.18 oracle
19349 grid      RT   0  482m 288m  51m S    2  0.2   1693:51 ocssd.bin
20320 oracle    18   0 1647m  39m  15m S    2  0.0   2155:21 oraagent.bin
31010 oracle    15   0 20.5g  37m  34m S    2  0.0   0:00.73 oracle
32298 oracle    15   0 20.5g  44m  40m S    2  0.0   0:00.44 oracle
    1 root      15   0 10328  688  580 S    0  0.0  20:24.76 init
    2 root      RT  -5     0    0    0 S    0  0.0   0:22.95 migration/0
    3 root      34  19     0    0    0 S    0  0.0   0:00.63 ksoftirqd/0
    4 root      RT  -5     0    0    0 S    0  0.0   0:00.19 watchdog/0
    5 root      RT  -5     0    0    0 S    0  0.0   1:25.47 migration/1
    6 root      34  19     0    0    0 S    0  0.0   0:00.82 ksoftirqd/1
    7 root      RT  -5     0    0    0 S    0  0.0   0:00.29 watchdog/1
    8 root      RT  -5     0    0    0 S    0  0.0   0:23.55 migration/2
    9 root      34  19     0    0    0 S    0  0.0   0:00.80 ksoftirqd/2
   10 root      RT  -5     0    0    0 S    0  0.0   0:00.33 watchdog/2
   11 root      RT  -5     0    0    0 S    0  0.0   0:32.52 migration/3
   12 root      34  19     0    0    0 S    0  0.0   0:00.76 ksoftirqd/3
   13 root      RT  -5     0    0    0 S    0  0.0   0:00.31 watchdog/3
   14 root      RT  -5     0    0    0 S    0  0.0   1:10.23 migration/4
   15 root      34  19     0    0    0 S    0  0.0   0:08.80 ksoftirqd/4
   16 root      RT  -5     0    0    0 S    0  0.0   0:00.21 watchdog/4
   17 root      RT  -5     0    0    0 S    0  0.0   1:32.78 migration/5
   18 root      34  19     0    0    0 S    0  0.0   0:06.73 ksoftirqd/5
   19 root      RT  -5     0    0    0 S    0  0.0   0:00.27 watchdog/5
   20 root      RT  -5     0    0    0 S    0  0.0   1:06.50 migration/6
   21 root      34  19     0    0    0 S    0  0.0   0:06.41 ksoftirqd/6
   22 root      RT  -5     0    0    0 S    0  0.0   0:00.34 watchdog/6
   23 root      RT  -5     0    0    0 S    0  0.0   1:20.30 migration/7
   24 root      34  19     0    0    0 S    0  0.0   0:06.87 ksoftirqd/7
   25 root      RT  -5     0    0    0 S    0  0.0   0:00.28 watchdog/7
   26 root      RT  -5     0    0    0 S    0  0.0   0:20.32 migration/8
   27 root      34  19     0    0    0 S    0  0.0   0:01.32 ksoftirqd/8

We can modify the output of top command while it is running.

A)Sort top Output command
We can sort top output command by pressing O (Uppercase O) while it is running. While top command is running press capital O will display the output like below.
Current Sort Field:  K  for window 1:Def
Select sort field via field letter, type any other key to return

  a: PID        = Process Id
  b: PPID       = Parent Process Pid
  c: RUSER      = Real user name
  d: UID        = User Id
  e: USER       = User Name
  f: GROUP      = Group Name
  g: TTY        = Controlling Tty
  h: PR         = Priority
  i: NI         = Nice value
  j: P          = Last used cpu (SMP)
* K: %CPU       = CPU usage
  l: TIME       = CPU Time
  m: TIME+      = CPU Time, hundredths
  n: %MEM       = Memory usage (RES)
  o: VIRT       = Virtual Image (kb)
  p: SWAP       = Swapped size (kb)
  q: RES        = Resident size (kb)
  r: CODE       = Code size (kb)
  s: DATA       = Data+Stack size (kb)
  t: SHR        = Shared Mem size (kb)
  u: nFLT       = Page Fault count
  v: nDRT       = Dirty Pages count
  w: S          = Process Status
  x: COMMAND    = Command name/line
  y: WCHAN      = Sleeping in Function
  z: Flags      = Task Flags 

Note1:
  If a selected sort field can't be
  shown due to screen width or your
  field order, the '<' and '>' keys
  will be unavailable until a field
  within viewable range is chosen.

Note2:
  Field sorting uses internal values,
  not those in column display.  Thus,
  the TTY & WCHAN fields will violate
  strict ASCII collating sequence.
  (shame on you if WCHAN is chosen)
Pressing Capital M (M) will sort the output by memory usage. Sample output by pressing M.
[oracle@DC-DB-01 ~]$ top
top - 14:24:24 up 231 days, 12:57,  2 users,  load average: 4.57, 3.27, 2.96
Tasks: 673 total,   4 running, 669 sleeping,   0 stopped,   0 zombie
Cpu(s): 10.4%us,  0.7%sy,  0.0%ni, 87.0%id,  1.3%wa,  0.1%hi,  0.4%si,  0.0%st
Mem:  132099408k total, 131249840k used,   849568k free,  2509836k buffers
Swap: 133122612k total,   139832k used, 132982780k free, 115721144k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3503 oracle    15   0 10.1g 3.7g 3.7g S    0  3.0 162:48.84 oracle
 3505 oracle    15   0 10.1g 3.6g 3.6g S    0  2.9  40:14.52 oracle
 3501 oracle    15   0 10.1g 3.6g 3.6g S    0  2.9  51:14.55 oracle
 8598 oracle    15   0 10.1g 3.4g 3.4g S    0  2.7  37:51.43 oracle
 1901 oracle    15   0 20.5g 3.1g 3.1g S    0  2.5   0:03.33 oracle
 9310 oracle    15   0 10.1g 2.9g 2.9g S    0  2.3 129:22.91 oracle
 8590 oracle    15   0 10.1g 2.8g 2.8g S    0  2.2  36:25.79 oracle
 8566 oracle    -2   0 10.1g 2.6g 2.6g S    0  2.1  52:32.13 oracle
 8570 oracle    -2   0 10.1g 2.6g 2.6g S    0  2.1  48:43.81 oracle
 8580 oracle    15   0 10.1g 2.6g 2.6g S    0  2.0  15:02.37 oracle
18507 root      15   0 3118m 2.6g  20m S    0  2.0 155:36.15 ohasd.bin
 8582 oracle    15   0 10.1g 2.6g 2.6g S    0  2.0  14:58.25 oracle
 8584 oracle    15   0 10.1g 2.6g 2.5g S    0  2.0  14:51.78 oracle
 3499 oracle    15   0 10.1g 2.3g 2.3g S    0  1.8  43:55.73 oracle
 

If you want to sort by how long they processes have been running then press S (capital S).

If you want to sort by CPU usage then press P (capital P).

B)Sort in Reverse Order by using top
If you want to display the sort order in reverse order then press R (capital R).

C)Not to Display Idle Processes
If you no longer want to display idle processes then press r (Small r). Hit them i again to see them again.

D)Kill a process while running top
If you want to kill a process then first note down the PID and then while running top press "k" which will ask you for the process id. Type the process ID. If you have the privilege to kill that particular PID, it will get killed successfully.

E)Change priority of processes while running top
If you want to change the priority of a process but not kill the process then simply press "r" (renice). This will ask PID for renice, enter the PID and priority. The priority value can be -20 (most favorable scheduling) to 19 (least favorable).

F)Display selected user in top output
By using "top -u" you can display a specific user processes.
For example:
$ top -u grid
top - 14:56:30 up 231 days, 13:29,  2 users,  load average: 3.11, 3.20, 2.99
Tasks: 676 total,   2 running, 674 sleeping,   0 stopped,   0 zombie
Cpu(s):  7.3%us,  0.6%sy,  0.0%ni, 90.9%id,  1.2%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  132099408k total, 131272684k used,   826724k free,  2509928k buffers
Swap: 133122612k total,   139832k used, 132982780k free, 115732404k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
20159 grid      17   0 1795m 195m  20m S    1  0.2   4473:22 java
19349 grid      RT   0  482m 288m  51m S    1  0.2   1694:26 ocssd.bin
19095 grid      18   0  825m  33m  14m S    0  0.0   1062:20 oraagent.bin
 1926 grid      16   0  466m  20m  18m S    0  0.0   0:12.13 oracle
 4378 grid      17   0  469m  20m  17m S    0  0.0   0:00.02 oracle
 4413 grid      18   0  469m  18m  16m S    0  0.0   0:00.02 oracle
 6748 grid      16   0  470m  31m  28m S    0  0.0   0:02.05 oracle
 8600 grid      16   0  466m  23m  21m S    0  0.0  13:40.14 oracle
15046 grid      16   0  466m  19m  17m S    0  0.0   2:13.50 oracle
17145 grid      16   0 81228  12m 9264 S    0  0.0   6:33.89 tnslsnr
17617 grid      16   0  470m  31m  28m S    0  0.0   0:02.03 oracle
17645 grid      16   0 81216  12m 9208 S    0  0.0   0:33.69 tnslsnr
19120 grid      15   0  249m  53m 8636 S    0  0.0   6:47.27 gipcd.bin
19132 grid      15   0  261m  56m 7240 S    0  0.0   6:46.97 mdnsd.bin
19146 grid      15   0  320m  65m  11m S    0  0.1  14:16.95 gpnpd.bin
19409 grid      18   0  192m  15m 8216 S    0  0.0   5:05.33 diskmon.bin
19600 grid      15   0  403m  67m  12m S    0  0.1  17:01.10 evmd.bin
19687 grid      15   0  471m  18m  16m S    0  0.0  10:00.15 oracle
You can display only specific process with given PIDs using top -p
$ top -p 1926, 17645, 19349
top - 15:01:51 up 231 days, 13:34,  2 users,  load average: 2.46, 2.76, 2.85
Tasks:   3 total,   0 running,   3 sleeping,   0 stopped,   0 zombie
Cpu(s):  6.3%us,  0.6%sy,  0.0%ni, 92.8%id,  0.2%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  132099408k total, 131216440k used,   882968k free,  2509936k buffers
Swap: 133122612k total,   139832k used, 132982780k free, 115734276k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
19349 grid      RT   0  482m 288m  51m S    1  0.2   1694:29 ocssd.bin
 1926 grid      15   0  466m  20m  18m S    0  0.0   0:12.14 oracle
17645 grid      16   0 81216  12m 9208 S    0  0.0   0:33.69 tnslsnr
G)Display all CPUs/ Cores in the top output
While running top command, pressing 1 (one) will break the CPU down and show details for all the individual CPUs running on the system.

$ top
top - 15:03:17 up 231 days, 13:36,  2 users,  load average: 2.23, 2.60, 2.79
Tasks: 671 total,   5 running, 666 sleeping,   0 stopped,   0 zombie
Cpu0  : 26.4%us,  0.0%sy,  0.0%ni, 73.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  : 10.7%us,  0.7%sy,  0.0%ni, 88.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  0.7%us,  0.0%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  : 42.0%us,  0.3%sy,  0.0%ni, 57.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  1.3%us,  0.0%sy,  0.0%ni, 98.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  0.7%us,  0.0%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  8.7%us,  0.7%sy,  0.0%ni, 90.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :  0.3%us,  1.0%sy,  0.0%ni, 98.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu8  : 35.6%us,  0.0%sy,  0.0%ni, 64.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu9  :  0.7%us,  0.3%sy,  0.0%ni, 98.3%id,  0.3%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu10 :  1.0%us,  0.3%sy,  0.0%ni, 97.7%id,  0.7%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu11 :  2.0%us,  0.0%sy,  0.0%ni, 98.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu12 :  1.0%us,  0.0%sy,  0.0%ni, 99.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu13 :  3.3%us,  0.3%sy,  0.0%ni, 96.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu14 : 16.1%us,  0.7%sy,  0.0%ni, 79.6%id,  3.3%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu15 :  0.0%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st 
Cpu16 :  9.4%us,  0.0%sy,  0.0%ni, 90.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu17 :  9.0%us,  0.3%sy,  0.0%ni, 90.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu18 :  0.0%us,  0.0%sy,  0.0%ni, 99.7%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu19 : 18.4%us,  5.4%sy,  0.0%ni, 76.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu20 : 17.1%us,  0.0%sy,  0.0%ni, 82.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu21 : 27.2%us,  0.7%sy,  0.0%ni, 69.8%id,  1.3%wa,  0.0%hi,  1.0%si,  0.0%st
Cpu22 :  0.3%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu23 :  3.6%us,  1.0%sy,  0.0%ni, 94.4%id,  0.3%wa,  0.0%hi,  0.7%si,  0.0%st
Mem:  132099408k total, 131368388k used,   731020k free,  2509944k buffers
Swap: 133122612k total,   139832k used, 132982780k free, 115734812k cached

H)Change Refresh Interval in top output
Whenever you issue top command you will notice in every 3 seconds the output is being refreshed. You can change the update output frequency by pressing d (small d). Upon pressing d it will display an interactive prompt. Enter the time in seconds and then press enter. Now it will update based on the seconds given by you. If you want to update the output on-demand then press space bar.

I)Highlight running processes
You can highlight running processes by pressing z or b. You can press z or b and then check the output while running top.

J)Display absolute path and arguments of the command running
While running top command output press small c. It will show / hide command absolute path, and arguments.

K)Quit top after specific number of iterations
Top continuously displays output in every 3 seconds until you press q. If you would like to view only a certain iteration and want the top to exit automatically use -n option.

For example if you want to display 3 iterations of unix top command output and then exit then issue.

$ top -n 3

L)Executing top Command in batch mode

If you want to execute top command in the batch mode then use -b option with top command for example:

$ top -b -n 3

This is very useful if you want to save the top command output into a file.
For example the following command

$ top -b -n 3 >a.txt
will create a file named a.txt which contains 3 iterations of top command output.

M)Split top output into multiple panels
While running top output pressing A (capital A) will split top output into multiple panels. You can cycle through these windows using 'a'.

N)Decease number of processes to display
By default the value of n is 0 which means all tasks are displayed. But you can limit the number of tasks to display. If you want to display 4 tasks at a time then press "n". The interactive mode will appear and just type 4.
Maximum tasks = 0, change to (0 is unlimited): 4

O)Toggle top header while displaying top command output
You will increase the number of processes displayed by top command in your window. While running top command.

Press "l" – to hide / show the load average. 1st header line.
Press "t" – to hide / show the CPU states. 2nd and 3rd header line.
Press "m" – to hide / show the memory information. 4th and 5th line.

P)Save top configuration settings
You can save the top configuration settings to a file by pressing W (capital W) while running top output. After you press W it will display output like,

Wrote configuration to '/home/oracle/.toprc'
You can check the content of the file by,

$ less /home/oracle/.toprc

In another post I will show you details about vmstat, iostat, ps and lsof commands.