Wednesday, August 22, 2012

Linux diagnostic utilities

Here's a list of commands and utilities you might find useful to provide various interesting diagnostic information about your Linux system. This is just a starting point, look at the man page on each utility for the full story.
Not all utilities are included by default with most distributions, you may need to install them using your package manager first. For RHEL / CentOS, this often also means adding the EPEL, RPMFusion, and/or RepoForge repositories.
A lot of this is also provided by the gkrellm GUI utility, which is a great real-time monitor for system performance. You might want to enhance gkrellm by configuring it to launch some of the commands below to provide additional detail.

System Resources

uname -a
# The currently running Linux kernel version, architecture, build date
free -mto
# Memory usage in megabytes, free, disk cache, buffers
cat /proc/meminfo
# more memory usage details
echo "2" > /proc/sys/vm/drop_caches
# clear the disk cache (useful before benchmarking your disk read performance)
dstat -a --top-io --top-bio
# Also displays disk and network utilization, along with the name of the most memory and disk-I/O intensive process.

Process Info

top
# Table of processes, bring up the help ("?") and spend some time playing with the setup... particularly "1", "M/P". Also try making it super colorful with: "A", "Z", "aaa", "Enter", "B". Use at your own peril, but once you have it set up the way you like, use "W" to save that configuration as default.
atop
# Advanced table of processes with historical logging by enabling the atopd service. Access historical logs using `atop -r`, then jump forwards/backwards in time with t/T . Very useful for debugging system and process issues after the fact, and it's very good at highlighting which system resources are constrained (even relatively obscure ones like context switches per sec).
iftop
# Identify processes using network I/O
iotop
# Identify processes using disk / block device I/O
w
# What processes are being run by each user
lsof
# List of open files by process # Particularly useful for figuring out which process is preventing you from unmounting your USB disk so you can terminate it
pmap -x <PID>
# Show the memory map of a process, including which shared libraries it's using
strace -f <command>
# Run a command and display all system function calls... useful for finding out what a process is doing or trying to do when it isn't responding, what files it's using (filter for the "open" call), etc. Also note you can use it to connect to an existing process with "-p <pid>. Also beware that strace will make the process run slower and perhaps unstable.
ltrace -f <command>
# Like strace, but shows all library function calls.

Hardware Info

dmesg
# View kernel ring buffer of debug messages, which is mostly about hardware and driver initialization activity. Usually also logged to /var/log/messages
dmidecode
# motherboard / BIOS version information
lspci
# PCI bus devices, add "-v" for more detail
lsusb
# USB devices, add "-v" for more detail
lsmod
# show device driver kernel modules loaded
cat /proc/interrupts
# show number of interrupts triggered by devices on each CPU
cat /proc/ioports
# show I/O ports reserved by devices
cat /proc/iomem
# show I/O memory ranges reserved by kernel
sensors
# environmental sensor data for temperatures, voltages, fan speeds. Might need to install lm-sensors package and perform initial setup using "sensors-detect" script.

Disk & Filesystem Info

sync
# flush all buffered / pending writes to disk
df
# Show disk free / utilization of each mounted partition
mount
# show which partitions are mounted
cat /proc/mounts
# show which partitions are mounted with options
sfdisk -l
# list all visible disk partitions
smartctl -a /dev/sda
# display hard disk SMART monitoring information, including runtime and number of detected bad blocks ("reallocated sector count")
sg_reassign
# list and mark bad blocks on a device. Part of RHEL/CentOS RPMForge sg3_utils package
hdparm -I /dev/sda
# display disk information, including model and serial number. Also check out "sdparm" for SCSI disks
hdparm -Tt /dev/sda
# perform sequential read throughput test
dumpe2fs /dev/sda1
# show ext2/3/4 filesystem information. Use "tune2fs" to change options and features, such as journal size, block size, used/unused block map
du | sort -n
# report disk usage of current directory tree. Also try the GUI utilities such as "filelight"
iostat
# How much read/write blocks to each disk device
mdadm --detail
# Software RAID configuration
cat /proc/mdstat
# Software RAID status
blktrace -d /dev/sda -o - | blkparse -i -
# record and display disk head movement (well, sector number) as it reads/writes from a block device. It's also possible to plot this over time using the included bno_parse.py script to give a pretty good idea of how many seeks your disk load is generating, as well as how you might optimize it using "readahead" techniques and the like.
fio
Block device benchmarking tool supporting more load patterns.
bonnie++
File system benchmarking tool

Network Info

ifconfig
# display all configured network interfaces
ethtool -i eth0
# show NIC driver and firmware version
ethtool -S eth0
# show NIC statistics
sar -n DEV
# display transmit/receive statistics over time
mii-tool eth0
# show NIC link status
netstat -ap
# state of open network ports, connections, sockets, and associated processes
route -n
# network routing table
iptables -L
# IP filtering tables
tcpdump -i eth0
# sniff network activity on a NIC. Better yet, use the "wireshark" GUI to get lots of useful filtering, reporting, and deep packet inspection
nmap <hostname>
# probe for open ports on a server (clear this with you rIT/security department first!)
ping <hostname>
# measure round trip time to another server over the network
traceroute / tracepath
# discover network path to another server on the network and ping times to all network devices in between
NPtcp # start listener server on remote host NPtcp -h <remotehost>
# netpipe performance benchmark. Adapt the "geplot" script to plot the "np.out" file to view network throughput vs. packet size.

Video Driver Info

less /var/log/Xorg.0.log
# Xorg log file. Also reference config file in /etc/X11/xorg.conf
xdpyinfo
# X Window System configuration, extensions, and colorspaces
glxinfo
# X Window System OpenGL capabilities
glxgears
# quick OpenGL test / benchmark
nvidia-settings
# NVidia driver configuration GUI
xrandr
# X Rotation and Resolution settings

Audio Driver Info

aplay -l
# list audio hardware devices. It can also play .wav files.
alsamixer
# Volume control settings

Input Device Utilities

xev
# X input events received by a selected window
xwininfo
# X window client properties of a selected window
xvkbd
# virtual X keyboard, can be used to send keyboard events with mouse or touchscreen, or even scripts. Works much better than gok (GNOME Onscreen Keyboard)
xset
# various properties for mouse acceleration, keyboard LED control, sleep timeout, etc. May have to deconflict / disable gnome-screensaver-preferences first if you actually want to use some of these, though.

No comments:

Post a Comment