Tag Archives: Ubuntu

Harddrive failure and monitoring

Broken harddiskI have two Ubuntu servers running at home which both have large RAID volumes on them, set up via mdadm. This summer I had a total disk failure in one of my RAID5s which luckily didn’t result in data loss. Thanks you RAID5!

In any case, it caused me to write a script that logs SMART-data to a MySQL database. I also wrote an admin webpage that displays this data for me in an easy to follow way. The monitoring script itself is written in php5 and so are the admin pages. I used php5 because it is easy to communicate with MySQL from it, and it has the needed string manipulation commands. It could probably be done as easily in Python though. The script is called as a cron job every 2 hours on the servers and every hour on the desktops when they’re running. Examples of the code I’m using is attached below and includes the cron-ed script and the code generating the log output and plot.

What to look for?

Well, that is the big question. How do you know if a drive is about to fail? Google Labs has looked into this topic back in 2007 in this paper: «Failure Trends in a Large Disk Drive Population». An interesting read if you are at all concerned about harddrive failure in servers. The results for how temperature affects the lifetime and failure rate in harddrives are especially interesting. It turns out, at least in their data, that low temperature isn’t such a good thing for the drives contrary to what many people seem to assume. I have up until now been concerned that my drives get too hot, but in fact they seem to be almost overcooled the way I have things set up now.

When I wrote these scripts this summer I decided to log temperature and reallocated sector count primarily, which is what is emphasized in the log display scripts. Seems now I also should be including scan errors as well after reading that paper. The colour coding I use in the temperature plot below is loosely based on Figure 4 in the paper and reflects what seems to be the optimal operating temperature for harddrives.

Screenshots
HD-Mon Screenshot 1HD-Mon Screenshot 2HD-Mon Screenshot 3
Screenshot 1: The overview page.
Screenshot 2: Details of one of the RAID-drives.
Screenshot 3: Details of one of the drives with reallocated sectors.

Code

The php source code I wrote is available in this file: hd-mon.tar.gz

It is specifically designed to work with my setup and hardware and probably isn’t universal, but it gives an idea of how I set it up. There are probably better ways of doing this though. I just call shell commands from php and parse the returned text-string and do simple search on it and input the data into a MySQL database.

I also included php-snippets showing how the admin page is generated. These are not standalone php files, they need to be wrapped in a template. However they reproduce what is seen in the screenshots above.

Packages needed for these scripts to run:

  • php5-cli for the php5 command line.
  • mysql-client, php5-mysql for the database connection.
  • smartctl to access the SMART-data.
  • mdadm to access RAID-data (assuming you use mdadm for RAIDs in the first place)

All are available in the Ubuntu repository.

Ubuntu 11.04

NattySo, a new version of Ubuntu is up. Named: Natty Narwhal, or just Natty for short. I skipped the last 2010 version, 10.10 (Maverick) as I had problems running Kile on it. I have upgraded my desktop and my laptop, and left my server and media centre running 10.04 (Lucid). I had no issues on my desktop, a custom built computer from last year with an ASUS M4A89TD Pro mainboard, an AMD Phenom II Hex core CPU and a XFX Radeon HD5770 GFX card. I had no issues at all installing 11.04, neither did I have any problems with 10-04, only Windows7 gave me problems. On my laptop (HP ProBook 4515s) I do have some funky issues with LCD backlights. I also needed to add the option “nomodeset” to the grub loader, otherwise no problems there either.

11.04 comes with Firefox 4, which wasn’t in the Lucid-repository when I upgraded, and LibreOffice has replaced OpenOffice.org, otherwise I found all my regular software packages in working order, including Kile. Although I do have audio sync issues on VLC player on my laptop that I haven’t resolved yet, it was fine on Lucid.

Natty comes with a new desktop as well, which I did not like at all, but you can get the old one back by choosing “ubuntu-classic” at login.

Harddrive Encryption

One thing I did on my laptop was installing Ubuntu on a completely encrypted harddrive—completely except for /boot that is. I have not tried this before, so I am not sure when this feature was implemented, but you need to download the alternate install CD to get these extended partitioning options unless you want to do this manually. There is a guided option, but what the installer does is to create a small boot partition (I made mine 1GB, but the wizard made it 256 GB) with ext2 file system and a logical volume spanning the rest of the space. Then this partition is encrypted and handed over to the LVM where I partitioned it into a swap and a root partition. With this method you will only need to type the decryption passphrase once during boot.

Now the whole harddrive is encrypted with 256 bit AES encryption—good enough for the US government. I do however notice the demand on the CPU, most notable when playing a HD videofile when the data both needs to be decrypted and decoded and played. I don’t use my laptop for anything demanding, but it is worth noting that 256 bit encryption on the entire harddrive—including the system—is a strain on the CPU. You of course do not have to set the encryption to 256 bit though.