Tuesday, 29 January 2013

dpkg MD5 checksums

My OpenOffice installation stopped working a few days ago after I Changed Nothing (tm) [1], so one of my avenues in investigating the breakage was to check for any unexplained changes to installed files. That happened to me once before [2], back when I worked at Prism, so "obviously" I felt I should check out that possibility again:

$ md5sum -c --quiet /var/lib/dpkg/info/*.md5sums

After much disk grinding (sometimes I'm sure I'm about to see a puff of hard disk powder come out of the fan exhaust), what seems to be a smoking gun:

usr/bin/gnuplot: FAILED
md5sum: WARNING: 1 of 45 computed checksums did NOT match

This is interesting! So I download the deb for gnuplot-x11 and unpack it manually (with binutils' ar), and find the same "wrong" checksum. A friend repeated the procedure and found the same "wrong" checksum, so I'm no longer suspecting a fancy worm/virus that infects new gnuplot binaries as they appear on the filesystem.

It turns out that these mismatching packages have preinst scripts that "divert" files, invalidating the naive checksum. The diverted files are still around, but their names no longer match what's in the lists of MD5 checksums.

And that's where laziness bites me in the behind: I knew by the time I started on my wild goose chase that debsums(1) checked checksums, but since I didn't have it installed and felt too lazy to install it, decided to just run the checksum files through md5sum(1). And after all that effort to get an explanation for these mismatched checksums, I installed debsums(1) anyway and discovered that it knows how to follow diversions!

Now, I'm back to still wanting to know why OpenOffice stopped working.

[1] I upgraded google-chrome, but that update involved only its own package. ooffice seemed to stop working after I tried to open some document that caused it to crash, but I no longer recall the exact sequence of events.

[2] It was almost ten years ago when gethostbyname(3) or some nearby interface seemed to stop working. Suddenly no programs could connect to anything on the Internet anymore. After a bit of bug-chasing I noticed that libc's contents had changed. I don't remember what led me to check that with rpm, but I did. I must have suspected cosmic rays, because I made a copy of libc before rebooting, in order to freeze the corrupted memory contents onto stable storage. Sure enough, after the reboot libc was fine (clearly having been reloaded from the uncorrupted copy on disk), and a diff of a hexdump showed some six bytes that differed, right inside gethostbyname(3). To this day I don't know how I might have forced the kernel to re-read what must have been a very frequently-accessed page.

No comments:

Post a Comment