Saturday, April 21, 2007

Nerd Food: A Feisty Fawn Indeed!

Like many nerds, I've been looking forward to Feisty Fawn, the latest incarnation of my favourite distro Ubuntu. Unfortunately, due to some incredibly bad internet access, was unable to take the plunge for more than two days after the release. You can imagine the pain. This morning I decided that enough was enough and searched for options. Just as luck would have it, I found out that there are no download limitations in the local friendly internet cafe and laptops are welcome. Perfect.

Like any good ubuntero, I started by reading posts. Although I keep to the manly maxim of "never read manuals", I've learned a long time ago that it doesn't apply to Linux. I think it was after a mega-botched Red Hat 6.0 upgrade. Quick skimming through the release notes and the millions of Fawn related posts revealed that:

1. You must make sure Edgy is up to date.
2. There are some potential problems with the new version of the ATA driver.

The first one was a major bummer. After four months in Africa, all my machines were massively out of date. I had 400 meg worth of packages to download and install. But since there wasn't an option, I went for it. Whilst waiting for the update, I investigated 2. It appears to be well documented:

#84603: Hard disk I/O randomly freezes when hald is running and optical drive is empty
#107271: Acer 9410 ATA hangs, system freezes

A quick perusal of the bug reports showed that there were known good workarounds, which I saved on a local file. Nothing else showed up on my quick review so, once the machine was up to date, it was time to get Fawned. Or not.

The update manager dutifully informed me that my machine was up to date, showed a "Upgrade to Fawn" little button (as per documentation) and told me I needed to reboot the box in order to complete the updates. It seemed sensible to reboot the box first. After rebooting, and moaning about the fact that my framebuffer theme is still black and gray instead of the shinny normal one (don't know what happened there, but has been that way since Edgy has been released. Tried manually changing the theme and everything but to no avail), I logged into my Gnome desktop.

But when I restarted update-manager, I was greeted by a surprise: the "Upgrade to Fawn" button was gone! Totally, inexplicably, inextricably gone. In addition, there was a new version of the update-manager, 0.45.3. Posts I had read previously indicated that 0.45.2 was the minimum required one, but latest is always the greatest - and the button was missing! - so I upgraded. I then restarted update-manager but the button was still missing.

Remembering Dapper - when one had to use some command line switches to get the distro level updates going - I went down to the command line. I tried all sorts of combinations:

$ update-manager -c
$ update-manager -c -d
...

But the button just didn't want to return. Then I remembered I read somewhere that update-manager was written in python. Being a C++ sort of chap, I know nothing about python and avoid high-level like the plague, but these were desperate times. Source code is always source code right? A bit of browsing revealed the "binary" was located in /usr/bin/update-manager. A bit of finding revealed the "packages" (not sure how pythonians refer to them) lived under /usr/lib/python2.4/site-packages/UpdateManager.

After lots of debug statements (which in python are, apparently, print followed by a printf like syntax), I eventually located the culprit in MetaRelease.py (in the core directory). In the download def (function?), we were trying to GET but never got (apologies for the lack of identation from here onwards, but I'm not a blogger/HTML wizard):


try:
print "here"
uri=urllib2.urlopen(req)
print "after"

Alas, there was never an after. I checked the revision control. There had been a couple of changes since the last release, and they were connected to fixing a bug.

# make sure that we always get the latest file (#107716)
req.add_header("Cache-Control", "No-Cache")
req.add_header("Pragma", "no-cache")

Thinking that this perhaps was damaging the HTTP envelope, I first removed the lines. No luck. I then made a simple test program just to see if the code would work in isolation:

import urllib2
import os
import time

print "test"

req = urllib2.Request("http://changelogs.ubuntu.com/meta-release")

# make sure that we always get the latest file (#107716)
req.add_header("Cache-Control", "No-Cache")
req.add_header("Pragma", "no-cache")

if os.access("/var/lib/update-manager/meta-release", os.R_OK):
print "access ok"
lastmodified = os.stat("/var/lib/update-manager/meta-release").st_mtime
if lastmodified > 0:
req.add_header("If-Modified-Since", time.asctime(time.gmtime(lastmodified)))
try:
print lastmodified
uri=urllib2.urlopen(req)
print "here"
f=open("/home/phase/test","w+")
for line in uri.readlines():
f.write(line)
f.flush()
f.seek(0,0)
uri.close()
print "here"
except:
print "Unexpected error:", sys.exc_info()[0]
raise

When I ran this little program, it appeared to grind to a halt; several seconds of wait produced no "here". Clearly something was wrong. But loading the page in Epiphany worked fine! What was going on here?!

I installed tshark, the latest iteration of tethereal (after all, you don't really want people to see you using a packet sniffer on a web cafe, right? a text mode one is easier to conceal...), and configured it listen to my traffic:

tshark -V -i eth0 host 82.211.81.132 > out.txt

I obtained the IP address by pinging changelogs.ubuntu.com. Although the site does not return pings, all I was interested in was the address resolution so that's all good. What I didn't notice at the time, bogged down as I was with my multitasking, was that the address resolution took a long time. And so had the epiphany page loading. The penny would have dropped then, had I only paid attention.

Onwards. A few seconds of wait and nothing got written to my file, so I killed and restarted the process. Eventually, I decided to download the CD ISO, which is apparently another way of upgrading. But without noticing, I had left my code running. A few minutes later I went back to emacs and surprise, surprise, lots of packets filled my out.txt. For some random reason, the GET was being delayed quite a lot. Quick confirmation was done using time and curl:

$ time curl http://changelogs.ubuntu.com/meta-release

real 0m24.322s
user 0m0.024s
sys 0m0.016s

By George! Slow network? But my ISO download was flying at over 100 kbytes! But nevertheless, things are actually working right? So if I leave update-manager running for a bit, it should eventually figure out that the HTTP data has been delivered and present me with a lovely button I can click, right? After all, the check() def is being run on a timer (Performance Federico would have loved this, after the outcry with poll()...):

gobject.timeout_add(1000,self.check)

A few prints later demonstrated that, whilst I could get my simple python script to work by waiting long enough, update-manager just didn't want to know. This is probably because the check() is done on a separate thread, but at this point I had more than exhausted my non-existent python knowledge. So the trick was to find out what was causing the network delays.

I save you the suspense: DNS resolution is the culprit. Repeating the same exercise with the IP address improved things a little bit:

$ time curl http://82.211.81.132/meta-release

real 0m0.133s
user 0m0.016s
sys 0m0.004s

I then reinstalled update-manager to make sure all my changes didn't break it beyond recovery, and did a dirty, dirty hack in MetaRelease.py, replacing

METARELEASE_URI = "http://changelogs.ubuntu.com/meta-release"

with

METARELEASE_URI = "http://82.211.81.132/meta-release"

Lo and behold, the button was back. But will the upgrade work? Only time will tell. At any rate, the ISO has finished downloading.

Bugs will be filled, but that will have to wait for my next session at the internet cafe. After all, today's bill will be a hefty one...

Update: Bug had already been reported (#108541).