So, let'a assume that you are given a task of downloading a file given the URL and the file name on disk, using Python. You may want to write the following code and hope that you don't have to add any error-handling because (as you think) all errors that can happen are either network errors or file write errors, and those two types of errors already raise exceptions for you.
#!/usr/bin/python import urllib2 import sys import socket def download(url, fname): net = urllib2.urlopen(url) f = open(fname, "wb") while True: data = net.read(4096) if not data: break f.write(data) net.close() f.close() if __name__ == "__main__": if len(sys.argv) != 3: print "Usage: download.py URL filename" url = sys.argv fname = sys.argv socket.setdefaulttimeout(30) download(url, fname)
Indeed, this code downloads existing files via HTTP just fine. Also, it provides sensible tracebacks for non-existing hosts, 404 errors, full-disk situations, and socket timeouts. So, it looks like the result of calling the download() fnction is either a successfully downloaded file, or an exception that the other part of the application will likely be able to deal with.
But actually, it only looks like this. Consider a situation when the HTTP server closes the connection gracefully at the TCP level, but prematurely. You can test this by starting your own Apache web server, putting a large file there, and calling "apache2ctl restart" while the client is downloading the file. Result: an incompletely downloaded file, and no exceptions.
I don't know if it should be considered a bug in urllib2 or in the example download() function above. In fact, urllib2 could have noticed the mismatch of the total number of bytes before the EOF and the value in the Content-Length HTTP header.
Here is a version of the download() function that detects incomplete downloads based on the Content-Length header:
def download(url, fname): net = urllib2.urlopen(url) contentlen = net.info().get("Content-Length", "") f = open(fname, "wb") datalen = 0 while True: data = net.read(4096) if not data: break f.write(data) datalen += len(data) net.close() f.close() try: contentlen = int(contentlen) except ValueError: contentlen = None if contentlen is not None and contentlen != datalen: raise urllib2.URLError("Incomplete download")