How to zip an entire directory with Python
I’m finally done with the Python project I’m working on and now just working on the build script. I want my build script to zip the entire build directory at the end so at first I got lazy and I figured I’d just call an external zip application (7-Zip) and let it do all the work. This would of course require the person doing the build to have 7-Zip installed, so I decided it’s probably better in the long run to write it completely in Python.
Python has a built-in module called zipfile that can do this. Zipping just one file is pretty straightforward, but zipping an entire directory is actually trickier than I thought. I figured at first I could probably just use os.listdir()and have a for loop to add each item in the os.listdir() output to the zip file. But it turned out this doesn’t return the full paths and doesn’t go through the subfolders. I then tried glob.glob(), which returns the full paths but likeos.listdir() it doesn’t do subfolders.
After a little bit of searching, I found out that there’s a function called os.walk() that traverses the entire folder structure and returns the paths of the root, subfolders, and files which is exactly what I needed. I had to play with it a bit to be able to do what I needed. For example, when I zip the folder, I want the first item in the zip archive to be the folder I specified to zip. In order to do this, I had to retrieve the paths relative to that folder. I also wanted empty subfolders to be included in the archive as I have empty folders in my project such as the logs and reports folders.
Below is what I was able to come up with. Again, I’m a Python newbie so probably not the most elegant implementation, so I’m open to any suggestions :D.
import zipfile import sys import os def zip_folder(folder_path, output_path): """Zip the contents of an entire folder (with that folder included in the archive). Empty subfolders will be included in the archive as well. """ parent_folder = os.path.dirname(folder_path) # Retrieve the paths of the folder contents. contents = os.walk(folder_path) try: zip_file = zipfile.ZipFile(output_path, 'w', zipfile.ZIP_DEFLATED) for root, folders, files in contents: # Include all subfolders, including empty ones. for folder_name in folders: absolute_path = os.path.join(root, folder_name) relative_path = absolute_path.replace(parent_folder + '\\', '') print "Adding '%s' to archive." % absolute_path zip_file.write(absolute_path, relative_path) for file_name in files: absolute_path = os.path.join(root, file_name) relative_path = absolute_path.replace(parent_folder + '\\', '') print "Adding '%s' to archive." % absolute_path zip_file.write(absolute_path, relative_path) print "'%s' created successfully." % output_path except IOError, message: print message sys.exit(1) except OSError, message: print message sys.exit(1) except zipfile.BadZipfile, message: print message sys.exit(1) finally: zip_file.close() ## TEST ## if __name__ == '__main__': zip_folder(r'D:\[STORAGE]\Software\TrueCrypt', r'D:\[STORAGE]\Software\TrueCrypt.zip')<br>
Just to make sure that I zipped everything properly and nothing got corrupted, I extracted the zipped file generated by Python and compared the extracted files to the original files by computing the files’ MD5 checksums using Jacksum.