How to zip an entire directory with Python
I’m finally done with the Python project I’m working on and now just working on the build script. I want my build script to zip the entire build directory at the end so at first I got lazy and I figured I’d just call an external zip application (7-Zip) and let it do all the work. This would of course require the person doing the build to have 7-Zip installed, so I decided it’s probably better in the long run to write it completely in Python.
Python has a built-in module called zipfile that can do this. Zipping just one file is pretty straightforward, but zipping an entire directory is actually trickier than I thought. I figured at first I could probably just use os.listdir()and have a for loop to add each item in the os.listdir() output to the zip file. But it turned out this doesn’t return the full paths and doesn’t go through the subfolders. I then tried glob.glob(), which returns the full paths but likeos.listdir() it doesn’t do subfolders.
After a little bit of searching, I found out that there’s a function called os.walk() that traverses the entire folder structure and returns the paths of the root, subfolders, and files which is exactly what I needed. I had to play with it a bit to be able to do what I needed. For example, when I zip the folder, I want the first item in the zip archive to be the folder I specified to zip. In order to do this, I had to retrieve the paths relative to that folder. I also wanted empty subfolders to be included in the archive as I have empty folders in my project such as the logs and reports folders.
Below is what I was able to come up with. Again, I’m a Python newbie so probably not the most elegant implementation, so I’m open to any suggestions :D.
import zipfile
import sys
import os
def zip_folder(folder_path, output_path):
"""Zip the contents of an entire folder (with that folder included
in the archive). Empty subfolders will be included in the archive
as well.
"""
parent_folder = os.path.dirname(folder_path)
# Retrieve the paths of the folder contents.
contents = os.walk(folder_path)
try:
zip_file = zipfile.ZipFile(output_path, 'w', zipfile.ZIP_DEFLATED)
for root, folders, files in contents:
# Include all subfolders, including empty ones.
for folder_name in folders:
absolute_path = os.path.join(root, folder_name)
relative_path = absolute_path.replace(parent_folder + '\\',
'')
print "Adding '%s' to archive." % absolute_path
zip_file.write(absolute_path, relative_path)
for file_name in files:
absolute_path = os.path.join(root, file_name)
relative_path = absolute_path.replace(parent_folder + '\\',
'')
print "Adding '%s' to archive." % absolute_path
zip_file.write(absolute_path, relative_path)
print "'%s' created successfully." % output_path
except IOError, message:
print message
sys.exit(1)
except OSError, message:
print message
sys.exit(1)
except zipfile.BadZipfile, message:
print message
sys.exit(1)
finally:
zip_file.close()
## TEST ##
if __name__ == '__main__':
zip_folder(r'D:\[STORAGE]\Software\TrueCrypt',
r'D:\[STORAGE]\Software\TrueCrypt.zip')<br>Just to make sure that I zipped everything properly and nothing got corrupted, I extracted the zipped file generated by Python and compared the extracted files to the original files by computing the files’ MD5 checksums using Jacksum.