-
How to zip an entire directory with Python
Posted on January 22nd, 2011 4 comments
Print
I’m finally done with the Python project I’m working on and now just working on the build script. I want my build script to zip the entire build directory at the end so at first I got lazy and I figured I’d just call an external zip application (7-Zip) and let it do all the work. This would of course require the person doing the build to have 7-Zip installed, so I decided it’s probably better in the long run to write it completely in Python.Python has a built-in module called zipfile that can do this. Zipping just one file is pretty straightforward, but zipping an entire directory is actually trickier than I thought. I figured at first I could probably just use os.listdir() and have a for loop to add each item in the os.listdir() output to the zip file. But it turned out this doesn’t return the full paths and doesn’t go through the subfolders. I then tried glob.glob(), which returns the full paths but like os.listdir() it doesn’t do subfolders.
After a little bit of searching, I found out that there’s a function called os.walk() that traverses the entire folder structure and returns the paths of the root, subfolders, and files which is exactly what I needed. I had to play with it a bit to be able to do what I needed. For example, when I zip the folder, I want the first item in the zip archive to be the folder I specified to zip. In order to do this, I had to retrieve the paths relative to that folder. I also wanted empty subfolders to be included in the archive as I have empty folders in my project such as the logs and reports folders.
Below is what I was able to come up with. Again, I’m a Python newbie so probably not the most elegant implementation, so I’m open to any suggestions
.import zipfile import sys import os def zip_folder(folder_path, output_path): """Zip the contents of an entire folder (with that folder included in the archive). Empty subfolders will be included in the archive as well. """ parent_folder = os.path.dirname(folder_path) # Retrieve the paths of the folder contents. contents = os.walk(folder_path) try: zip_file = zipfile.ZipFile(output_path, 'w', zipfile.ZIP_DEFLATED) for root, folders, files in contents: # Include all subfolders, including empty ones. for folder_name in folders: absolute_path = os.path.join(root, folder_name) relative_path = absolute_path.replace(parent_folder + '\\', '') print "Adding '%s' to archive." % absolute_path zip_file.write(absolute_path, relative_path) for file_name in files: absolute_path = os.path.join(root, file_name) relative_path = absolute_path.replace(parent_folder + '\\', '') print "Adding '%s' to archive." % absolute_path zip_file.write(absolute_path, relative_path) print "'%s' created successfully." % output_path except IOError, message: print message sys.exit(1) except OSError, message: print message sys.exit(1) except zipfile.BadZipfile, message: print message sys.exit(1) finally: zip_file.close() ## TEST ## if __name__ == '__main__': zip_folder(r'D:\[STORAGE]\Software\TrueCrypt', r'D:\[STORAGE]\Software\TrueCrypt.zip')Just to make sure that I zipped everything properly and nothing got corrupted, I extracted the zipped file generated by Python and compared the extracted files to the original files by computing the files’ MD5 checksums using Jacksum.
Original Files:
f0d4cd3efafcbb97f049fb3a425e34e1 D:\[STORAGE]\Software\TrueCrypt\Readme.txt 9216204150d0e043de519f6cec98436b D:\[STORAGE]\Software\TrueCrypt\TrueCrypt Setup.exe a8420e0e8705e8d9885b31e269b78973 D:\[STORAGE]\Software\TrueCrypt\truecrypt-4.2.zip cebde8cb7a7a1d36b6e21e7eb318e1d3 D:\[STORAGE]\Software\TrueCrypt\truecrypt-4.2.zip.sig 1120bf03bb58f5ccf70ee9aeea5e39c0 D:\[STORAGE]\Software\TrueCrypt\Setup Files\License.txt 4b09b87ec5dfd24f5170df7f3a1bc6b1 D:\[STORAGE]\Software\TrueCrypt\Setup Files\TrueCrypt Format.exe da8b0a2c15a318ee5b53d18db14932ee D:\[STORAGE]\Software\TrueCrypt\Setup Files\TrueCrypt User Guide.pdf 55a47e64aae6fd3989d32e5fc3a0e7dc D:\[STORAGE]\Software\TrueCrypt\Setup Files\truecrypt-x64.sys bba1d2723fc413865d70bab41281cacd D:\[STORAGE]\Software\TrueCrypt\Setup Files\TrueCrypt.exe 3c659b2973e6fb81f50eb0a79b8f5c0f D:\[STORAGE]\Software\TrueCrypt\Setup Files\truecrypt.sys --- Created with Jacksum 1.7.0, algorithm=md5
Extracted files from the zip file created by the Python code:
f0d4cd3efafcbb97f049fb3a425e34e1 D:\[STORAGE]\Software\from_python\TrueCrypt\Readme.txt 9216204150d0e043de519f6cec98436b D:\[STORAGE]\Software\from_python\TrueCrypt\TrueCrypt Setup.exe a8420e0e8705e8d9885b31e269b78973 D:\[STORAGE]\Software\from_python\TrueCrypt\truecrypt-4.2.zip cebde8cb7a7a1d36b6e21e7eb318e1d3 D:\[STORAGE]\Software\from_python\TrueCrypt\truecrypt-4.2.zip.sig 1120bf03bb58f5ccf70ee9aeea5e39c0 D:\[STORAGE]\Software\from_python\TrueCrypt\Setup Files\License.txt 4b09b87ec5dfd24f5170df7f3a1bc6b1 D:\[STORAGE]\Software\from_python\TrueCrypt\Setup Files\TrueCrypt Format.exe da8b0a2c15a318ee5b53d18db14932ee D:\[STORAGE]\Software\from_python\TrueCrypt\Setup Files\TrueCrypt User Guide.pdf 55a47e64aae6fd3989d32e5fc3a0e7dc D:\[STORAGE]\Software\from_python\TrueCrypt\Setup Files\truecrypt-x64.sys bba1d2723fc413865d70bab41281cacd D:\[STORAGE]\Software\from_python\TrueCrypt\Setup Files\TrueCrypt.exe 3c659b2973e6fb81f50eb0a79b8f5c0f D:\[STORAGE]\Software\from_python\TrueCrypt\Setup Files\truecrypt.sys --- Created with Jacksum 1.7.0, algorithm=md5
As you can see the MD5 checksums are identical
.4 responses to “How to zip an entire directory with Python”

-
Very nice post, ZipFile should have something like this included in the core tbh. Thank you!
-
@webmaster I’ve extended on this a little bit..
http://djangosnippets.org/snippets/2507/
# Inspired by:
# http://www.calazan.com/how-to-zip-an-entire-directory-with-python/comment-page-1/#comment-45304
#
# Heavily modified to support:
# – automatic choice between .pyc and .py
# – partially integrated with ZipFile class
#
# TODO:
# – Allow auto compile to /tmp when .pyc is not present.# usage
# create zip object
_zip_obj = _ZipFile(“filenamehere.txt”, ‘w’)# enable debug mode
_zip_obj.debug = debug# attempt to zip the dir
_zip_obj.writedir(
pathname = dirpath
)
Leave a reply
-

Cal Leeming [Simplicity Media Ltd] July 31st, 2011 at 04:32