Share the Knowledge
RSS icon Home icon
  • How to zip an entire directory with Python

    Posted on January 22nd, 2011 webmaster 4 comments         Print Print

    I’m finally done with the Python project I’m working on and now just working on the build script.  I want my build script to zip the entire build directory at the end so at first I got lazy and I figured I’d just call an external zip application (7-Zip) and let it do all the work.  This would of course require the person doing the build to have 7-Zip installed, so I decided it’s probably better in the long run to write it completely in Python.

    Python has a built-in module called zipfile that can do this.  Zipping just one file is pretty straightforward, but zipping an entire directory is actually trickier than I thought.  I figured at first I could probably just use os.listdir() and have a for loop to add each item in the os.listdir() output to the zip file.  But it turned out this doesn’t return the full paths and doesn’t go through the subfolders.  I then tried glob.glob(), which returns the full paths but like os.listdir() it doesn’t do subfolders.

    After a little bit of searching, I found out that there’s a function called os.walk() that traverses the entire folder structure and returns the paths of the root, subfolders, and files which is exactly what I needed.  I had to play with it a bit to be able to do what I needed.  For example, when I zip the folder, I want the first item in the zip archive to be the folder I specified to zip.  In order to do this, I had to retrieve the paths relative to that folder.  I also wanted empty subfolders to be included in the archive as I have empty folders in my project such as the logs and reports folders.

    Below is what I was able to come up with.  Again, I’m a Python newbie so probably not the most elegant implementation, so I’m open to any suggestions :D .

    import zipfile
    import sys
    import os
    
    def zip_folder(folder_path, output_path):
        """Zip the contents of an entire folder (with that folder included
        in the archive). Empty subfolders will be included in the archive
        as well.
        """
        parent_folder = os.path.dirname(folder_path)
        # Retrieve the paths of the folder contents.
        contents = os.walk(folder_path)
        try:
            zip_file = zipfile.ZipFile(output_path, 'w', zipfile.ZIP_DEFLATED)
            for root, folders, files in contents:
                # Include all subfolders, including empty ones.
                for folder_name in folders:
                    absolute_path = os.path.join(root, folder_name)
                    relative_path = absolute_path.replace(parent_folder + '\\',
                                                          '')
                    print "Adding '%s' to archive." % absolute_path
                    zip_file.write(absolute_path, relative_path)
                for file_name in files:
                    absolute_path = os.path.join(root, file_name)
                    relative_path = absolute_path.replace(parent_folder + '\\',
                                                          '')
                    print "Adding '%s' to archive." % absolute_path
                    zip_file.write(absolute_path, relative_path)
            print "'%s' created successfully." % output_path
        except IOError, message:
            print message
            sys.exit(1)
        except OSError, message:
            print message
            sys.exit(1)
        except zipfile.BadZipfile, message:
            print message
            sys.exit(1)
        finally:
            zip_file.close()
    
    ## TEST ##
    if __name__ == '__main__':
        zip_folder(r'D:\[STORAGE]\Software\TrueCrypt',
                   r'D:\[STORAGE]\Software\TrueCrypt.zip')
    

    Just to make sure that I zipped everything properly and nothing got corrupted, I extracted the zipped file generated by Python and compared the extracted files to the original files by computing the files’ MD5 checksums using Jacksum.

    Original Files:

    f0d4cd3efafcbb97f049fb3a425e34e1 D:\[STORAGE]\Software\TrueCrypt\Readme.txt
    9216204150d0e043de519f6cec98436b D:\[STORAGE]\Software\TrueCrypt\TrueCrypt Setup.exe
    a8420e0e8705e8d9885b31e269b78973 D:\[STORAGE]\Software\TrueCrypt\truecrypt-4.2.zip
    cebde8cb7a7a1d36b6e21e7eb318e1d3 D:\[STORAGE]\Software\TrueCrypt\truecrypt-4.2.zip.sig
    1120bf03bb58f5ccf70ee9aeea5e39c0 D:\[STORAGE]\Software\TrueCrypt\Setup Files\License.txt
    4b09b87ec5dfd24f5170df7f3a1bc6b1 D:\[STORAGE]\Software\TrueCrypt\Setup Files\TrueCrypt Format.exe
    da8b0a2c15a318ee5b53d18db14932ee D:\[STORAGE]\Software\TrueCrypt\Setup Files\TrueCrypt User Guide.pdf
    55a47e64aae6fd3989d32e5fc3a0e7dc D:\[STORAGE]\Software\TrueCrypt\Setup Files\truecrypt-x64.sys
    bba1d2723fc413865d70bab41281cacd D:\[STORAGE]\Software\TrueCrypt\Setup Files\TrueCrypt.exe
    3c659b2973e6fb81f50eb0a79b8f5c0f D:\[STORAGE]\Software\TrueCrypt\Setup Files\truecrypt.sys
    
    ---  
    Created with Jacksum 1.7.0, algorithm=md5

    Extracted files from the zip file created by the Python code:

    f0d4cd3efafcbb97f049fb3a425e34e1 D:\[STORAGE]\Software\from_python\TrueCrypt\Readme.txt
    9216204150d0e043de519f6cec98436b D:\[STORAGE]\Software\from_python\TrueCrypt\TrueCrypt Setup.exe
    a8420e0e8705e8d9885b31e269b78973 D:\[STORAGE]\Software\from_python\TrueCrypt\truecrypt-4.2.zip
    cebde8cb7a7a1d36b6e21e7eb318e1d3 D:\[STORAGE]\Software\from_python\TrueCrypt\truecrypt-4.2.zip.sig
    1120bf03bb58f5ccf70ee9aeea5e39c0 D:\[STORAGE]\Software\from_python\TrueCrypt\Setup Files\License.txt
    4b09b87ec5dfd24f5170df7f3a1bc6b1 D:\[STORAGE]\Software\from_python\TrueCrypt\Setup Files\TrueCrypt Format.exe
    da8b0a2c15a318ee5b53d18db14932ee D:\[STORAGE]\Software\from_python\TrueCrypt\Setup Files\TrueCrypt User Guide.pdf
    55a47e64aae6fd3989d32e5fc3a0e7dc D:\[STORAGE]\Software\from_python\TrueCrypt\Setup Files\truecrypt-x64.sys
    bba1d2723fc413865d70bab41281cacd D:\[STORAGE]\Software\from_python\TrueCrypt\Setup Files\TrueCrypt.exe
    3c659b2973e6fb81f50eb0a79b8f5c0f D:\[STORAGE]\Software\from_python\TrueCrypt\Setup Files\truecrypt.sys
    
    ---  
    Created with Jacksum 1.7.0, algorithm=md5

    As you can see the MD5 checksums are identical :D .

     

    4 responses to “How to zip an entire directory with Python” RSS icon


    Leave a reply