Generating a signed URL for an Amazon S3 file using boto
I was refactoring some code for EZ Exporter, a data exporter app for Shopify, last week as our customer base has been growing pretty steadily these last few months. I figured it's time to do some optimization to make sure the app is ready for future growth.
One of the functionalities that we needed to optimize is how we handle manual downloads. Initially, I just made it very simple by returning the data as a CSV file and stream it directly to the client as part of the response. While it works just fine currently with a smaller user base, we know this will eventually become a problem as we get more users. For example, if a user has a big report to generate, it will tie up the web worker while the report is being generated and even after that part when the data is being downloaded.
To optimize this process, we have a Celery task in the background that generates the report and a view to check the status of that task. This way, as soon as the user clicks the "Download" button, the task runs asynchronously in the background and the response is returned right away with the Celery task ID that will be used to check the task status via periodic AJAX calls. Once the report is done, we then write the file directly to S3 and generate a signed URL that is returned to the user to start the download process. At this point of the process, the user downloads directly from S3 via the signed private URL.
Here's a sample code of how we handle the S3 upload and generating the private download URL using boto (code is written in Python 3 with boto 2):
import uuid from io import BytesIO from django.conf import settings import boto from boto.s3.key import Key def download_file(data, output_filename): conn = boto.connect_s3(settings.AWS_ACCESS_KEY_ID, settings.AWS_SECRET_ACCESS_KEY) bucket = conn.get_bucket(settings.AWS_BUCKET_NAME) k = Key(bucket) k.key = 'temp-downloads/{}'.format(uuid.uuid4().hex) k.set_contents_from_file(BytesIO(data.encode('utf-8'))) download_url = k.generate_url( expires_in=60, response_headers={ 'response-content-type': 'text/csv', 'response-content-disposition': 'attachment; filename={}'.format( output_filename), } ) ...
The main thing to look at here is the k.generate_url() method. As you can see, there's an expires_in parameter that lets you set an expiration for the URL. In this case, the signed URL is only valid for 60 seconds. This was intentional for additional security, in case someone gets a hold of the signed URL and makes it publicly available. If that URL is accessed after 1 minute, the requestor will get an "Access Denied" message from AWS.
You can also override the default response headers with your own (click here to see which headers you can override) . In the code above, we wanted to make the content type as "text/csv" and also use a different filename from what's actually stored in S3. We store the files in S3 using UUIDs for the filenames so we don't have to worry about name conflicts.