Django and AWS: Handling thousands of user uploaded files (MUN Store)
With the launch of our new position paper system, we're expecting thousands of documents to be uploaded by delegates from all over the world. Our systems need to be able to handle the last-minute rush of delegates uploading files, store all of them and ensure they are always available throughout the conference.
Based on this, we decided that it was time to migrate our media files, and static files whilst we're at it, to separate storage. The key benefits of storing media files separately are:
- Loading speeds increase as our web servers do not need to serve media files.
- Our media storage can scale independently of our web servers.
- We can introduce multiple web servers as they can all connect to the same media storage.
There were numerous ways of going about this. Based on cost, elasticity and ease-of-integration, I decided to go with Amazon Web Services' S3 to handle our media files.
For Django, the django-storages plugin is exactly what I needed to integrate our static and media files. To install:
pip install django-storages
Whilst there are many back-ends available to use, the boto3 backend is the one that worked for me as it integrated with AWS S3.
Before configuring the plugin, the S3 bucket had to be set up on AWS. For the most part, it was relatively straightforward. A few things that would have been useful to know from the start:
- If you are planning to use the bucket with your own domain, ensure to set the name of the bucket to the domain you will be using (e.g. media.modelun.co).
- Make sure to set up your CORS configuration; the default config should do.
- Do not set all of the files to public, instead, set a bucket policy that allows public read access. Doing this will ensure that directories are not listed.
I did not have to sort out any internal AWS permissions because we use a different provider for our web servers, but you may have to.
One issue that came up was that the buckets did not serve files using HTTPS. This was an unexpected issue, but was easily resolved by setting up CloudFront to serve the files over HTTPS. I then added a CNAME record for CloudFront, as opposed to the S3 bucket, which sorted out our HTTPS requirement.
With the bucket set up, the next step was to configure Django to use and link back to the new media storage. This is all handled by setting variables in the settings.py files. The django-storages documentation has everything you need to know about the variables, but I'll highlight two:
AWS_LOCATION
is useful if you want to easily split up your bucket as it prepends the specified directory to all files.
DEFAULT_FILE_STORAGE
and STATICFILES_STORAGE
can be set to 'storages.backends.s3boto.S3Boto3Storage'
to hook everything up to S3, however, media and static files will not be separated doing this, they will all be stored in the root directory. For obvious reasons, this is a very bad idea. Instead, custom classes can be created to ensure the media and static files are stored in separate directories.
Dan Poirier handles this well by creating a custom_storages.py
file in the base directory of the project with this:
from django.conf import settings
from storages.backends.s3boto3 import S3Boto3Storage
class StaticStorage(S3Boto3Storage):
location = settings.STATICFILES_LOCATION
class MediaStorage(S3Boto3Storage):
location = settings.MEDIAFILES_LOCATION
These can then be referred to by setting the variables to 'custom_storages.StaticStorage'
or 'custom_storages.MediaStorage'
.
With everything set up, I then had to migrate all of our existing files to our S3 bucket. The AWS CLI (pip install awscli
then aws configure
) came in handy here, more specifically the following command:
aws s3 sync ./media s3://{bucket name here}/media
Static files can be migrated and updated in the future by using python manage.py collectstatic
- exactly the same as before.
It is definitely worth testing this on a staging server before deploying to production. It turns out our WYSIWYG text editor broke during the migration and I had to replace it as it was incompatible with our new media storage.
And that's it! I'm very happy with the outcome of migrating our media storage to AWS S3. We're now in a much better position to handle large numbers of files for our conferences.