Django Files — A Short Talk

110
Files @jaylett Files sound like a dry subject. This is going to be a struggle to keep everyone’s attention after Amber’s talk. You’re all busy thinking about things you can build, and I have to get you to think about files for an hour.

Transcript of Django Files — A Short Talk

Page 1: Django Files — A Short Talk

Files@jaylett

Files sound like a dry subject. This is going to be a struggle to keep everyone’s attention after Amber’s talk. You’re all busy thinking about things you can build, and I have to get you to think about files for an hour.

Page 2: Django Files — A Short Talk

Filesin an exciting adventure with dinosaurs

Page 3: Django Files — A Short Talk

Files

in an exciting adventure with dinosaurs

Page 4: Django Files — A Short Talk

Filesa brief talk

Files is less than 1% of the Django codebase, and perhaps 2% of the tests. How big can it be? This is a talk in 8 parts.

Page 5: Django Files — A Short Talk

TZ=CET ls -ltr talk/-rwxr--r-- 1 jaylett django 6 6 Nov 14:01 Files

-rwxr--r-- 1 jaylett django 5 6 Nov 14:02 Files and HTTP

-rwxr--r-- 1 jaylett django 15 6 Nov 14:04 Files in the ORM

-rwxr--r-- 1 jaylett django 13 6 Nov 14:08 Storage backends

-rwxr--r-- 1 jaylett django 25 6 Nov 14:20 Static files

-rwxr--r-- 1 jaylett django 8 6 Nov 14:35 Form media

-rwxr--r-- 1 jaylett django 29 6 Nov 14:40 Asset pipelines

-rwxr--r-- 1 jaylett django 6 6 Nov 14:55 What next?

It’s quite a lot, and I haven’t managed it in under an hour so far, so get ready, strap in, and grab your favourite dinosaur.

Page 6: Django Files — A Short Talk

Files

The programme says this is about static files, but there’s some other stuff I want to cover quickly first. Although if anyone wants an hour long talk on the rest of Django’s file support, let me know. Or a two hour one. Or five.

Page 7: Django Files — A Short Talk

Python files

Python files are an abstraction built out of abstract base classes.

Page 8: Django Files — A Short Talk

Django files

Django returns the favour. Hang on; this is the small version.

Page 9: Django Files — A Short Talk

Django files

Awesome!

Django adds a few utilities to Python files, like chunked reading for http, line iteration, stuff like that. It sounds awesome. As in inspiring awe, which comes from the Old English word for “terror”.

Page 10: Django Files — A Short Talk

The File family

• File — or ImageFile, if it might be an image

• ContentFile / SimpleUploadedFile in tests

• which have a different parameter order

You have to guess in advance if something might be an image. Testing code isn’t particularly pleasant.

But, you know…it works.

Page 11: Django Files — A Short Talk

• #10541 cannot save file from a pipe

This is Deinonychus Antirrhopus. He’ll be introducing bugs as we come across them.

This isn’t a terribly interesting one. Generally I won’t bother mentioning bugs unless they’re interesting or appalling.

Page 12: Django Files — A Short Talk

Files and HTTP

HTTP handling of uploads is managed by a File specialisation called UploadedFile.

Page 13: Django Files — A Short Talk

UploadedFile

• “behaves somewhat like a file object”

• temporary file and memory variants

• custom upload handlers

The HTTP layer doesn’t know how to deal with memory buffering or file buffering of inbound entities…the File layer does. Choosing whether to handle HTTP uploads in memory or not is done by asking a series of handlers in turn if they want to try. And you can add your own handlers, for which you’d need to do the same dance if you want to support both in-memory and on-disc.

Page 14: Django Files — A Short Talk

forms.FileField# forms-filefield.py

class FileForm(forms.Form): uploaded = forms.FileField()

def upload(request): if request.method == 'POST': form = FileForm(request.POST, request.FILES) if form.is_valid(): request.FILES['uploaded'] # do something! return HttpResponseRedirect('/next/') else: form = FileForm() return render_to_response( 'upload.html', {'form': form}, )

This is taken from the documentation. Note that we create a form field for handling the file, then we ignore it and use the HTTP layer directly, so that was totally worth it.

Page 15: Django Files — A Short Talk

Again, it works

Never overestimate the value of functional. But maybe we could make this a bit more sane while looking at the HTTP handling layer.

Page 16: Django Files — A Short Talk

• #15879 multipart/form-data filename="" not handled as file

• #17955 Uploading a file without using django forms

• #18150 Uploading a file ending with abackslash fails

• #20034 Upload handlers provide no way to retrieve previously parsed POST variables

• #21588 "Modifying upload handlers on the fly" documentation doesn't replicate internal magic

None of these is particularly worth talking about today.

Page 17: Django Files — A Short Talk

Files in the ORM

Page 18: Django Files — A Short Talk

Files in the ORM# orm-file.py

class Wub(models.Model): infosheet = models.FileField()

>>> w = Wub(infosheet="relative/to/media/root.pdf")>>> print w.infosheet.urlhttps://media.root/relative/to/media/root.pdf>>> w.infosheet = ContentFile("A boring bit of text", “file.txt")>>> print w.infosheet.urlhttps://media.root/file.txt

What’s stored in the database is the path relative to the media root, and the file itself gets stored outside the database.

Page 19: Django Files — A Short Talk

• #5619 FileField and ImageField return the wrong path/urlbefore calling save_FOO_file()

• #10244 FileFields can't be set to NULL in the db

• #13809 FileField open method is only accepting 'rb' modes

• #14039 FileField special-casing breaks MultiValueField including a FileField

• #13327 FileField/ImageField accessor methods throw unnecessary exceptions when they are blank or null.

• #17224 determine and document the use of default option in context of FileField

• #25547 refresh_from_db leaves FieldFile with reference to db_instance

Page 20: Django Files — A Short Talk

Files in the ORM# orm-file.py

class Wub(models.Model): infosheet = models.FileField()

>>> w = Wub(infosheet="relative/to/media/root.pdf")>>> print w.infosheet.urlhttp://media.root/relative/to/media/root.pdf>>> w.infosheet = ContentFile("A boring bit of text", “file.txt")>>> print w.infosheet.urlhttp://media.root/file.txt>>> w.infosheet = None>>> print w.infosheet.url

We can “unset” a file field by setting it to None, which is what you’d expect in Python. But you shouldn’t.

Page 21: Django Files — A Short Talk

Files in the ORM# orm-file.py

class Wub(models.Model): infosheet = models.FileField()

>>> w = Wub(infosheet="relative/to/media/root.pdf")>>> print w.infosheet.urlhttp://media.root/relative/to/media/root.pdf>>> w.infosheet = ContentFile("A boring bit of text", “file.txt")>>> print w.infosheet.urlhttp://media.root/file.txt>>> w.infosheet = None>>> print w.infosheet.url>>> print type(w.infosheet)FieldFile <class 'django.db.models.fields.files.FieldFile'>

Wait, what?

You set it to None, but it’s a FieldFile class.

Page 22: Django Files — A Short Talk

FieldFile

• magical autoconversion for anything (within reason)

• this happens using FileDescriptor classes which, well, let’s just ignore that

Page 23: Django Files — A Short Talk

FileField# orm-fieldfile.py

class Wub(models.Model): infosheet = models.FileField()

>>> w = Wub(infosheet="relative/to/media/root.pdf")>>> w.infosheet = None>>> w.infosheet == NoneTrue>>> w.infosheet is NoneFalse

This is just going weird. If you want to unset a FileField, set it to the empty string. You can’t save None into the database anyway.

Page 24: Django Files — A Short Talk

• #18283 FileField should not reuse FieldFiles

Don’t keep references to FileFields around; instead you have to take their name or path.

Page 25: Django Files — A Short Talk

In ModelForms

# modelforms-filefield.py

class Wub(models.Model): infosheet = models.FileField()

class WubCreate(CreateView): model = Wub fields = ['infosheet']

And this all just works in the expected way with model forms.

Let’s add an image!

Page 26: Django Files — A Short Talk

ImageField# orm-imagefile.py

class Wub(models.Model): infosheet = models.FileField() photo = models.ImageField()

>>> w = Wub(infosheet="relative/to/media/root.pdf", photo="relative/to/media/root.png")>>> w.photo.width, w.photo.height(480, 200)

Neat.

Page 27: Django Files — A Short Talk

• #15817 ImageField having[width|height]_field set sytematically compute the image dimensions in ModelForm validation process

• #18543 Non image file can be saved to ImageField

• #19215 ImageField's “Currently” and “Clear” Sometimes Don't Appear

• #21548 Add the ability to limit file extensions for ImageField and FileField

It’s worth looking at the cooperating classes here…

Page 28: Django Files — A Short Talk

So many classes

What do we say about non-planar graphs of class collaboration? Well, we say cut bits out so it makes more sense.

Page 29: Django Files — A Short Talk

So many classes

If you wanted to make another specialisation of File, like ImageFile, you’d need to write four classes. Image handling isn’t just part of the file layer, it’s also part of the ORM. This sort of thing is why OOP gets a bad name, incidentally.

All this is JUST to get width and height conveniently from an ImageFile, and by inference from an ImageField. Except you almost certainly don’t want to do this, because it has to read the header, and sometimes the entire content, of the file to do so. On that note:

Page 30: Django Files — A Short Talk

ImageField# orm-imagefile-proxies.py

class Wub(models.Model): infosheet = models.FileField() photo = models.ImageField(width_field='photo_width', height_field='photo_height') photo_width = models.PositiveIntegerField(blank=True) photo_height = models.PositiveIntegerField(blank=True)

>>> w = Wub(infosheet="relative/to/media/root.pdf", photo="relative/to/media/root.png")>>> w.photo.width, w.photo.height(480, 200)>>> w.photo_width, w.photo_height(480, 200)

On setting the photo, width and height will be populated through to the relevant attributes, which will roundtrip to the database. So you don’t have to read your file data off slow slow slow storage to know what size it’ll render at.

This is denormalising for performance. But it’s tricksy. You can still hit `wub.photo.width`, and it’ll still load the file header.

Page 31: Django Files — A Short Talk

• #8307 ImageFile use of width_field and height_field is slow with remote storage backends

• #13750 ImageField accessing height or width and then data results in "I/O operation on closed file”

We could have it defer to the denormalised attributes if configured and populated. There’s some discussion in the tickets about this being a little too magic, though.

Page 32: Django Files — A Short Talk

Storage backends

So we’ve saved some files via the ORM…but where did they go?

Page 33: Django Files — A Short Talk

Storing files

• Stored in MEDIA_ROOT

• Served from MEDIA_URL

• Uses FileSystemStorage…by default

And how did Django know what URL to use for them? This is what storage backends are for. The default one is FileSystemStorage, which just stores things on the file system, at MEDIA_ROOT. URLs are constructed by tacking MEDIA_URL on the front, which assumes you’re serving the filesystem via HTTP in a naive style.

But what is a storage backend?

Page 34: Django Files — A Short Talk

Another abstraction

It’s okay, this is (mostly) a good abstraction.

Page 35: Django Files — A Short Talk

What can Storage do?

• oriented around files, rather than a general FS API

• open / save / delete / exists / size / path / url

• make an available name, ie one that doesn’t clash

• modified, created, accessed times

• … also listdir

Broadly speaking: you can create, delete and manage files, and then turn them into URLs to use them.

Most of these are actually optional, ie a given storage backend may not support them all.

Page 36: Django Files — A Short Talk

Choosing storage# configuring-storage-settings.pyDEFAULT_FILE_STORAGE = 'dotted.path.Class'MEDIA_ROOT = '/my-root'MEDIA_URL = 'https://media.root/'

class MyStorage(FileSystemStorage): def __init__(self, **kwargs): kwargs.setdefault( 'location', '/my-root' ) kwargs.setdefault( 'base_url', 'https://media.root/' ) return super( MyStorage, self ).__init__(**kwargs)

The default for the ORM is in DEFAULT_FILE_STORAGE, which is a dotted path to a python class. Django will import the module and instantiate the class — so you must accept the defaults of the class. This is why FileSystemStorage uses MEDIA_ROOT and MEDIA_URL (and some others) to customise it. Or you can subclass it as a utility class and point to that, like you might with AppConfig.

Page 37: Django Files — A Short Talk

Choosing storage

# models.pyfrom mystorage import BetterStoragestorage_instance = BetterStorage()

class MyModel(models.Model): upload = models.FileField( storage=storage_instance )

You can override the ORM default on a field-by-field basis; in this case you use an instance not a dotted path. (The asymmetry is mildly annoying.)

You can also instantiate and use Storage objects directly, so you can keep decisions about where things go in the hands of your users.

http://tartarus.org/james/diary/2013/07/18/fun-with-django-storage-backends

Page 38: Django Files — A Short Talk

• #9586 Shall upload_to return an urlencodedstring or not?

• #12157 FileSystemStorage does file I/O inefficiently, despite providing options to permit larger blocksizes

• #15799 Document what exception should be raised when trying to open non-existent file

• #21602 FileSystemStorage._save() Should Save to a Temporary Filename and Rename to Attempt to be Atomic

• #23759 Storage.get_available_name should preserve all file extensions, not just the first one

• #23832 Storage API should provide a timezone aware approach

First is actually a bug with FileSystemStorage.url() not creating URLs properly, so not coping with non-ASCII. Because no one uses non-ASCII characters…

Page 39: Django Files — A Short Talk

Reasons to override

• Model fields with different options

• Different storage engine entirely

For example, you might have a model field that contains members-only content that you don’t serve directly.

Page 40: Django Files — A Short Talk

Protected storageFSS = FileSystemStorage

pstore = FSS( location=‘/protected’, base_url="/p/",)

urlpatterns += patterns( url( r'^p/(?P<path>.*)$', protected, ),)

class Profile(models.Model): resume = FileField( null=True, blank=True, storage=pstore, )

@login_requireddef protected(request, path): f = pstore.open(path) return HttpResponse( f.chunks() )

You store the files in a separate part of the filesystem, and make them available at a different URL, served via Django (you’d probably actually use django-sendfile or similar).

Or you might want a totally different storage engine, such as one that supports Amazon S3.

Page 41: Django Files — A Short Talk

S3BotoStorage

• One of many in django-storages-redux

• Millions of options, somewhat undocumented

• configure with AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_STORAGE_BUCKET_NAME

• defaults for the rest mostly sane

Amazon S3 is a popular way of storing files on the internet without having to worry too hard about where they are. You pay some money, and it stores files.

S3Boto is a Django storage backend that stores files on S3 part of django-storages-redux. It has *lots* of configuration options. Some you will always need to set.

Page 42: Django Files — A Short Talk

Fun with S3Boto

AWS_S3_CUSTOM_DOMAIN='cdn.eg.com'AWS_HEADERS={ 'Cache-Control': 'max-age=31536000'} # a year or soAWS_PRELOAD_METADATA=True

There are some particularly useful ones: you can override the Amazon domain name in constructing URLs (useful for CDNs), you can set default HTTP headers that will be returned with the object (eg Cache-Control).

You can also tell it to preload the metadata for the bucket. That may make subsequent operations faster, or (if you have lots of objects in the bucket) it may just eat all your memory and cause your app to fall over. YMMV.

Page 43: Django Files — A Short Talk

Protected S3 storageprotected_storage = S3BotoStorage( acl='private', querystring_auth=True, querystring_expire=600, # 10 minutes, try to ensure people # won’t / can't share)

model.field.url # works as expected

And because S3 has a permissions system, and a way of generating short-lived URLs with access credentials in the query string, we can implement the protected storage example from earlier without using a custom view. You just take the URL of any model field using that storage in the normal way and it’ll all Just Work(tm).

Page 44: Django Files — A Short Talk

Code that uses storage

• Please test on both Windows and Linux/Unix

• Please test with something remote like S3Boto

• Please write your own tests to work with different storage backends

If you write an extension — a third party app (3PA) — that uses files, you are probably affected. (Wait, it’ll get worse.)

Avoid problems by writing tests and then running them against various backends. That means that not only your code, but your tests, need to be storage agnostic.

Page 45: Django Files — A Short Talk

Static filesHelpful assets

Okay, so that’s “media”, ie uploaded/ugc and so forth. What about assets, things in the codebase?

In the beginning there was only MEDIA. This has made a lot of people very angry and been widely regarded as a bad move.

Then 1.3 integrated the 3PA staticfiles, which separated MEDIA from STATIC.

Page 46: Django Files — A Short Talk

staticfiles in 1.3

Apps could have assets, in `static`. The project could have assets, in anything you call them (you should probably call them `static`). The world was a better place.

(Assets can be in sub-directories, there just aren’t any here.)

You can add more complex mechanisms but let’s not go there.

Page 47: Django Files — A Short Talk

How it works<!-- source.html -->{% load static %}<img src='{% static "asset/path.png" %}' width='200' height='100' alt=''>

<!-- out.html --><img src='https://static.root/asset/path.png' width='200' height='100' alt=‘'>

It introduced a template library and tag, both called static, which expands the asset path to a full URL by tacking STATIC_URL on the front. STATIC_URL is equivalent to MEDIA_URL.

Page 48: Django Files — A Short Talk

In development

In development, staticfiles takes care of finding assets to serve, when they’re requested under STATIC_URL.

Page 49: Django Files — A Short Talk

collectstatic

In production they get swept up into one place so a webserver can serve them rather than Django/wsgi.

It’s says “copied into STATIC_ROOT”; that’s actually done by the configured static files storage backend, which by default (STATICFILES_STORAGE_BACKEND) is a variant of FileSystemStorage that uses STATIC_ROOT. STATIC_ROOT is the target of a build process, so don’t call it “static”, call it something like “collected_static”.

Page 50: Django Files — A Short Talk

• #24336 static server should skip for protocol-relative STATIC_URL

• #25022 collectstatic create self-referential symlink

Anyway, as of 1.3 media and static assets were kept distinct, and reusable apps could bundle their own assets *but* it was still a bad time because it assumed that static files should be served from STATIC_URL — but storage backends can generate URLs themselves!

Page 51: Django Files — A Short Talk

staticfiles in 1.4

1.4 made a new template library, staticfiles, which can use storage backends.

Page 52: Django Files — A Short Talk

staticfiles in 1.4

But it’s still a bad time because we still have the static template library from 1.3 — it’s pretty easy to get the wrong one, a bug that you may not notice until production.

Page 53: Django Files — A Short Talk

• #23563 Make `staticfiles_storage` a public API

• #25484 static template tag outputs invalid HTML if storage class's url() returns a URI with '&' characters.

Page 54: Django Files — A Short Talk

I make it sound badReally, it’s not

Let’s not overstate things though because since 1.4 you can set a few settings variables, run collectstatic, and your static files are ready to serve, alongside but separate from your media, with suitably different options.

Page 55: Django Files — A Short Talk

Still a moderately bad time

staticfiles isn’t in core, so not everyone uses it. That puts 3PAs in particular in an awkward position. Do they use staticfiles? Ignore it? Try to support it automatically if it’s enabled?

Page 56: Django Files — A Short Talk

Still a moderately bad time

Admin does the last of these from I believe 1.6. It has a templatetag which also acts as a helper so it can get it right either from view code or from within templates. Anyone can use it as a helper, because it’s shipped with core, but you can’t use a taglib from an app that isn’t enabled (INSTALLED_APPS), so everyone would have to write their own. Unsurprisingly, no one does this, so generally either 3PAs only work if staticfiles is installed, or just ignore it.

This wouldn’t be an enormous problem if it weren’t for CachedStaticFilesStorage.

Page 57: Django Files — A Short Talk

Serving assets best practice

We’ve had a best practice for serving assets for a while (perlbal supported this pattern in 2007, and it was old then). Because assets don’t change between releases, you can make the asset URL dependent on the version of the asset, usually by including a hash of its contents (byte by byte) and add a far future expires, then future requests for the asset

Page 58: Django Files — A Short Talk

Serving assets best practice

Can be served out of cache, so you don’t pay any network cost. But when you release a new version…

Page 59: Django Files — A Short Talk

Serving assets best practice

The updated asset will be fetched immediately because it has a different URL. For bonus points, you put all your assets on a CDN so it’s edge served nice and close to your users.

Page 60: Django Files — A Short Talk

Cached (1.4) / Manifest (1.7)

• Cache uses the main cache, or a dedicated one

• Manifest writes a JSON manifest file to the configured storage backend

To support this in Django, CachedStaticFilesStorage was introduced in 1.4 and keeps the mapping from file name to URL in cache (so it doesn’t have to be recomputed every time, although it will regenerate missing entries), and ManifestStaticFilesStorage came along in 1.7 so you don’t have to hit a cache for every asset reference; it uses a JSON file which is loaded when the storage is instantiated, probably when Django starts up.

So remember that we had a problem with not all 3PAs supporting staticfiles? This is what it looks like.

Page 61: Django Files — A Short Talk

The problem

<!-- hashed-assets.html --><script src=“/static/admin/js/ collapse.min.c1a27df1b997.js”><script src=“/admin/editorial/ article/2/autosave_variables.js"><script src=“/static/autosave/js/ autosave.js”>

Look, the first one’s great. Hashed URL, lovely. This comes from Django admin itself.The second and third are from a 3PA that creates some JS variables dynamically (second line) then uses them in a Javascript library (third line). It doesn’t support staticfiles.

I cannot use a FFE here, or I can’t upgrade files like that.

Page 62: Django Files — A Short Talk

The problem

<!-- hashed-assets.html --><script src=“/static/admin/js/ collapse.min.c1a27df1b997.js”><script src=“/admin/editorial/ article/2/autosave_variables.js"><script src=“/static/autosave/js/ autosave.js?v=2”>

What the actual 3PA did is this. See the “?v=2” there? That’s someone working round this problem. Now you have to remember to bump the version in the querystring. It’s a symptom.

Page 63: Django Files — A Short Talk

That’s not all

We also have problems which come about when we deploy new versions.

Say you’re using the cache variant.

Page 64: Django Files — A Short Talk

That’s not all

When you run collectstatic for deploy v1302, it’ll update the mapping from site.css to its new hashed URL.

Page 65: Django Files — A Short Talk

That’s not all

But that’s *in the cache that v1301 is still using* until you finish running collectstatic and any other prep tasks and then switch over to v1302. If you’ve changed your HTML layout and then updated your CSS or JS to match, you are now serving old HTML with new CSS, and your site is broken.

It’s almost impossible to make this work properly with Cache.

Page 66: Django Files — A Short Talk

That’s not all

Manifest is a little better, providing you are writing your static files to a different place on each deploy. But *the manifest file is stored via the storage engine*. So if you want to share that space between different versions, or you put them on S3, you’re in trouble. You *might* get away with it on upgrading, because the manifest gets loaded when the storage backend is instantiated; but you can’t predict when your worker gets reloaded, which will load the new one. Too many variables to get right.

From the Django documentation:

> The purpose of [ManifestStaticFilesStorage] is to keep serving the old files in case some pages still refer to those files, e.g. because they are cached by you or a 3rd party proxy server.

With Heroku, and also things like containers (Docker, Kubernetes &c) we’re getting to the idea of “self-contained artefacts” for deploying our applications. This is a great idea, just as great as it was when operating systems adopted this idea called packages back sometime in the late Triassic. Or Java and war files, for that matter.

We really want to put the manifest in the deployment artefact. That means reliably storing it locally, no matter what.

Page 67: Django Files — A Short Talk

Manifests for artefactsclass LocalManifestMixin(ManifestFilesMixin): _local_storage = None

def __init__(self, *args, **kwargs): super(LocalManifestMixin, self).__init__(*args, **kwargs) self._local_storage = FileSystemStorage()

def read_manifest(self): try: with self._local_storage.open(self.manifest_name) as manifest: return manifest.read().decode('utf-8') except IOError: return None

def save_manifest(self): payload = { 'paths': self.hashed_files, 'version': self.manifest_version, } if self._local_storage.exists(self.manifest_name): self._local_storage.delete(self.manifest_name) contents = json.dumps(payload).encode('utf-8') self._local_storage._save( self.manifest_name, ContentFile(contents) )

It’d probably look something like that. Caution: utterly untested.

Page 68: Django Files — A Short Talk

• #18929 CachedFilesMixin is not compatible with S3BotoStorage

• #19528 CachedFilesMixin does not rewrite rules for css selector with path

• #19670 CachedFilesMixin Doesn't Limit Substitutions to Extension Matches

• #20620 CachedFileMixin.post_process breaks when cache size is exceeded

• #21080 collectstatic post-processing fails for references inside comments

• #22353 CachedStaticFilesMixin lags in updating hashed names of other static files referenced in CSS

• #22972 HashedFilesMixin.patterns should limit URL matches to their respective filetypes

• #24243 Allow HashedFilesMixin to handle file name fragments

• #24452 Staticfiles backends using HashedFilesMixin don't update CSS files' hash when referenced media changes

• #25283 ManifestStaticFilesStorage does not works in edge cases whileimporting url font-face with IE hack

The first highlighted bug…hashes are applied in both Cache & Manifest by post-processing the assets, which by default only supports CSS’s url() construct, not Javascript asset references or weirder CSS constructs or anything else. Also there are some bugs if you want to extend it yourself.

And if you change an asset, its hash will change — but the hash of a CSS file that refers to it won’t, so you can’t eg update a logo without also making a change to any file that uses it. Some of these we can fix. Some of the cache-based ones…not so much.

Page 69: Django Files — A Short Talk

Some options for 3PA• Fiat: All 3PAs use staticfiles

• Shim: all 3PAs use the admin approach, wrapping it in their own taglib. Duplication!

• Bless: move staticfiles into core and make the simple `load static` the same as `load staticfiles`

• Weakly bless staticfiles so `load static` behaves like the admin, and the admin static stuff goes away, and everyone just uses `load static`

Just looking at the 3PA problem… People don’t like the first option because staticfiles isn’t in core. Fair enough.

The second isn’t really practical, because there’s no way on earth any 3PA author is going to know to look for documentation on how to do this, even if we write the best possible documentation.

The third is probably a little contentious. The fourth is actually practical, and hopefully less contentious.

Page 70: Django Files — A Short Talk

Form mediaHurtful assets

If you don’t know about form “media” (called this for historical reasons), then imagine you wanted to enable anything to do with forms — widgets, forms, model admin — to be able to specify CSS & JS files that have to be served with them, only the whole thing was implemented using sentient mould from space that wants to eat us.

It started off reasonably enough.

Page 71: Django Files — A Short Talk

Widgets, Forms, Admin• Widgets might have specific assets to render

properly: typically CSS & JS

• Forms might have specific assets to render properly, too. They’re made out of widgets and some other bits, so they use the same system

• Then individual admin screens (ModelAdmin) might have specific assets as well; they have Forms which have Widgets

• Amazingly this hasn’t escaped into View

Page 72: Django Files — A Short Talk

This is a good thingfrom django.contrib import adminfrom django.contrib.staticfiles.templatetags.staticfiles import static

class ArticleAdmin(admin.ModelAdmin):

# ...

class Media: js = [ '//tinymce.cachefly.net/4.1/tinymce.min.js', static('js/tinymce_setup.js'), ]

Voila, TinyMCE for your admin editing interface.

Oh, hang on. That `static` call.

Page 73: Django Files — A Short Talk

This isn’t a good thing

from django.contrib.staticfiles.templatetags.staticfiles import staticfrom django.contrib.admin.templatetags.admin_static import static

That can’t be right.

This is exactly what we were just talking about with template tags, only it’s infecting our admins, our forms, our widgets, our everything. Form media just shoves STATIC_URL on the front of anything that is a relative path, and leaves anything absolute alone. It’s completely ignorant of staticfiles and storage backends, so we have to do things explicitly.

And as before, there’s no good solution. Admin (as of 1.6) uses its helper, and you could too (PROCEED), but it isn’t obvious that you should. And, as previously even if we wrote documentation, it’s unlikely anyone would just guess they have to go looking for advice on this.

Get it wrong, and it won’t work properly with the hashing backends. It might not work properly with other backends.

Page 74: Django Files — A Short Talk

• #9357 Unable to subclass form Media class

• #12264 calendar.js depends on jsi18n but date widgets usingit do not specify as required media

• #12265 Media (js/css) collection strategy in Forms has no order dependence concept

• #13978 Allow inline js/css in forms.Media

• #18455 Added hooks to Media for staticfiles app

• #21221 Widgets and Admin's Media should use the configured staticfiles storage to create the right path to a file

• #21318 Clarify the ordering of the various Media classes

• #21987 Allow Media objects to have their own MEDIA_TYPES

• #22298 Rename Media to Static

Page 75: Django Files — A Short Talk

Some options• Fiat: all 3PAs use staticfiles explicitly

• Shim: all 3PAs use the admin approach explicitly

• Bless: move staticfiles into core; Media.absolute_path uses its storage backend

• Weakly bless staticfiles, ie Media.absolute_path uses the admin trick

Going back to options for fixing 3PAs: again, there are reasons to dislike basically all of these solutions. We should pick whatever we solve the last problem with; weakly blessing staticfiles seems like the safest solution to me.

However Form Media are troublesome for other reasons.

Page 76: Django Files — A Short Talk

This world’s a mess anyway• no convenient API to get (CSS, JS) media

• can’t dedupe between forms if you have many on one page

We can’t integrate form media with any kind of asset pipeline (spoilers!). We also can’t render out CSS links in the <head> and JS at the end of the <body>, which some of us still think is a pretty neat idea. (Why not? Apparently because the accessor methods return itertools.chain not lists. I haven’t looked into this at all.)

If you have multiple forms — say a subscription form and a search form — both with custom widgets, then any required assets will be deduplicated within each form, but not between them. So you might load jQuery twice. We could make View a MediaDefiningClass, but that doesn’t even sound a good idea even if you say it quickly while drunk.

Page 77: Django Files — A Short Talk

This world’s a mess anyway• no convenient API to get (CSS, JS) media

• can’t dedupe between forms if you have many on one page

• some things are global, eg jQuery, and can’t easily dedupe between a widget and a site-wide library

• to say nothing of different versions

Then some things are global; you just pull them into every page on your site. I’m going to say “jQuery” again, even though jQuery isn’t cool any more. If that upsets you, just imagine I’m saying something else. It doesn’t matter. We’d have to dedupe a global set of assets against everything bubbling out of widgets & forms.

And somewhere in the middle of this we remember that there are DIFFERENT VERSIONS of jQuery between Django admin, your custom widget, and some 3PA. It’s okay, they’re only 90 kilobytes each. That’s not going to cause you any problems on mobile browsers.

Page 78: Django Files — A Short Talk

Asset pipelines

What are these, and why do we need them? Number of interlocking concerns.

Page 79: Django Files — A Short Talk

What? Why?

• compilation

Compilation means you don’t have to write CSS or JS. You could write something like SCSS, which is CSS with extra features. You could write in ES6 and then compile it down to ES3, which works on more browsers. You could even use something based on YAML.

Compilers can output “source maps” so browsers can report errors against the source files rather than in the compiled versions.

Page 80: Django Files — A Short Talk

What? Why?

• compilation

• concatenation

Concatenation of CSS and JS means fewer HTTP requests. HTTP 1 is terrible at using the network, so this is a good thing. It’s less important with SPDY/HTTP 2, but that’s probably the least important reason for pipelines these days anyway.

Page 81: Django Files — A Short Talk

What? Why?

• compilation

• concatenation or linking/encapsulation

Encapsulation means that you write lots of Javascript, but very little escapes into the global namespace. There are tools, like Webpack and Browserify, that allow you to write Javascript libraries that include each other, and then “link” the ones you need together to make a single file, encapsulated so only the entry points you need are exposed.

Generally, the link/encapsulation step mean you don’t need a concatenation step for Javascript.

Page 82: Django Files — A Short Talk

What? Why?

• compilation

• concatenation or linking/encapsulation

• minification

Minification means you transfer less. This is obviously also good.

Page 83: Django Files — A Short Talk

What? Why?

• compilation

• concatenation or linking/encapsulation

• minification

• hashing and caching

And finally, hashing your URLs allows you to improve HTTP-level caching, which we’ve already talked about.

Sourcemaps either have to match the hash so the browser can find them automatically, or be explicitly hooked together.

Page 84: Django Files — A Short Talk

Writing the HTML

<!-- rendered.html --><script type='text/javascript' src='site.min.48fb66c7.js'><link rel='stylesheet' type='text/css' href=‘site.min.29557b4f.css'>

This is what we’re serving in production. You’re unlikely to want to write that into your HTML templates.

In this case we have two pipeline “targets”, site.js and site.css. They might come from three Javascript files and two CSS files respectively. (Or maybe Coffeescript and SCSS. It doesn’t hugely matter.)

Page 85: Django Files — A Short Talk

Focus on targets

<!-- external-syntax.html --><script type='text/javascript' src='{% static "site.js" %}'><link rel='stylesheet' type=‘text/css' href='{% static "site.css" %}'>

Either you want to give each target a name, and just use that…

Page 86: Django Files — A Short Talk

Focus on sources<!-- internal-syntax-1.html --><script type='text/javascript' src='menu.js'><script type='text/javascript' src='index.js'><link rel='stylesheet' type='text/css' href='nav.css'><link rel='stylesheet' type='text/css' href='footer.css'><link rel='stylesheet' type='text/css' href='index.css'>

<!-- internal-syntax-2.html -->{% asset js 'menu.js' %}{% asset js 'index.js' %}{% asset css 'nav.css' %}{% asset css 'footer.css' %}{% asset css 'index.css' %}

Or you could list the source files in your template directly. Then you’ll need something that either parses your HTML (at the top) or maybe have a custom syntax (at the bottom). This can result in each page on your site getting a different set of assets, making caching less helpful.

Page 87: Django Files — A Short Talk

Some asset pipelines

Let’s look at some real pipelines.

Page 88: Django Files — A Short Talk

Rails / Sprocket

<%= stylesheet_link_tag "application", media: "all" %><%= javascript_include_tag "application" %>

Rails has an asset pipeline called Sprocket (it used to be part of Rails, now it’s available for all Rack apps). You might reference CSS stylesheets and Javascript assets like this.

Page 89: Django Files — A Short Talk

Rails / Sprocket

//= require home//= require moovinator//= require slider//= require phonebox

Then you’d have a “manifest file” that looks like this, which would pull in home.js, moovinator.js and so forth, taking care of concatenation. There are different paths that Sprocket will search to find them, so you can group application code, your libraries and vendored libraries separately.

Coffeescript gets compiled to Javascript, and LESS & SASS/SCSS to CSS. Other languages can be supported by plugging in extra engines. It also has support for ERB and some other templates languages, which allows you to do things like this:

Page 90: Django Files — A Short Talk

Rails / Sprocket

.class { background-image: url( <%= asset_path 'image.png' %> )}

This is because in production, the Sprockets pipeline takes care of adding a hash to the URL. This way any asset in the pipeline can refer to any other asset and it will all work.

The last step in the pipeline is compression / minification. You can write your own wrappers if it doesn’t support the tool you want.

Page 91: Django Files — A Short Talk

Rails / Sprocket

rake assets:precompile

For production, you generally want to precompile your assets, which will run the pipeline over anything it finds in the right directories, outputting them to the public serving directory; it applies hashes at the same time. It’s kind of like `collectstatic`, but with a pipeline in front (and no support for Storage backends, so no automatic upload to S3, for instance).

Page 92: Django Files — A Short Talk

Rails / Sprocket

rake assets:clean

There’s also a neat command assets:clean which deletes old versions of assets, keeping the last three. This prevents an ever-increasing collection of versioned assets, while giving some confidence that cached HTML that refer to older assets isn’t pointing at things that no longer exist. In a world with multiple deploys a day, you probably actually want to clean up on date rather than count, but generally this is functionality we should consider adding to staticfiles.

Page 93: Django Files — A Short Talk

Sprocket clones

• asset-pipeline (Express/node.js)

• sails (node.js)

• grails asset pipeline (Groovy)

• Pipe (PHP)

There are also some that aren’t direct Sprocket clones, like Symfony’s pipeline and sbt-web for Play, but there isn’t much to learn from them, at least compared to…

Page 94: Django Files — A Short Talk

node.js

• express-cdn

• Broccoli

• Sigh

• gulp

• Webpack

In the node.js world, it’s increasingly common for the framework to have no pipeline, just asset location that the pipeline feeds into. express-cdn does that for the Express framework; Broccoli, Sigh & gulp are asset pipelines that tend to have plugins for Express and/or common template languages to take care of the asset locating. Webpack is kind of a Javascript compiler & linker which has grown into an asset pipeline — although really for JS only.

Page 95: Django Files — A Short Talk

gulp (node.js)var gulp = require('gulp');var sass = require('gulp-sass');var sourcemaps = require('gulp-sourcemaps');var rev = require('gulp-rev');

gulp.task('default', ['compile-scss']);

gulp.task('compile-scss', function() { gulp.src('source/stylesheets/**/*.scss') .pipe(sourcemaps.init()) .pipe(sass( {indentedSyntax: false, errLogToConsole: true } )) .pipe(sourcemaps.write()) .pipe(rev()) .pipe(gulp.dest('static'));});

Pipelines can be built on a per-project basis, using gulp (or similar) to glue different tools together.

In this case, we’re using separate tools to manage sourcemaps, SCSS compilation, and hashing (the `rev` plugin).

Page 96: Django Files — A Short Talk

gulp (node.js)var gulp = require('gulp');var sass = require('gulp-sass');var sourcemaps = require('gulp-sourcemaps');var rev = require('gulp-rev');

gulp.task('default', ['compile-scss']);

gulp.task('compile-scss', function() { gulp.src('source/stylesheets/**/*.scss') .pipe(sourcemaps.init()) .pipe(sass( {indentedSyntax: false, errLogToConsole: true } )) .pipe(sourcemaps.write()) .pipe(rev()) .pipe(gulp.dest('static')) .pipe(rev.manifest()) .pipe(gulp.dest('static'));});

If you also want a manifest file, similar to how ManifestStaticFilesStorage works, then you just chain a bit more to the end of the pipeline. You aren’t constrained by having just a compiler followed by a compressor.

This is pretty Unixy: small tools that don’t try to do too much. (Although Webpack tries to do too much, but its plugins are small.)

The node.js world is one place where a lot of interesting activity about managing frontend development is happening, so it’s worth paying attention to. Starting to be used outside node, eg by Rails folks.

http://blog.arkency.com/2015/03/gulp-modern-approach-to-asset-pipeline-for-rails-developers/

Page 97: Django Files — A Short Talk

Django options

• Plain django (external pipeline + staticfiles)

• django-compressor

• django-pipeline

I’ve skipped the other options, because they tend to fall into one of three camps:

1. Abandoned2. Technology specific (eg RequireJS, LESS)3. No py3 support

Page 98: Django Files — A Short Talk

Plain Django

• Pipeline external to Django (use what you want)

• Hashes computed by staticfiles

• Sourcemap support is fiddly if you want hashes

You can use any tools with this, and you don’t have to change Django or its configuration. If you want hashes, you have to do that in staticfiles.

Sourcemaps you have to have a //# sourceMappingURL=/path/to/file.js.map, bypassing the hash. This is fine providing you know to do it.

Page 99: Django Files — A Short Talk

django-compressor

• Integrated pipeline, supports precompilers &c

• Source files listed in templates

• Integrated hashing

• Can be used with staticfiles, but feels awkward

• Can support sourcemaps, via a plugin

Very popular. In common configurations this means there’s lots of help for you. The pipeline runs on demand, and can be configured to use different precompilers and compressors, much like Sprockets.

It parses your HTML, which just feels plain wrong to me, although it makes it simpler to explain to people who mostly care about HTML+CSS+JS and don’t want to worry too much about production/deployment. Also is fast entering abandonware territory.

Page 100: Django Files — A Short Talk

django-pipeline

• Internal pipeline, supports precompilers &c

• Source to output mapping in Django settings

• Integrates with staticfiles better than compressor

• Hashing via staticfiles

• Doesn’t support sourcemaps directly

A lot of people seem to dislike the config. It sits within Django settings, which makes it feel worse; if it put them in separate files for each output asset, it’d feel a lot like Sprockets.

Page 101: Django Files — A Short Talk

Django + webpack

• webpack-bundle-tracker + django-webpack-loader (Owais Lone 2015)

• Pipeline run by Webpack, emits a mapping file

• Template tag to resolve the bundle name to a URL relative to STATIC_ROOT

Wait. This wasn’t on the list! It’s technology specific, but I think points in an interesting direction, because it allows hashing to be done outside Django. We could maybe abstract it, so there’s a manifest (in the deployment artefact) that maps abstract bundle names to relative paths (for all languages, not just JS).

We can actually almost do this already with ManifestStaticFilesStorage, but we need to be able to disable its postprocessing step during collectstatic, and it would have to be able to run in DEBUG mode.

Page 102: Django Files — A Short Talk

Django options

• django-compressor: fixed pipeline

• django-pipelines: fixed pipeline (+ config woes)

• staticfiles: doesn’t get hashes right

• webpack-loader: isn’t generic

The first two have pretty inflexible pipelines (also compressor is effectively abandoned). Staticfiles on its own is quite limited, at least at the moment, by the flexibility of its hashing approach. The webpack integration isn’t generic, and isn’t flexible: it’s only for Javascript managed by Webpack; but may show the way to a better future where we combine staticfiles without hashing with a separate pipeline that does hashes and anything else you might want.

Page 103: Django Files — A Short Talk

The future?• pipeline builds named bundles into output files

• pipeline writes manifest.json: a mapping of bundle name to output filename

• staticfiles storage reads in manifest.json on boot

• templates refer to the bundle name

• useful for staticfiles to be able to list static directories (eg for node pipeline search paths)

The last bullet is something that Marc mentioned to me last night, and would be pretty easy to implement. All of it’s pretty easy to implement, really.

Although we still have a problem with 3PA.

Page 104: Django Files — A Short Talk

Third-party apps

• can’t cooperate with your project’s pipeline

• don’t want to force a dependency on a pipeline

• so must precompile into files in your sdist

• possibly for staticfiles to sweep up (but we’ve discussed this bit before)

3PA cannot participate in your project’s pipeline, because there are too many options and it may be completely external to Django. If they depend on a specific pipeline, that may not play well with your deployment strategy (more work!) and you have more dependencies.

So they HAVE to run any pipeline they want when building the package sdist. At the least this means lots of bitty entities served over HTTP (HTTP 2 does make this easier). Also may mean, eg, you can’t run doiuse on your pipeline alone and detect potential browser compatibility problems.

Page 105: Django Files — A Short Talk

What next?

1% of the code has really complex implications. Abstractions everywhere, many of them hiding bugs: over 4% of Django’s open bugs are related to files. Some bits, like form media, aren’t remotely integrated with other bits like staticfiles. While we can adapt to completely new ideas (like gulp pipelines), you have to know how to build things yourself to meet best practice.

Page 106: Django Files — A Short Talk

What next?• bless or semi-bless staticfiles?

• deprecate CachedStaticFilesStorage?

• document the boundaries of our hashing?

• rename? kill? expand? form.Media

• asset management / document external pipelines

• fix some bugs ;-)

3) staticfiles probably warrants some carefully stated boundaries. For simple use, it’s fine that it does hashing. However we’re possibly doing it the wrong way, and we should definitely accept its limits and encourage people to move onto other things when they hit them.

Page 107: Django Files — A Short Talk

• #9433 File locking broken on AFP mounts

• #17686 file.save crashes on unicode filename

• #18233 file_move_safe overwrites destination file

• #18655 Media files should be served using file storage API

• #22961 StaticFilesHandler should not run middleware on 404

Page 108: Django Files — A Short Talk

Thanks, Deinonychus Antirrhopus.

Page 109: Django Files — A Short Talk

😨🐉

• Durbed (durbed.deviantart.com) under CC By-SA 3.0

• “Happy New Year from Hell Creek”

• “Primal feathers”

Page 110: Django Files — A Short Talk

James Aylett@jaylett