Python & Django

Django: dealing with Unicode in CSV 3

Sometimes you are presented with a task to export some model data from a Django app into old plain CSV format. No problem! Standard-issue csv library does the trick easily:

...
with open("accounts.csv") as f:
    writer = csv.DictWriter(f, ACCOUNT_FIELDS_TO_EXPORT)
    for account in models.Account.objects.all():
        acc = {}
        for field in ACCOUNT_FIELDS_TO_EXPORT: ch[field] = getattr(account, field)
        writer.writerow(acc)
...

However, should even a single Unicode character appear in your table, you’ll be presented with an UnicodeEncodeError:

File "C:\Python26\lib\csv.py" in writerow
    144.         return self.writer.writerow(self._dict_to_list(rowdict))
Exception Type: UnicodeEncodeError at /account/csv/
Exception Value: ('ascii', u'Hello There Cars\u2014 Real Deal!', 15, 16, 'ordinal not in range(128)')

That’s right.  No official support for Unicode does exist in 2.x branch of Python version of csv library.  Third-party solutions like this one either monkey-patch the existing library or reinvent the bicycle by creating a stand-alone CSV library. If you’re dealing with a lot of CSV output code in your application, it makes sense to modify your (both development and deployment) environment, however sometimes the benefits aren’t worth the spent time.

Django documentation has a brilliant suggestion for Unicode problem workaround in their Outputting CSV How-To. Right, Django was designed with Unicode support in mind, so, of course, their template system has Unicode support.

So, the idea is to replace DictWriter instance calls for a template-based custom function:

from django.template import Context, Template
...
def get_csv(field_list, data):
    csv_line = ", ".join(['"{{ row.%s|addslashes }}"' % field for field in field_list])
    return Template(template).render(Context({"row": data}))
...
with open("account.csv") as f:
    for account in models.Account.objects.all():
        acc = {}
        for field in ACCOUNT_FIELDS_TO_EXPORT: ch[field] = getattr(account, field)
        f.write( get_csv(ACCOUNT_FIELDS_TO_EXPORT, acc)+"\n" )
...

This approach does work as expected, though one may lessen template rendering overhead by rendering whole file at once:

 ...
def get_csv_from_dict_list(field_list, data):
    csv_line = ", ".join(['"{{ row.%s|addslashes }}"' % field for field in field_list])
    template = "{% for row in data %}"+ csv_line +"\n{% endfor %}" # No %s formatting to avoid screening percent sign
    return Template(template).render(Context({"data": data}))
...
accounts = []
for account in models.Account.objects.all():
    acc = {}
    for field in ACCOUNT_FIELDS_TO_EXPORT: ch[field] = getattr(account, field)
    accounts.append(acc)

with open("account.csv") as f:
    f.write(get_csv_from_dict_list(ACCOUNT_FIELDS_TO_EXPORT, accounts))
...

3 Responses to “Django: dealing with Unicode in CSV”

  1. Lisa Van says:

    Thanks for your article. I am new at python and this will be a big help.

  2. [...] Kollege, Anatoly Ivanov hat das Problem auch erkannt und präsentiert hier drei Codeschnipsel: Sein Ansatz benutzt folgerichtig das Unicode-fähige Template-System, aber er schreibt auf eine [...]

  3. André Duarte says:

    I´m using the unicode writer from http://docs.python.org/library/csv.html#csv-examples.

    worked for me. Hope to help someone.

Leave a Reply