CrowData is a tool to collaborate on the verification or release of data that otherwise would be hard or impossible to get via automatic tools. This is the software we used to create VozData.

In 2014, La Nacion in Argentina launched VozData, a website to crowdsourced senate spendings by asking people to transcribe information from 6500 scanned PDF documents from the senate. This is the code that created that website and it can be used with any document set and any data you may need to take from them.

VozData: collaborating to free data from PDFs: A really nice article about the process of creating VozData from La Nacion.

Install Locally

  1. Python 2.7.5

  2. We recommend the use of virtualenv — Install it.

  3. Create a virtual environment and activate it:

    virtualenv ~/.python-envs/crowdata
    . ~/.python-envs/crowdata/bin/activate
  4. Get the source code:

    git clone crowdata
    cd crowdata
  5. Install dependencies:

    Ubuntu users: before you can move forward, please make sure you have the following packages installed: python-dev, postgresql-9.3, postgresql-server-dev-9.3, postgresql-contrib, and libgeos-dev

    pip install -r requirements.txt
  6. Create PostgreSQL database

    $ createuser -s -h localhost crow_user
    $ createdb -O crow_user -h localhost crowdata_development
  7. Create extensions for doing trigram matching and removing accents in PostgreSQL

    $ psql -Ucrow_user crowdata_development
    crowdata_development=# CREATE EXTENSION pg_trgm;
  8. We keep local settings outside GIT. You will need to copy to You will need to edit the database settings there.

        'default': {
            'ENGINE': 'django.db.backends.postgresql_psycopg2', # Add 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
            'NAME': 'crowdata_development',                      # Or path to database file if using sqlite3.
            'USER': 'crow_user',
            'PASSWORD': '',
            'HOST': '',
            'PORT': '',
  9. Install the GEOS library in case you don't have installed already.

  10. Initialize the database:

    python syncdb
    python migrate --all
  11. Ask a team member for a database backup and load it.

    pg_restore --dbname=crowdata_development --verbose ~/my_backup.backup --clean
  12. Create superuser

    python createsuperuser

    and follow the prompts.

  13. Start the development server

    python runserver_plus
  14. Navigate to http://localhost:8000/admin/ and log in with your superuser credentials.

Installing via Docker

  1. Set your environment variables

There are 6 required environment variables.

set each of them with:

export [var name]=[value you want] (i.e. export crowdata_USER="beyonce")

  1. Build your image with

cat Dockerfile | envsubst | sudo docker build -t lanacion/crowdata -

  1. Once it's built, run the server with

sudo docker run -i -t -d lanacion/crowdata python /crowdata/ runserver_plus && tail -f /dev/null

When creating a document set

If you are going to use document cloud to load and view the PDF documents, then you will have to set the 'head html' in the admin, when creating the document set:

<script src=""></script>

and the template function:

// Javascript function to insert the document into the DOM.
// Receives the URL of the document as its only parameter.
// Must be called insertDocument
// JQuery is available yeah
// resulting element should be inserted into div#document-viewer-container

function insertDocument(document_url) {
  var url = document_url.match(/(.+)\.html$/)[1];
  DV.load(url + '.js', {
    container : 'div#document-viewer-container', width:650,height:835,sidebar:false});

When importing documents to a 'document set' via CSV upload

There is an option 'Add Documents to this document set' in the admin for the document set. You can upload a CSV with columns document_title and document_url. This will create documents in the document set with that name and link to that url.

CrowData's copyright is © 2013 Manuel Aristarán CrowData was developed with Open News and La Nacion Argentina.

Crowdata is an open source project that was born when Manuel Aristaran was an Open News fellow at La Nacion in 2013. It was finally released as free software when Gabriela Rodriguez continued it for VozData in 2014. Thanks to Cristian Bertelegni and La Nacion for contributing to the code.

Now it relies on contributions from people and organizations. Please, use it, comment on it and make improvements by pull requests in 'GitHub'.