OpenNews Code and projects

OpenNews supports developers inside and outside of newsrooms in creating code that helps journalism thrive on the open web. We believe that the code being written in news transforms not only the industry, but the web itself.

Projects by the Knight-Mozilla Fellows

Our Fellows spend ten months hacking in some of the best newsrooms in the world, following their passions and creating compelling open-source projects. Here are just a few of the many things they've developed:

  • Tabula
    Tabula is a tool to extract tabular data from PDFs. The project was created by 2013 Knight-Mozilla Fellow Manuel AristarĂ¡n, with the help of 2013 Knight-Mozilla Fellow Mike Tigas and his colleague at ProPublica Jeremy B. Merrill.
    Key libraries :
    On the web | On GitHub
  • Hyperaudio
    Mark Boas' Hyperaudio is a tool to facilitate the easy assemblage of audio and video programs from their underlying transcripts. The ongoing aims are to create something usable that works with both audio and video and allows transitions and overlaying to be specified via in-pad natural language instructions and to build up a library of material and to integrate with other third-parties such as Amara (formerly Universal Subtitles). Hyperaudio is currently a project with Mozilla's WebFWD. In addition to Mark Boas, 2012 Knight-Mozilla Fellow Daniel Schultz and Matteo Spinelli are on the development team.
    On the web | On GitHub
  • Dataset
    Dataset is a tool from 2013 Knight-Mozilla Fellow Friedrich Lindenberg and Gregor Aisch to make it easier to manage databases in Python. Dataset makes it easier to import and export databases: "databases for lazy people."
    On the web | On GitHub
  • Learning Lunches
    2013 Knight-Mozilla Fellow Noah Veltman began organizing informal discussions with colleagues about technical topics. He's shared the materials from these discussions on GitHub. Topics have included databases, maps, and web scraping.
    On GitHub

Code Convenings

OpenNews also brings together groups of newsroom developers and other open-source contributors to collaborate on shared codebases and libraries. Whether it is spending time open-sourcing work already built in the newsroom so that others can benefit from it too or working on new projects together, we believe that collaborating on shared codebases helps move newsroom development–and the web–forward.

Projects from our first code convening:

  • Pym.js: An NPR library enabling responsive iframes for embedded graphics
  • PourOver and Tamper: A New York Times library and protocol pair that let you quickly filter datasets with thousands of records, right in the browser
  • Landline + Stateline: A ProPublica tool for creating easy SVG maps that work across all browsers
  • FourScore: A WNYC graphic template for capturing reader sentiments in an elegant 2D chart

Code from Hack Days

We’ve sponsored more than 40 hack days around the world where journalists and developers have worked with data from censuses, elections, campaign finance, and more.

Some projects that got their start at hack days include:

  • CivOmega: this project got its start at the 2013 Knight-MIT-Mozilla hack day. It allows people to ask questions of legislative data and was recently awarded a Sunlight Foundation OpenGov Grant.
  • HackDash: was originally developed at the 2012 Hacks/Hackers Buenos Aires Media Party and a year later, this tool for organizing hackathon projects powered the hack day at the 2013 event.
  • NewsDiffs: began at the 2012 Knight-MIT-Mozilla hack day as a way to track changes to articles and headlines. It now tracks an archives changes to articles on five news sites.
  • Treasury.io: began as a project called FMS parser at the Bicoastal Datafest. It was developed by the CSV Soundsystem hacker team, which includes 2013 Knight-Mozilla Fellow Brian Abelson, to help track the US government's virtual checkbook.

Code from Code Sprints

We developed Code Sprints to help create some of the small, simple tools that can have a big impact in newsrooms.

Our Code Sprint projects include:

  • Sheetsee.js: Easy data visualizations using a simple spreadsheet backend.
  • Dedupe: A library for deduplication, entity resolution, record linkage, and author disambiguation of big datasets.
  • Treasury.io: A parser and API for the daily cash balance updates from the US Treasury.
  • California Election Parser: A parser for election data used by over 200 California news sites in 2012.

We’d love to develop more Code Sprints. Learn more about the program and apply