“Data can be the source of data journalism, or it can be the tool with which the story is told — or it can be both.”— Paul Bradshaw
As the immensity of digitalized information continues to expand, so do opportunities for journalists. You may be familiar with how to tell a story with data. The ability to use raw data to craft a story, though, is becoming a valuable skill to possess in communicating science. To help you sort through some of the opportunities and challenges in this emerging field, KSJ has provided a collection of resources and tools for finding, analyzing, and presenting data — as well as further immersion into the field of data journalism.
Introduction to Data Journalism
- “What Is Data Journalism?” a clear summary in the introduction of Data Journalism Handbook, which is available for free under the Creative Commons license.
- “Data-driven journalism, explained,” by the Knight Center at the University of Texas.
- “5 tips for getting started in data journalism,” an article by Associated Press editor Troy Thibodeaux
- “Essential data journalism skills,” a conversation with three data journalists, published by MulinBlog.
- “Understanding Data Journalism,” an overview of sources, tools, topics, published by Harvard University’s Shorenstein Center.
- “9 Must-read Books for Beginners,” by Adrian Blanco.
- “10 tools for the data journalist tool belt,” a guide by Troy Thibodeaux covering various aspects of data journalism.
- Storybench, a website maintained by Northeastern University’s School of Journalism that explores the use of data in story telling and visualization, with discussions of tools, designs and innovations.
- “Journalism in the Age of Data,” a video report on data visualization as a storytelling medium.
- Health Datapalooza, an annual conference focused on “turning information into innovation by supporting a healthy exchange of ideas.”
- Investigative Reporters & Editors (IRE), an annual conference devoted to data journalism.
Toolkits and Programs
Tools for working with data
- Data acquisition tools: DocumentCloud, OpenRefine
- Analysis tools: Excel, Google Spreadsheets, SQL, CSVKit, R
- Presentation tools: Excel, Overview, R, Google Fusion Tables, Tableau Public, QGIS
- National Institute of Computer Assisted Reporting (NICAR), a site that provides tools and tutorials useful for getting and analyzing electronic information.
- “Dollars for Doctors,” a series of tools and technical guides related to ProPublica’s Dollars for Docs search tool.
- Journalists’ Toolbox’s “Digital Tools,” an assortment of story-telling tools, data scraping tools, and other handy information from the Society of Professional Journalists.
Tools and databases for visualization
- Vizzuality: Cool maps and graphics on a range of topics
- Flowing Data
- Andy Kirk: Visualizing Data
- Alberto Cairo: The Functional Art
- D3: interactive / web-based
- Inkscape: free version of Illustrator (graphic design software program)
Programming and Computer Science
- Hacks/Hackers: A journalism organization that seeks to bridge information seeking and storytelling. Journalists, developers, and designers around the world can meet in local chapters.
- Open: A New York Times blog about code and development
- Stack Overflow: A question-and-answer site for programmers, from beginning to professional skill levels
- Code Academy: Learn a variety of programming skills and languages, for free
- Carto: WebGIS (Graphic Information Systems) with a “free” plan
- Mapbox: Another webGIS
- UC Berkeley Geospatial Innovation Facility: Programs and tutorials for geospatial data
- GeoJournalism: Tools to produce multimedia stories or simple maps and data visualization to help creating context for complex environmental issues
- UNData: This tool allows you to search for country level statistics on a wide range of topics from across all UN agencies (e.g., UN Statistics Division, FAO, UNESCO, etc.)
- NICAR: This database library from the National Institute for Computer-Assisted Reporting (NICAR) covers a variety of topics, including health and the environment.
Medicine and Health
- HealthData: Access to over 1000 health datasets.
- The World Health Organization (WHO) has numerous databases, including the Global Health Observatory and health statistics and information systems.
- Dollars for Docs: Search for general payments pharmaceutical and medical device companies made to U.S. doctors in 2013-2014 (ProPublica database)
- Centers for Medicare and Medicaid (CMS) Data Navigator
- Agency for Healthcare Research and Quality (AHRQ): Database covering U.S. medical topics including health care, cost of care, trends in hospital care, health insurance coverage, out-of-pocking spending, and patient satisfaction.
- Global Trigger Tool: Measures Adverse Events (AEs) in healthcare organizations
- Leapfrog Hospital Survey: Reports on safety and quality performance of U.S. hospitals. Categories include maternity care, high-risk surgeries, and hospital-acquired conditions.
- ICD10Data: Index of diagnosis and medical procedure billing codes.
- NIH Health Statistics subject guide in the U.S. National Library of Medicine provides a diverse list of references for health data around the world.
- Human Connectome Project: National Institutes of Health (NIH) funded project to map neural pathways.
Environment and Sustainability
- U.S. Geological Survey Science data catalog
- National Centers for Environmental Information, maintained by the National Oceanic and Atmospheric Administration.
- Climate Central, a hub for information on the impacts of climate change, run by an independent organization of scientists and journalists.
- Climate Data, a site that offers data related to climate change for America’s communities, businesses, and citizens.
- “The Measurement of Sustainable Competitiveness,” a guide to the World Economic Forum’s sustainability-adjusted Global Competitiveness Index, which considers the environmental side of economic competitiveness.
- Environmental Performance Index (EPI), a measure that ranks countries based on 24 performance indicators covering aspects of environmental health and ecosystem vitality.
- Global Forest Watch, a site featuring satellite-derived maps of deforestation data that allows users to create custom maps, analyze forest trends, subscribe to alerts, or download data for their local area or the entire world.
- NASA EarthData, which includes a WorldView tool that allows visualization of near-realtime imagery from NASA satellites related to fires, dust, ash clouds, air quality, drought, floods, and other events. The site also has other mapping and visualization tools, such as FIRMS for fires, and offers access to more than a dozen NASA data centers and associated satellite data products.
- UNEP Live, a hub for environmental indicators on multiple issues, at multiple scales, and in multiple formats, maintained by the United Nations Environment Program.
- World Resources Institute (WRI), a non-profit global research organization providing a variety of data and information on the environment.
Population and Socioeconomic Data
- Population Estimation Service: Hosted by SEDAC (the Socioeconomic Data and Applications Center at Columbia University). Draw a polygon around an area (e.g., a cyclone track or toxic release) and get the population in that polygon in real time; iOS app under development
- Population Reference Bureau (PRB): 2014 World Population Data Sheet including interactive map, population clock, etc.
- Human Development Report Data: Human Development Index; Public Data Explorer; Multidimensional Poverty Index
- Socioeconomic Data and Applications Center (SEDAC): Gridded population data, poverty maps, infrastructure data (dams, nuclear plants, roads), PM2.5 maps, national estimates of population and land area by climate/biome/elevation (PLACEv3); map gallery available under Creative Commons licenses; country treaty participation data
- World Bank Data: Includes the World Development Indicators and free and open access to data about development in countries around the globe; thematic portals for agriculture and rural development, climate change, environment and urban development
- OECD QWIDS: International development statistics; OECD is the France-based Organization for Economic Cooperation and Development.
- TIGER data: From the U.S. Census of population.
- Ekuatorial: Indonesia specific datasets on marine, forests, and natural disasters
- InfoAmazonia: An aggregation of environmental datasets for the 9 countries of the Amazon basin including forests, watersheds, industries and indigenous lands
- Open Development Mekong: Development tracker focused on the Mekong
- Third Pole Data Network: An open source geospatial database: a simple, searchable catalog of water-related datasets sourced from leading organization monitoring water in Asia
References and Further Reading
- “Computational Journalism,” by Sarah Cohen, James T. Hamilton, and Fred Turner.
- Communications of the Association of Computing Machinery
- A comprehensive list of resources for data journalism; includes blogs, books, and conferences
- Facts Are Sacred: A blog about data journalism and data visualization, then published into a book.
- “Data Journalism at the Guardian“: A “10-point guide” by Simon Rogers, data editor.
- “Frontiers in Massive Data Analysis“: Report in The National Academies Press that examines tools, skills, and approaches toward analyzing massive amounts of data.
- “Data Journalism in Action“: ProPublica’s Surgeon Scorecard, hosted by WNYC.