“Data can be the source of data journalism, or it can be the tool with which the story is told — or it can be both.”— Paul Bradshaw
As the immensity of digitalized information continues to expand, so do opportunities for journalists. You may be familiar with how to tell a story with data. The ability to use raw data to craft a story, though, is becoming a valuable skill to possess in communicating science. To help you sort through some of the opportunities and challenges in this emerging field, KSJ has provided a collection of resources and tools for finding, analyzing, and presenting data — as well as further immersion into the field of data journalism.
Introduction to Data Journalism
- What Is Data Journalism? A clear summary in the introduction of Data Journalism Handbook, which is available for free under the Creative Commons license.
- Here is another explanation by the Knight Center at the University of Texas.
- 5 tips for getting started in data journalism: an article by Associated Press editor Troy Thibodeaux
- MulinBlog published an interview with three data journalists on essential skills.
- Understanding Data Journalism: Overview of sources, tools, topics.
- 9 Must-read Books for Beginners
- 10 tools for the data journalist tool belt: for various aspects of data journalism. Another piece by Troy Thibodeaux.
- Storybench: A website maintained by Northeastern University’s School of Journalism that explores the use of data in story telling and visualization, with discussions of tools, designs and innovations.
- Journalism in the Age of Data: A video report on data visualization as a storytelling medium.
- Data journalism for every scale and skill level: A panel discussion from ScienceWriters 2015
- Health Datapalooza: “Focused on liberating health data, and bringing together…the newest and most innovative and effective uses of health data to improve patient outcomes.”
- Investigative Reporters & Editors (IRE): Annual conference devoted to data journalism
Toolkits and Programs
Tools for working with data
- Data acquisition: DocumentCloud, OpenRefine
- Analysis: Excel, Google Spreadsheets, SQL, CSVKit, R
- Presentation: Excel, Overview, R, Google Fusion Tables, Tableau Public, QGIS
- National Institute of Computer Assisted Reporting (NICAR): Tools aiding in the reporting and publishing process
- QuickCode: Useful for acquiring, analyzing, and presenting data
- “Dollars for Doctors.” A series of programming and technical guides on how ProPublica collected data for its Dollars for Docs news application. (Extracting information from websites)
- Journalists’ Toolbox/Digital Tools. An assortment of story-telling tools, data scraping tools, and other handy information from the Society of Professional Journalists.
Tools and databases for visualization
- Vizzuality: Cool maps and graphics on a range of topics
- Flowing Data
- Andy Kirk: Visualizing Data
- Alberto Cairo: The Functional Art
- D3: interactive / web-based
- Inkscape: free version of Illustrator (graphic design software program)
Programming and Computer Science
- Hacks/Hackers: A journalism organization that seeks to bridge information seeking and storytelling. Journalists, developers, and designers around the world can meet in local chapters.
- Open: A New York Times blog about code and development
- Stack Overflow: A question-and-answer site for programmers, from beginning to professional skill levels
- Code Academy: Learn a variety of programming skills and languages, for free
- Carto: WebGIS (Graphic Information Systems) with a “free” plan
- Mapbox: Another webGIS
- UC Berkeley Geospatial Innovation Facility: Programs and tutorials for geospatial data
- GeoJournalism: Tools to produce multimedia stories or simple maps and data visualization to help creating context for complex environmental issues
- UNData: This tool allows you to search for country level statistics on a wide range of topics from across all UN agencies (e.g., UN Statistics Division, FAO, UNESCO, etc.)
- NICAR: This database library from the National Institute for Computer-Assisted Reporting (NICAR) covers a variety of topics, including health and the environment.
Medicine and Health
- HealthData: Access to over 1000 health datasets.
- The World Health Organization (WHO) has numerous databases, including the Global Health Observatory and health statistics and information systems.
- Dollars for Docs: Search for general payments pharmaceutical and medical device companies made to U.S. doctors in 2013-2014 (ProPublica database)
- Centers for Medicare and Medicaid (CMS) Data Navigator
- Agency for Healthcare Research and Quality (AHRQ): Database covering U.S. medical topics including health care, cost of care, trends in hospital care, health insurance coverage, out-of-pocking spending, and patient satisfaction.
- Global Trigger Tool: Measures Adverse Events (AEs) in healthcare organizations
- Leapfrog Hospital Survey: Reports on safety and quality performance of U.S. hospitals. Categories include maternity care, high-risk surgeries, and hospital-acquired conditions.
- ICD10Data: Index of diagnosis and medical procedure billing codes.
- NIH Health Statistics subject guide in the U.S. National Library of Medicine provides a diverse list of references for health data around the world.
- Human Connectome Project: National Institutes of Health (NIH) funded project to map neural pathways.
Environment and Sustainability
- U.S. Geological Survey Science data catalog
- National Climatic Data Center Information provided by the National Oceanic and Atmospheric Administration
- Climate Central: Find information on the impacts of climate change; independent organization of scientists and journalists focused on telling the story of climate change.
- Climate Data: Data related to climate change for America’s communities, businesses, and citizens
- Sustainability Competitiveness Index: World Economic Forum (WEF) index focused on the environmental side of economic competitivenes
- Environmental Performance Index (EPI): 2014 EPI country rankings and associated data; data explorer; case studies on indicators in practice
- Global Forest Watch: Satellite derived maps of deforestation data; create custom maps, analyze forest trends, subscribe to alerts, or download data for their local area or the entire world
- NASA EarthData: WorldView tool allows visualization of near-realtime imagery from NASA satellites related to fires, dust,ash clouds, air quality, drought, floods, etc.; other mapping and visualization tools such as FIRMS for fires; access to more than a dozen NASA data centers and associated satellite data products
- UNEP Live: Environmental indicators on multiple issues, at multiple scales, in multiple formats from the United Nations Environment Program.
- World Resources Institute (WRI): A variety of data and information on the environment. WRI is a non-profit global research organization founded in 1982 with members in more than 50 countries.
Population and Socioeconomic Data
- Population Estimation Service: Hosted by SEDAC (the Socioeconomic Data and Applications Center at Columbia University). Draw a polygon around an area (e.g., a cyclone track or toxic release) and get the population in that polygon in real time; iOS app under development
- Population Reference Bureau (PRB): 2014 World Population Data Sheet including interactive map, population clock, etc.
- Human Development Report Data: Human Development Index; Public Data Explorer; Multidimensional Poverty Index
- Socioeconomic Data and Applications Center (SEDAC): Gridded population data, poverty maps, infrastructure data (dams, nuclear plants, roads), PM2.5 maps, national estimates of population and land area by climate/biome/elevation (PLACEv3); map gallery available under Creative Commons licenses; country treaty participation data
- World Bank Data: Includes the World Development Indicators and free and open access to data about development in countries around the globe; thematic portals for agriculture and rural development, climate change, environment and urban development
- OECD QWIDS: International development statistics; OECD is the France-based Organization for Economic Cooperation and Development.
- TIGER data: From the U.S. Census of population.
- Ekuatorial: Indonesia specific datasets on marine, forests, and natural disasters
- InfoAmazonia: An aggregation of environmental datasets for the 9 countries of the Amazon basin including forests, watersheds, industries and indigenous lands
- Open Development Mekong: Development tracker focused on the Mekong
- Third Pole Data Network: An open source geospatial database: a simple, searchable catalog of water-related datasets sourced from leading organization monitoring water in Asia
References and Further Reading
- “Computational Journalism,” by Sarah Cohen, James T. Hamilton, and Fred Turner.
- Communications of the Association of Computing Machinery
- A comprehensive list of resources for data journalism; includes blogs, books, and conferences
- Facts Are Sacred: A blog about data journalism and data visualization, then published into a book.
- Data Journalism at the Guardian: A “10-point guide” by Simon Rogers, data editor.
- Frontiers in Massive Data Analysis: Report in The National Academies Press that examines tools, skills, and approaches toward analyzing massive amounts of data.
- Data Journalism in Action: ProPublica’s Surgeon Scorecard. Hosted by WNYC.