Over the years, data journalists have told some truly remarkable stories using datasets that were never meant to see the light of day. Ask any layperson to name an example of a famous piece of data journalism, and there’s a good chance they’ll point to a story involving a major data breach — like the Guardian’s reporting of the Edward Snowden NSA leak in 2013, or any of the numerous revelations that have come out of organizations like WikiLeaks. However, that doesn’t mean that you have to make friends with a government whistleblower before you can start telling stories with data. In fact, most great data journalism is the result of insights unearthed from the vast troves of government and other public data that are freely available online. Here, we’ve compiled a selection of some of the most useful and fascinating databases that are available today.
- UNData. Maintained by the United Nations, this web-based data service allows users to access the UN’s extensive collection of international statistical databases through a single, convenient entry point.
- Data.gov. Built with the aim of making the government more open and accountable, data.gov is an open data site that compiles data from federal, state, city, and county-level agencies in standardized, machine-readable formats.
- Eurostat. Founded with the mission of providing high quality statistics and data on Europe, this website provides open access to high-quality datasets covering EU economics, business and trade, social statistics, and more.
- Open Africa. Created as a volunteer-driven, grassroots initiative, openAFRICA is perhaps the largest independent repository of open data on the African continent.
- ASEANstats. Created by the Association of Southeast Asian Nations (ASEAN), this website compiles, consolidates, and disseminates key statistical information on the region and its member states.
- CEPALSTAT. This data repository delivers access to key statistics and indicators produced by various divisions of the Economic Commission for Latin America and the Caribbean (ECLAC).
- Pew Research Center Datasets. The Pew Research Center is a nonpartisan American think tank that conducts a wide array of analytical studies and research efforts aimed at capturing the issues and trends that are shaping the country.
- OCCRP Aleph. This investigative data platform was built specifically with journalists in mind, providing public access to an enormous archive of government records and databases—particularly those related to finances and expenditures.
- Big Local News. Created by Stanford University’s Journalism and Democracy Initiative, this website collects hard-to-find datasets with a special focus on local data.
- Google Dataset Search. This dataset search engine created by Google aims to make datasets universally accessible and useful, allowing users to discover datasets hosted in thousands of repositories across the internet.
Medicine and Health
- HealthData. This one-stop shop for public U.S. health data provides easy access to datasets and statistics from a variety of federal and state agencies.
- World Health Organization Global Health Observatory. The WHO maintains a large data repository of health-related statistics for its 194 member states in the form of its Global Health Observatory, as well as a variety of useful tools for collecting, managing, and analyzing health data.
- Global Trigger Tool. For those reviewing hospital records specifically, this tool aims to teach users how to identify and measure the overall level of harm in a healthcare organization.
- Leapfrog Hospital Survey. Launched in 2001, this annual survey uses national performance measures to evaluate individual hospitals in terms of safety, quality, and efficiency.
- ICD10Data. This resource is a useful reference website that enables quick lookup of all current American medical billing codes for various diagnoses and procedures—which often feature prominently in hospital datasets.
Earth and Environment
- Climate Central. This organization partners with local journalists to bring a greater emphasis on data and science to region-specific climate and weather reporting.
- Climate Data. An extension of the broader data.gov initiative, this website makes it easy to access public datasets on environmental issues from nearly every federal and state-level government agency.
- Global Forest Watch. This online platform provides data and tools for tracking where and how forests are changing all over the world, often delivering near real-time information.
- UNEP Live. This global platform developed by the UN Environmental Programme is a big data initiative that offers key statistics and data related to sustainable development worldwide.
- World Resources Institute (WRI). Spanning more than 60 countries, this global research organization offers a suite of data platforms that collect data from across the Institute’s multiple research initiatives, including its work on climate, energy, food, forests, water, cities and the ocean.
Population and Socioeconomic Data
- Population Reference Bureau (PRB). PRB is dedicated to providing U.S. and international population data that is objective, accurate, current, and delivered in accessible formats.
- Human Development Report Data. This resource collects human development data sourced from international agencies that assemble datasets on a wide range of statistical indicators.
- Socioeconomic Data and Applications Center (SEDAC). Part of NASA, this organization collects data on human interactions in the environment, with the aim of serving as an “information gateway” between earth and social sciences.
- World Bank Data. Managed by the World Bank’s Development Data Group, this website offers a number of useful datasets covering various macroeconomic statistics, financial data, and sector-specific databases.
- TIGER data. Short for “Topologically Integrated Geographic Encoding and Referencing”, TIGER is a format used by the United States Census Bureau that describes geographical attributes such as roads, buildings, rivers, and lakes.