Data is everywhere. Conversely, data is the most valuable commodity on earth, surpassing even oil, according to Britney Kaiser a former Cambridge Analytica employee who appeared in the Netflix documentary The Great Hack.
It is the analysis that makes data valuable. Without analysis, data is mostly worthless for humans.
While companies have committed themselves to use their data for examination and interpretation, several organizations and government entities have made hundreds of datasets freely available to be used and analyzed by anyone willing to examine them.
Companies that may not have the resources to begin accumulating their own data can access publically available data and start investigating the right questions and getting answers right away.
Below is a list of 30 data sets that are freely available to anyone. Many more are available for various niches. Chances are if someone collects it, follows it, or makes money from it, there may be a free dataset about it somewhere.
The majority of the datasets below were found in an a 2016 Forbes article about free datasets.
- Data.gov (http://data.gov) In 2015 the US Government promised to make all government data freely available online. This site serves as a portal to all kinds of astonishing information on everything from crime to climate.
- US Census Bureau (http://www.census.gov/data.html) A plethora of information about the lives of US citizens including geographic and population data, as well as education.
- Data Catalogs (http://datacatalogs.org/) offers open government data from US, Canada, EU, CKAN, and more.
- Socrata (http://www.tylertech.com/products/socrata) is another interesting government data source to explore with some built-in visualization tools.
- Data.gov.uk (http://data.gov.uk/) This data is from the UK Government. It includes the British National Bibliography that includes metadata about all UK books and publications since 1950.
- European Union Open Data Portal (http://open-data.europa.eu/en/data/) Like the above sources, but it includes data from institutions in the European Union.
- Canada Open Data (https://open.canada.ca/en/open-data/) is a project with numerous geospatial and government datasets.
- Healthdata.gov (healthdata.gov) This data covers 125 years of US healthcare featuring claim-level Medicare data, as well as epidemiology and population statistics.
- NHS Health and Social Care Information Center (https://digital.nhs.uk) The UK National Health service provides health data sets.
- The CIA World Factbook (https://www.cia.gov/library/publications/the-world-factbook/) holds information about the history, economy, population, government, military, and infrastructure of 267 countries.
- World Health Organization (https://www.who.int/en/) offers statistics about world hunger, health, and diseases.
- National Climatic Data Center (http://www.ncdc.noaa.gov/data-access/quick-links#loc-clim) provides a vast collection of environmental, meteorological, and climate data sets from the US National Climatic Data Center. The world’s largest archive of weather data.
- The BROAD Institute provides several cancer-related datasets.
- UNICEF grants access to statistics about the situation of women and children around the world.
- Amazon Web Services public datasets (http://www.registry.opendata.aws)provides a gigantic resource of public data that includes the1000 Genome Project, which attempted to construct the most comprehensive database of human genetic information.
- Facebook Graph (https://developers.facebook.com/docs/graph-api) Although much of the information about users’ Facebook profiles are private; many are not. Facebook offers the Graph API that can query the immense amount of information that its users readily share with the world (or can’t hide because they haven’t figured out how to use the privacy settings).
- IBM Watson Data and AI Learning Center (https://developer.ibm.com/clouddataservices/docs/ibm-data-science-experience/get-started/load-analyze-public-data-sets/) lets users load an analyze public data sets, collaborate on projects, and publish notebooks to GitHub.
- Google Public data explorer (http://www.google.com/publicdata/) incorporates data from world development indicators, OECD, and human development indicators, primarily covering economics data of the world.
- New York Times (http://developer.nytimes.com/docs) offers a searchable, indexed archive of news articles going back to 1851.
- Junar (http://www.junar.com) is a data scraping service that also produces data feeds.
- Buzzdata (http://www.buzzdata.com/content/) is a social data sharing service that lets you upload your own data and connect with others who are also uploading their data.
- Gapminder (http://www.gapminder.org/data/) grants access to a compilation of data from sources including the World Health Organization and the World Bank incorporating economic, medical, and social statistics from around the world.
- Google Trends (http://www.google.com/trends/explore) features statistics on search volume (as a proportion of total searches) for any given terms, since 2004.
- Google Finance (https://www.google.com/finance) offers access to 40 years’ of stock market data that is updated in real-time.
- Google Books Ngrams (http://storage.googleapis.com/books/ngrams/books/datasetsv2.html) allows the search and analyze the full text of any of the millions of books digitized via the Google Books project.
- DBPedia (http://wiki.dbpedia.org) Wikipedia is composed of millions of pieces of data, structured and unstructured on a vast range of topics. DBPedia is an ambitious project intended to catalog and create a public, freely distributable database that lets anyone analyze this data.
- Freebase (http://www.freebase.com/) is a community-compiled database of structured data about people, places and things, with over 45 million records.
- UCI Machine Learning Repository is a dataset that is explicitly pre-processed for machine learning.
- Pew Research Center provides raw data from its interesting research into American life.
- Sports Reference (https://www.sports-reference.com/) offers links to databases and datasets from college and professional basketball and American football, professional hockey, and soccer (what other countries call soccer).
Reference
Marr, Bernard. Big Data: 33 Brilliant And Free Data Sources Anyone Can Use, Forbes, February 12, 2016.
https://www.forbes.com/sites/bernardmarr/2016/02/12/big-data-35-brilliant-and-free-data-sources-for-2016/#53fa5337b54d