9 Publicly available datasets that you can analyze.

Tarun Manrai
4 min readApr 27, 2020

As datasets come in myriad formats and can sometimes be difficult to use, there has been considerable work put into curating and standardizing the format of datasets to make them easier to use for machine learning research.

If you’re looking to learn how to analyze data, create data visualizations, or just boost your data literacy skills, public data sets are a perfect place to start. In simple terms, Public Data means the kind of data which is open for anyone and everyone for access, modification, reuse, and sharing.

One of the common problems in data science is gathering data from various sources in a somehow cleaned (semi-structured) format and combining metrics from various sources for making a higher level analysis.

9 publicly available data sets

World Bank Open Data

As a repository of the world’s most comprehensive data regarding what’s happening in different countries across the world, World Bank Open Data is a vital source of Open Data. It also provides access to other datasets as well which are mentioned in the data catalog.

UNICEF Dataset

Since UNICEF concerns itself with a wide variety of critical issues, it has compiled relevant data on education, child labor, child disability, child mortality, maternal mortality, water and sanitation, low birth-weight, antenatal care, pneumonia, malaria, iodine deficiency disorder, female genital mutilation/cutting, and adolescents. The good thing is that there is a regular update when it comes to these datasets. Every month, the data is updated in order to make it more comprehensive, reliable and accurate.

Data.gov

Data.gov is the treasure-house of US government’s open data. You can conduct your research, develop your web and mobile applications and even design data visualizations. All you need to do is enter keywords in the search box and browse through types, tags, formats, groups, organization types, organizations, and categories. This will facilitate easy access to data or datasets that you need.

freeCodeCamp Open Data

It is an open source community. Why it matters is because it enables you to code, build pro bono projects. You will find a variety of things in this repository. You can find datasets, analysis of the same and even demos of projects based on the freeCodeCamp data. You can also find links to external projects involving the freeCodeCamp data. Whether it is web analytics, social media analytics, social network analysis, education analysis, data visualization, data-driven web development or bots, the data offered by this community can extremely useful and effective.

Google trends

This is one of the widest and most interesting public data sets to analyze. Google’s vast search engine tracks search term data to show us what people are searching for and when. You can explore statistics on search volume for almost any search term since 2004. Enter in any search term, or a handful of search terms, and clicks the download button to analyze the data outside of the Trends website.

There are a variety of filters to narrow down trends according to location (worldwide or by country), various time ranges, categories, or even specific search types (web vs image vs YouTube search results).

National Climatic Data Center

Here you can find an archive of climate and weather data sets across the US, the largest archive of environmental data in the world. It is a huge resource for all kinds of weather data, including meteorological, oceanic, climate, atmospheric, and geophysical data.

Global Health Observatory data

As part of their core goal for better health information worldwide, the World Health Organization makes their data on global health publicly available through the Global Health Observatory (GHO). The GHO acts as a portal with which to access and analyze health situations and important themes.

The various data sets are organized according to themes, such as mortality, health systems, communicable and non-communicable diseases, medicines and vaccines, health risks, and so on

Earthdata

Earthdata is part of NASA’s Earth Science Data Systems Program, specifically the Earth Observing System Data and Information System (EOSDIS). EOSDIS acts as a means to process and distribute Earth science data from the Earth observation satellites, aircraft, and field measurements.

Via Earthdata, the public can access NASA’s data, news, and event information. It covers data from Earth’s atmosphere, solar radiance, the cryosphere (arctic/ frozen areas), the ocean, land surface (gravity, geomagnetism, tectonics), and human environments.

Amazon Web Services Open Data Registry

As more organizations make their data available for public access, Amazon has created a registry to find and share those various data sets. There are over 50 public data sets supported through Amazon’s registry, ranging from IRS filings to NASA satellite imagery to DNA sequencing to web crawling. The data sets also include usage examples, showing what other organizations and groups have done with the data.

Wrapping up

The world has gradually started moving towards open systems and open data is rightly in sync with that.The business and organizations which leverage open data will gain a competitive edge and will be able to dominate the future.

Read More like this…Visit…http://entradasoft.com/blogs/9-publicly-available-datasets-that-you-can-analyze

--

--