in Uncategorized

Public Data Sets

For your data analysis pleasure, I give you a giant list of super cool publicly available data. If you’re looking at the data sets and wondering “now what?” – you can find this list AND tutorials on how to use H2O for analysis at the H2O docs page (here: http://docs.0xdata.com).

You can also get a detailed hands on experience analyzing any of this data, random numbers you might have laying around, stuff you made up, or whatever you want by coming to any of our upcoming meetups and hanging out with the 0xdata math team (http://www.meetup.com/H2Omeetup/). 

Open City Datasets

**Palo Alto Open Data
http://www.cityofpaloalto.org/gov/depts/it/open_data/default.asp

Chicago
https://data.cityofchicago.org/

20 yrs crime data
https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2

NYC
https://nycopendata.socrata.com/

Rents & Neighborhoods
http://www.huduser.org/portal/datasets/HUD_data_matrix.html
Transportation and Travel

Airlines Dataset
http://stat-computing.org/dataexpo/2009/the-data.html – but so far it contains years 1987-2007 (based on http://www.stat.purdue.edu/~sguha/rhipe/doc/html/airline.html)

Data source: http://www.transtats.bts.gov/Fields.asp?Table_ID=236

Open Flights Database
http://openflights.org/data.html

Capital Bikes Share Data
https://www.capitalbikeshare.com/trip-history-data
Sciences and Engineering

NASA Open Data
http://data.nasa.gov/

Seismic Data
http://sioseis.ucsd.edu/segy.header.html

Weather Public Data
http://OpenWeatherMap.org
http://OpenMeteoData.org
Diverse Data Sets

Many Eyes Community Datasets
http://www-958.ibm.com/software/analytics/manyeyes/

Kaggle Competitions
http://www.kaggle.com/

UCI Machine Learning Library
http://archive.ics.uci.edu/ml/datasets.html

Human Activity Recognition Using Smartphones http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones

MLData repository
http://mldata.org/

GitHub Challenge
https://github.com/blog/1450-the-github-data-challenge-ii

Yelp Dataset Challenge
https://www.yelp.com/dataset_challenge

Netflix Prize
http://stackoverflow.com/questions/1407957/netflix-prize-dataset

Infochimps

Home

Stanford Dataset Library
http://snap.stanford.edu/data/index.html

Million Songs Database
http://labrosa.ee.columbia.edu/millionsong/pages/getting-dataset

Caret
http://caret.r-forge.r-project.org/datasets.html
Public Policy Data

European Open Data
http://open-data.europa.eu/en/

US Open Data

Frontpage

opendatasites

WorldBank Data
http://data.worldbank.org/data-catalog

Guardian Data
http://www.guardian.co.uk/news/datablog/interactive/2013/jan/14/all-our-datasets-index

Statistics Netherlands
http://www.cbs.nl/en-GB/menu/home/default.htm?Languageswitch=on

Quandl 6M Financial, Economics, and Social Datasets
http://www.quandl.com/