Feedback should be send to goran.milovanovic@datakolektiv.com.


R + RStudio Desktop Installations

NOTE. Please take care to install the latest available versions. At the time of this writing, those were:

Please follow the instructions provided here:

Earth Data Analytics Online Certificate, Lesson 1. Install & Set Up R and RStudio on Your Computer

Essentially, there are two installation steps:

  • install R (the programming language)
  • install RStudio (your IDE, i.e. your working environment, where you write code, inspect data, etc.)

For Windows users: Video Instructions

For Mac users: Video Instructions

For Linux users:


Data Sets

Inside Airbnb

This is a collection of frequently updated public Airbnb data sets which are nicely suited to practice basic data visualization and Exploratory Data Analysis (EDA).

Wikimedia Foundation Product Analytics/Comparison datasets

Data collected by Wikimedia Foundation’s Product Analytics team on the development of different language versions of Wikipedia, the free encyclopedia.

UCLA Statistical Methods and Data Analysis - LOGIT REGRESSION data set

A classic binary classification problem: predict a binary response variable admit from gre, gpa, and rank.

Household Size in the Philippines case study data set from Beyond Multiple Linear Regression: Applied Generalized Linear Models and Multilevel Models in R, Paul Roback and Julie Legler

An excellent data set to practice Poisson Regression from a classic GLM book in R.

Kaggle: The Boston Housing Dataset

The Boston Housing Dataset is a derived from information collected by the U.S. Census Service concerning housing in the area of Boston MA. We will use it to practice Random Forest models for regression problems.

Kaggle: AirQualityUCI

Predict Air Quality from the data recorede by a gas multisensor device deployed on the field.

Kaggle: Fish market

Database of common fish species for fish market: build a predictive model to estimate if the weight of fish can be predicted.

UCI Machine Learning repository: Wine Quality data set

The goal of the exercise in which we use the Wine Quality dataset is to train a regularized Multinomial Regression model to predict the wine quality class.

Kaggle: Bank Customer Churn Prediction

The task is to predict the Exited variable, making this pretty much a churn prediction problem.

UCI Machine Learning Repository: Online News Popularity Data Set. E

The task is to predict the web popularity of a post: the number of shares a post receives once it is published.

Additional Data Resources

Rdatasets: Rdatasets is a collection of 1892 datasets which were originally distributed alongside the statistical software environment R and some of its add-on packages. The goal is to make these data more broadly accessible for teaching and statistical software development.


License: GPLv3 This Notebook is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This Notebook is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this Notebook. If not, see http://www.gnu.org/licenses/.


Contact: goran.milovanovic@datakolektiv.com

  

Impressum
Data Kolektiv, 2004, Belgrade.