Learning Resources
Goran S. Milovanovic, PhD
Feedback should be send to
goran.milovanovic@datakolektiv.com
.
R + RStudio Desktop Installations
NOTE. Please take care to install the latest available versions. At the time of this writing, those were:
- Programming language R:
R-4.0.3
, available here for Windows, here for Mac, and here for Linux. - RStudio Desktop:
RStudio Desktop 1.3.1093
, available here.
Please follow the instructions provided here:
Earth Data Analytics Online Certificate, Lesson 1. Install & Set Up R and RStudio on Your Computer
Essentially, there are two installation steps:
- install R (the programming language)
- install RStudio (your IDE, i.e. your working environment, where you write code, inspect data, etc.)
For Windows users: Video Instructions
For Mac users: Video Instructions
For Linux users:
Learn R programming
- R for Data Science, Hadley Wickham & Garrett Grolemund
- R Programming - Dynamic Data Script: especially their R Programming for Beginners | Complete Tutorial | R & RStudio intro course
- Norman Matloff’s The Art of R Programming
- Advanced R, Hadley Wickham NOTE. Definitely not an introductory material.
- Tutorialspoint: Learn R Programming
- R in a Nutshell, 2nd Edition, by Joseph Adler
- An Introduction to R - Notes on R: A Programming Environment for Data Analysis and Graphics, Version 4.1.1 (2021-08-10)
- R Tutorial from Quick-R
Computational Statistics and Machine Learning: General
- All of Statistics: A Concise Course in Statistical Inference, Larry Wasserman
- All of Nonparametric Statistics, Larry Wasserman
- Learning statistics with R: A tutorial for psychology students and other beginners, Danielle Navarro
- An Introduction to Statistical Learning with Applications in R, Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
- Modern Statistics with R - From wrangling and exploring data to inference and predictive modelling, Måns Thulin, 2021-08-04 - Version 1.0.0
- Practical Data Science with R, Nina Zumel and John Mount
- R (BGU course), Jonathan D. Rosenblatt, 2019-10-10
- ISTA 321 - Data Mining, Nicholas DiRienzo, 2020-08-24
- Introduction to Econometrics with R, Christoph Hanck, Martin Arnold, Alexander Gerber, and Martin Schmelzer, 2021-10-06
- An Introduction to Machine Learning with R, Laurent Gatto, 2020-02-28
Simple and Multiple Linear Regression Models in R
- Introduction to Econometrics with R, Christoph Hanck, Martin Arnold, Alexander Gerber, and Martin Schmelzer, 2021-10-06, 4 Linear Regression with One Regressor
- Introduction to Econometrics with R, Christoph Hanck, Martin Arnold, Alexander Gerber, and Martin Schmelzer, 2021-10-06, 6 Regression Models with Multiple Regressors
- An R Companion to Applied Regression, Third Edition, John Fox and Sanford Weisberg, 2019
- Linear Regression Using R: An Introduction to Data Modeling, Lilja, David J
- Handbook of Regression Modeling in People Analytics, Keith McNulty
- r-statistics.co, Selva Prabhakaran - Linear-Regression
- Complete Introduction to Linear Regression in R, Selva Prabhakaran
Generalized Linear Models in R
- R (BGU course), Jonathan D. Rosenblatt, 2019-10-10, Chapter 7: Generalized Linear Models
- Generalized Linear Models in R, Social Science Computing Cooperative, University of Wisconsin–Madison
- Generalized Linear Models in R, Nathaniel E. Helwig, January 17, 2021
- Generalized Linear Models With Examples in R (Springer Texts in Statistics), Peter K. Dunn, Gordon K. Smyth
- Introduction to Econometrics with R, Christoph Hanck, Martin Arnold, Alexander Gerber, and Martin Schmelzer, 2021-10-06, 11 Regression with a Binary Dependent Variable
Decision Tree and Random Forest Models in R
- Random Forests with R, Genuer, Robin, Poggi, Jean-Michel
- Machine Learning with R, the tidyverse, and mlr, Hefin I. Rhys, Chapter 7. Classifying with decision trees
- ISTA 321 - Data Mining, Nicholas DiRienzo, 2020-08-24, 13 Decision Trees and Random Forests
- A Complete Guide to Random Forest in R, Listen Data, Deepanshu Bhalla
- Introduction to decision trees and random forests, Ned Horning, American Museum of Natural History’s, Center for Biodiversity and Conservation
- A Comprehensive Guide To Random Forest In R, Zulaikha Lateef
Data Visualization in R
- Data Visualization - A practical introduction, Kieran Healy
- Data Visualization with R, Rob Kabacoff, 2020-12-01
- R for Data Science, Hadley Wickham & Garrett Grolemund, 3 Data visualisation
- R for Data Science, Hadley Wickham & Garrett Grolemund, 28 Graphics for communication
- R Graphics Cookbook, 2nd edition, Winston Chang, 2021-09-23
- htmlwidgets for R
Data Sets
Inside Airbnb
This is a collection of frequently updated public Airbnb data sets which are nicely suited to practice basic data visualization and Exploratory Data Analysis (EDA).
Wikimedia Foundation Product Analytics/Comparison datasets
Data collected by Wikimedia Foundation’s Product Analytics team on the development of different language versions of Wikipedia, the free encyclopedia.
UCLA Statistical Methods and Data Analysis - LOGIT REGRESSION data set
A classic binary classification problem: predict a binary response
variable admit
from gre
, gpa
, and
rank
.
Household Size in the Philippines case study data set from Beyond Multiple Linear Regression: Applied Generalized Linear Models and Multilevel Models in R, Paul Roback and Julie Legler
An excellent data set to practice Poisson Regression from a classic GLM book in R.
Kaggle: The Boston Housing Dataset
The Boston Housing Dataset is a derived from information collected by the U.S. Census Service concerning housing in the area of Boston MA. We will use it to practice Random Forest models for regression problems.
Kaggle: AirQualityUCI
Predict Air Quality from the data recorede by a gas multisensor device deployed on the field.
Kaggle: Fish market
Database of common fish species for fish market: build a predictive model to estimate if the weight of fish can be predicted.
Multiple Linear Regression: House Sales in King County, USA.
Predict the pricing of a property.
UCI Machine Learning repository: Wine Quality data set
The goal of the exercise in which we use the Wine Quality dataset is to train a regularized Multinomial Regression model to predict the wine quality class.
Kaggle: Bank Customer Churn Prediction
The task is to predict the Exited
variable, making this
pretty much a churn prediction problem.
UCI Machine Learning Repository: Online News Popularity Data Set. E
The task is to predict the web popularity of a post: the number of shares a post receives once it is published.
Additional Data Resources
Rdatasets
Rdatasets: Rdatasets is a collection of 1892 datasets which were originally distributed alongside the statistical software environment R and some of its add-on packages. The goal is to make these data more broadly accessible for teaching and statistical software development.
License: GPLv3 This Notebook is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This Notebook is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this Notebook. If not, see http://www.gnu.org/licenses/.