Feedback should be send to
goran.milovanovic@datakolektiv.com
. ### 0. Data Types in
R
Some datasets come with the R programming language, e.g., the famous
iris
:
data(iris)
head(iris, 10)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5.0 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
The function head()
: show me the first n rows of a
data.frame! Similarly, the function tail()
:
tail(iris, 10)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 141 6.7 3.1 5.6 2.4 virginica
## 142 6.9 3.1 5.1 2.3 virginica
## 143 5.8 2.7 5.1 1.9 virginica
## 144 6.8 3.2 5.9 2.3 virginica
## 145 6.7 3.3 5.7 2.5 virginica
## 146 6.7 3.0 5.2 2.3 virginica
## 147 6.3 2.5 5.0 1.9 virginica
## 148 6.5 3.0 5.2 2.0 virginica
## 149 6.2 3.4 5.4 2.3 virginica
## 150 5.9 3.0 5.1 1.8 virginica
returns the last n rows of a data.frame.
my_vector <- c(1, 7, 9, 10, 14, 22, 3.14, 2.71, 99)
head(my_vector, 5)
## [1] 1 7 9 10 14
tail(my_vector, 5)
## [1] 14.00 22.00 3.14 2.71 99.00
Lists are very important in R. Let’s create one:
my_list <- list(element_1 = 1,
element_2 = "Belgrade",
element_3 = TRUE)
str(my_list)
## List of 3
## $ element_1: num 1
## $ element_2: chr "Belgrade"
## $ element_3: logi TRUE
The function str()
in R is generic, meaning it can be
used on objects of different classes. We will discuss this
further during the course.
Here’s another list describing a person:
person <- list(name = "Mark",
family_name = "Smith",
phone = "+381661722838383",
email = "mark.smith@rcourses.org",
age = 40,
gender = "M",
employed = TRUE)
person
## $name
## [1] "Mark"
##
## $family_name
## [1] "Smith"
##
## $phone
## [1] "+381661722838383"
##
## $email
## [1] "mark.smith@rcourses.org"
##
## $age
## [1] 40
##
## $gender
## [1] "M"
##
## $employed
## [1] TRUE
Anything can be an element of a list in R—even an entire data.frame:
person <- list(name = "Mark",
family_name = "Smith",
phone = "+381661722838383",
email = "mark.smith@rcourses.org",
age = 40,
gender = "M",
employed = TRUE,
favorite_dataset = "iris",
favorite_dataset_source = iris)
Lists can be nested:
ll <- list(e1 = 10,
e2 = 20,
e3 = list(
e1 = 20,
e2 = 40,
e3 = 15
),
e4 = 40,
e5 = list(
e1 = 12
))
ll
## $e1
## [1] 10
##
## $e2
## [1] 20
##
## $e3
## $e3$e1
## [1] 20
##
## $e3$e2
## [1] 40
##
## $e3$e3
## [1] 15
##
##
## $e4
## [1] 40
##
## $e5
## $e5$e1
## [1] 12
For example, data structures describing people via R lists:
persons <- list(name = c("Mark", "Jane"),
family_name = c("Smith", "Doe"),
phone = c("+381661722838383", "+381661722838384"),
email = c("mark.smith@rcourses.org", "jane.doe@rcourses.org"),
age = c(40, 42),
gender = c("M", "F"),
employed = c(TRUE, FALSE)
)
Accessing list elements:
persons[[1]]
## [1] "Mark" "Jane"
Accessing elements of a named list:
persons$family_name[2]
## [1] "Doe"
The logic of structuring data is key in Data Science. Here’s a better way to describe people, e.g., employees in a company, using lists:
persons <- list(
p1 = list(name = "Mark",
family_name = "Smith",
phone = "+381661722838383",
email = "mark.smith@rcourses.org",
age = 40,
gender = "M",
employed = TRUE
),
p2 = list(name = "Jane",
family_name = "Doe",
phone = "+381661722838385",
email = "jane.doe@rcourses.org",
age = 42,
gender = "F",
employed = FALSE
)
)
Accessing list elements:
persons[[1]]
## $name
## [1] "Mark"
##
## $family_name
## [1] "Smith"
##
## $phone
## [1] "+381661722838383"
##
## $email
## [1] "mark.smith@rcourses.org"
##
## $age
## [1] 40
##
## $gender
## [1] "M"
##
## $employed
## [1] TRUE
data.frame
ClassThis class is essentially the central component we work with in the R programming language:
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Unique elements of the variable (column, field, however you prefer)
Species
in iris
using the function
unique()
:
unique(iris$Species)
## [1] setosa versicolor virginica
## Levels: setosa versicolor virginica
Nesting function calls in R—something we do quite often. For example, “give me the length (length) of a vector obtained by listing the unique elements (unique) of the Sepal.Length column in iris” is written in R as:
length(
unique(
iris$Sepal.Length
)
)
## [1] 35
length()
returns the length of a vector or list:
length(iris$Sepal.Length)
## [1] 150
dim()
gives the dimensions of a data.frame, for
example:
dim(iris)
## [1] 150 5
dim()
returns a vector (e.g., the number of rows and
columns for the data.frame class). Remember that the result of a
function in R can also be “subsetted,” i.e., you can extract only the
part of the result you need by indexing. For example, how many rows does
iris
have:
dim(iris)[1]
## [1] 150
License: GPLv3 This Notebook is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This Notebook is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this Notebook. If not, see http://www.gnu.org/licenses/.
Contact: goran.milovanovic@datakolektiv.com
Impressum
Data Kolektiv, 2004, Belgrade.