Labs 01: A Very Basic Introduction to Data Types and Lists in R

Feedback should be send to goran.milovanovic@datakolektiv.com. ### 0. Data Types in R

Some datasets come with the R programming language, e.g., the famous iris:

data(iris)
head(iris, 10)
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1           5.1         3.5          1.4         0.2  setosa
## 2           4.9         3.0          1.4         0.2  setosa
## 3           4.7         3.2          1.3         0.2  setosa
## 4           4.6         3.1          1.5         0.2  setosa
## 5           5.0         3.6          1.4         0.2  setosa
## 6           5.4         3.9          1.7         0.4  setosa
## 7           4.6         3.4          1.4         0.3  setosa
## 8           5.0         3.4          1.5         0.2  setosa
## 9           4.4         2.9          1.4         0.2  setosa
## 10          4.9         3.1          1.5         0.1  setosa

The function head(): show me the first n rows of a data.frame! Similarly, the function tail():

tail(iris, 10)
##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 141          6.7         3.1          5.6         2.4 virginica
## 142          6.9         3.1          5.1         2.3 virginica
## 143          5.8         2.7          5.1         1.9 virginica
## 144          6.8         3.2          5.9         2.3 virginica
## 145          6.7         3.3          5.7         2.5 virginica
## 146          6.7         3.0          5.2         2.3 virginica
## 147          6.3         2.5          5.0         1.9 virginica
## 148          6.5         3.0          5.2         2.0 virginica
## 149          6.2         3.4          5.4         2.3 virginica
## 150          5.9         3.0          5.1         1.8 virginica

returns the last n rows of a data.frame.

my_vector <- c(1, 7, 9, 10, 14, 22, 3.14, 2.71, 99)
head(my_vector, 5)
## [1]  1  7  9 10 14
tail(my_vector, 5)
## [1] 14.00 22.00  3.14  2.71 99.00

1. Lists

Lists are very important in R. Let’s create one:

my_list <- list(element_1 = 1,
                element_2 = "Belgrade", 
                element_3 = TRUE)
str(my_list)
## List of 3
##  $ element_1: num 1
##  $ element_2: chr "Belgrade"
##  $ element_3: logi TRUE

The function str() in R is generic, meaning it can be used on objects of different classes. We will discuss this further during the course.

Here’s another list describing a person:

person <- list(name = "Mark",
               family_name = "Smith",
               phone = "+381661722838383", 
               email = "mark.smith@rcourses.org", 
               age = 40,
               gender = "M", 
               employed = TRUE)
person
## $name
## [1] "Mark"
## 
## $family_name
## [1] "Smith"
## 
## $phone
## [1] "+381661722838383"
## 
## $email
## [1] "mark.smith@rcourses.org"
## 
## $age
## [1] 40
## 
## $gender
## [1] "M"
## 
## $employed
## [1] TRUE

Anything can be an element of a list in R—even an entire data.frame:

person <- list(name = "Mark",
               family_name = "Smith",
               phone = "+381661722838383", 
               email = "mark.smith@rcourses.org", 
               age = 40,
               gender = "M", 
               employed = TRUE, 
               favorite_dataset = "iris", 
               favorite_dataset_source = iris)

Lists can be nested:

ll <- list(e1 = 10, 
           e2 = 20, 
           e3 = list(
             e1 = 20,
             e2 = 40,
             e3 = 15
           ), 
           e4 = 40,
           e5 = list(
             e1 = 12
           ))
ll
## $e1
## [1] 10
## 
## $e2
## [1] 20
## 
## $e3
## $e3$e1
## [1] 20
## 
## $e3$e2
## [1] 40
## 
## $e3$e3
## [1] 15
## 
## 
## $e4
## [1] 40
## 
## $e5
## $e5$e1
## [1] 12

For example, data structures describing people via R lists:

persons <- list(name = c("Mark", "Jane"),
                family_name = c("Smith", "Doe"),
                phone = c("+381661722838383", "+381661722838384"),
                email = c("mark.smith@rcourses.org", "jane.doe@rcourses.org"), 
                age = c(40, 42),
                gender = c("M", "F"),
                employed = c(TRUE, FALSE)
                )

Accessing list elements:

persons[[1]]
## [1] "Mark" "Jane"

Accessing elements of a named list:

persons$family_name[2]
## [1] "Doe"

The logic of structuring data is key in Data Science. Here’s a better way to describe people, e.g., employees in a company, using lists:

persons <- list(
  p1 = list(name = "Mark",
            family_name = "Smith",
            phone = "+381661722838383",
            email = "mark.smith@rcourses.org",
            age = 40,
            gender = "M",
            employed = TRUE
            ),
  p2 = list(name = "Jane",
            family_name = "Doe",
            phone = "+381661722838385",
            email = "jane.doe@rcourses.org",
            age = 42,
            gender = "F",
            employed = FALSE
            )
)

Accessing list elements:

persons[[1]]
## $name
## [1] "Mark"
## 
## $family_name
## [1] "Smith"
## 
## $phone
## [1] "+381661722838383"
## 
## $email
## [1] "mark.smith@rcourses.org"
## 
## $age
## [1] 40
## 
## $gender
## [1] "M"
## 
## $employed
## [1] TRUE

2. data.frame Class

This class is essentially the central component we work with in the R programming language:

str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Unique elements of the variable (column, field, however you prefer) Species in iris using the function unique():

unique(iris$Species)
## [1] setosa     versicolor virginica 
## Levels: setosa versicolor virginica

Nesting function calls in R—something we do quite often. For example, “give me the length (length) of a vector obtained by listing the unique elements (unique) of the Sepal.Length column in iris” is written in R as:

length(
  unique(
    iris$Sepal.Length
    )
  )
## [1] 35

length() returns the length of a vector or list:

length(iris$Sepal.Length)
## [1] 150

dim() gives the dimensions of a data.frame, for example:

dim(iris)
## [1] 150   5

dim() returns a vector (e.g., the number of rows and columns for the data.frame class). Remember that the result of a function in R can also be “subsetted,” i.e., you can extract only the part of the result you need by indexing. For example, how many rows does iris have:

dim(iris)[1]
## [1] 150

License: GPLv3 This Notebook is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This Notebook is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this Notebook. If not, see http://www.gnu.org/licenses/.


Contact: goran.milovanovic@datakolektiv.com

  

Impressum
Data Kolektiv, 2004, Belgrade.