Chapter 2 Intro to R Part I
2.1 Getting to know your IDE
What’s an IDE? IDE stands for integrated development environment, and its goal is to facilitate coding by integrating a text editor, a console and other tools into one window.
We are using RStudio as our IDE for this workshop. You can either download and install R and RStudio on your computer (for instructions on how to do so, see the “Before we start” section) or create a free account at http://rstudio.cloud and run your R code in the cloud.
In this part of the workshop we will start an R project and situating ourselves around our IDE.
Why create a RStudio project? RStudio projects make it easier to keep your projects organized, since each project has their own working directory, workspace, history, and source documents.
- Start a new R project
- Create a new R script
- Save that R script as 01-intro_to_r_part_one
Take a moment to look around your IDE. What are the main panes on the RStudio interface. What are the 4 main areas of the interface? Can you guess what each area is for?
2.2 Operations and Objects
Let’s start by using R as a calculator. On your console type 3 + 3
and hit enter.
## [1] 6
What symbols do we use for all basic operations (addition, subtraction, multiplication, and division)?
What happens if you type 3 +
?
Let’s save our calculation into an object, by using the assignment symbol <-
.
Take a moment to look around your IDE once again. What has changed?
Now, let’s use this new object in our calculation
## [1] 9
Take a moment to look around your IDE once again. Has anything changed?
What else can we do with an object?
## [1] "numeric"
R is primarily a functional programming language. That means that there pre-programmed functions in base R such as class()
and that you can also write your own functions (more on that later).
Type ?class
in your console and hit enter to get more information about this function.
CHALLENGE
Create an object called daisys_age
that holds the number 8.
Multiply daisys_age
by 4 and save the results in another object called daisys_human_age
Imagine I had multiple pets (unfortunately, that is not true, Daisy is my only pet). I can create a vector to hold multiple numbers representing the age of each of my pets.
Take a moment to look around your IDE once again. What has changed?
What is the class of the object my_pets_ages
?
Now let’s multiply this vector by 4.
## [1] 32 8 24 12 4
Errors are pretty common when writing code in any programming language, so be ready to read error messages and debug your code. Let’s insert a typing error in our previous code:
CHALLENGE
Try to multiply my_pets_ages
by 4. What happens? How can we debug our code to find out what is causing the error?
2.3 Dataframes
You will rarely work with individual numeric values, or even individual numeric vectors. Often, we have information organized in dataframes, which is R’s version of a spreadsheet.
Let’s go back to my imaginary pet’s ages (make sure you have the correct vector in your global environment).
We will now create a vector of strings or characters that holds my imaginary pets’ names (we have to be careful to keep the same order then the my_pets_ages
vector).
Let’s now create a dataframe that contains info about my pets.
# create dataframe
my_pets <- data.frame(name = my_pets_names, age = my_pets_ages)
# print out dataframe
my_pets
## name age
## 1 Daisy 8
## 2 Violet 2
## 3 Lily 6
## 4 Iris 3
## 5 Poppy 1
CHALLENGE
There’s a number of functions you can run on dataframes. Try running the following functions on my_pets
:
summary()
nrow()
ncol()
dim()
What other functions can/do you think/know of?
2.4 Slicing you dataframe
There are different ways you can slice or subset your dataframe.
You can use indices for rows and columns.
## name age
## 1 Daisy 8
## [1] "Daisy" "Violet" "Lily" "Iris" "Poppy"
## [1] "Daisy"
You can use a column name or a row name instead of an index.
## [1] 8 2 6 3 1
## name age
## 1 Daisy 8
## [1] 8
Or you can use $
to retrieve values from a column.
## [1] 8 2 6 3 1
## [1] 8
You can also use comparisons to filter your dataframe
## [1] 1
# use which() inside dataframe indexing my_pets[row_number, column_number]
my_pets[which(my_pets$age == 8),]
## name age
## 1 Daisy 8
## [1] "Daisy"
## [1] "Daisy"
## [1] "Daisy"
CHALLENGE
Print out a list of pet names that are older than 3.
2.5 Adding new variables (i.e., columns) to your dataframe
So far the my_pets
dataframe has two columns: name and age.
Let’s add a third column with the pets’ ages in human years. For that, we are going to use $
on with a variable (or column) name that does not exist in our dataframe yet. We will then assign to this variable the value in the age
column multiplied by 4.
# create new column called human_years
my_pets$human_years <- my_pets$age * 4
# print dataframe
my_pets
## name age human_years
## 1 Daisy 8 32
## 2 Violet 2 8
## 3 Lily 6 24
## 4 Iris 3 12
## 5 Poppy 1 4
Inspect the new my_pets
dataframe. What dimensions does it have now? How could you get a list of just the human years values in the data frame?
2.6 Descriptive stats on dataframes
Let’s explore some functions for descriptive statistics.
CHALLENGE
Try running the following functions on my_pets$age
and my_pets$human_years
:
mean()
sd()
median()
max()
min()
range()
What other functions can/do you think/know of?
2.7 For loops
Besides implementing operations on an entire column (e.g., my_pets$age * 4
multiplies each value in the age
column of my_pets
dataframe by 4), you can loop through each element in your dataframe column using a for loop.
There are two ways of writing a for loop in R.
## [1] "Daisy"
## [1] "Violet"
## [1] "Lily"
## [1] "Iris"
## [1] "Poppy"
## [1] "Daisy"
## [1] "Violet"
## [1] "Lily"
## [1] "Iris"
## [1] "Poppy"
CHALLENGE
Write a for
loop to print each pets’ name and age. You can use the function paste()
to combined the two variables into one line.
Remember you can enter ?paste
in your console to get information on how to use this function.
Take a moment to look around your IDE once again. What objects do you have in your environment?
2.8 If blocks
Maybe calculating a pet’s age in human years is more complex than just multiplying it by 4 (or 7?). The American Kennel club has the following on how to calculate dog years to human years:
- 15 human years equals the first year of a medium-sized dog’s life.
- Year two for a dog equals about nine years for a human.
- And after that, each human year would be approximately five years for a dog.
Let’s first figure out all the if statements we need for this
this_dogs_age <- my_pets$age[1]
this_dogs_human_age <- 15
if (this_dogs_age >= 2) {
this_dogs_human_age <- this_dogs_human_age + 9
}
if (this_dogs_age >=3) {
this_dogs_human_age <- this_dogs_human_age + ((this_dogs_age - 2) * 5)
}
Now, let’s create a for loop to calculate human age for every pet.
for (i in c(1:5)) {
# store dog i age in an object
this_dogs_age <- my_pets$age[i]
# 15 human years equals the first year of a medium-sized dog's life.
this_dogs_human_years <- 15
# if the pet is two years or older
if (this_dogs_age >= 2) {
# Year two for a dog equals about nine years for a human.
this_dogs_human_years <- this_dogs_human_years + 9
# And after that, each human year would be approximately five years for a dog.
partial_dog_age <- (this_dogs_age - 2) * 5
# sum up both parts
this_dogs_human_years <- this_dogs_human_years + partial_dog_age
}
# add the final calculation to the dataframe
my_pets$human_years2[i] <- this_dogs_human_years
}
# print dataframe
my_pets
## name age human_years human_years2
## 1 Daisy 8 32 54
## 2 Violet 2 8 24
## 3 Lily 6 24 44
## 4 Iris 3 12 29
## 5 Poppy 1 4 15
2.9 Functions
Functions are extremely useful to make your R code more organized and reusable.
The main structure of a function is object_name <- function() code_here
,Here’s an example of a simple function.
human_years <- function(pets_age) {
# 15 human years equals the first year of a medium-sized dog's life.
human_years <- 15
# if the pet is two years or older
if (pets_age >= 2) {
# Year two for a dog equals about nine years for a human.
# And after that, each human year would be approximately five years for a dog.
human_years <- human_years + 9 + (pets_age - 2) * 5
}
return(human_years)
}
human_years(3)
## [1] 29
Read more on writing your own functions: Nice R Code - Functions
2.10 Putting it all together
Read and run the code below that provides some info on our my_pets
dataframe.
# get number of rows for the for loop
how_many_pets <- nrow(my_pets)
# a for loop to print info on each pet
for (i in c(1:how_many_pets)) {
# paste info with some prose
info_to_print <- paste(my_pets$name[i], 'is',
my_pets$age[i],
'years old in pet years, which is equivalent to',
my_pets$human_years[i], 'human years.')
# print out the info for pet i
print(info_to_print)
} # end of for loop to print info on each pet
## [1] "Daisy is 8 years old in pet years, which is equivalent to 32 human years."
## [1] "Violet is 2 years old in pet years, which is equivalent to 8 human years."
## [1] "Lily is 6 years old in pet years, which is equivalent to 24 human years."
## [1] "Iris is 3 years old in pet years, which is equivalent to 12 human years."
## [1] "Poppy is 1 years old in pet years, which is equivalent to 4 human years."
## [1] 8
## [1] 1
## [1] "Daisy"
## [1] "Daisy is the oldest pet"
CHALLENGE
Add to the code above, to print the following information:
the name of the youngest pet
the mean pet age
any other info you find relevant
2.11 Note on coding style
Coding style refers to how you name your objects and functions, how you comment your code, how you use spacing throughout your code, etc. If your coding style is consistent, your code is easier to read and easier to debug as a result. Here’s some guides, so you can develop your own coding style: