Chapter 2 Intro to R Part I

2.1 Getting to know your IDE

What’s an IDE? IDE stands for integrated development environment, and its goal is to facilitate coding by integrating a text editor, a console and other tools into one window.

We are using RStudio as our IDE for this workshop. You can either download and install R and RStudio on your computer (for instructions on how to do so, see the “Before we start” section) or create a free account at http://rstudio.cloud and run your R code in the cloud.

In this part of the workshop we will start an R project and situating ourselves around our IDE.

Why create a RStudio project? RStudio projects make it easier to keep your projects organized, since each project has their own working directory, workspace, history, and source documents.

  1. Start a new R project
  2. Create a new R script
  3. Save that R script as 01-intro_to_r_part_one

Take a moment to look around your IDE. What are the main panes on the RStudio interface. What are the 4 main areas of the interface? Can you guess what each area is for?

2.2 Operations and Objects

Let’s start by using R as a calculator. On your console type 3 + 3 and hit enter.

## [1] 6

What symbols do we use for all basic operations (addition, subtraction, multiplication, and division)? What happens if you type 3 +?

Let’s save our calculation into an object, by using the assignment symbol <-.

Take a moment to look around your IDE once again. What has changed?

Now, let’s use this new object in our calculation

## [1] 9

Take a moment to look around your IDE once again. Has anything changed?

What else can we do with an object?

## [1] "numeric"

R is primarily a functional programming language. That means that there pre-programmed functions in base R such as class() and that you can also write your own functions (more on that later).

Type ?class in your console and hit enter to get more information about this function.

CHALLENGE

Create an object called daisys_age that holds the number 8. Multiply daisys_age by 4 and save the results in another object called daisys_human_age

Imagine I had multiple pets (unfortunately, that is not true, Daisy is my only pet). I can create a vector to hold multiple numbers representing the age of each of my pets.

Take a moment to look around your IDE once again. What has changed?

What is the class of the object my_pets_ages?

Now let’s multiply this vector by 4.

## [1] 32  8 24 12  4

Errors are pretty common when writing code in any programming language, so be ready to read error messages and debug your code. Let’s insert a typing error in our previous code:

CHALLENGE

Try to multiply my_pets_ages by 4. What happens? How can we debug our code to find out what is causing the error?

2.3 Dataframes

You will rarely work with individual numeric values, or even individual numeric vectors. Often, we have information organized in dataframes, which is R’s version of a spreadsheet.

Let’s go back to my imaginary pet’s ages (make sure you have the correct vector in your global environment).

We will now create a vector of strings or characters that holds my imaginary pets’ names (we have to be careful to keep the same order then the my_pets_ages vector).

Let’s now create a dataframe that contains info about my pets.

##     name age
## 1  Daisy   8
## 2 Violet   2
## 3   Lily   6
## 4   Iris   3
## 5  Poppy   1

CHALLENGE

There’s a number of functions you can run on dataframes. Try running the following functions on my_pets:

  • summary()

  • nrow()

  • ncol()

  • dim()

What other functions can/do you think/know of?

2.4 Slicing you dataframe

There are different ways you can slice or subset your dataframe.

You can use indices for rows and columns.

##    name age
## 1 Daisy   8
## [1] "Daisy"  "Violet" "Lily"   "Iris"   "Poppy"
## [1] "Daisy"

You can use a column name or a row name instead of an index.

## [1] 8 2 6 3 1
##    name age
## 1 Daisy   8
## [1] 8

Or you can use $ to retrieve values from a column.

## [1] 8 2 6 3 1
## [1] 8

You can also use comparisons to filter your dataframe

## [1] 1
##    name age
## 1 Daisy   8
## [1] "Daisy"
## [1] "Daisy"
## [1] "Daisy"

CHALLENGE

Print out a list of pet names that are older than 3.

2.5 Adding new variables (i.e., columns) to your dataframe

So far the my_pets dataframe has two columns: name and age.

Let’s add a third column with the pets’ ages in human years. For that, we are going to use $ on with a variable (or column) name that does not exist in our dataframe yet. We will then assign to this variable the value in the age column multiplied by 4.

##     name age human_years
## 1  Daisy   8          32
## 2 Violet   2           8
## 3   Lily   6          24
## 4   Iris   3          12
## 5  Poppy   1           4

Inspect the new my_pets dataframe. What dimensions does it have now? How could you get a list of just the human years values in the data frame?

2.6 Descriptive stats on dataframes

Let’s explore some functions for descriptive statistics.

CHALLENGE

Try running the following functions on my_pets$age and my_pets$human_years:

  • mean()

  • sd()

  • median()

  • max()

  • min()

  • range()

What other functions can/do you think/know of?

2.7 For loops

Besides implementing operations on an entire column (e.g., my_pets$age * 4 multiplies each value in the age column of my_pets dataframe by 4), you can loop through each element in your dataframe column using a for loop.

There are two ways of writing a for loop in R.

## [1] "Daisy"
## [1] "Violet"
## [1] "Lily"
## [1] "Iris"
## [1] "Poppy"
## [1] "Daisy"
## [1] "Violet"
## [1] "Lily"
## [1] "Iris"
## [1] "Poppy"

CHALLENGE

Write a for loop to print each pets’ name and age. You can use the function paste() to combined the two variables into one line.

Remember you can enter ?paste in your console to get information on how to use this function.


Take a moment to look around your IDE once again. What objects do you have in your environment?

2.8 If blocks

Maybe calculating a pet’s age in human years is more complex than just multiplying it by 4 (or 7?). The American Kennel club has the following on how to calculate dog years to human years:

  • 15 human years equals the first year of a medium-sized dog’s life.
  • Year two for a dog equals about nine years for a human.
  • And after that, each human year would be approximately five years for a dog.

Let’s first figure out all the if statements we need for this

Now, let’s create a for loop to calculate human age for every pet.

##     name age human_years human_years2
## 1  Daisy   8          32           54
## 2 Violet   2           8           24
## 3   Lily   6          24           44
## 4   Iris   3          12           29
## 5  Poppy   1           4           15

2.10 Putting it all together

Read and run the code below that provides some info on our my_pets dataframe.

## [1] "Daisy is 8 years old in pet years, which is equivalent to 32 human years."
## [1] "Violet is 2 years old in pet years, which is equivalent to 8 human years."
## [1] "Lily is 6 years old in pet years, which is equivalent to 24 human years."
## [1] "Iris is 3 years old in pet years, which is equivalent to 12 human years."
## [1] "Poppy is 1 years old in pet years, which is equivalent to 4 human years."
## [1] 8
## [1] 1
## [1] "Daisy"
## [1] "Daisy is the oldest pet"

CHALLENGE

Add to the code above, to print the following information:

  1. the name of the youngest pet

  2. the mean pet age

  3. any other info you find relevant

2.11 Note on coding style

Coding style refers to how you name your objects and functions, how you comment your code, how you use spacing throughout your code, etc. If your coding style is consistent, your code is easier to read and easier to debug as a result. Here’s some guides, so you can develop your own coding style: