Intro to R Workshop Book
2019-07-20
Chapter 1 Overview and Introduction
The aim of this workshop is to get you up and running with R as quick as possible by teaching you how to connect the most important R constructs through hands-on examples. One of the toughest parts about learning a programming language is feeling overwhelmed by the abundance of material to learn.
By giving you an overview of the most useful functions, in the lens of close-to-real-world artificial datasets, we hope that you can apply these concepts to your own data.
As with everything else, the best way to learn R is through consistent practice with a real world project.
1.1 Fundamental Concepts: Objects
An object is an entity that carries information.
In R, objects are the fundamental foundation to storing data, and manipulating it.
- R organizes the storage of data with datatypes. Which are different types of objects.
- It also organizes the manipulation of data with functions, which perform operations on your data.
As we go over the most important datatypes in R, keep in mind the context- these are the pieces that allow us to store, and manipulate information for our needs.
1.2 Datatypes
You can think of these datatypes like a pyramid of information.
- The bottom of the pyramid, the foundation, are the datatypes:
- Numeric
- Character
- Logical
- The middle of the pyramid consists of
- Vectors (holding multiple values a single datatype in level 1)
- Factors (holding multiple values of a single datatype but with additional attributes in level 1)
- The top of the pyramid consists of
- Lists (flexible storage of both level 1 and 2 of the pyramid)
- Dataframes (rows and columns of vectors)
- Another name for a datatype is a class
We will talk about each datatype in this section.
1.2.1 Numeric, Character, Logical, Factor
1.2.1.1 Numeric
- A numeric is self-explanatory. It could be a number that is either an integer, or a decimal
- In R we can assign a numeric to a number with the assignment operator
<-, or with an=sign.a <- 1
1.2.1.2 Character
- A character is a representation of letters and or numbers and symbols. You would wrap the letters and numbers in either
""or''marks to store as a character - Assign a character value to a variable named a
a <- 'abc'
1.2.1.3 Logical
- A logical value also called a boolean value can only be
TRUEorFALSE. - These are powerful and allows the user to filter their data based on certain conditions.
- Assign a flag to TRUE
flag <- TRUE
1.2.1.3.1 Boolean operators
- Boolean operators are
<(less than)>(greater than)==(equal to) notice two equal signs!NOT operator: Turns a TRUE to FALSE or a FALSE to TRUE&AND operator: Returns TRUE if all values TRUE, else FALSE|OR operator: Returns FALSE if all values FALSE, else TRUE
- A boolean operator returns a logical value or vector (will be explained) or matrix (will be explained)
# quick example of boolean operators
boolVal <- 1 < 5
boolVal## [1] TRUE
boolVal <- 5 == 5
boolVal## [1] TRUE
boolVal <- !(5 == 5)
boolVal## [1] FALSE
1.2.2 Vectors, Factors, Lists
1.2.2.1 Vectors
- A vector is a combination of values that have the same datatype.
- We use a function called
c()to combine the values into a vector - We put various values within the
()separated by a comma
vector1 <- c(1, 2, 3)
vector1## [1] 1 2 3
vector2 <- c('a', 'b', 'c')
vector2## [1] "a" "b" "c"
vector3 <- c(TRUE, FALSE, TRUE)
vector3## [1] TRUE FALSE TRUE
Vectors are the fundamental datatypes that can make up a list, or a dataframe, which is how R represents a piece of data by rows and columns.
We can names vectors by using the
names()
names(vector1) <- c('a','b', 'c')
vector1## a b c
## 1 2 3
1.2.2.2 Factor
- A factor is a special datatype that summarizes your data into a set of levels, where each level is a unique value of your data.
- It can be used to create Statistical Models
- we can create factors with the function:
as.factor()
factor1 <- as.factor(c('a', 'b', 'b', 'c'))
factor1## [1] a b b c
## Levels: a b c
1.2.2.3 Lists
- A list is similar to a vector in that it could hold multiple values. However, it is much more flexible.
- The difference is that it can hold multiple values of different datatypes.
- we can create lists with the function:
list() - it can also have names for each element in the list by doing:
list(nameOfElement = valuesOfElement)
list1 <- list(a = 1,
b = c(TRUE, FALSE),
c = 'hi')
list1## $a
## [1] 1
##
## $b
## [1] TRUE FALSE
##
## $c
## [1] "hi"
1.2.3 Dataframes
Dataframes are useful in that it organizes your data into rows and columns, similar to how data would be represented in a database, and excel sheet, or a simple CSV or TSV file.
- Every column has the same length could be thought of as a vector with one datatype.
It holds additional attributes covered in the next section
we can create dataframes with
data.frame()
dataframe1 <- data.frame(
col1 = c(1, 2, 3),
col2 = c('a', 'b', 'c')
)
dataframe1## col1 col2
## 1 1 a
## 2 2 b
## 3 3 c
1.2.4 Viewing datatypes with class()
- Using the
class()function, we could see what datatype a variable is
class(list1)## [1] "list"
class(vector1)## [1] "numeric"
class(dataframe1)## [1] "data.frame"
1.3 Functions
Remember how objects are entities that hold information?
A function is also an object, that holds information about how to manipulate your data if used.
A function consists of 3 pieces:
- Input
- Body
- Output
In the previous section, we have already made use of functions. For example, the class() function is a function if you see () after a variable name.
- The input is any of the datatypes we’ve talked about
- The body determines which type of datatype it is- a.k.a it’s class
- The output returns the type of class the input is.
Other functions we have already used:
c(),- takes in multiple values of one datatype as input,
- the body converts the values to a vector,
- and outputs a vector
list(),- takes in vectors as input,
- the body converts the vectors to a list,
- and outputs a list
data.frame(),- takes in vectors as input,
- the body converts the vectors to a dataframe,
- and outputs a data.frame
Now that we’ve covered fundamental R objects, we can begin the fun stuff- starting with Data Exploration