print("Hello World")
[1] "Hello World"
Fundamentals of Data Science
Start out with a code cell saying “Hello World”
print("Hello World")
[1] "Hello World"
The cat
command is actually probably more useful than print:
cat("hello world")
hello world
In R, the assignment operator is <-, not =. This takes some getting used to.
<- 5
count <- "Jeremy Teitelbaum" # string types are called chr for character
name <- "Far across the misty mountains cold,
paragraph to dungeons deep and caverns cold,
we must away,
ere break of day
to seek our long forgotten gold."
<- 3.14159 # R doesn't use integer types unless you force it to, numbers are "num" # nolint: line_length_linter.
pi <- 1e-6
epsilon <- 5L # this forces an integer
count <- c("Jeremy", "Phillip", "Sara", "Molly")
students <- TRUE # note all caps unlike Python; false is FALSE hot_dog
In R, you can give names to the elements of a vector.
print("hello")
[1] "hello"
names(students) <- c("President", "Vice President", "Treasurer", "Secretary")
print(names(students))
[1] "President" "Vice President" "Treasurer" "Secretary"
print(students["President"])
President
"Jeremy"
print(students)
President Vice President Treasurer Secretary
"Jeremy" "Phillip" "Sara" "Molly"
The cat command is a print command that “concatenates” its arguments; it needs an explicit newline.
print(students)
President Vice President Treasurer Secretary
"Jeremy" "Phillip" "Sara" "Molly"
print(count)
[1] 5
cat("Students:", students, "\n")
Students: Jeremy Phillip Sara Molly
print(epsilon)
[1] 1e-06
cat("The value of epsilon is:", epsilon, "\n")
The value of epsilon is: 1e-06
print(paragraph)
[1] "Far across the misty mountains cold,\nto dungeons deep and caverns cold,\nwe must away,\nere break of day\nto seek our long forgotten gold."
cat(paragraph)
Far across the misty mountains cold,
to dungeons deep and caverns cold,
we must away,
ere break of day
to seek our long forgotten gold.
The [1] at the beginning of each of these things reflects the fact that in R everything is a vector. So it is telling you that the first thing there is element 1 of the vector.
The c() command makes a vector of its arguments. It forces everything to be of the same type.
<- c("Jeremy", 25, 1.34, FALSE) # everything becomes a string
str_list <- c(1, 2, 3, 4, 5)
int_list <- c(1, 2, 3.5, 4) float_list
R does all arithmetic on vectors/lists. It one is shorter than the other, it repeats the shorter one, but the length of the longer has to be a multiple of the shorter.
<- 1
a <- 2
b + b a
[1] 3
<- c(1, 2, 3, 4, 5)
a <- 4
b + b a
[1] 5 6 7 8 9
<- c(1, 2, 3, 4, 5, 6)
a <- c(10, 11)
b + b a
[1] 11 13 13 15 15 17
<- c(1, 2, 3, 4, 5)
a <- c(1, 2)
b + b a
Warning in a + b: longer object length is not a multiple of shorter object
length
[1] 2 4 4 6 6
/ 5 a
[1] 0.2 0.4 0.6 0.8 1.0
# integer division (// in python)
<- 5L
a <- 3
b %/% b a
[1] 1
# remainder (% in python)
<- 5
a <- 3
b %% b a
[1] 2
<- c(1, 2, 3, 4, 5)
a ^2 a
[1] 1 4 9 16 25
print(a^2 == a)
[1] TRUE FALSE FALSE FALSE FALSE
print(a^2 > a)
[1] FALSE TRUE TRUE TRUE TRUE
print(a^2 == 4)
[1] FALSE TRUE FALSE FALSE FALSE
<- "Jeremy"
first_name <- "Teitelbaum"
last_name nchar(first_name)
[1] 6
paste(first_name, last_name) # spaces by default
[1] "Jeremy Teitelbaum"
paste(first_name, last_name, sep = "") # no space
[1] "JeremyTeitelbaum"
paste(c(1, 2, 3), "Jeremy") # remember functions work across vectors
[1] "1 Jeremy" "2 Jeremy" "3 Jeremy"
In R, you always count from 1 (big difference from python)
1] # another difference from Python first_name[
[1] "Jeremy"
<- substr("Jeremy", 1, 1)
a <- substr("Jeremy", 1, 3)
b cat(a, b, paste(a, b, sep = ""))
J Jer JJer
<- 0:10 # generates a sequence from 0 to 10 INCLUSIVE (compare python)
nums print(nums)
[1] 0 1 2 3 4 5 6 7 8 9 10
print(nums[c(1, 3)]) # you can pass a list of indices to a subscript
[1] 0 2
<- nums^2
sqrs seq(1, 10, 2)] sqrs[
[1] 0 4 16 36 64
In R, negative numbers in seq mean “omit” so this means omit entries 2 through 5. You can’t mix positive and negative numbers
<- nums[seq(-2, -5)]
rev print(rev)
[1] 0 5 6 7 8 9 10
rev(nums) # reverses the list
[1] 10 9 8 7 6 5 4 3 2 1 0
Use the Rstudio package manager to add libraries to your installation, but to use them you need to use the library function. The tidyverse library is something we will use a lot.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.3 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.2 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
<- seq(-10, 10, .1)
x <- x**2
y <- tibble("x" = x, "y" = y) data
ggplot(data = data, aes(x = x)) +
geom_point(aes(y = y), color = "red") +
ggtitle("A Parabola") +
scale_x_continuous(breaks = seq(-10, 10, 1)) +
scale_y_continuous(breaks = seq(0, 100, 20))
<- seq(-10, 10, .1)
x <- cos(x)
y <- tibble("x" = x, "y" = y)
data ggplot(data = data, aes(x = x)) +
geom_line(aes(y = y), color = "darkgreen") +
ggtitle("A Cosine Curve") +
scale_x_continuous(breaks = seq(-10, 10, 1)) +
scale_y_continuous(breaks = seq(-1, 1, 5))