# n
<- c("Hello", 1)
x class(x)
[1] "character"
Fundamentals of Data Science
<-
# n
<- c("Hello", 1)
x class(x)
[1] "character"
Notice that x is now all characters, and in fact if you now compute 2*x[2]
you will get an error.
1:10
[1] 1 2 3 4 5 6 7 8 9 10
TRUE and FALSE instead of True and False
indentation does not matter and you can use ;
to string multiple statements together.
<- 1
x <- 1
y <- 1 z
length
gives the length of a vector, nchar
gives the number of characters of a string.length("Hello")
[1] 1
length(c("Hello", "GoodBye"))
[1] 2
nchar("Hello")
[1] 5
nchar(c("Hello", "GoodBye"))
[1] 5 7
substr
to extract substrings, not subscripts.<- "Hello"
s 1] s[
[1] "Hello"
<- paste(c("A", "B", "C"), collapse = "")
s <- paste(c("A", "B", "C"), c("D", "E", "F"), sep = ",", collapse = " ")
t print(s)
[1] "ABC"
print(t)
[1] "A,D B,E C,F"
<- "This is a string of letters"
s <- substr(rep(s, nchar(s) / 2), seq(1, nchar(s), 2), seq(1, nchar(s), 2))
t paste(t, collapse = "")
[1] "Ti sasrn flte"
A list can contain objects of different types.
<- list("a", 1.5) lst
In particular, a list can contain vectors and can have named entries.
<- list(first = c(1, 2, 3), second = c(4, 5, 6))
lst print(lst)
$first
[1] 1 2 3
$second
[1] 4 5 6
The presence of [[]] indicates a list.
print(lst[[1]])
[1] 1 2 3
print(lst$first)
[1] 1 2 3
class(lst[1])
[1] "list"
class(lst[[1]])
[1] "numeric"
<- strsplit("This is a string", split = " ")
a <- strsplit("this is a string split into letters", split = "")
b print(a)
[[1]]
[1] "This" "is" "a" "string"
print(b)
[[1]]
[1] "t" "h" "i" "s" " " "i" "s" " " "a" " " "s" "t" "r" "i" "n" "g" " " "s" "p"
[20] "l" "i" "t" " " "i" "n" "t" "o" " " "l" "e" "t" "t" "e" "r" "s"
Functions are constructed like this:
<- function(n) {
f **2
n
}f(5)
[1] 25
The last evaluated expression is the value of the function but it is better style to actually use the return statement.
<- function(n) {
f return(n**2)
}f(10)
[1] 100
Functions are automatically “vectorized.”
f(1:10)
[1] 1 4 9 16 25 36 49 64 81 100
R automatically “recyles” when things fit.
1:3 + 1:6
[1] 2 4 6 5 7 9
The principle of scope
is essentially the same as discussed in the python programming notes.
<- 0
y for (x in c(1, 2, 3, 10)) {
print(x)
<- y + x
y }
[1] 1
[1] 2
[1] 3
[1] 10
cat("y=", y)
y= 16
<-0
ywhile(y<10) {
cat("y = ",y," ",sep="")
<- y+1
y }
y = 0 y = 1 y = 2 y = 3 y = 4 y = 5 y = 6 y = 7 y = 8 y = 9
Often iteration in R is unnecessary. Suppose you want to compute the sum of the squares of the first n integers.
<- function(n) {
f <- 0
s for (i in seq(1, n)) {
<- s + i^2
s
}return(s)
}f(10)
[1] 385
<- function(n) {
f return(sum(seq(1, n)^2))
}f(10)
[1] 385
if(substr("Hesterday",1,1)=="H") {
print("Yes")
else {
} print("No")
}
[1] "Yes"
<- function(x) {
less_than_one if (any(x<1)) {
print("Yes")
} else {
print("No")
} }
Again you can avoid iteration.
<- rnorm(20)
x < 1] x[x
[1] 0.98825215 0.05024871 0.65684013 -2.15985123 -0.64628928 -0.59783264
[7] -1.16356622 -0.10959459 -0.30605511 -1.16282074 -0.26461149 -0.67179247
[13] -0.58978884 0.60680612 -2.63858979 0.04518775 -0.53765815
Take a string and make its first character upper case and the rest lower.
<-function(s) {
f<-paste(toupper(substr(s,1,1)),substr(s,2,nchar(s)),sep="")
areturn(a)
}
You can assign to substrings.
<-function(s) {
fsubstr(s,1,1)<-toupper(substr(s,1,1))
return(s)
}
Hint: convert the string to a vector of letters
penguins_raw
is a “tibble”, which is a fancy type of tabular layout. It has named columns that you can extract with $.library(palmerpenguins)
# view(penguins_raw)
colnames(penguins_raw)
[1] "studyName" "Sample Number" "Species"
[4] "Region" "Island" "Stage"
[7] "Individual ID" "Clutch Completion" "Date Egg"
[10] "Culmen Length (mm)" "Culmen Depth (mm)" "Flipper Length (mm)"
[13] "Body Mass (g)" "Sex" "Delta 15 N (o/oo)"
[16] "Delta 13 C (o/oo)" "Comments"
By assigning to colnames you can change the column names. (In other words, colnames(penguins_raw)<-c(...)
replaces the column names from the given vector. Use your function from part(1) to simplify the column names of this tibble.
You can access a column of the tibble using $
, so for example penguins_raw$species
should give you the vector of species. Replace this column with just the first word of the species name (Gentoo, Adelie, Chinstrap).
Let \(n\) be a positive real number and let \(x_0\) be 1. The iteration \[ x_{k+1} = x_{k}/2+n/(2x_k) \]
converges to the square root of \(n\). (This is Newton’s Method). Write an R function which computes the square root using this iteration. You should continue to iterate until \(x_{k+1}\) is within \(10^{-6}\) of \(x_{k}\).
#f<-function(n) {
#}
Suppose you want to save the successive values you computed during the iteration for plotting purposes. How could you do that (and return them)?
Suppose you want the tolerance (here \(10^{6}\)) to be a parameter?
Suppose you want to set a maximum number of iterations, in case something goes wrong, to prevent an infinite loop?