Basics of Programming in R

Fundamentals of Data Science

Author

Jeremy Teitelbaum

Some basic characteristics of R

The assignment operator in R is <-
There is no built-in “dictionary” datatype.
The basic datatype in R is the vector, which contains objects of the same type.
Vectors are indexed from 1.

# n
x <- c("Hello", 1)
class(x)

[1] "character"

Notice that x is now all characters, and in fact if you now compute 2*x[2] you will get an error.

Ranges are inclusive

1:10

 [1]  1  2  3  4  5  6  7  8  9 10

TRUE and FALSE instead of True and False
indentation does not matter and you can use ; to string multiple statements together.

x <- 1
y <- 1
z <- 1

length gives the length of a vector, nchar gives the number of characters of a string.

length("Hello")

[1] 1

length(c("Hello", "GoodBye"))

[1] 2

nchar("Hello")

[1] 5

nchar(c("Hello", "GoodBye"))

[1] 5 7

You need to use substr to extract substrings, not subscripts.

s <- "Hello"
s[1]

[1] "Hello"

Convert a vector to a string

s <- paste(c("A", "B", "C"), collapse = "")
t <- paste(c("A", "B", "C"), c("D", "E", "F"), sep = ",", collapse = " ")
print(s)

[1] "ABC"

print(t)

[1] "A,D B,E C,F"

s <- "This is a string of letters"
t <- substr(rep(s, nchar(s) / 2), seq(1, nchar(s), 2), seq(1, nchar(s), 2))
paste(t, collapse = "")

[1] "Ti sasrn flte"

Lists

A list can contain objects of different types.

lst <- list("a", 1.5)

In particular, a list can contain vectors and can have named entries.

lst <- list(first = c(1, 2, 3), second = c(4, 5, 6))
print(lst)

$first
[1] 1 2 3

$second
[1] 4 5 6

The presence of [[]] indicates a list.

print(lst[[1]])

[1] 1 2 3

print(lst$first)

[1] 1 2 3

class(lst[1])

[1] "list"

class(lst[[1]])

[1] "numeric"

Split a string to a list

a <- strsplit("This is a string", split = " ")
b <- strsplit("this is a string split into letters", split = "")
print(a)

[[1]]
[1] "This"   "is"     "a"      "string"

print(b)

[[1]]
 [1] "t" "h" "i" "s" " " "i" "s" " " "a" " " "s" "t" "r" "i" "n" "g" " " "s" "p"
[20] "l" "i" "t" " " "i" "n" "t" "o" " " "l" "e" "t" "t" "e" "r" "s"

Extract every other letter

Functions

Functions are constructed like this:

f <- function(n) {
    n**2
}
f(5)

[1] 25

The last evaluated expression is the value of the function but it is better style to actually use the return statement.

f <- function(n) {
    return(n**2)
}
f(10)

[1] 100

Functions are automatically “vectorized.”

f(1:10)

 [1]   1   4   9  16  25  36  49  64  81 100

R automatically “recyles” when things fit.

1:3 + 1:6

[1] 2 4 6 5 7 9

The principle of scope is essentially the same as discussed in the python programming notes.

Iteration in R

y <- 0
for (x in c(1, 2, 3, 10)) {
    print(x)
    y <- y + x
}

[1] 1
[1] 2
[1] 3
[1] 10

cat("y=", y)

y= 16

y<-0
while(y<10) {
    cat("y = ",y," ",sep="")
    y <- y+1
}

y = 0 y = 1 y = 2 y = 3 y = 4 y = 5 y = 6 y = 7 y = 8 y = 9

Often iteration in R is unnecessary. Suppose you want to compute the sum of the squares of the first n integers.

f <- function(n) {
    s <- 0
    for (i in seq(1, n)) {
        s <- s + i^2
    }
    return(s)
}
f(10)

[1] 385

f <- function(n) {
    return(sum(seq(1, n)^2))
}
f(10)

[1] 385

Logical statements

if(substr("Hesterday",1,1)=="H") {
    print("Yes")
} else {
    print("No")
}

[1] "Yes"

less_than_one <- function(x) {
if (any(x<1)) {
    print("Yes")
} 
else {
    print("No")
}
}

Again you can avoid iteration.

x <- rnorm(20)
x[x < 1]

 [1]  0.98825215  0.05024871  0.65684013 -2.15985123 -0.64628928 -0.59783264
 [7] -1.16356622 -0.10959459 -0.30605511 -1.16282074 -0.26461149 -0.67179247
[13] -0.58978884  0.60680612 -2.63858979  0.04518775 -0.53765815

Example

Take a string and make its first character upper case and the rest lower.

f<-function(s) {
    a<-paste(toupper(substr(s,1,1)),substr(s,2,nchar(s)),sep="")
    return(a)
}

You can assign to substrings.

f<-function(s) {
    substr(s,1,1)<-toupper(substr(s,1,1))
    return(s)
}

Problems

Write a function which takes a string and standardizes it by:
- removes all characters which are not letters, numbers, or spaces
- makes all the letters lower case
- replacing all spaces by underscore ’_’

Hint: convert the string to a vector of letters

The object penguins_raw is a “tibble”, which is a fancy type of tabular layout. It has named columns that you can extract with $.

library(palmerpenguins)
# view(penguins_raw)
colnames(penguins_raw)

 [1] "studyName"           "Sample Number"       "Species"            
 [4] "Region"              "Island"              "Stage"              
 [7] "Individual ID"       "Clutch Completion"   "Date Egg"           
[10] "Culmen Length (mm)"  "Culmen Depth (mm)"   "Flipper Length (mm)"
[13] "Body Mass (g)"       "Sex"                 "Delta 15 N (o/oo)"  
[16] "Delta 13 C (o/oo)"   "Comments"

By assigning to colnames you can change the column names. (In other words, colnames(penguins_raw)<-c(...) replaces the column names from the given vector. Use your function from part(1) to simplify the column names of this tibble.

You can access a column of the tibble using $, so for example penguins_raw$species should give you the vector of species. Replace this column with just the first word of the species name (Gentoo, Adelie, Chinstrap).
Let $n$ be a positive real number and let $x_0$ be 1. The iteration \[ x_{k+1} = x_{k}/2+n/(2x_k) \]

converges to the square root of $n$. (This is Newton’s Method). Write an R function which computes the square root using this iteration. You should continue to iterate until $x_{k+1}$ is within $10^{-6}$ of $x_{k}$.

#f<-function(n) {

#}

Suppose you want to save the successive values you computed during the iteration for plotting purposes. How could you do that (and return them)?

Suppose you want the tolerance (here $10^{6}$) to be a parameter?

Suppose you want to set a maximum number of iterations, in case something goes wrong, to prevent an infinite loop?