---
title: "Homework Two - R"
author: [your name]
format: html
---
Homework Two
Fundamentals of Data Science
This homework is due Sunday, September 24th at midnight. Please submit it using HuskyCT.
There are two files you need to submit (one for R, one for python). For the R part of the homework (problems 1-3), you submit a QMD file with the following YAML material at the top. For the Python part (problem 4-5) submit an ipynb file.
Problem 1
Let X be a binomial random variable with n=50 and p=.7.
- Draw 1000 samples from X. How many of your sampled values are less than 30?
- Based on the probability distribution, how many sampled values would you expect to see that are less than 30?
- Plot a histogram of your sampled values.
Your answer should be in the form of an r code block in your qmd file.
<- #
Xsamples <- #
less_than_30_observed <- #
less_than_30_predicted # code to plot histogram of Xsamples
Problem 2
The poisson distribution is a discrete probability distribution that arises in queuing theory (and many other places). For example, imagine that customers arrive at a server at a rate so that, in a typical one hour period, 20 customers come. But the intervals between customers are random and independent of one another. Then in a randomly chosen hour, the probability of k customers arriving is dpois(k,20)
.
- Sample this distribution 1000 times (hint: use
rpois
). What is the largest number of people who arrive in an one of these random hours? What is the smallest? - Suppose you want to design your system so that it can handle the number of arriving customers 95% of the time. How many people should you design for? (Hint: use
qpois
). - What’s the chance that between 18 and 22 people arrive in a given hour? (Hint: use
ppois
). - Plot the Poisson distribution probabilities. (Hint: use
dpois
).
<- #
poisson_samples <- #
max_arrivals <- #
min_arrivals <-#
threshold_95 # code to plot the distribution
Problem 3
Write an R function that takes a string as input, removes all characters that are not numbers, letters, or spaces, makes all the letters lower case, converts all the spaces to ’_’, and returns the result.
Hints:
- the
gsub
function replaces things in a string. - the
tolower
function makes things lower case - the builtin variable
letters
(respLETTERS
) is a vector of all lower (resp upper) case letters.
Problem 4
Do problem 3 in Python.
Hints:
- the python string method
replace()
replaces thngs in a string. So ifx
is a string,x.replace("a","b")
replaces all a’s by b. - the python string method
lower()
makes a string lower case. So ifx
is a string,x.lower()
is x in lower case.
Problem 5
If \(x_0=1\) and \(n\) is a positive real number, the iteration \[ x_{k+1} = x_{k}/2+n/(2x_k) \]
converges to the square root of \(n\). Write a Python function that runs this iteration until the difference between \(x_{k+1}\) and \(x_{k}\) is less than a tolerance.
def sq(n):
=1
x= 1
tol while(tol>1e-6):
= # fill this in
xnew = np.abs(xnew - x)
tol = xnew
x return x
Now improve the function above so that:
- one can optionally provide a threshold to replace 1e-6
- if the code runs for more than
max_iter
iterations of the while loop, it quits while printing “Failed to converge”.max_iter
is by default 1000 but can be modified when the function is called.