= 56 # integer
n = 1234.48 # floating point
m = [1, 2, 3, 4] # list
L = "Jeremy" # string name
Basics of Programming in Python
Fundamentals of Data Science
Basics of programming in python
Key ingredients of programming language:
- data types and data structures
- functions
- control flow (iteration and logical branches)
Key data types in python
- numbers (integers and floating point)
- strings
- lists
- numpy arrays (*)
- dictionaries
- pandas dataframes (*)
Basic examples
From before, remember:
The typeof
operator tells you what something is.
print("type of n is {}, type of name is {}".format(type(n), type(name)))
type of n is <class 'int'>, type of name is <class 'str'>
Working with Lists and Strings
Split a string to a list.
= list("My name is Jeremy")
L print(L)
['M', 'y', ' ', 'n', 'a', 'm', 'e', ' ', 'i', 's', ' ', 'J', 'e', 'r', 'e', 'm', 'y']
Join a list to a string.
print(''.join(["A","B","C"]))
print('_'.join(["A","B","C"]))
ABC
A_B_C
Dictionaries
A dictionary (or a HashMap, or an associative array) is like an array with arbitrary subscripts.
= {"first_name": "Jeremy", "last_name": "Teitelbaum"}
D "middle_name"] = "Thau"
D[print(D["first_name"])
"Title"] = "Emperor"
D[print(D)
# D["Subtitle"]
Jeremy
{'first_name': 'Jeremy', 'last_name': 'Teitelbaum', 'middle_name': 'Thau', 'Title': 'Emperor'}
Arrays
import numpy as np
=np.array([1,2,3,4])
x=np.linspace(-5,5,10) x
Booleans
= True
T = False
F print(T or F) # or
print(T and F) # and
3 == 5 # equality
3 > 5 #
3 < 5 #
= (3 <= 5) #
x print(x)
= (3 != 5) #
y print(y)
True
False
True
True
Functions
import scipy.stats as sps
def my_function(n,mu,s):
= sps.norm.rvs(mu,s,size=n)
x return x
Important concepts: - arguments - scope - return values
Scope
Basic rule of scope: Variables created inside functions are completely separate from those outside the function, changing them has no effect.
Exception: some operations (such as list append) modify an element in place and in these cases you may end up modifying something.
def f(a,b):
=a+b
xreturn x
=3
xprint("before executing f, x={}".format(x))
print(f(2,5))
print("after executing f, x={}".format(x))
before executing f, x=3
7
after executing f, x=3
def f(x):
=x+["d"]
xreturn x
=["a","b","c"]
Lprint("L before is {}".format(L))
print("result of f(L) is {}".format(f(L)))
print("L after is {}".format(L))
L before is ['a', 'b', 'c']
result of f(L) is ['a', 'b', 'c', 'd']
L after is ['a', 'b', 'c']
def f(x):
"d") #
x.append(return x
# Warning
= ["a","b","c"]
x print(f(x))
print(x)
['a', 'b', 'c', 'd']
['a', 'b', 'c', 'd']
= 55
x
def f(n):
= n+x
n return n
24) f(
79
Iteration
for x in range(10):
print(x,end=',')
print('\n---')
for x in ["a","b","c"]:
print(x,end=',')
# Also available: while
0,1,2,3,4,5,6,7,8,9,
---
a,b,c,
Logic
if 3<5:
print("ha")
else:
print("ba")
ha
if 3+5==8 and 3-5==-2:
print("Yeah!")
else:
print("Nah!")
Yeah!
if 3+5 in [1,2,3,4,5,6,7]:
print("Yeah")
else:
print("nah!")
nah!
List Comprehensions
This is one of the most useful things about python.
= ["hello","Hello","HELLO","jeremy","jereMy"]
L = [f(x) for x in L]
N = [f(x) for x in L if x[0]=="H"]
M print(N,M)
['helloc', 'Helloc', 'HELLOc', 'jeremyc', 'jereMyc'] ['Helloc', 'HELLOc']
Another example.
="Jeremy Teitelbaum"
s=[x for x in list(s) if x not in [" "]]
Lprint(L)
['J', 'e', 'r', 'e', 'm', 'y', 'T', 'e', 'i', 't', 'e', 'l', 'b', 'a', 'u', 'm']
Compare:
=""
Sfor x in "Jeremy Teitelbaum":
if x not in [" "]:
+=x S
A few other tricks
- default arguments
- docstrings
def f(x=0,y=1):
return x+y
print(f())
print(f(1))
print(f(3,4))
1
2
7
def first_letter_cap(s):
"Returns s but first letter of string is upper case"
return s[0].upper()+s[1:]
Some examples
Take a string and make its first character upper case and the rest lower.
def f(s):
= s[0].upper()+s[1:].lower()
l return l
print(f("hello"),f("Hello"),f("HELLO"))
Hello Hello Hello
Now do this for each element of a list.
def h(L):
=[]
Nfor x in L:
= N + [f(x)]
N return N
"hello","HELLO","jeremy","JEREMY","jerEmy"]) h([
['Hello', 'Hello', 'Jeremy', 'Jeremy', 'Jeremy']
Problems
- Write a function which takes a string and standardizes it by:
- removes all characters which are not letters, numbers, or spaces
- makes all the letters lower case
- replacing all spaces by underscore ’_’
Hint: convert the string to a vector of letters.
- The
penguins_raw.csv
file can get loaded into a pandas dataframe, which is a fancy type of tabular layout. It has named columns that you can extract with.columns
import pandas as pd
= pd.read_csv("data/penguins-raw.csv")
penguins_raw penguins_raw.columns
Index(['studyName', 'Sample Number', 'Species', 'Region', 'Island', 'Stage',
'Individual ID', 'Clutch Completion', 'Date Egg', 'Culmen Length (mm)',
'Culmen Depth (mm)', 'Flipper Length (mm)', 'Body Mass (g)', 'Sex',
'Delta 15 N (o/oo)', 'Delta 13 C (o/oo)', 'Comments'],
dtype='object')
You can assign to penguins_raw.columns
to change the column names. Use your function from part 1 to standardize and simplify the column names.
You can access a column of the dataframe using
.
, so for examplepenguins_raw.Species
should give you the column species. Replace this column with just the first word of the species name (Gentoo, Adelie, Chinstrap). To do this, you have to use themap
method. Iff
is a function that picks out the first word of a string, thenpenguins_raw.species.map(f)
returns the result of applyingf
to every element ofpenguins_raw.species
.Let \(n\) be a positive real number and let \(x_0\) be 1. The iteration \[ x_{k+1} = x_{k}/2+n/(2x_k) \]
converges to the square root of \(n\). (This is Newton’s Method). Write an R function which computes the square root using this iteration. You should continue to iterate until \(x_{k+1}\) is within \(10^{-6}\) of \(x_{k}\).
# def f(n):
#
# ...
Suppose you want to save the successive values you computed during the iteration for plotting purposes. How could you do that (and return them)?
Suppose you want the tolerance (here \(10^{6}\)) to be a parameter?
Suppose you want to set a maximum number of iterations, in case something goes wrong, to prevent an infinite loop?