Start Jupyter Lab
You can arrange panels as you like them.
pwd
Notebook blends text (markdown) and code (python, in our case)
For those who know TeX, the notebook can render math:
$$\sum_{i=1}^{\infty}\frac{1}{n^2} = \frac{\pi^2}{6}$$Python is an interpreted language and the blocks of the notebook can be used as a calculator (a fancy one!)
name = "Jeremy"
age = 60
probability = .35
print(name, age, probability)
print(name, 'is', age,'years old with probability',probability)
Variables must be created before being used, by having a value assigned to them.
temperature
temperature=50
temperature*2
print("the temperature is:", temperature)
full_name = "Jeremy Teitelbaum"
full_name[3]
In python, indexing starts from zero!!!
full_name[0]
len(full_name)
a=123
b='123'
a[1]
b[1]
L = ['jeremy', 'kendra', 'parkisheet']
L[0]
full_name[1:3]
full_name[5:]
full_name[:5]
full_name[3:5:3]
full_name[:-1]
full_name[-1]
full_name[-1::-1]
L = ['a','b','c','d']
L[1:3]
L[-1]
L[::2]
magic_word = "thanos_must_die"
magic_word
?print(magic_word[4:8])
print(magic_word[7:])
print(magic_word[:3])
print(magic_word[:])
print(magic_word[2:-3])
print(magic_word[-3:2:-1])
print(magic_word[0:100])
In python, every variable has a type, but the language figures out the appropriate type by itself.
The most important types are:
x="this is a string"
a=137.50
b=12e-1
c=133
print(a)
print(b)
print(c)
type(a)
type(b)
type(c)
type(x)
They're what you'd expect, with some caveats.
x=3.5
y=x/5
z=x*22.4
w=x*1e-1
print("x=",x,"y=",y,"z=",z,"w=",w)
String addition is concatenation.
x="Jeremy"+"Teitelbaum"
print(x)
Mixing floats and integers is ok (you get a float) but mixing strings and numbers is a problem.
1+3.5
"Jeremy"+3
You can convert floats or ints to strings and then combine them:
"Jeremy Teitelbaum is "+60+" years old"
"Jeremy Teitelbaum is "+str(60)+" years old"
3*"a"
type("hello")
type(60)
type(3.6)
len("Jeremy")
len(3.5)
L = ['jeremy','kendra', 'pariksheet','dyanna']
S = ['mouse', 'cat']
L+S
sorted(L)
['a']*3
L = L + ['x']
print(L)
list('abcdefg')
''.join(['a','b','c'])
f = {}
f['jeremy']=15
f['kendra']=33
f['marshmallow']=100
f['french_fries']='hello'
f['jeremy']
f['french_fries']
f[0]
f.keys()
f.values()
a = dict(name=['jeremy','kendra','pariksheet'],gender=['m','f','m'])
a
# this goes in a code cell, but doesn't get executed
round(3.12131,3)
max('hello')
min("hello")
round("hello")
1/2
result = 60
report = "the result is "+str(result)
print(report)
#help(type)
help(round)
round(123,-2)
round(1351,-1)
#help()
report = "Now is the time to flee'
1/0
y = 15-23
x = y+8
z= 3/x
max()
min()
False, class, finally, is, return, None, continue, for, lambda, try, True, def, from, nonlocal, while, and, del, global, not, with, as, elif, if, or, yield, assert, else, import, pass, break, except, in, raise
All of the interesting capabilities of the language come from extensions to the basic language; these extensions are called "libraries" or "modules".
The language also includes about 100 built-in functions. The functions
int, len, print, str, type, max, min
are built-in library. The built-in functions are very primitive.
To get access to more interesting functions, you need to
import
them by importing the library that defines them. Many interesting capabilities are added by the standard library
import numpy
import pandas
print('pi is ',numpy.pi)
print('e is ',numpy.exp(1))
Notice that we refer to the function exp, for example, by first naming the library it came from.
exp(1)
math.exp(1)
You can get help on a library/module using the help command.
The import command also allows you to adopt abbreviations.
import numpy as np
np.exp(1)
from numpy import exp
exp(1)
import numpy.random
numpy.random.choice([1,2,3,4,5])
np.random.choice(list('ACTTGCTTGAC'))
import math
import random
bases = "ACTTGCTTGAC"
n_bases = len(bases)
idx = np.random.randint(n_bases)
print("random base", bases[idx], "base index", idx)
import numpy.random as rnd
rnd.randint(20)
a=np.arange(0,10,.1)
numpy.cos(a)
3*a
a*a
a+numpy.exp(a)
import pandas as pd
df = pd.DataFrame.from_dict(a)
df
data = pd.read_csv('../gapminder_data.csv')
data.columns
data.head()
data = pd.read_csv('../gapminder_data.csv', index_col='country')
data.columns
data.info()
data['pop']
data[['year','pop']]
data.loc['Afghanistan']
data['continent']=='Asia'
data[data['continent']=='Asia']
data[data['pop']>1e8]
data[data['lifeExp']<60]
data.iloc[33]
data.iloc[33,3]
data.loc['Afghanistan','pop']
data.loc['Afghanistan','pop'].describe()
s=data.groupby(['continent']).mean()
s.head()
averages_by_continent=data.groupby(['continent','year']).mean()
averages_by_continent.loc['Asia'].round()
lifeExp_over_time = pd.pivot_table(data,index='country',columns='year',values='lifeExp')
lifeExp_over_time.loc['Afghanistan',:]
stats_over_time = pd.pivot_table(data,index=['continent','country'],columns='year',values=['lifeExp','gdpPercap','pop'])
stats_over_time_by_continent=stats_over_time.groupby('continent').mean()
stats_over_time_by_continent['pop'].round(-3)
import matplotlib.pyplot as plt
plt.style.use('ggplot')
plt.plot([1,2,3],[1,4,9],c='blue',linewidth=3,linestyle='solid')
plt.xlabel('x axis')
plt.ylabel('y axis')
plt.suptitle('A demo plot')
plt.plot([1,2,3],[1,4,9],c='orange',linewidth=1,linestyle='dashed')
plt.xlabel('x axis')
plt.ylabel('y axis')
plt.suptitle('A demo plot')
plt.scatter([1,2,3],[1,4,9],c=[0,1,2],s=100)
plt.plot([1,2,3],[1,4,9])
plt.xticks([1,2,3])
plt.yticks([1,4,9])
plt.xlim(0,5)
plt.ylim(0,10)
plt.grid(False)
data.head()
lifeExp_over_time.head()
lifeExp_over_time.T['Afghanistan'].plot()
transpose = lifeExp_over_time.T
transpose[['Afghanistan','Germany']].plot(kind='bar')
_=stats_over_time.groupby('continent').mean()['gdpPercap'].T.plot()
stats_over_time.groupby('continent').mean()['lifeExp'].T.plot()
_=plt.suptitle('Mean Life Expectancy over Time')
_=stats_over_time['gdpPercap'].loc[['Africa']].mean().T.plot(kind='bar',legend=None)
data.plot(kind='scatter',y='gdpPercap',x='lifeExp',c='year',logy=True,figsize=(10,10))
plt.savefig('scatter.png')
pwd
_=data[data['year']==2002].groupby('continent').sum()['pop'].plot(kind='pie',figsize=(10,10),label='Population')
_=data[(data['year']==2002) & (data['continent']=='Asia')].groupby('country').sum()['pop'].sort_values().plot(kind='bar',figsize=(10,10),)