Data Structures and Functions in R and Python
Fundamentals of Data Science
Basic data structures for analysis
Both R and Python have data structures like excel spreadsheets that are the basic way to organize tabular data.
In R, these tools are packaged together in a family of libraries called the tidyverse.
In Python they are packaged in two closely related libraries, numpy (which handles numerical linear algebra) and pandas which handles tabular data.
In Python, these tabular data structures are called dataframes; in R they are called tibbles (there are dataframes in R as well but the tidyverse package mainly uses tibbles.)
Features
The basic operations that both R and Python offer are
- mapping a function to a column of data and creating a new column
- selecting a particular column
- filtering to select rows where column entries meet a condition
- grouping rows by keys
- summarizing data by computing sums, counts, averages, variances, and so on.
Visualization
In addition, both R and Python have plotting libraries that rely on dataframes/tibbles as input and libraries that apply ML algorithms to tabular data stored in dataframes/tibbles.
Walkthroughs
- R/basics of R programming
- R/tidyverse walkthrough
- Python/basics of python programming
- Python/pandas walkthrough
To download the materials including the data so you can work with them locally, follow this link.