MATH/COSC 3570 Introduction to Data Science
quit
or exit
to change Console back to R.Math functions in R are built-in.
Use <-
to do assignment. Why
character
, double
, integer
and logical
.
c()
, short for concatenate or combine.:
to create a sequence of integers.seq()
to create a sequence of numbers of type double
with more options.
[]
with element indexing.factor
can be ordered in a meaningful way.
factor()
. It is a type of integer, not character. 😲 🙄Lists are different from (atomic) vectors: Elements can be of any type, including lists.
Construct a list by using list()
.
Return an element of a list
If list
x
is a train carrying objects, thenx[[5]]
is the object in car 5;x[4:6]
is a train of cars 4-6.— @RLangTip, https://twitter.com/RLangTip/status/268375867468681216
dim
.matrix()
to create a matrix.,
to separate row and column index.mat[2, 2]
extracts the element of the second row and second column.cbind()
(binding matrices by adding columns)
rbind()
(binding matrices by adding rows)
When matrices are combined by columns (rows), they should have the same number of rows (columns).
data.frame()
that takes named vectors as input “element”.Data frame has properties of matrix and list.
Can use either list or matrix subsetting methods.
05-R Data Type Summary
In lab.qmd Lab 5,
Create R objects vector v1
, factor f2
, list l3
, matrix m4
and data frame d5
.
Check typeof()
and class()
of those objects, and create a list having the output below.
v1 <- __________
f2 <- __________
l3 <- __________
m4 <- __________
d5 <- __________
v <- c(type = typeof(v1), class = class(v1))
f <- c(type = __________, class = _________)
l <- c(type = __________, class = _________)
m <- c(type = __________, class = _________)
d <- c(type = __________, class = _________)
____(vec = v,
______ = ___,
______ = ___,
______ = ___,
______ = ___)
[]
.0
: the 1st element-1
: the last elementWhat does lst[0:1]
return? Is it a list?
Lists are changed in place!
list.method()
Tuples work exactly like lists except they are immutable, i.e., they can’t be changed in place.
To create a tuple, we use ()
.
Note
Lists have more methods than tuples because lists are more flexible.
A dictionary consists of key-value pairs.
A dictionary is mutable, i.e., the values can be changed in place and more key-value pairs can be added.
To create a dictionary, we use {"key name": value}
.
The value can be accessed by the key in the dictionary.
{'Name': 'Ivy', 'Age': 9, 'Class': 'Third'}
dict_items([('Name', 'Ivy'), ('Age', 9), ('Class', 'Third')])
06-Python Data Structure
In lab.qmd Lab 6,
Remember to create Python code chunk
Any issue of this Python chunk?
Commit and Push your work once you are done.
Python built-in data structures are not specifically for data science.
To use more data science friendly functions and structures, such as array or data frame, Python relies on packages NumPy
and pandas
.
In your lab-yourusername project, run
Go to Tools > Global Options > Python > Select > Virtual Environments
You may need to restart R session. Do it, and in the new R session, run
plot()
mpg cyl disp hp
Mazda RX4 21.0 6 160 110
Mazda RX4 Wag 21.0 6 160 110
Datsun 710 22.8 4 108 93
Hornet 4 Drive 21.4 6 258 110
Hornet Sportabout 18.7 8 360 175
Valiant 18.1 6 225 105
Duster 360 14.3 8 360 245
Merc 240D 24.4 4 147 62
Merc 230 22.8 4 141 95
Merc 280 19.2 6 168 123
Merc 280C 17.8 6 168 123
Merc 450SE 16.4 8 276 180
Merc 450SL 17.3 8 276 180
Merc 450SLC 15.2 8 276 180
Cadillac Fleetwood 10.4 8 472 205
matplotlib.pyplot
mpg cyl disp hp
0 21.0 6 160.0 110
1 21.0 6 160.0 110
2 22.8 4 108.0 93
3 21.4 6 258.0 110
4 18.7 8 360.0 175
5 18.1 6 225.0 105
6 14.3 8 360.0 245
7 24.4 4 146.7 62
8 22.8 4 140.8 95
9 19.2 6 167.6 123
10 17.8 6 167.6 123
11 16.4 8 275.8 180
12 17.3 8 275.8 180
13 15.2 8 275.8 180
14 10.4 8 472.0 205
plt.subplots
for more details.boxplot()
boxplot()
hist()
hist()
decides the class intervals/with based on breaks
. If not provided, R chooses one.hist()
barplot()
barplot()
pie()
3 4 5
46.9 37.5 15.6
[1] "3 gears: 46.88%" "4 gears: 37.5%" "5 gears: 15.62%"
pie()
image()
image()
function displays the values in a matrix using color.In Python,
fields::image.plot()
scatterplot3d()
In Python,
persp()
In Python,
07-Plotting (Bonus question!)
In lab.qmd ## Lab 7
,
mtcars
data, use R or Python to
make a scatter plot of miles per gallon
vs. weight
. Decorate your plot using arguments, col
, pch
, xlab
, etc.
create a histogram of 1/4 mile time. Make it beautiful!
Find your mate and work in pairs.
Two volunteer pairs teach us how to make beautiful plots next Tuesday (Feb 13)!
The presenters will be awarded a hex sticker! 😎
We will talk about data visualization in detail soon!