
MATH/COSC 3570 Introduction to Data Science

The goal of tidyr is to help you tidy your data via
NAs should be treatedmore columns
more rows by pivot_longer()
# A tibble: 6 × 3
  customer_id item_no item        
        <dbl> <chr>   <chr>       
1           1 item_1  bread       
2           1 item_2  milk        
3           1 item_3  banana      
4           2 item_1  milk        
5           2 item_2  toilet paper
6           2 item_3  <NA>        
pivot_longer() and pivot_wider()
To fix these problems, we’ll need pivot_longer() and pivot_wider()
Starts with a data set,
pivot_longer() add more rows and decreases the number of columns.pivot_wider() add more columns and decreases the number of rows.pivot_longer() and pivot_wider()
One variable spreads across multiple columns

One subject is scattered across multiple rows

pivot_longer() and pivot_wider()
more columns
more rows by pivot_longer()
# A tibble: 6 × 3
  customer_id item_no item        
        <dbl> <chr>   <chr>       
1           1 item_1  bread       
2           1 item_2  milk        
3           1 item_3  banana      
4           2 item_1  milk        
5           2 item_2  toilet paper
6           2 item_3  <NA>        
pivot_longer()pivot_longer()pivot_longer()pivot_longer()data: data frame
cols: columns to pivot into longer format
names_to: name of the column where column names of pivoted variables go (character string)
values_to: name of the column where data values in pivoted variables go (character string)
# A tibble: 2 × 4
  customer_id item_1 item_2       item_3
        <dbl> <chr>  <chr>        <chr> 
1           1 bread  milk         banana
2           2 milk   toilet paper <NA>  
# A tibble: 6 × 3
  customer_id item_no item        
        <dbl> <chr>   <chr>       
1           1 item_1  bread       
2           1 item_2  milk        
3           1 item_3  banana      
4           2 item_1  milk        
5           2 item_2  toilet paper
6           2 item_3  <NA>        
In customers data,
Names item_1, item_2, item_3 are values of variable item_no in purchases
Values bread, milk, etc are values of variable item in purchases
purchases data set and the prices data can now be joined together with the common key variable item. 
data: data framenames_from: which column variable in the long format contains the what should be column names in the wide formatvalues_from: which column variable in the long format contains the what should be (cell) values in the new columns in the wide format# A tibble: 6 × 3
  customer_id item_no item        
        <dbl> <chr>   <chr>       
1           1 item_1  bread       
2           1 item_2  milk        
3           1 item_3  banana      
4           2 item_1  milk        
5           2 item_2  toilet paper
6           2 item_3  <NA>        
# A tibble: 2 × 4
  customer_id item_1 item_2       item_3
        <dbl> <chr>  <chr>        <chr> 
1           1 bread  milk         banana
2           2 milk   toilet paper <NA>  

17-tidyr (Present your work!)
In lab.qmd ## Lab 17 section,
Import trump.csv. Call it trump_data as below on the left.
Use pivot_longer() to transform trump_data into the data set trump_longer on the right.
# A tibble: 5,404 × 4
  subgroup date       rating_type rating_value
  <chr>    <date>     <chr>              <dbl>
1 Voters   2020-10-04 approval            44.7
2 Voters   2020-10-04 disapproval         52.2
3 Adults   2020-10-04 approval            43.2
4 Adults   2020-10-04 disapproval         52.6
5 Adults   2020-10-03 approval            43.2
6 Adults   2020-10-03 disapproval         52.6
# ℹ 5,398 more rows
BONUS 💳: Use trump_longer to generate a plot like the one below.

   customer_id  item_no          item
0            1        1         bread
1            2        1          milk
2            1        2          milk
3            2        2  toilet paper
4            1        3        banana
5            2        3           NaN
item_no          1             2       3
customer_id                             
1            bread          milk  banana
2             milk  toilet paper     NaN