Data Wrangling with R
Welcome
1
Introduction
2
Import Data - 1
2.1
Introduction
2.2
Delimiters
2.2.1
Comma Separated Values
2.2.2
Semi Colon Separated Values
2.2.3
Space Separated Values
2.2.4
Tab Separated Values
2.3
Read Data
2.4
Column Names
2.5
Skip Lines
2.6
Maximum Lines
2.7
Column Types
2.8
Select Columns
2.9
Summary
3
Import Data - 2
3.1
Introduction
3.2
List Sheets
3.3
Read Sheet
3.3.1
Case 1: Specify the sheet number
3.3.2
Case 2: Specify the sheet name
3.4
Read Specific Cells
3.4.1
Method 1
3.4.2
Method 2
3.4.3
Method 3
3.4.4
Method 1
3.4.5
Method 2
3.4.6
Method 3
3.5
Read Specific Rows
3.6
Read Single Column
3.7
Read Multiple Columns
3.7.1
Summary
3.8
Statistical Softwares
3.8.1
STATA
3.8.2
SPSS
3.8.3
SAS
3.9
Summary
4
Data Wrangling - 1
4.1
Introduction
4.2
dplyr Verbs
4.3
Data
4.3.1
Data Dictionary
4.4
Case Study
4.5
Average Order Value
4.6
AOV by Devices
4.7
Syntax
4.8
Filter Rows
4.8.1
Case Study
4.9
Select Columns
4.9.1
Case Study
4.10
Grouping Data
4.10.1
Case Study
4.11
Summarise Data
4.11.1
Case Study
4.12
Create Columns
4.12.1
Case Study
4.13
Arrange Data
4.13.1
Case Study
4.14
AOV by Devices
4.15
Your Turn
5
Data Wrangling - 2
5.1
Introduction
5.2
Case Study
5.2.1
Data: Orders
5.2.2
Data: Customers
5.3
Example Data
5.4
Inner Join
5.4.1
Case Study: Details of customers who have placed orders and their order details
5.5
Left Join
5.6
Case Study: Details of customers and their orders irrespective of whether a customer has
5.7
Right Join
5.7.1
Case Study: Customer details for each order
5.8
Semi Join
5.8.1
Case Study: Details of customers who have placed orders
5.9
Anti Join
5.9.1
Case Study: Details of customers who have not placed orders
5.10
Full Join
5.10.1
Case Study: Details of all customers and all orders
6
Data Wrangling - 3
6.1
Introduction
6.2
Case Study
6.2.1
Data
6.2.2
Data Dictionary
6.3
Data Sanitization
6.4
Rename Columns
6.5
Data Tabulation
6.6
Sampling Data
6.7
Data Extraction
6.7.1
Sample Data
6.8
Between
6.9
Case When
7
Pipes
7.1
Introduction
7.2
Pipes
7.3
Data
7.3.1
Data Dictionary
7.4
head()
7.4.1
Using Pipe
7.5
Square Root
7.6
Visualization
7.6.1
Using pipe
7.7
Correlation
7.8
Regression
7.8.1
Using pipe
7.9
String Manipulation
7.10
Data Extraction
7.10.1
Extract Column
7.10.2
Extract List Element
7.11
Arithmetic Operations
7.11.1
Addition
7.11.2
Multiplication
7.11.3
Division
7.11.4
Power
7.12
Logical Operators
7.12.1
Greater Than
7.12.2
Weakly Greater Than
8
tibbles
8.1
Introduction
8.2
Creating tibbles
8.3
tibble features
8.3.1
never changes input’s types
8.3.2
never adjusts variable names
8.3.3
never prints all rows
8.3.4
never recycles vector of length greater than 1
8.4
Membership Testing
8.5
Tribble
8.6
Column Names
8.7
Add Rows
8.8
Add Columns
8.9
Rownames
8.9.1
Remove Rownames
8.9.2
Rownames to Column
8.9.3
Column to Rownames
8.10
Glimpse
8.11
Check Column
8.12
Summary
8.12.1
Creating tibbles
8.12.2
Modifying tibbles
8.12.3
Testing tibbles
9
Handling String Data
9.1
Introduction
9.2
Case Study
9.2.1
Data
9.2.2
Data Dictionary
9.3
Overview
9.4
Extract domain name from email ids
9.4.1
Steps
9.5
Extract Domain Extension
9.6
Extract image type from URL
9.6.1
Steps
9.7
Extract Image Dimesion from URL
9.7.1
Steps
9.8
Extract HTTP Protocol from URL
9.8.1
Steps
9.9
Extract file type
9.9.1
Steps
10
Working with Date & Time
10.1
Introduction
10.2
Quick Intro
10.2.1
Origin
10.2.2
Current Date/Time
10.3
Case Study
10.3.1
Data
10.3.2
Data Dictionary
10.4
Extract Date, Month & Year from Due Date
10.5
Compute days to settle invoice
10.6
Compute days over due
10.7
Is due year a leap year?
10.8
If due day is February 29, is it a leap year?
10.9
Shift Date
10.10
Interval
10.11
Intervals Overlap
10.12
How many invoices were settled within due date?
10.13
Shift Interval
10.14
Within
10.14.1
How many invoices were settled within due date?
10.15
Quarter
11
Categorical Data Analysis
11.1
Introduction
11.2
Case Study
11.2.1
Data
11.3
Tabulate Referrers
11.4
Reorder Referrers
11.5
Plot Referrer Frequency (Descending Order)
11.6
Plot Referrer Frequency (Ascending Order)
11.7
Case Study 2
11.7.1
Data
11.8
Tabulate Referrer
11.9
Collapse Referrer Categories
11.10
Lump Infrequent Referrer Types
11.11
Retain top 3 referrers
11.12
Lump Referrer Types with less than 10% traffic
11.13
Retain 3 Referrer Types with lowest traffic
11.14
Retain 3 Referrer Types with less than 10% traffic
11.15
Replace Levels
11.16
Drop Levels
11.17
Reorder Levels
11.18
Case Study 3
11.18.1
Data
11.19
Shift Levels
References
Published with bookdown
Data Wrangling with R
References