EDA with R

Intro and Objectives

We will begin to do exploratory data analysis in R. After completing the activities in this module, you should be able to explore a dataset using:

  • descriptive statistics,

  • simple R scripts including writing your own functions,

  • basic (and not so basic) plots with ggplot2.

We are going to explore a dataset related to New York City condo evaluations for fiscal year 2011-2012. It was obtained from the NYC Open Data initiative - https://data.cityofnewyork.us/.

Readings

  • RforE - Chapters 6, 7, 8

  • r4ds - Chapters 1-2, 10-12

  • PDSwR - Chapters 3, 4

Downloads and other resources

Other Resources:

Activities

We will work through two tutorials on EDA (with a short detour on creating user defined functions in R)

Explore (OPTIONAL)

Data visualization

Percentiles

It’s easy to get enamored with averages. They don’t tell the whole story. Look at percentiles, too.

R Markdown