Chapter 3 Data transformation

The dataset is in CSV form on the source website. We downloaded the CSV file and loaded it as a dataframe in RStudio. Most of the cleanings are done while analyzing the dataset in the result section.

In the result section, rows with missing data values are removed. Grouping of the data is also used to divide the observations according to the needed variables.

The original data set is messy because there exist multiple rows containing messy inspection information of one restaurant on different inspection date or violation information, etc.. We cleaned and reorganized the data in our need with functions such as group_by(), distinct(), filter() and summarise(). More details in “05-results” part.