Tidyverse packages overview ( and others )

dplyr

There is a package specifically designed for helping you wrangle your data. This package is called dplyr

and will allow you to easily accomplish many of the data wrangling tasks necessary. Like tidyr, this package is a core package within the tidyverse, and thus it was loaded in for you when you ran library(tidyverse) earlier. We will cover a number of functions that will help you wrangle data using dplyr:

%>% – pipe operator for chaining a sequence of operations
glimpse() – get an overview of what’s included in dataset
filter() – filter rows
select() – select, rename, and reorder columns
rename() – rename columns
arrange() – reorder rows
mutate() – create a new column
group_by() – group variables
summarize() – summarize information within a dataset
left_join() – combine data across data frame
tally() – get overall sum of values of specified column(s) or the number of rows of tibble
count() – get counts of unique values of specified column(s) (shortcut of group_by() and tally())
add_count() – add values of count() as a new column
add_tally() – add value(s) of tally() as a new column

tidyr

We will also return to the tidyr package. The same package that we used to reshape our data will be helpful when wrangling data. The main functions we’ll cover from tidyr are:

unite() – combine contents of two or more columns into a single column
separate() – separate contents of a column into two or more columns

janitor

The third package we’ll include here is the janitor package. While not a core tidyverse package, this tidyverse-adjacent package provides tools for cleaning messy data. The main functions we’ll cover from janitor are:

clean_names() – clean names of a data frame
tabyl() – get a helpful summary of a variable
get_dupes() – identify duplicate observations

If you have not already, you’ll want to be sure this package is installed and loaded: