Tidyverse packages overview ( and others )

< Tillbaka till R

dplyr

There is a package specifically designed for helping you wrangle your data. This package is called dplyr

 and will allow you to easily accomplish many of the data wrangling tasks necessary. Like tidyr, this package is a core package within the tidyverse, and thus it was loaded in for you when you ran library(tidyverse) earlier. We will cover a number of functions that will help you wrangle data using dplyr:

  • %>% – pipe operator for chaining a sequence of operations
  • glimpse() – get an overview of what’s included in dataset
  • filter() – filter rows
  • select() – select, rename, and reorder columns
  • rename() – rename columns
  • arrange() – reorder rows
  • mutate() – create a new column
  • group_by() – group variables
  • summarize() – summarize information within a dataset
  • left_join() – combine data across data frame
  • tally() – get overall sum of values of specified column(s) or the number of rows of tibble
  • count() – get counts of unique values of specified column(s) (shortcut of group_by() and tally())
  • add_count() – add values of count() as a new column
  • add_tally() – add value(s) of tally() as a new column

tidyr

We will also return to the tidyr package. The same package that we used to reshape our data will be helpful when wrangling data. The main functions we’ll cover from tidyr are:

  • unite() – combine contents of two or more columns into a single column
  • separate() – separate contents of a column into two or more columns

janitor

The third package we’ll include here is the janitor package. While not a core tidyverse package, this tidyverse-adjacent package provides tools for cleaning messy data. The main functions we’ll cover from janitor are:

  • clean_names() – clean names of a data frame
  • tabyl() – get a helpful summary of a variable
  • get_dupes() – identify duplicate observations

If you have not already, you’ll want to be sure this package is installed and loaded: