Reshaping a Wide Dataframe to Long in R: A Step-by-Step Guide Using Pivot_longer and pivot_wider

Reshaping a Wide Dataframe to Long in R

=============================================

In this section, we’ll go over the process of reshaping a wide dataframe to long format using pivot_longer and pivot_wider functions from the tidyr package.

Problem Statement


We have a dataset called landmark with 3 skulls (in each row) and a set of 3 landmarks with XYZ coordinates. The dataframe is currently in wide format, but we want to reshape it into long format with one column for the landmark name and three columns for X, Y, and Z coordinates.

Using Pivot Longer


We’ll start by using the pivot_longer function from the tidyr package to reshape our dataframe. This function allows us to pivot a column of values (in this case, the XYZ coordinates) into separate rows.

library(tidyr)

landmark |>
  pivot_longer(-SNPRC, names_sep = "_", names_to = c("landmark", "coord"))

In this code:

  • pivot_longer is the function used to reshape our dataframe.
  • -SNPRC specifies that we don’t want to include the SNPRC column in the reshaping process. This is done using the pipe operator (|>).
  • names_sep = "_" tells pivot_longer to separate the landmark names from the coordinates by an underscore (_).
  • names_to = c("landmark", "coord") specifies that we want the resulting columns to be called landmark and coord.

The output of this code will be a dataframe with one column for the landmark name and three columns for X, Y, and Z coordinates.

Using Pivot Wider


After reshaping our dataframe into long format using pivot_longer, we can use the pivot_wider function to convert it back into wide format.

# # A tibble: 9 × 5
landmark |>
  pivot_wider(names_from = coord, values_from = value)

In this code:

  • pivot_wider is the function used to reshape our dataframe back into wide format.
  • names_from = coord specifies that we want to use the coord column as the names for our new columns in the resulting dataframe.
  • values_from = value tells pivot_wider to include only the values from the value column.

The output of this code will be a dataframe with one row for each landmark and three separate columns for X, Y, and Z coordinates.

Additional Tips


One thing worth noting is that our coordinates are currently stored as character strings. To avoid any potential issues with data type conversions later on down the line, it’s a good idea to convert them to numeric values using the as.numeric() function from the dplyr package.

Here’s an example of how you could do this:

library(dplyr)

landmark |>
  mutate(across(X:Z, as.numeric))

By doing so, we ensure that our coordinates are stored in a format that can be easily manipulated and worked with in subsequent steps.

Conclusion

Reshaping a wide dataframe to long format is an important skill for any data analyst or scientist working with R. By using the pivot_longer and pivot_wider functions from the tidyr package, we can easily convert our dataframes between these two formats. Additionally, by paying attention to detail when it comes to data type conversions, we can avoid potential issues down the line and ensure that our code runs smoothly and efficiently.

Best Practices

  • Use the names_sep argument when using pivot_longer to specify how you want your column names to be separated.
  • Use the values_from argument when using pivot_wider to specify which column you want to use as the values in your new columns.
  • Pay attention to data type conversions, especially if your coordinates are stored as character strings.
  • Use the mutate function from the dplyr package to convert your coordinates to numeric values.

Last modified on 2023-05-20