Purrr::iwalk(): A Step-by-Step Guide to Deleting Rows in Lists of Data Frames

Understanding the Problem with purrr::iwalk()

Introduction to Purrr and iwalk()

Purrr is a package in R that provides a functional programming approach to data manipulation. It offers several functions, including map2, filter, and purrr::iwalk. The latter is used for iterating over a list of objects while keeping track of their indices.

In this article, we will explore how to delete rows from a list of data frames using the purrr::iwalk() function. We’ll go through the process step-by-step, discussing its usage and limitations.

Problem Description

The problem arises when dealing with a list of data frames where you want to remove specific rows based on certain conditions. You might be wondering if there’s an efficient way to do this using purrr::iwalk(). Let’s break down the issue:

  • We have a list of data frames, DGE_tables.
  • Each data frame has columns similar in structure but with some variations.
  • The task is to remove rows that contain specific values in a particular column (column1).

Without using purrr::iwalk(), we could create a new data frame by copying the desired rows from each original data frame, but this would be inefficient and prone to errors.

Using Purrr::keep()

The answer lies in purrr::keep() and purrr::discard(). These functions are specifically designed for filtering or removing elements from a list based on conditions. However, they work differently than what we initially expected when dealing with data frames.

  • purrr::keep() removes rows that do not meet the specified condition.
  • purrr::discard() removes rows that meet the condition.

In our scenario, if we want to remove rows where $column1 equals "to_delete", we need to use purrr::discard() followed by purrr::iwalk() to apply this operation to each data frame in our list.

Applying Purrr::keep() and Purrr::discard()

The answer is actually quite straightforward: use purrr::discard() for what you want (purrr::keep() would be the opposite).

# Load necessary library
library(purrr)

# Create a data frame with some values in column1 that need to be removed
dataframe <- data.frame(
    column1 = c("to_delete", "to_keep", "to_delete"),
    column2 = 1:3,
    stringsAsFactors = FALSE
)

# Apply purrr::discard() to remove rows where column1 equals 'to_delete'
purrr::discard(dataframe, ~ .x$column1 == "to_delete")

However, as we’ve already discovered in the original post:

# Load necessary library
library(purrr)

l <- list(
    list(col1 = 'to keep', col2 = 1),
    list(col1 = 'to discard', col2 = 2)
)

purrr::keep(l, ~ .x[['col1']] == 'to keep')
#&gt; [[1]]
#&gt; [[1]]$col1
#&gt; [1] "to keep"
#&gt; 
#&gt; [[1]]$col2
#&gt; [1] 1

purrr::discard(l, ~ .x[['col1']] == 'to discard')
#&gt; [[1]]
#&gt; [[1]]$col1
#&gt; [1] "to keep"
#&gt; 
#&gt; [[1]]$col2
#&gt; [1] 1

Example Usage: Applying Purrr::keep() to Data Frames

Now, let’s see how we can apply purrr::keep() directly to our data frames.

We want to remove the rows where $column1 equals "to_delete". This operation is straightforward and should be performed in a pipeline or by using vectorized operations if possible.

# Load necessary library
library(purrr)

DGE_tables <- list(
    # List of data frames
    data.frame(column1 = c("to_keep", "to_delete", "to_keep"),
              column2 = 1:3, stringsAsFactors = FALSE),
    data.frame(column1 = c("to_delete", "to_delete", "to_keep"),
              column2 = 4:6, stringsAsFactors = FALSE)
)

# Apply purrr::keep() to remove rows where column1 equals 'to_delete'
DGE_tables <- map(DGE_tables, ~ purrr::keep(., ~ .x$column1 == "to_delete"))

# Display the updated data frames
print(all.equal(DGE_tables[[1]], data.frame(
    column1 = c("to_keep", "to_keep"),
    column2 = 1:3, stringsAsFactors = FALSE)))
print(all.equal(DGE_tables[[2]], data.frame(
    column1 = c("to_keep", "to_keep"),
    column2 = 4:6, stringsAsFactors = FALSE)))

In this updated code:

  • We define a list of data frames (DGE_tables).
  • We use purrr::keep() to remove rows from each data frame where $column1 equals "to_delete".
  • Finally, we display the updated lists.

By directly applying purrr::keep(), we can efficiently filter out unwanted rows from our list of data frames.

Conclusion

In this article, we explored how to delete specific rows in a list of data frames based on conditions. We delved into the use cases for purrr::keep() and purrr::discard(), which are designed for filtering or removing elements from lists but work differently when applied to individual data frames.

By learning how to properly apply these functions, you can efficiently manage your data while maintaining a clear understanding of what’s happening.


Last modified on 2025-01-08