Let’s say you have hundreds of columns and you need to create a filter with the same condition like ‘greater than 10’ for all the columns or some of the columns.

You don’t want to define the condition for each of the hundred columns, especially when the condition is exactly the same!

This is where filter_all, filter_at, filter_if commands come in rescue. They all can apply the same condition on multiple columns and filter the data, but in slightly different ways.

Filter Basic

First, let’s make sure we are all on the same page when it comes to filtering the data.

Filtering the data in R and Exploratory is super simple. All you need to do is to use filter command like the below.

In R:

filter(FL_DATE > as.Date("2016-09-15"))

In Exploratory:

If you want to add another filtering condition to keep the rows where FL_DATE is not only greater than 2016-09-15, but also less than 2016-09-20, you can simply add another condition.

In R, you want to use ‘&’ to add an extra condition.

filter(FL_DATE >= as.Date("2016-09-15") & FL_DATE <= as.Date("2016-09-20"))

In Exploratory:

You can simply add another condition either from the column header menu or by clicking the plus button.

By using Pivot table we can see the result more clear.

Use BETWEEN function

By the way, the above two conditions can be consolidated into one function with ‘between’ function from lubridate package.

In R:

filter(between(FL_DATE, as.Date("2016-09-15"), as.Date("2016-09-20")))

In Exploratory:

Apply Filter for Multiple Columns Together

Now, we want to apply this condition not just for FL_DATE but also other columns.

To demonstrate, I have artificially created a new column called ‘FL_DATE_Plus15’, which I just added 15 days with Mutate command like below.

In R:

mutate(FL_DATE_Plus15 = FL_DATE + days(15))

In Exploratory:

And here is the Pivot Table showing the two columns and the number of the rows for each pair.

To apply the same filter condition, you can simply add another condition for FL_DATE_Plus15 with ‘&’ like below.

filter(between(FL_DATE, as.Date("2016-09-15"), as.Date("2016-09-20")) & between(FL_DATE_Plus15, as.Date("2016-09-15"), as.Date("2016-09-20")))

This is fine when you have only two columns to care. But imagine you have dozens or hundreds of such columns. That would be annoying.

Now, here’s the good news.

You can use filter_if or filter_at commands to address this problem.

Applying Filter for Multiple Columns with Column Selection Methods with filter_at

First, let’s say we know the column names for which we want to apply the filter condition. Their names all start with “FL_DATE”. Then you can use filter_at command like the below.

filter_at(vars(starts_with("FL_DATE")), all_vars(between(., as.Date("2016-09-03"), as.Date("2016-09-20"))))

The first argument with ‘vars’ function is to select the columns. Inside ‘vars’, you can use all the functions that are supported by SELECT command such as ‘starts_with’, ‘ends_with’, ‘contains’, etc.

You can also just list up all the column names or set the selection range with the starting column and the ending column here as well.

The second argument is the condition. The condition needs to be inside the all_vars function. This can be any_vars function as well.

all_vars will keep only the rows as long as the values of the columns that are selected in the vars function are all satisfying the condition.

any_vars will keep the rows as long as values of at least one of the columns are satisfying the condition.

By the way, inside the between function, the first argument is a dot (.). This is basically a placeholder for all the columns that are returned by the vars function that selects the columns.

In Exploratory, you can use Custom Command input.

And type the following command.

filter_if(is.Date, all_vars(between(., as.Date("2016-09-15"), as.Date("2016-09-20"))))

When you run it you won’t get any result because the condition is too tight. All the columns, in this case, FL_DATE and FL_DATE_Plus15 need to satisfy the condition, but none of the rows can satisfy it. Hence, no data returned.

If I changed the condition to be a bit loose by between 2016-09-03 and 2016-09-20 like the below,

filter_if(is.Date, all_vars(between(., as.Date("2016-09-03"), as.Date("2016-09-20"))))

we would get some data returned.

This is trying to satisfying the condition with all the columns at the same time.

But maybe sometimes you are ok as long as one of the columns is satisfying the condition. Then you can switch all_vars to any_vars.

filter_if(is.Date, any_vars(between(., as.Date("2016-09-03"), as.Date("2016-09-20"))))

Selecting Columns based on Data Type

You might want to select the columns based on the data type and apply the same filtering condition to them. Then, you want to use filter_if command.

Here is an examples. 

filter_if(is.Date, all_vars(between(., as.Date("2016-09-03"), as.Date("2016-09-20"))))

The first argument is evaluating each column and returns a list of the columns that satisfy this condition. In this case, ‘is.Data’ is the column evaluation function, which evaluates whether each column’s data type is Date or not.

And the second argument is basically the same as we have seen above. It is a condition that evaluates each row value of each column.

With this example, there are only two columns - FL_DATE and FL_DATE_Plus15 - whose data types are Date. So this filter_if command would keep only the rows whose FL_DATE values and FL_DATE_Plus15 values are between 2016-09-03 and 2016-09-20 at a same time.

You can use any_vars function here as well.

filter_if(is.Date, any_vars(between(., as.Date("2016-09-03"), as.Date("2016-09-20"))))

This will keep the rows as long as values of one of FL_DATE and FL_DATE_Plus15 columns are between 2016-09-03 and 2016-09-20 at the same time.

Filter command of dplyr works great in general. But having filter_at, filter_if, and filter_all in your pocket can save you a lot of time especially when you have dozens or hundreds of columns that you want to apply the same condition.