Creating a Recipe Function to Combine Multiple Data Wrangling Steps

You might have a set of Data Wrangling operations that you need to run often with many data sets you import into Exploratory.

For example, converting the data type from Character to Numeric or Date, removing empty rows, etc.

Manually adding these steps from UI one by one can quickly become cumbersome.

In such cases, you can convert such UI operations of Data Wrangling to a R function and the next time you import such data you can simply call the function.

This is how you can do it.

1. Convert Data Wrangling Steps to R Script

Let's say I have these two steps that I often use to convert the data type from Character to Numeric and to remove the outlier values of the Price column.

Now you can select the last step of the two steps and select 'Generate R Script' from the 'more menu.

It will generate a R script that can be run to produce the same result in a standalone R environment.

Now you want to copy the part that are equivalent of the previously shown steps.

Now, you're going to create a R function to run these two steps.

2. Create a R Function

Click the plus button next to the 'Script' to open a new Script Editor.

And paste the previously copied script.

Now, you want to make it a R function that can take a data frame as an input and return the processed data frame as an output.

convert_and_remove_outliers <- function(original_data){

new_data <- original_data %>%
mutate_at(vars(weekly_price, monthly_price, security_deposit, cleaning_fee, price), funs(parse_number)) %>%
filter(detect_outlier(price, "iqr") == "Normal")

new_data
  
}

This function 'convert_and_remove_outliers' can take a data frame and pass that to the 'mutate_ata' and 'filter' functions, and finally return the processed data as 'data'.

3. Call the Function as Custom R Script Step

Now, it's time to call the function.

Go to the new data frame and select 'Custom R Command' from the plust menu at the top of the Data Wranglnig Step pane.

And type 'convert_and_remove_outliers()'.

Now you will see the data being updated by the R function.