How to Write Data Wrangling Result Back to Amazon S3 with R and Exploratory

Sometimes you want to write your data wrangling or even data analysis result back to your cloud storage like Amazon s3. And this is relatively simple to do with R and in Exploratory.

I'm going to demonstrate this by using the R custom script feature in Exploratory following the three steps below.

Create a Custom R Function

Click plus button next to Script at the left hand side pane to open the R Script editor.

And enter name in the Create Script Dialog.

Then type in the below script. Make sure to replace each <VARIABLE> with appropriate value.

write_back_to_s3 <- function(df, name) {
　library(aws.s3)
　Sys.setenv("AWS_ACCESS_KEY_ID" = "<ID>",
　　　　　　　"AWS_SECRET_ACCESS_KEY" = "<KEY>",
　　　　　　　"AWS_DEFAULT_REGION" = "<REGION>")
   readr::write_delim(df, file.path(tempdir(), str_c(name, ".csv")), append=FALSE, col_names=TRUE, delim=',', na="")
   put_object(
        file = file.path(tempdir(), str_c(name, ".csv")),
        object = str_c(name, ".csv"),
        bucket = "<BUCKET>"
    )
    df
}

Credentials

AWS_ACCESS_KEY_ID

This is an s3 access_key with permissions for the bucket.

AWS_SECRET_ACCESS_KEY

This is a secret key with permissions for the bucket.

AWS_DEFAULT_REGION

This is a region of the s3 bucket.

To get Access Key and Secret Key, Go to AWS Console

Call the Custom Function As a Custom Command

Now it's time to call the R function we have just created above. To call, you want to get into a Command Input mode by selecting 'Custom Command' from the Plus menu like below.

Then type in write_back_to_s3(., '<s3_csv_file_name>') in the input field. The first argument (.) represents the step data passed from the previous data wrangling step and the second argument is the file CSV file name that will be created in s3.

As mentioned above, this function takes a data frame data from the previous step and writes back to a s3 as CSV file. Once that's done, it will return the same data frame data.

Once you click green Run button, the data will be uploaded to s3.

How to stop Write Back

Now, probably you don't want to write the data back to the database every time you open this data frame or update the data wrangling steps.

For this, you can temporary disable the 'write back' step.

Make sure you have selected the Write Back step at the right hand side data wrangling step.

Click the disable icon on the step.

This will temporally disable this step so that this step will be ignored when it runs the data wrangling steps until you enable it back.

This is just an example of writing the data wrangling or analysis result back to Amazon Redshift database. But by using the R custom function feature you can do a lot more.

Try it for yourself!

If you don't have Exploratory Desktop yet, you can sign up from here for free. If you are currently a student or teacher, then it's free!

Learn Data Science without Programming

If you are interested in learning various powerful Data Science methods ranging from Machine Learning, Statistics, Data Visualization, and Data Wrangling without programming, go visit our Booster Training home pageand enroll today!