Sometimes you want to write your data wrangling or even data analysis result back to your databases like Amazon Redshift. And this is relatively simple to do in R and in Exploratory.
I’m going to demonstrate this by using R custom script feature in Exploratory following the three steps below.
First, let install the R pacakge called redshiftTools. With this R pacakge, you can upload a data wrangling result to Amazon S3 then copy the data to Amazon Redshift.
To install the R Package, click Down Arrow Icon next to Project Name like below and select Manage R Packages menu.
Then type in redshiftTools in the input field and click Install button.
Once it’s installed you’ll see below success message.
Cick plus button next to Script at the left hand side pane to open the R Script editor.
And enter name in the Create Script Dialog.
Then type in the below script. Make sure to replace host, port, dbname, user, password to point to your database to get connection wth dbConnect. And then, call rs_replace_table function that uploads the data frame data to S3 then copy the data to Amazon Redshift.
write_back_to_redshift <- function(df) {
library(RPostgres)
library(redshiftTools)
con <- dbConnect(RPostgres::Postgres(),
dbname="exploratory",
host='xxxxxxxxxxxxxxxxx.amazonaws.com',
port='xxxx',
user='xxxx',
password='xxxxxxxxx',
sslmode='require')
b=rs_replace_table(df,
dbcon=con,
table_name='write_back_test',
bucket="xxxxxxxxxxxx",
region='xxxxxxxx',
access_key='xxxxxxxxxxxxxxx',
secret_key='xxxxxxxxxxxxxxx',
split_files=4)
df
}
This is a bucket the name of the S3 temporary bucket to load the data. Will look for AWS_BUCKET_NAME on environment if not specified.
This is a region of the S3 bucket. Will look for AWS_DEFAULT_REGION on environment if not specified.
This is an S3 access_key with permissions for the bucket. Will look for AWS_ACCESS_KEY_ID on environment if not specified.
This is a secret key with permissions fot the bucket. Will look for AWS_SECRET_ACCESS_KEY on environment if not specified.
To get Access Key and Secret Key, Go to AWS Console
Now it’s time to call the R function we have just created above. To call, you want to get into a Command Input mode by selecting ‘Custom Command’ from the Plus menu like below.
Then type in write_back_to_redshift() in the input field.
As mentioned above, this function takes a data frame data from the previous step and writes back to a specified database (Redshift DB in this case). Once that’s done, it will return the same data frame data.
Once you click green Run button, the data will be saved to your database.
Now, probably you don’t want to write the data back to the database every time you open this data frame or update the data wrangling steps.
For this, you can temporary disable the ‘write back’ step.
Make sure you have selected the Write Back step at the right hand side data wrangling step.
Click the disable icon on the step.
This will temporally disable this step so that this step will be ignored when it runs the data wrangling steps until you enable it back.
This is just an example of writing the data wrangling or analysis result back to Amazon Redshift database. But by using the R custom function feature you can do a lot more.
If you don’t have Exploratory Desktop yet, you can sign up from here for free. If you are currently a student or teacher, then it’s free!
If you are interested in learning various powerful Data Science methods ranging from Machine Learning, Statistics, Data Visualization, and Data Wrangling without programming, go visit our Booster Training home pageand enroll today!