Cleaning up all the column names all together

R doesn’t like special characters or space in the column names. But you can still use them as long as you use ‘back-ticks’ to surround the column names. But this makes the column name references ugly and can cause unexpected error down the road.

This is when ‘clean_names’ function from ‘janitor’ package comes in handy.

‘janitor’ is an R package that provides many convenient functions to make your data wrangling with dirty data more efficient, and it’s built by Sam Firke.

How to Use it?

Import Unicorn Data by Web-Scraping

Let’s import our usual Unicorn (startups whose valuations are greater than $1 billions.) data from CB Insights web site.

You can copy and paste the URL and hit ‘Get Data’ button to scrape the table data from the web page.

Here is how it looks once the data is imported.

Use Clean Names Command

As you can see there are spaces in some of the column names and special characters or symbols like brackets and dollar signs.

We can run ‘clean_names’ function by selecting ‘Clean Column Names’ under ‘Others’ from the ‘Data Wrangling’ menu.

Now, you can see below that all the spaces are replaced with ‘_’ and the special characters are simply removed.

There are other options to clean up the column names. Take a look at this document page for more details.

Reference: