How to Transform Wide Data to Long Data with Gather Command

When working with data, you often come across a situation where "What kind of operation should this be like?" One of the typical ones is operation by Gather command included in the Tidyr package. This is a very convenient operation to know. I would like to explain with some examples.

When to Use

Let's take a look at the data provided by the World Bank on the trends in population for each country for each year.

In this data, the rows represent each country, the columns represent each year, and the intersection of countries represents the country's population for that year.

You may often see data that has column names as part of the data like this. I often see this type of data especially in Excel format. This type of data is called Wide data. The shape of the Wide data will increase horizontally as the amount of data increases (in this case, the amount of annual data increases).

This form of data is convenient when you just view it, but it is inconvenient when further processing and analyzing the data. For example, consider creating a chart that displays the rate of population change from the previous year for each country like the following.

Loading...

It would be hard to make a chart like this when years are at columns. In this case, the ideal form of the data would be to have the year as part of the data, not the column.

This type of data is called Long data, as opposed to the Wide data. The shape of the Long data grows longer in the vertical direction as the amount of data increases.

How to Use

The "Gather" command is a command to convert Wide data to Long data. (There is actually a command called "Spread" that does the opposite way. Please see here for details.) Here is an example of how to convert the data in the above example using the Gather command.

gather(dataframe, Year, Population, `1959`:`2016`) 
  • 1st argument: data frame name.
  • 2nd argument: Column name that the column name (in this case, 1960, etc.) will be entered as data after conversion
  • 3rd argument: the column name that will contain the data (in this case the population data) after conversion.
  • 4th argument: the range of columns to convert. In this case, we are covering all the columns between 1959 and 2016. It is also possible to specify each column individually, rather than a range.

You don't need to memorize the command detail if you use Exploratory. Let's take a look at how to do it on Exploratory.

Go to the table view and click the leftmost column of the range of columns you want to convert. That column is selected.

Shift-click the rightmost column of the range of columns you want to convert. All columns from the left edge to the right edge are selected.

Select the Gather command from the column header menu and select the Selected Range.

Specify "Year" for the Key Column of the new column name and "Population" for the Value Column.

Click the Run button once you are done. Then you will see the data is converted to Long data.

Summary

I introduced the Gather command to convert Wide data to Long data. I have summarized how the Gather command works in the figure following. Hope this helps you to understand the concept.

There is actually a command called "Spread" that does the opposite way. Please see here for details.


Try it for Yourself!

If you want to quickly try it out but you don’t have an Exploratory account yet, sign up from our website for a 30 days free trial without a credit card!

If you happen to be a current student or teacher at schools, it’s free! Sign up for a Community Plan.

And, if you don’t mind sharing what you create publicly you can sign up for the Public edition of Exploratory (FREE)!