When working with data, you often come across a situation where "What kind of operation should this be like?" One of the typical ones is operation by Gather command included in the Tidyr package. This is a very convenient operation to know. I would like to explain with some examples.
Let's take a look at the data provided by the World Bank on the trends in population for each country for each year.
In this data, the rows represent each country, the columns represent each year, and the intersection of countries represents the country's population for that year.
You may often see data that has column names as part of the data like this. I often see this type of data especially in Excel format. This type of data is called Wide data. The shape of the Wide data will increase horizontally as the amount of data increases (in this case, the amount of annual data increases).
This form of data is convenient when you just view it, but it is inconvenient when further processing and analyzing the data. For example, consider creating a chart that displays the rate of population change from the previous year for each country like the following.
It would be hard to make a chart like this when years are at columns. In this case, the ideal form of the data would be to have the year as part of the data, not the column.
This type of data is called Long data, as opposed to the Wide data. The shape of the Long data grows longer in the vertical direction as the amount of data increases.
The "Gather" command is a command to convert Wide data to Long data. (There is actually a command called "Spread" that does the opposite way. Please see here for details.) Here is an example of how to convert the data in the above example using the Gather command.
gather(dataframe, Year, Population, `1959`:`2016`)
You don't need to memorize the command detail if you use Exploratory. Let's take a look at how to do it on Exploratory.
Go to the table view and click the leftmost column of the range of columns you want to convert. That column is selected.
Shift-click the rightmost column of the range of columns you want to convert. All columns from the left edge to the right edge are selected.
Select the Gather command from the column header menu and select the Selected Range.
Specify "Year" for the Key Column of the new column name and "Population" for the Value Column.
Click the Run button once you are done. Then you will see the data is converted to Long data.
I introduced the Gather command to convert Wide data to Long data. I have summarized how the Gather command works in the figure following. Hope this helps you to understand the concept.
There is actually a command called "Spread" that does the opposite way. Please see here for details.
If you want to quickly try it out but you don’t have an Exploratory account yet, sign up from our website for a 30 days free trial without a credit card!
If you happen to be a current student or teacher at schools, it’s free! Sign up for a Community Plan.
And, if you don’t mind sharing what you create publicly you can sign up for the Public edition of Exploratory (FREE)!