How to perform One-Hot Encoding for Categorical Columns

Sometimes, you want to transform one single categorical column to a set of numerical columns. This is useful when you use machine learning algorithms that require all the variables to be presented as numerical.

For example, assume you have Segment column in your data frame and this column has 3 unique values “Consumer” or “Corporate”, or “Home Office” like below.

In this case, we want to create ‘dummy’ columns (e.g “Segment_Consumer”, “Segment_Corporate”, and “Segment_Home Office”) each of which represents each of the categorical value of the Segment column and has 0 or 1 based on whether it matches or not. This technique is commonly called One-Hot encoding.

To do One Hot Encoding, open column header menu for the Segment column and select One Hot Encoding.

Then below dialog opens up so click “Run” button.

This will create 3 dummy columns (i.e “Segment_Consumer”, “Segment_Corporate”, and “Segment_Home Office”) like below and each dummy column has 1 if the row originally has the value for the Segment column and 0 if not.