How to perform One-Hot Encoding for Categorical Columns

Sometimes, you want to transform one single categorical column to a set of numerical columns. This is useful when you use machine learning algorithms that require all the variables to be presented as numerical.

For example, assume you have Segment column in your data frame and this column has 3 unique values "Consumer" or "Corporate", or "Home Office" like below.

In this case, we want to create ‘dummy’ columns (e.g "Segment_Consumer", "Segment_Corporate", and "Segment_Home Office") each of which represents each of the categorical value of the Segment column and has 0 or 1 based on whether it matches or not. This technique is commonly called One-Hot encoding.

To do One Hot Encoding, click Plus (+) button left next to the "Steps" and select One Hot Encoding under Others...* menu.

Then below dialog opens up so select Segment column then click the "Run" button.

This will create 3 dummy columns (i.e "Segment_Consumer", "Segment_Corporate", and "Segment_Home Office") like below and each dummy column has 1 if the row originally has the value for the Segment column and 0 if not.