
The stepwise method is a technique that automatically adds or removes variables based on statistical criteria (primarily AIC) to explore the optimal combination of variables.
It is useful for determining “which variables (features) to use” to
improve the accuracy of predictive models. Since Exploratory’s standard
features do not support the stepwise method, you can perform variable
selection using the stepwise method by using the R
package MASS in the “Note” feature.
There are three main approaches to the stepwise method. The R script used in this guide adopts the “Both” direction.
First, install the package required to perform the stepwise method.
Select “Manage R Packages” from the project menu.

The R Package Management dialog will appear. Click “Install Packages” and then install the “MASS” package.

Select “Create Note” from the report’s plus button.

Once the Note window opens, click the “R Code” button from the plus button in the upper left.

A code block for writing R scripts has been added.

Use the following R script.
library(MASS) # For stepAIC function
# Load dataset
data(Employee_Data)
# Define full model (including all variables)
full.model <- lm(Salary ~ ., data = Employee_Data)
# Define null model (intercept only)
null.model <- lm(Salary ~ 1, data = Employee_Data)
# Model selection using stepwise method (forward selection, backward elimination, or both directions)
# direction = "both" means bidirectional selection (combination of forward and backward)
step.model <- stepAIC(null.model, scope = list(lower = null.model, upper = full.model), direction = "both", trace = 0)
# Summary of the optimal model
summary(step.model)
Executing this R script will display a summary of the finally selected model (coefficients, P-values, R-squared, etc.).

When applying the sample code above to your own data, change the following parts.
Change: data parameter of the
lm() function
# Example: If the data frame name is "Employee_Data"
full.model <- lm(salary ~ ., data = Employee_Data)
null.model <- lm(salary ~ 1, data = Employee_Data)
# Example: If the data frame name is "Sales_Data"
full.model <- lm(sales ~ ., data = Sales_Data)
null.model <- lm(sales ~ 1, data = Sales_Data)
Change: Left side of the equation in the
lm() function (left side of the tilde ~)
# Example: If the objective variable is "salary"
full.model <- lm(salary ~ ., data = Employee_Data)
null.model <- lm(salary ~ 1, data = Employee_Data)
# Example: If the objective variable is "sales"
full.model <- lm(sales ~ ., data = Sales_Data)
null.model <- lm(sales ~ 1, data = Sales_Data)
Change: trace parameter of the
stepAIC() function
# Display the selection process in detail
trace = 1
# Do not display the selection process (results only)
trace = 0
The summary() result after performing the stepwise
method includes the following information.
Call:
lm(formula = salary ~ job_level + job_role + years_at_company +
education_field, data = Employee_Data)
Residuals:
Min 1Q Median 3Q Max
-3842.9 -680.4 -1.4 660.5 4244.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 104.032 313.228 0.332 0.739839
job_level 2770.404 66.897 41.413 < 2e-16 ***
job_rolehuman_resources -302.437 219.961 -1.375 0.169356
job_rolelaboratory_technician -589.197 137.949 -4.271 2.07e-05 ***
job_rolemanager 4120.067 179.364 22.970 < 2e-16 ***
job_rolemanufacturing_director -109.317 135.826 -0.805 0.421049
job_roleresearch_director 4027.325 179.575 22.427 < 2e-16 ***
job_roleresearch_scientist -493.202 137.062 -3.598 0.000331 ***
job_rolesales_executive -85.282 124.543 -0.685 0.493601
job_rolesales_representative -663.596 177.870 -3.731 0.000198 ***
years_at_company 43.609 6.261 6.965 4.96e-12 ***
education_fieldlife_sciences -41.594 266.362 -0.156 0.875932
education_fieldmarketing -7.758 284.097 -0.027 0.978218
education_fieldmedical -47.951 267.372 -0.179 0.857693
education_fieldother -128.264 288.996 -0.444 0.657235
education_fieldtechnical_degree 43.573 279.769 0.156 0.876256
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1125 on 1454 degrees of freedom
Multiple R-squared: 0.9435, Adjusted R-squared: 0.9429
F-statistic: 1618 on 15 and 1454 DF, p-value: < 2.2e-16
The part
Call: lm(formula = salary ~ job_level + job_role + years_at_company + education_field, data = Employee_Data)
indicates the optimal variables that minimize AIC (Akaike Information
Criterion) using the stepwise method.
MASS package is not
installed.The stepwise method is a technique that automatically selects
variables to include in a predictive model based on statistical criteria
(AIC). While Exploratory’s UI does not have a direct stepwise method
menu, it can be performed using the R MASS package within
the Note feature.
This method allows for automatic optimal variable selection within Exploratory.