0% found this document useful (0 votes)
4 views13 pages

M5 Dataset Model

The document outlines a process for forecasting sales at Walmart stores using the M5 dataset, employing both ARIMA and LightGBM models. It details steps including data loading, processing, and model training, along with performance metrics like Mean Absolute Error (MAE). Key takeaways emphasize the importance of accurate demand forecasting for inventory management and supply chain efficiency.

Uploaded by

pranilbanoth12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views13 pages

M5 Dataset Model

The document outlines a process for forecasting sales at Walmart stores using the M5 dataset, employing both ARIMA and LightGBM models. It details steps including data loading, processing, and model training, along with performance metrics like Mean Absolute Error (MAE). Key takeaways emphasize the importance of accurate demand forecasting for inventory management and supply chain efficiency.

Uploaded by

pranilbanoth12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

ML MODEL FOR FORECASTING SALES OF DIFFERENT STORES OF WALMART

USING M5 DATASET.

STEP 1: IMPORT NECESSARY LIBRARIES


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

STEP 2: LOAD THE DATASET


df_calendar = pd.read_csv('calendar.csv')
df_price = pd.read_csv('sell_prices.csv')
df_sales = pd.read_csv('sales_train_validation.csv')

STEP 3: SINCE WE ARE FORECATING SALES


df_sales
STEP 4: SAMPLE DATA IS TAKEN FROM (df_sales) AND PLOT TO CHECK THE FLUCTUATIONS IN
SALES OVER TIME.

df_sample = df_sales.iloc[3, :]
series_sample = df_sample.iloc[6:]
df_sample
STEP 5 : SUMMARIZE THE SALES DATA BY GROUPING IT BY STORE_ID AND CALCULATING
THE TOTAL SALES FOR EACH STORE ACROSS ALL PRODUCTS AND TIME PERIODS.

df_sales_total_by_store = df_sales.groupby(['store_id']).sum()
df_store_sales = df_sales_total_by_store.iloc[:,5:]
df_store_sales
STEP 6 : PROCESSING DATA AND CHECKING WHEATHER DATA IS STATIONARY OR NON
STATIONARY

Using Adfuller Statistics Model – ADF Statistics -2.035408


P value 0.271267.
If p value is <0.05 null hypothesis is rejected and which determines given series of data is
stationery.
As we can see the p value initial is >0.05
def difference(dataset , interval=1): Differencing Method
diff=list() is used to convert
for i in range(interval , len(dataset)): non stationary data
value= dataset[i]-dataset[i-interval] to stationary
diff.append(value)
return np.array(diff)
series_d1 = difference(series)
results = adfuller(series_d1)

STEP 7: AUTO CORRELATION MATRIX IS PLOTTED TO CHECK SEASONALITY IN TIME SERIES


DATA.
plot_acf(series, lags = 730, use_vlines = True)
plt.show()
1) DIFFERENCE(SERIES,INTERVAL=7) WEEKLY DIFFERENCING
series_d7 = difference(series, 7)

2) Difference(Series_D7_D30, Interval=30) Monthly Differencing


series_d7_d30 = difference(series_d7, 30)
STEP 8: USING THIS PROCESSED STATIONARY DATA ARIMA MODEL IS BULIT (TRADITIONAL
METHOD)

from statsmodels.tsa.arima.model import ARIMA


model = ARIMA(series_d7_d30, order=(1, 0, 1))
model_fit = model.fit()

# Forecast the next 28 time steps


forecast_steps = 28
forecast = model_fit.forecast(steps=forecast_steps)
Final Forecasted Values (Original Scale):
[10433.79676477 8465.26387334 7627.39493797 7497.54887916 7726.54061899 9653.54106222 11096.54103843 16311.33780448 14961.80491298 12439.93597762 11561.08991881 11691.08165864
13309.08210186 15816.08207808 22423.87884412 20931.34595262 16924.47701726 15140.63095845 15169.62269828 16946.62314151 19798.62311772 27861.41988377 26885.88699227
21270.01805691 18934.1719981 18892.16373793 20656.16418115 24186.16415737]

FORECASTED VALUES ARE THEN SCALED TO ORIGINAL VALUES BY INVERSE_DIFFERENCE


METHOD
final_forecast = np.array(final_forecast)

from sklearn.metrics import mean_absolute_error


test_data = series[-forecast_steps:] # Last 28 days of the original series
mae = mean_absolute_error(test_data, final_forecast)
print(f"Mean Absolute Error (MAE): {mae}")
Mean Absolute Error (MAE): 1156.18687017154

ONE STEP FORECASTING METHOD LightGBM :


STEP 1: CREATE X,Y VARIABLES
def create_xy(series, window_size, prediction_horizon, shuffle = False):
x=[]
y=[]
for i in range(0, len(series)):
if len(series[(i + window_size):(i + window_size + prediction_horizon)]) < prediction_horizon:
break
x.append(series[i:(i + window_size)])
y.append(series[(i + window_size):(i + window_size + prediction_horizon)])
x = np.array(x)
y = np.array(y)
return x,y

HYPER PARAMETERS
window_size = 365 (1 YEAR == 365 DAYS )
prediction_horizon = 1 -ONE STEP FORECASTING HENCE PREDICTION
HORIZON=1
TRAIN TEST SPLIT
test_size = 28
split_time = len(series) - test_size

train_series = series[:split_time]
test_series = series[split_time - window_size:]

train_x, train_y = create_xy(train_series, window_size, prediction_horizon)


test_x, test_y = create_xy(test_series, window_size, prediction_horizon)

train_y = train_y.flatten()
test_y = test_y.flatten()

FIT THE MODEL WITH X_TEST AND VALIDATE WITH Y_TEST


import lightgbm as lgb
params = {
'n_estimators': 2000,
'max_depth': 4,
'num_leaves': 2**4,
'learning_rate': 0.1,
'boosting_type': 'dart'
}
model = lgb.LGBMRegressor(first_metric_only = True, **params)
model.fit(train_x, train_y,
eval_metric = 'l1',
eval_set = [(test_x, test_y)] )

MULTI STEP FORECATING

In recursive forecasting, we first train a one-step model then generate a multi-step forecast by
recursively feeding our predictions back into the model.

Recursive MAE: 214.8020


Direct MAE: 233.6326
Combination MAE: 217.0313
One-Step MAE: 200.5037
Multi-Step MAE: 214.8020

plt.rcParams['figure.figsize'] = [5, 5]
lgb.plot_importance(model, max_num_features = 15, importance_type = 'split')
pl.show()

LightGBM provides feature importance scores, which help identify the most influential features in
the model.
CONCLUSION OF THIS MODEL (KEY TAKE AWAYS ):
 Share accurate demand forecasts with suppliers and distributors to reduce information
asymmetry.
 This helps align orders with actual consumer demand, minimizing amplification
 Use forecasts to adjust inventory levels dynamically, avoiding overstocking or stockouts.
 Use multi-step forecasts to plan production and delivery schedules in advance.
 4Shorter lead times reduce the need for large safety stocks, which can amplify the bullwhip
effect.

You might also like