0% found this document useful (0 votes)
37 views7 pages

Demand Prediction Using Machine Learning Methods and Stacked Generalization

Uploaded by

mah.atoum1996
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views7 pages

Demand Prediction Using Machine Learning Methods and Stacked Generalization

Uploaded by

mah.atoum1996
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Demand Prediction Using Machine Learning Methods and Stacked

Generalization

Resul Tugay, Şule Gündüz Ögüdücü


Department of Computer Engineering, Istanbul Technical University,Istanbul, Turkey


{tugayr, sgunduz}@itu.edu.tr

Keywords: Stacked Generalization, Random Forest, Demand Prediction.

Abstract: Supply and demand are two fundamental concepts of sellers and customers. Predicting demand accurately
arXiv:2009.09756v2 [cs.LG] 1 Nov 2022

is critical for organizations in order to be able to make plans. In this paper, we propose a new approach for
demand prediction on an e-commerce web site. The proposed model differs from earlier models in several
ways. The business model used in the e-commerce web site, for which the model is implemented, includes
many sellers that sell the same product at the same time at different prices where the company operates a
market place model. The demand prediction for such a model should consider the price of the same product
sold by competing sellers along the features of these sellers. In this study we first applied different regression
algorithms for specific set of products of one department of a company that is one of the most popular online
e-commerce companies in Turkey. Then we used stacked generalization or also known as stacking ensemble
learning to predict demand. Finally, all the approaches are evaluated on a real world data set obtained from the
e-commerce company. The experimental results show that some of the machine learning methods do produce
almost as good results as the stacked generalization method.

1 INTRODUCTION describe methodology. In section 5 we describe our


experimental results and data definitions. In section 6
Demand forecasting is the concept of predicting the we conclude this work and future work.
quantity of a product that consumers will purchase
during a specific time period. Predicting right demand
of a product is an important phenomenon in terms of 2 RELATED WORK
space, time and money for the sellers. Sellers may
have limited time or need to sell their products as soon In literature, the research studies on demand forecast-
as possible due to the storage and money restrictions. ing can be grouped into three main categories: (1) Sta-
Therefore demand of a product depends on many fac- tistical Methods; (2) Artificial Intelligence Methods;
tors such as price, popularity, time, space etc. Fore- (3) Hybrid Methods.
casting demand is being hard when the number of fac- Statistical Methods: Linear regression, regression
tors increases. Demand prediction is also closely re- tree, moving average, weighted average, bayesian
lated with seller revenue. If sellers store much more analysis are just some of statistical methods for de-
product than the demand then this may lead to sur- mand forecasting (Liu et al., 2013). Johnson et al.
plus (Miller et al., 1988). On the other hand storing used regression trees to predict demand due to its
less product in order to save inventory costs when the simplicity and interpretability (Johnson et al., 2014).
product has high demand will cause less revenue. Be- They applied demand prediction on data given by an
cause of these and many more reasons, demand fore- online retailer web company named as Rue La La.
casting has become an interesting and important topic The web site consists of several events which are
for researchers in many areas such as water demand changing within 1-4 days interval from different de-
prediction (An et al., 1996), data center application partments. Each event has multiple products called
(Gmach et al., 2007) and energy demand prediction ”style” and each product has different items. Items
(Srinivasan, 2008). are typical products that have different properties such
The rest of the paper is organized as fol- as size and color. Because of the price is set at style
lows:Section 2 discusses related work and section 3 level, they aggregate items at style level and use dif-
Reviews product Quality/
shipping/Seller Reponse
etc.

Sellers
Web Site
Training Data
1 2 STACKED GENERALIZATION SYSTEM
N
orders a product 3
4

Customer

Shipping ß Nth Seller

WebSite ß Demand Prediction

Figure 1: General System Schema

ferent regression models for each department. Edi- ies where fuzzy logic is used for demand forecasting
ger and Akar used Autoregressive Integrated Moving (Aburto and Weber, 2007), (Thomassey et al., 2002),
Average (ARIMA) and seasonal ARIMA (SARIMA) (Vroman et al., 1998).
methods to predict future energy demand of Turkey Thus far, many researchers focused on different
from 2005 to 2020 (Ediger and Akar, 2007). Also Lim statistical, AI and hybrid methods for forecasting
and McAleer used ARIMA for travel demand fore- problem. But Islek and Gunduz Oguducu proposed
casting (Lim and McAleer, 2002). a state-of-art method that is based on stack gener-
Although basic implementation and simple interpre- alization on the problem of forecasting demand of
tation of statistical methods, different approaches are warehouses and their model decreased the error rate
applied for demand forecasting such as Artificial In- using proposed method (Islek, 2016). On the con-
telligence (AI) and hybrid methods. trary, in this paper we use different statistical meth-
AI Methods: AI methods are commonly used in lit- ods and compare their results with stack generaliza-
erature for demand forecasting due to their primary tion method which uses these methods as sub-level
advantage of being efficient and accurate (Chang and learners.
Wang, 2006), (Gutierrez et al., 2008), (Zhang et al.,
1998), (Yoo and Pimmel, 1999). Frank et al. used Ar-
tificial Neural Networks (ANN) to predict women’s
apparel sales and ANN outperformed two statistical 3 METHODOLOGY
based models (Frank et al., 2003). Sun et al. proposed
a novel extreme learning machine which is a type of This section describes the methodology and tech-
neural network for sales forecasting. The proposed niques used to solve demand prediction problem.
methods outperformed traditional neural networks for
sales forecasting (Sun et al., 2008). 3.1 Stacked Generalization
Hybrid Methods: Another method of forecasting
sales or demands is hybrid methods. Hybrid methods Stacked generalization (SG) is one of the ensemble
utilize more than one method and use the strength of methods applied in machine learning which use multi-
these methods. Zhang used ARIMA and ANN hybrid ple learning algorithms to improve the predictive per-
methodology in time series forecasting and proposed formance. It is based on training a learning algorithm
a method that achieved more accuracy than the meth- to combine the predictions of the learning algorithms
ods when they were used separately (Zhang, 2003). In involved instead of selecting a single learning algo-
addition to hybrid models, there has been some stud- rithm. Although there are many different ways to
implement stacked generalization, its primary imple- a penalty term is used to control the complexity of
mentation consists of two stages. At the first stage, all the model. For instance, lasso (least absolute shrink-
the learning algorithms are trained using the available age and selection operator) uses L1 norm penalty term
data. At this stage, we use linear regression, random and ridge regression uses L2 norm penalty term as
forest regression, gradient boosting and decision tree shown in Equation (3) and (4) respectively. λ is reg-
regression as the first-level regressors. At the second ularization parameter that prevents overfitting or con-
stage a combiner algorithm is used to make a final trols model complexity. In both Equation (3) and (4)
prediction based on all the predictions of the learning coefficients (β), dependent variables (γ) and indepen-
algorithms applied in the first stage. At this stage, we dent variables (X) are represented in matrix form.
use again same regression algorithms we used in the
1
first stage to specify which model is the best regressor β = argmin{ (γ − βX)2 + λ1 ||β||1 } (3)
for this problem. The general schema of stacked gen- n
eralization applied in this study can be seen in Figure 1
2. β = argmin{ (γ − βX)2 + λ2 ||β||22 } (4)
n
In this study, we used elastic net which is combina-
1st Prediction
tion of L1 and L2 penalty terms with λ = 0.8 for L1
Random Forest and λ = 0.2 for L2 penalty term with λ = 0.3 regular-
ization parameter as shown in Equation (5).
Gradient Boosted 2ndPrediction Last
Meta-Regressor
Trees Prediction 1
β = argmin{ (γ − βX)2 + λ(0.8||β||1 + 0.2||β||22 )}
Training
Data

rd
3 Prediction n
Decision Tree (5)

Linear Regression
4thPrediction
3.3 Decision Tree Regression

Decision trees (DT) can be used both in classification


Figure 2: Stacked Generalization and regression problems. Quinlan proposed ID3 al-
gorithm as the first decision tree algorithm (Quinlan,
1986). Decision tree algorithms classify both categor-
ical (classification) and numerical (regression) sam-
3.2 Linear Regression ples in a form of tree structure with a root node, in-
ternal nodes and leaf nodes. Internal nodes contain
A linear regression (LR) is a statistical method where one of the possible input variables (features) avail-
a dependent variable γ (the target variable) is com- able at that point in the tree. The selection of input
puted from p independent variables that are assumed variable is chosen using information gain or impurity
to have an influence on the target variable. Given a for classification problems and standard deviation re-
data set of n data points, the formula for a regression duction for regression problems. The leaves repre-
of one data point γi (regressand) is as follows: sent labels/predictions. Random forest and gradient
boosting algorithms are both decision tree based al-
γi = β j xi1 + ..β p xip + εi i = 1, 2, ..n (1)
gorithms. In this study, decision tree method is ap-
where β j is the regression coefficient that can be cal- plied for regression problems where variance reduc-
culated using Least Squares approach, xi j (regressor) tion is employed for selection of variables in the inter-
is the value of the jth independent variable and εi the nal nodes. Firstly, variance of root node is calculated
error term. The best-fitting straight line for the ob- using Equation 6, then variance of features is calcu-
served data is calculated by minimizing the loss func- lated using Equation 7 to construct the tree.
tion which is sum of the squares of differences be-
∑ni=1 (xi − µ)2
tween the value of the point γi and the predicted value σ2 = (6)
γˆi (the value on the line) as shown in Equation 2. n
In Equation 6, n is the total number of samples and
1 n
MSE = ∑ (γˆi − γi )2
n i=1
(2) µ is the mean of the samples in the training set. Af-
ter calculating variance of the root node, variance of
input variables is calculated as follows:
The best values of regression coefficients and the er-
ror terms can be found by minimizing the loss func-
σ2X = ∑ P(c)σ2c (7)
tion in Equation 2. While minimizing loss function, cεX
In Equation 7, X is the input variable and c’s are the tree is constructed with two nodes such as stock and
distinct values of this feature. For example, X : Brand price, whereas TREE N has four nodes and some of
and c : Samsung, Apple or Nokia. P(c) is the proba- them are different than the TREE 1.
bility of c being in the attribute X and σ2c is the vari-
ance of the value c. Input variable that has the mini- SELLER GRADE > 85
STOCK < 50
mum variance or largest variance reduction is selected
as the best node as shown in Equation 8:

NO

NO
S
S

YE
YE
vrX = σ2 − σ2X (8) 30
PRICE < 500 STOCKOUT PRICE < 750

Finally leaves are representing the average values of

NO

NO
S
S

YE
YE
NO
S
YE
instances that they include in subsection 3.4 with 60

20 70
bootstrapping method. This process continues recur- 100 50
FASTSHIP

sively, until variance of leaves is smaller than a thresh-

NO
S
YE
old or all input variables are used. Once a tree has 85 150

been constructed, new instance is tested by asking TREE 1 TREE N


questions to the nodes in the tree. When reaching a
leaf, value of that leaf is taken as prediction.
Figure 3: Random Forest
3.4 Random Forest
Due to bootstrapping sampling method, there is no
Random forest (RF) is a type of meta learner that uses need to use cross-validation or separate datasets for
number of decision trees for both classification and training and testing. This process is done internally.
regression problems (Breiman, 2001). The features In this project, minimum root mean squared error was
and samples are drawn randomly for every tree in the achieved by using random forest with 20 trees in the
forest and these trees are trained independently. Each first level.
tree is constructed with bootstrap sampling method.
Bootstrapping relies on sampling with replacement. 3.5 Gradient Boosted Trees
Given a dataset D with N samples, a training data set
of size N is created by sampling from D with replace- Gradient Boosted Trees (GBT) are ensemble learn-
ment. The remaining samples in D that are not in the ing of decision trees (Friedman, 2001). GBT are said
training set are separated as the test set. This kind of to be the combination of gradient descent and boost-
sampling is called bootstrap sampling. ing algorithms. Boosting methods aim at improv-
The probability of an example not being chosen in ing the performance of classification task by convert-
the dataset that has N samples is : ing weak learners to strong ones. There are multi-
1 ple boosting algorithms in literature (Oza and Rus-
Pr = 1 − (9) sell, 2001), (Grabner and Bischof, 2006), (Grabner
N
et al., 2006), (Tutz and Binder, 2006). Adaboost is
The probability of being in the test set for a sample is:
the first boosting algorithm proposed by Freund and

1 N
 Schapire (Freund et al., 1999). It works by weight-
Pr = 1 − ≈ exp−1 = 0.368 (10) ing each sample in the dataset. Initially all samples
N
are weighted equally likely and after each training it-
Every tree has a different test set and this set consists eration, misclassified samples are re-weighted more
of totally %63.2 of data. Samples in the test set are heavily. Boosting algorithms consist of many weak
called out-of-bag data. On the other hand, every tree learners and use weighted summation of them. A
has different features which are selected randomly. weak learner can be defined as a learner that performs
While selecting nodes in the tree, only a subset of the better than random guessing and it is used to compen-
features are selected and the best one is chosen as sep- sate the shortcomings of existing weak learners. Gra-
arator node from this subset. Then this process con- dient boosted trees uses gradient descent algorithm
tinues recursively until a certain error rate is reached. for the shortcomings of weak learners instead of us-
Each tree is grown independently to reach the speci- ing re-weighting mechanism. This algorithm is used
fied error rate. For instance, stock feature is chosen to minimize the loss function (also called error func-
as the best separator node among the other randomly tion) by moving in the opposite direction of the gra-
selected features, and likewise price feature is chosen dient and finds a local minimum. In literature, there
as second best node for the first tree in Figure 3. This are several different loss functions such as Gaussian
L2 , Laplace L1 , Binomial Loss functions etc (Natekin at the data preparation stages are determined by con-
and Knoll, 2013). Squared-error loss function, com- sulting with our contacts at the e-commerce company.
monly used in many regression problems, is used in
this project. 4.2 Evaluation Method
Let L(yi , F(xi )) be the loss function where yi is ac-
tual output and F(xi ) is the model we want to fit in. We used Root Mean Squared Error (RMSE) to evalu-
Our aim is to minimize J = ∑Ni=1 (yi − F(xi ))2 func- ate model performances. It is square root of the sum-
tion.By using gradient descent algorithm, mation of differences between actual and predicted
values. RMSE is frequently used in regression anal-
δJ ysis. RMSE can be calculated as shown in Equation
F(xi ) = F(xi ) − α (11)
δF(xi ) 12. γˆi ’s are predicted and γ’s are actual values.
In Equation 11, α is learning rate that accelerates r
or decelerates the learning process. If the learning ∑ni=1 (γˆi − γ)2
RMSE = (12)
rate is very large then optimal or minimal point may n
be skipped. If the learning rate is too small, more iter- We compared the results of the SG method with
ations are required to find the minimum value of the the results obtained by single classifiers. These clas-
loss function. While trees are constructed in parallel sifiers include DT, GBT, RF and LR. Firstly, we split
or independently in random forest ensemble learning, the data into training, validation and test sets (use %50
they are constructed sequential in gradient boosting. of data for training, %20 of data for validation and
Once all trees have been trained, they are combined the remaining part for testing) and trained first level
to give the final output. regressors using the training set. For SG, we applied
10-fold cross validation on the training set to get the
best model of the first level regressors (except random
4 EXPERIMENTAL RESULTS forest ensemble model). After getting the first level
regressor models, we used the validation set to create
second level of the SG model. The single classifiers
4.1 Dataset Definition are trained on the combined training and test sets. The
results of single classifiers and SG are evaluated us-
The data used in the experiments was provided from
ing the test set in terms of RMSE. This process can be
one of the most popular online e-commerce company
seen in Figure 4.
in Turkey. First, standard preprocessing techniques
are applied. Some of these techniques include filling
in the missing values, removal of missing attributes
4.3 Result and Discussion
when a major portion of the attribute values are miss-
In this section, we evaluate the proposed model using
ing and removal of irrelevant attributes. Each prod-
RMSE evaluation method. After calculating RMSE
uct/good has a timestamp which represents the date it
for single classifiers and SG, we applied analysis of
is sold consisting of year, month, week and day infor-
variances (ANOVA) test. It is generalized version of
mation. A product can be sold several times within
t-test to determine whether there are any statistically
the same day from both same and different sellers.
significant differences between the means of two or
The demands or sales of a product are aggregated
more unrelated groups. We use ANOVA test to show
weekly. While the dataset contains 3575 instances
and 17 attributes, only 1925 instances remained af-
ter the aggregation. Additionally, customers enter the
TRAINING SET (%50)

RANDOM FOREST
company’s website and choose a product they want.
GRADIENT BOOSTED
When they buy that product, this operation is inserted
DECISION TREE
into a table as an instance, but if they give up to buy,
DATASET

this operation is also inserted into another table. We LINEAR REGRESSION


VALIDATION

LINEAR REGRESSION
SET (%20)

used this information to find the popularity of the DECISION TREE

product/good. For instance, product A is viewed 100 GRADIENT BOOSTED


RANDOM FOREST
times and product B is viewed 55 times from both
TEST SET
(%30)

same and different users. It can be concluded that


product A is more popular than product B. Before ap-
1st Level 2nd Level
plying stacked generalization method, outliers were
removed, we only consider the products where de-
mand is less than 20. In this study, the parameters Figure 4: Stacking Process
Table 1: The Re- Table 2: The Best Re- Table 3: SG with Binary Table 4: SG with Triple
sults of Regres- sults of Single Classi- Combination of Regres- Combination of Regressors.
sors at Level 2. fiers and SG. sors.

Model RMSE Model RMSE Model RMSE Model RMSE


DT 2.200 DT 1.928 GBT+DT 1,955 DT+RF+GBT 1.894
GBT 2.299 GBT 1.918 GBT+LR 1,957 RF+DT+LR 1.909
RF 2.120 RF 1.865 LR+DT 1,962 GBT+LR+DT 2.011
LR 1.910 LR 2.708 DT+RF 1,963 LR+RF+GBT 1.864
SG(LR) 1.864 GBT+RF 1,870
LR+RF 1,927

that predictions of the models are statistically differ- statistically significantly different than RF.
ent. The training set is divided randomly into 20 dif-
ferent subsets, so that no subset contains the whole
training set. Using each of the different subsets and
the validation set, the SG model is trained and eval-
5 CONCLUSION AND FUTURE
uated on the test set. In the first level of the SG WORK
model, various combinations of the four algorithms
are used. We also conducted experiments with differ- In this paper, we examine the problem of demand
ent machine learning algorithms in the second level forecasting on an e-commerce web site. We proposed
of SG. For the single classifiers, the combination of stacked generalization method consists of sub-level
training and tests is divided randomly into 20 subsets, regressors. We have also tested results of single clas-
and the same evaluation process is also repeated for sifiers separately together with the general model. Ex-
these classifiers. We also run the proposed method periments have shown that our approach predicts de-
with different combinations of the first level regres- mand at least as good as single classifiers do, even
sors. better using much less training data (only %20 of
Table 1 shows the the average of the RMSE re- the dataset). We think that our approach will predict
sults of the SG when using different learning methods much better than other single classifiers when more
in the second level. As can be seen from the table, data is used. Because of the difference is not statisti-
LR outperforms the other learning methods. For this cally significant between the proposed model and ran-
reason, in the remaining experiments, the results of dom forest, the proposed method can be used to fore-
the SG model is given when using LR in the second cast demand due to its accuracy with less data. In the
level. Table 2 shows the best results of single classi- future, we will use the output of this project as part of
fiers and SG model obtained from 20 runs. The SG price optimization problem which we are planning to
model gives the best result when LR, RF and GBT work on.
are used in the first level. Table 3 shows the results
of binary combinations of the models. We found the
minimum RMSE as 1.870 by using GBT and RF to- ACKNOWLEDGEMENTS
gether in the first level. After using binary combina-
tion of the models in the first level, we also created The data used in this study was provided by n11
triple combination of them to specify the best combi- (www.n11.com). The authors are also thankful for the
nation. Table 4 shows results of triple combination of assistance rendered by İlker Küçükil in the production
models. of the data set.
In ANOVA test, the null hypothesis rejected with
5% significance level which shows that the predic-
tions of RF and LR are significantly better than others
in the first and second level respectively.
REFERENCES
After concluding RF and LR are statistically dif- Aburto, L. and Weber, R. (2007). Improved supply chain
ferent than other regressors at level 1 and 2 respec- management based on hybrid demand forecasts. Ap-
tively, we applied t-test again with α = 0.05 between plied Soft Computing, 7(1):136–144.
RF in the first level and LR in the second level. Result An, A., Shan, N., Chan, C., Cercone, N., and Ziarko, W.
of the t-test showed that LR in the second level is not (1996). Discovering rules for water demand predic-
tion: an enhanced rough-set approach. Engineering SIGKDD international conference on Knowledge dis-
Applications of Artificial Intelligence, 9(6):645–653. covery and data mining, pages 359–364. ACM.
Breiman, L. (2001). Random forests. Machine learning, Quinlan, J. R. (1986). Induction of decision trees. Machine
45(1):5–32. learning, 1(1):81–106.
Chang, P.-C. and Wang, Y.-W. (2006). Fuzzy delphi Srinivasan, D. (2008). Energy demand prediction using
and back-propagation model for sales forecasting gmdh networks. Neurocomputing, 72(1):625–629.
in pcb industry. Expert systems with applications, Sun, Z.-L., Choi, T.-M., Au, K.-F., and Yu, Y. (2008). Sales
30(4):715–726. forecasting using extreme learning machine with ap-
Ediger, V. Ş. and Akar, S. (2007). Arima forecasting of pri- plications in fashion retailing. Decision Support Sys-
mary energy demand by fuel in turkey. Energy Policy, tems, 46(1):411–419.
35(3):1701–1708. Thomassey, S., Happiette, M., Dewaele, N., and Castelain,
Frank, C., Garg, A., Sztandera, L., and Raheja, A. (2003). J. (2002). A short and mean term forecasting system
Forecasting women’s apparel sales using mathemati- adapted to textile items’ sales. Journal of the Textile
cal modeling. International Journal of Clothing Sci- Institute, 93(3):95–104.
ence and Technology, 15(2):107–125. Tutz, G. and Binder, H. (2006). Generalized additive mod-
Freund, Y., Schapire, R., and Abe, N. (1999). A short in- eling with implicit variable selection by likelihood-
troduction to boosting. Journal-Japanese Society For based boosting. Biometrics, 62(4):961–971.
Artificial Intelligence, 14(771-780):1612. Vroman, P., Happiette, M., and Rabenasolo, B. (1998).
Friedman, J. H. (2001). Greedy function approximation: a Fuzzy adaptation of the holt–winter model for tex-
gradient boosting machine. Annals of statistics, pages tile sales-forecasting. Journal of the Textile Institute,
1189–1232. 89(1):78–89.
Gmach, D., Rolia, J., Cherkasova, L., and Kemper, A. Yoo, H. and Pimmel, R. L. (1999). Short term load forecast-
(2007). Workload analysis and demand prediction of ing using a self-supervised adaptive neural network.
enterprise data center applications. In 2007 IEEE 10th IEEE transactions on Power Systems, 14(2):779–784.
International Symposium on Workload Characteriza- Zhang, G., Patuwo, B. E., and Hu, M. Y. (1998). Forecast-
tion, pages 171–180. IEEE. ing with artificial neural networks:: The state of the
Grabner, H. and Bischof, H. (2006). On-line boosting art. International journal of forecasting, 14(1):35–62.
and vision. In Computer Vision and Pattern Recog- Zhang, G. P. (2003). Time series forecasting using a hybrid
nition, 2006 IEEE Computer Society Conference on, arima and neural network model. Neurocomputing,
volume 1, pages 260–267. IEEE. 50:159–175.
Grabner, H., Grabner, M., and Bischof, H. (2006). Real-
time tracking via on-line boosting. In Bmvc, volume 1,
page 6.
Gutierrez, R. S., Solis, A. O., and Mukhopadhyay, S.
(2008). Lumpy demand forecasting using neural net-
works. International Journal of Production Eco-
nomics, 111(2):409–420.
Islek, I. (2016). Using ensembles of classifiers for demand
forecasting.
Johnson, K., Lee, B. H. A., and Simchi-Levi, D. (2014).
Analytics for an online retailer: Demand forecasting
and price optimization. Technical report, Technical
report). Cambridge, MA: MIT.
Lim, C. and McAleer, M. (2002). Time series forecasts
of international travel demand for australia. Tourism
management, 23(4):389–396.
Liu, N., Ren, S., Choi, T.-M., Hui, C.-L., and Ng, S.-F.
(2013). Sales forecasting for fashion retailing service
industry: a review. Mathematical Problems in Engi-
neering, 2013.
Miller, G. Y., Rosenblatt, J. M., and Hushak, L. J.
(1988). The effects of supply shifts on producers’ sur-
plus. American Journal of Agricultural Economics,
70(4):886–891.
Natekin, A. and Knoll, A. (2013). Gradient boosting ma-
chines, a tutorial. Frontiers in neurorobotics, 7:21.
Oza, N. C. and Russell, S. (2001). Experimental com-
parisons of online and batch versions of bagging
and boosting. In Proceedings of the seventh ACM

You might also like