Accurate
forecasting of rainfall has been one of the most important issues
in hydrological research, because early warnings of severe weather,
made possible by timely and accurate forecasting can help prevent
casualties and damages caused by natural disasters. The intricacy
of the atmospheric processes that generate rainfall makes the physical
modeling of rainfall overly parameterized. A possible solution
to this is to construct the rainfall forecasting system based on
the data, so that the sub-processes of rainfall can be captured
more accurately. In this study we have proposed hybrid ensemble
modeling framework where multiple machine learning models are built
on inputs selected by statistical criterion.
The main objectives of this research are two-fold; firstly to investigate
the possibilities and different architectures of integrating the
usual statistical techniques with computationally intelligent models
for the purpose of rainfall forecasting. Then test these forecasting
models on different case studies. In our approach we construct the
forecasting model following ''hybrid ensemble modeling'' framework.
The term hybrid characterizes the heterogeneity in the design of
the constructed ensemble model. The construction of the ensemble
can be generalized in three steps. In a nutshell, first we select
inputs then construct the sub-models/component models on the selected
inputs and then select sub-models based on statistical relevance
with the outputs to construct the ensemble. To maximize the diversity
within the sub-models of the ensemble, we have used expert models
from different domains. We have used Stepwise linear regression (SLR),
Multivariate adaptive regression spline (MARS) from Statistics, Artificial
neural network (ANN) and Support vector regression (SVR) from Machine
learning back ground. The main criterion for using expert models
from different areas is to capture the complex process of rainfall
as accurately as possible.
In the first step of the ensemble, an input selection technique is
used to select appropriate inputs/variables. A statistical criterion,
linear correlation analysis (LCA) and an information theoretic criterion,
Average mutual information (AMI) give similar results in selecting
the inputs. In the second step, the sub-models are trained on the
selected inputs. Adaptive training strategy is used to train the
models; in this training past outcomes are taken into account to
train the models. Finally, the constructed sub-models are ranked
and then selected to construct the ensemble model. For ranking of
the sub-models, one statistical variable selection method, Least
angle regression (LARS) and one information theoretic measure, Mutual
information (MI) is utilized. For faster implementation of the MI
based ranking, a projection technique, Independent component analysis
(ICA) is used. The accuracy of the higher ranked models is then checked
on the basis of L2 loss function. In this way the ensemble is constructed
with sub-models with higher accuracy and better conformity with the
original outputs.
The hybridized ensembles are applied in two rainfall series of Japan
and India. The experimental results show the advantage the hybridization
of combination scheme of models. This thesis contributes to hydrological
rainfall forecasting and we hope its findings can be used in building
more effective rainfall and flood forecasting systems. |
|
|
|
|
Forecast Combination, Ensemble Approach,
One-step ahead Forecasting, Extreme Rainfall event.
|
|
|