Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2006
…
1 page
1 file
A common practice in pre-processing step of hydrological modeling is to ignore observations with any missing variable values at any given time step even if it is only one of the independent variable that is missing. These rows of data are labeled incomplete and would not be used in either model building or subsequent testing and verification steps. This is not necessarily the best way of doing it as information is lost when incomplete rows of data are thrown out. Learning algorithms are affected by such problems more than physically-based models as they rely heavily on the data to learn the underlying input/output relationships. In this study, the extent of damage to the performance of the learning algorithm due to missing data is explored in a field-scale application. We have tested and compared the performance of two well- known learning algorithms, namely Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs) for short-term prediction of groundwater levels in a well...
Water Resources Research, 2007
1] A common practice in preprocessing of data for use in hydrological modeling is to ignore observations with any missing variable values at any given time step, even if it is only one of the independent variables that is missing. In most cases, these rows of data are labeled incomplete and would not be used in either model building or subsequent model testing and verification. We argue that this is not necessarily an optimal approach for dealing with missing data because significant information could be lost when incomplete rows of data are discarded. Learning algorithms are affected by such problems more than physically based models because they rely heavily on data to learn the underlying input/output relationships of the systems being modeled. In this study, the extent of damage to the performance of learning algorithms due to missing data is explored in a field-scale application. To do so, we employed two well-known learning algorithms, namely artificial neural networks (ANNs) and support vector machines (SVMs) for short-term prediction of groundwater levels at a well field. Performance comparison is made by subjecting these algorithms to various levels of missing data. In addition to understanding the relative strengths of these algorithms in dealing with missing data, an approach for filling the data gaps in the form of an imputation methodology is proposed and tested against observed data. The utility of the current approach is further demonstrated by analyzing model runs obtained with and without imputed data. It is shown that as the percentage of missing data increases, the forecasting accuracy of ANNs is compromised more than that of SVMs. However, ANNs also derive the greater benefit from the use of imputed data. Citation: Gill, M. K., T. Asefa, Y. Kaheil, and M. McKee (2007), Effect of missing data on performance of learning algorithms for hydrologic predictions: Implications to an imputation technique, Water Resour. Res., 43, W07416,
International Journal of Engineering Research and, 2021
Missing data has been a common problem and has been confronted by many researchers in the field of hydrology. Rainfall and Temperature time series data are often found missing and such missingness have huge implication on hydrological modelling, flood frequency analysis, trend analysis and dam operation schemes. Owing to the presence of missing data it hinders the performance analysis of the data and inhibits in concluding the correct inferences from the data. In this study, missing data in the rainfall and temperature has been imputed using kNN model and Tree-based model and subsequently these imputed data have been used as predictors to predict the river flow data using Artificial Neural Network (ANN). Uncertainty from kNN imputation model has been found with bootstrapping techniques, while the tree based and ANN model were assessed by Root Mean Square Error (RMSE) and Mean Absolute Error
International Journal of Engineering Research and Technology (IJERT), 2021
https://www.ijert.org/imputing-missing-data-in-hydrology-using-machine-learning-models https://www.ijert.org/research/imputing-missing-data-in-hydrology-using-machine-learning-models-IJERTV10IS010011.pdf Missing data has been a common problem and has been confronted by many researchers in the field of hydrology. Rainfall and Temperature time series data are often found missing and such missingness have huge implication on hydrological modelling, flood frequency analysis, trend analysis and dam operation schemes. Owing to the presence of missing data it hinders the performance analysis of the data and inhibits in concluding the correct inferences from the data. In this study, missing data in the rainfall and temperature has been imputed using kNN model and Tree-based model and subsequently these imputed data have been used as predictors to predict the river flow data using Artificial Neural Network (ANN). Uncertainty from kNN imputation model has been found with bootstrapping techniques, while the tree based and ANN model were assessed by Root Mean Square Error (RMSE) and Mean Absolute Error (MAE).
Journal of Hydroinformatics, 2011
Missing values are a common problem faced in the analysis of hydrometric data. The need for complete hydrological data, especially hydrometric data for planning, development and designing hydraulic structures, has become increasingly important. Reasonably estimating these missing values is significant for the complete analysis and modeling of the hydrological cycle. The major objective of this paper is to estimate the missing annual maximum hydrometric data by using artificial neural networks (ANN). Sixteen stations, with 28 years of measurements, in the catchment area of the Sefidroud watershed in the north of Iran were selected for this investigation. Comparison between the results of ANN and the nonlinear regression method (NLR) illustrated the efficiency of artificial neural networks and their ability to rebuild the missing data. According to the coefficient of determination (R 2) and the root mean squared value of error (RMSE), it was concluded that ANN provides a better estimation of the missing data.
Water
This paper analyzes the potential of a nu-support vector regression (nu-SVR) model for the reconstruction of missing data of hydrological time series from a sensor network. Sensor networks are currently experiencing rapid growth of applications in experimental research and monitoring and provide an opportunity to study the dynamics of hydrological processes in previously ungauged or remote areas. Due to physical vulnerability or limited maintenance, networks are prone to data outages, which can devaluate the unique data sources. This paper analyzes the potential of a nu-SVR model to simulate water levels in a network of sensors in four nested experimental catchments in a mid-latitude montane environment. The model was applied to a range of typical runoff situations, including a single event storm, multi-peak flood event, snowmelt, rain on snow and a low flow period. The simulations based on daily values proved the high efficiency of the nu-SVR modeling approach to simulate the hydrological processes in a network of monitoring stations. The model proved its ability to reliably reconstruct and simulate typical runoff situations, including complex events, such as rain on snow or flooding from recurrent regional rain. The worst model performance was observed at low flow periods and for single peak flows, especially in the high-altitude catchments.
2021
Computational methods based on machine learning have had extensive development and application in hydrology, especially for modelling systems that do not have enough data. Within this problem, there are data series that are missing, and that should not necessarily be discarded; this is achieved by means of the imputation of the same ones, obtaining complete sets. For this reason, this research proposes a comparison of computer-learning techniques to identify those best suited for hydrographic systems of the Pacific of Ecuador. For the elaboration of this investigation, the hydro-meteorological records of the monitoring stations located in the watersheds of the Esmeraldas, Cañar and Jubones Rivers were used for 22 years, between 1990 and 2012. The variables that were imputed were precipitation and flow. Automatic learning machines of the Python Scikit_Learn module were used; these modules integrate a wide range of automated learning algorithms, such as Linear Regression and Random Forest. Finally, results were obtained that led to a minimum useful mean square error for Random Forest as an automatic machine-learning imputation method that best fits the systems and data analyzed.
Frontiers in Water
With the growing use of machine learning (ML) techniques in hydrological applications, there is a need to analyze the robustness, performance, and reliability of predictions made with these ML models. In this paper we analyze the accuracy and variability of groundwater level predictions obtained from a Multilayer Perceptron (MLP) model with optimized hyperparameters for different amounts and types of available training data. The MLP model is trained on point observations of features like groundwater levels, temperature, precipitation, and river flow in various combinations, for different periods and temporal resolutions. We analyze the sensitivity of the MLP predictions at three different test locations in California, United States and derive recommendations for training features to obtain accurate predictions. We show that the use of all available features and data for training the MLP does not necessarily ensure the best predictive performance at all locations. More specifically, river flow and precipitation data are important training features for some, but not all locations. However, we find that predictions made with MLPs that are trained solely on temperature and historical groundwater level measurements as features, without additional hydrological information, are unreliable at all locations.
2023
Over two billion individuals worldwide rely on subterranean water as their primary reservoir of clean water. Ensuring the sustainable management of this heavily burdened resource necessitates a comprehensive quantitative evaluation of groundwater reserves. This becomes even more critical as water resources face escalating demands resulting from socioeconomic growth, population expansion, and the impacts of climate change. This research paper undertakes an extensive investigation in the context of a special issue dedicated to the utilization of machine learning (ML) algorithms for modeling and predicting groundwater levels (GWL). It offers a concise overview of prevalent Machine Learning(ML) techniques, encompassing their general architecture, key hyper-parameters, methods for fine-tuning, and strategies for optimal feature selection. Drawing insights from the scrutiny of 170 research papers across three prominent online databases, our findings indicate that well-constructed machine-learning models exhibit a commendable capacity for accurately modeling and predicting groundwater levels. Based on our review we realized that the utilization of machine learning to model GWLs is quite common. Typically, past groundwater levels are used as input data, and artificial neural networks (ANN) are a popular choice for this purpose. Our review of existing research provides a useful guide for researchers interested in applying machine learning algorithms for groundwater level modeling and forecasting. We also suggest new methods to improve modeling quality and highlight areas for future research in this field.
Fatma Trabelsi, 2022
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY
2016
Accurate prediction of missing hydro-meteorological data is crucial in planning, design, development and management of water resources systems. In the present research, prediction of such data using Artificial Neural Networks (ANN) based on temporal and spatial auto-correlation has been conducted for upper Tana River basin in Kenya. Different ANN models were formulated using a combination of numerous data delays in the ANN input layer. The findings show that the best models comprise of a feed-forward neural network trained on Levenberg-Marquardt algorithm with single hidden layer. Additionally, the best ANN architecture model for predicting missing stream flow data was at gauge station 4CC03 with correlation coefficient and MSE of 0732 and 0.242 respectively during validation. Temporal auto-correlation of the observed and the predicted stream flow values were evaluated using a correlation coefficient R that resulted to highest value of 0.756 at gauge station 4AB05. The best ANN mode...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Water Resources Research, 2005
Sustainability, 2021
Revista Brasileira de Recursos Hídricos, 2023
Water Resources Management, 2007
Water Resources Research, 2020
Agriculture and Forestry, 2022
Expert Systems with Applications, 2009
Journal of the American Water Resources Association, 2007
Journal of The American Water Resources Association, 1998
Algorithms, 2020