I am using ANN for solving Regression problem. But it turns out that ANN is performing badly. I have a daily data (2000 - 2018) which contains 6400 rows and 5 variables (including 2 target). This is how my data looks like:
Rain Discharge Level Discharge0 Level0
1 9.27 0.226 791.12 65.4600 1.08
2 17.62 0.433 791.18 209.6000 1.98
3 10.01 0.791 791.25 129.3000 1.50
4 4.61 0.988 791.28 60.3000 1.07
5 10.37 0.822 791.25 42.7400 0.98
6 46.25 9.564 791.78 173.0624 1.67
Here, I am trying to predict Discharge0 and Level0, one day ahead (in future). This is what I have tried:
data <- data.frame(Rain, Discharge, Level, Discharge0, Level0)
data1 <- data
# slide function can be used to take leads (for predicting one day ahead)
data1 <- slide(data1, Var = "df2.Discharge", slideBy = 1)
data1 <- slide(data1, Var = "df2.Level", slideBy = 1)
# Last value of both variables is now NA (as one day lead is taken), it can be removed
data1 <- na.omit(data1)
data1 <- data1[-c(4, 5)] # removing Discharhe0 and Level0.
I have taken lead of one day for both Discharge and Level. This is how final data looks like on which model is applied:
Rain Dischrge Level Level1 Discharge1
1 9.27 0.226 791.12 209.6000 1.98
2 17.62 0.433 791.18 129.3000 1.50
3 10.01 0.791 791.25 60.3000 1.07
4 4.61 0.988 791.28 42.7400 0.98
5 10.37 0.822 791.25 173.0624 1.67
6 46.25 9.564 791.78 352.5326 2.55
Now, the targets are Discharge1 and Level1 (they have one day lead).
# 1:5844 contains data from 1st june, 2000 to 31st May, 2016
index <- 1:5844
datatrain = data1[index, ]
datatest = data1[-index, ]
# Scaling data between 0 and 1.
max = apply(data1 , 2 , max)
min = apply(data1, 2 , min)
scaled = as.data.frame(scale(data1, center = min, scale = max - min))
# Splitting into training and testing
train = scaled[index , ]
test = scaled[-index , ]
# Applying neuralnet function
NN = neuralnet(Discharge1 + Level1 ~ rain + Discharge + Level, train, hidden = c(16, 8, 4), threshold = 0.05)
plot(NN)
predict_test = neuralnet::compute(NN, test[, c(1, 2, 3)])
# Unscaling: converting back to original range
predict_test1 = (predict_test$net.result[, 1] * (max(data1$Discharge1) - min(data1$Discharge1))) + min(data1$Discharge1)
predict_test2 = (predict_test$net.result[, 2] * (max(data1$Level1) - min(data1$Level1))) + min(data1$df2.Level1)
# Plotting acutal observations v/s real predictions
plot(datatest$Discharge1, predict_test1, col='blue', pch=16, ylab = "Predicted Discharge", xlab = "Real Discharge")
title("Predicted Discharge v/s Real Discharge"); abline(0, 1, col = "black")
plot(datatest$Level1, predict_test2, col='blue', pch=16, ylab = "Predicted Level", xlab = "Real Level")
title("Predicted Level v/s Real Level"); abline(0, 1, col = "black")
The model performs badly. Here are evaluation parameters:
For Discharge:
R-Squared -> 0.36
NSE -> 0.30
RMSE -> 121.96
For Level:
R-Squared -> 0.39
NSE -> 0.29
RMSE -> 0.43
I am not able to understand what I am doing wrong. It would be great if someone can help me.