ANN resulted in less than 0.4 R-squared value. Is my methodology correct?

Question

I am using ANN for solving Regression problem. But it turns out that ANN is performing badly. I have a daily data (2000 - 2018) which contains 6400 rows and 5 variables (including 2 target). This is how my data looks like:

   Rain Discharge  Level Discharge0 Level0
1  9.27     0.226 791.12    65.4600   1.08
2 17.62     0.433 791.18   209.6000   1.98
3 10.01     0.791 791.25   129.3000   1.50
4  4.61     0.988 791.28    60.3000   1.07
5 10.37     0.822 791.25    42.7400   0.98
6 46.25     9.564 791.78   173.0624   1.67

Here, I am trying to predict Discharge0 and Level0, one day ahead (in future). This is what I have tried:

data <- data.frame(Rain, Discharge, Level, Discharge0, Level0)
data1 <- data

# slide function can be used to take leads (for predicting one day ahead)

data1 <- slide(data1, Var = "df2.Discharge", slideBy = 1)
data1 <- slide(data1, Var = "df2.Level", slideBy = 1)

# Last value of both variables is now NA (as one day lead is taken), it can be removed    
data1 <- na.omit(data1)
data1 <- data1[-c(4, 5)] # removing Discharhe0 and Level0.

I have taken lead of one day for both Discharge and Level. This is how final data looks like on which model is applied:

   Rain Dischrge  Level   Level1 Discharge1
1  9.27    0.226 791.12 209.6000       1.98
2 17.62    0.433 791.18 129.3000       1.50
3 10.01    0.791 791.25  60.3000       1.07
4  4.61    0.988 791.28  42.7400       0.98
5 10.37    0.822 791.25 173.0624       1.67
6 46.25    9.564 791.78 352.5326       2.55

Now, the targets are Discharge1 and Level1 (they have one day lead).

# 1:5844 contains data from 1st june, 2000 to 31st May, 2016

index <- 1:5844
datatrain = data1[index, ]
datatest = data1[-index, ]

# Scaling data between 0 and 1.

max = apply(data1 , 2 , max)
min = apply(data1, 2 , min)
scaled = as.data.frame(scale(data1, center = min, scale = max - min))

# Splitting into training and testing

train = scaled[index , ]
test = scaled[-index , ]

# Applying neuralnet function

NN = neuralnet(Discharge1 + Level1 ~ rain + Discharge + Level, train, hidden = c(16, 8, 4), threshold = 0.05)
plot(NN)

predict_test = neuralnet::compute(NN, test[, c(1, 2, 3)])
# Unscaling: converting back to original range

predict_test1 = (predict_test$net.result[, 1] * (max(data1$Discharge1) - min(data1$Discharge1))) + min(data1$Discharge1)
predict_test2 = (predict_test$net.result[, 2] * (max(data1$Level1) - min(data1$Level1))) + min(data1$df2.Level1)

# Plotting acutal observations v/s real predictions

plot(datatest$Discharge1, predict_test1, col='blue', pch=16, ylab = "Predicted Discharge", xlab = "Real Discharge")
title("Predicted Discharge v/s Real Discharge"); abline(0, 1, col = "black")

plot(datatest$Level1, predict_test2, col='blue', pch=16, ylab = "Predicted Level", xlab = "Real Level") title("Predicted Level v/s Real Level"); abline(0, 1, col = "black")

The model performs badly. Here are evaluation parameters:

For Discharge:

R-Squared -> 0.36
NSE       -> 0.30
RMSE      -> 121.96

For Level:

R-Squared -> 0.39
NSE       -> 0.29
RMSE      -> 0.43

I am not able to understand what I am doing wrong. It would be great if someone can help me.

@Sycorax I want to ask that is my methodology correct? Please have a look at it. Thank you. — Amish Sharma, Jun 27 '20 at 16:47
It's impossible to say. You haven't described your problem in specific terms which would allow anyone to know what methodology would be appropriate or inappropriate. All I can say for sure is that you want a neural network to do better, so the duplicate seems like a good place to start. Perhaps you could edit your question to add more detail about what you're trying to do and why you believe your methodology might be incorrect for that purpose. See also https://stats.stackexchange.com/help/how-to-ask — Sycorax, Jun 27 '20 at 17:37

ANN resulted in less than 0.4 R-squared value. Is my methodology correct?

0 Answers0