-2

I trying to predict some fluid parameters, you will find the data I use in the drive link (24 input and 3 output to predict): DATA. first of all I replaced the null values ​​in the data with the median ,and as you can see I tried with this code but it gives me results nulls.

# Replace the null values ​​in the data with the median
name = input_train.select_dtypes(include = ["int64","float64"]).keys()
def change(x):
    if x==0:
        return x + name_median
    else:
        return x

for i in range(0,len(name)) :
  name_median = input_train[name[i]].median()
  input_train[name[i]] = input_train.apply(lambda row:change(row[name[i]]) , axis = 1)

for i in range(0,len(name)) :
  name_median = input_test[name[i]].median()
  input_test[name[i]] = input_test.apply(lambda row:change(row[name[i]]) , axis = 1)

# Create a PCA that will retain 95% of the variance
pca = decomposition.PCA(n_components=0.95, whiten=True)
# Conduct PCA
input_train = pca.fit_transform(input_train)
input_test = pca.fit_transform(input_test)

from sklearn.preprocessing import MinMaxScaler
X_scaler = MinMaxScaler()
Y_scaler = MinMaxScaler()
input_train = X_scaler.fit_transform(input_train)
output_train = Y_scaler.fit_transform(output_train)
input_test = X_scaler.fit_transform(input_test)
output_test = Y_scaler.fit_transform(output_test)

def create_model():
    model = Sequential()
    # Adding the input layer
    model.add(Dense(6, activation='relu', input_shape=(n_cols,)))
    # Adding the hidden layer
    model.add(Dense(6, activation='relu'))
    model.add(Dense(6, activation='relu'))  
    model.add(Dense(1, activation='relu'))
    # Compiling the RNN
    model.compile(optimizer='adam', loss='mean_absolute_percentage_error')
    return model

kf = KFold(n_splits = 10, shuffle = True)
Elasticity = create_model()
scores = []

# K=1
result = next(kf.split(input_train), None)
X_train = input_train[result[0]]
X_test = input_train[result[1]]
Y_train = output_train[result[0]]
Y_test = output_train[result[1]]
# Fitting the RNN to the Training set
Elasticity.fit(X_train, Y_train, epochs=300 ,batch_size=180 ,verbose=2)
predictions = Elasticity.predict(X_test) 
scores.append(Elasticity.evaluate(X_test, Y_test))
print(scores)
>>>54/54 [==============================] - 0s 628us/step
[100.0]

# Visualising Result
plt.figure
plt.plot(predictions, color='blue', label='Predicted results')
plt.plot(Y_test, color='red', label='Real results')
plt.title('Visualisation')
plt.xlabel('Batch')
plt.ylabel('Elasticity')
plt.legend()
plt.show()

enter image description here

SB_help08
  • 11
  • 2
  • We don't have your data. Then again, your model may correctly be outputting the functional of the density that minimizes the MAPE in expectation: [What are the shortcomings of the Mean Absolute Percentage Error (MAPE)?](https://stats.stackexchange.com/q/299712/1352) – Stephan Kolassa May 06 '19 at 13:10
  • you can find the data in the link I added, can you please tell me what should I do ? – SB_help08 May 06 '19 at 14:15

1 Answers1

0

A couple of things:

1) Both the PCA and the MinMaxScaler do not have to be fit on train and test data. Only fit them on train data and use the fitted transformer to transform both training and test data. Otherwise you might be using a slightly different transformation on both sets.

2) From the comment you seem to want to use a RNN (Recurrent Neural Network), but you are using Dense layers and a Sequential model and so you end up using a FeedForward NN whose layers are densely connected.

I suppose your network architecture was unable to learn much from the data you fitted it on (based on high MAPE score) but it's hard to know for sure without being able to reproduce it.

Jesús Ros
  • 408
  • 1
  • 4
  • 10