I am experimenting on a neural network model I found on Kaggle for Titanic dataset where the problem statement is to determine whether a person has survived or not.
The input I am providing is of this type:
Pclass Age Fare Sex_female Sex_male
789 1 46.000000 79.2000 0 1
543 2 32.000000 26.0000 0 1
109 3 29.699118 24.1500 1 0
111 3 14.500000 14.4542 1 0
726 2 30.000000 21.0000 1 0
... ... ... ... ... ...
559 3 36.000000 17.4000 1 0
648 3 29.699118 7.5500 0 1
556 1 48.000000 39.6000 1 0
302 3 19.000000 0.0000 0 1
786 3 18.000000 7.4958 1 0
This is the model I found(changed input_shape, but everything else is similar):
model = Sequential()
model.add(Dense(units = 32, input_shape = (5,), activation = 'relu'))
model.add(Dense(units = 64, activation = 'relu', kernel_initializer = 'he_normal', use_bias = False))
model.add(tf.keras.layers.BatchNormalization())
model.add(Dense(units = 128, activation = 'relu',kernel_initializer = 'he_normal', use_bias = False))
model.add(Dropout(0.1))
model.add(Dense(units = 64, activation = 'relu',kernel_initializer = 'he_normal', use_bias = False))
model.add(Dropout(0.1))
model.add(Dense(units = 32, activation = 'relu'))
model.add(Dropout(0.15))
model.add(Dense(units = 16, activation = 'relu'))
model.add(Dense(units = 8, activation = 'relu',kernel_initializer = 'he_normal', use_bias = False))
model.add(Dense(units =1 , activation = 'sigmoid'))
I used the following:
model.compile(optimizer=SGD(lr=0.1),
loss='binary_crossentropy',
metrics=['binary_accuracy'])
The doubt I had was when I replace the sigmoid
to softmax
, the accuracy takes about a 30% hit. As far as my understanding goes, isn't softmax better when there are 2 classes, such as survived or not in this case and each row can fall into only one class, not both?