I am new in the field of machine learning. So this question may sound silly. We usually use $sigmoid$ in output layer for binary classification. In my experiments, I found that $tanh$ gives higher accuracy and lower binary cross entropy loss if $sigmoid$ is replaced with $tanh$ in the output layer. Can someone please explain the possible reason? I am using labels as $0$ and $1$.
The code is shown below. I am using Keras with TensorFlow in the backend.
input_shape = (200, )
left_input = Input(input_shape)
right_input = Input(input_shape)
model = Sequential()
model.add(Dense(200,input_dim=200,kernel_initializer='glorot_uniform',bias_initializer='zeros'))
model.add(Activation('tanh'))
model.add(Dropout(0.1))
model.add(Dense(200,input_dim=200,kernel_initializer='glorot_uniform',bias_initializer='zeros'))
model.add(Activation('tanh'))
model.add(Dropout(0.1))
x1 = model(left_input)
x2 = model(right_input)
dotted = Dot(axes=1,normalize=True)([x1, x2])
out = Dense(1,activation='sigmoid',kernel_initializer='glorot_uniform',bias_initializer='zeros')(dotted)
siamese = Model(inputs=[left_input, right_input], outputs=out)
siamese.compile(loss='binary_crossentropy', optimizer='Adagrad', metrics=['accuracy'])