I am trying to create a binary classifier that will assess if a photograph is aesthetically pleasing or not. However, despite various experiments, I am struggling to create something that even reaches 50% accuracy.
My dataset is downloaded from Flickr, I am using 240x180 color images. For classification, I am using a score
which is defined as follows:
score = log((stars + 1) / (views + 1), 2)
where stars
is a number of favorites for a given photo and views
is a number of its views. That gives me more-or-less normal distribution of the score (bottom-right graph):
The idea behind this is to get a metric on what percentage of viewers liked the photo. Next, I am classifying an image as 'good' or 'bad' when its score is above or below the mean for this metric across the whole dataset.
For learning the classifier, I am using my own TensorFlow implementation of DCNN. The network architecture is based on configuration A found in this paper, which was proven in a similar problem. I am using MSE loss function and Adam optimizer. So far I have downloaded over 1 milion photos from Flickr. After selecting images with proper resolution and above 100 views threshold, I am able to train my classifier on train set containing ~60 000 images.
However, my classifier achieves very bad results. The model is not converging, accuracy on both training and cross validation sets is oscillating around 55% or 45% (explanation lower), the error is not decreasing.
While debugging the code I noticed that on the first iteration my model is producing reasonable predictions for given examples, like 0.9 (the output layer is one sigmoid neuron which is supposed to give a probability that a photo is aesthetically pleasing). However, after only one iteration, my classifier is constantly predicting ones or zeroes for any example. Due to the fact that my dataset is not divided perfectly, this results in accuracy like 55% or 45%, depending on the value that is always predicted on the output layer.
Things I tried to improve the classifier:
- changing GradientDescent optimizer to AdamOptimizer,
- training the model on bigger training set (I started with 12k images, now I have 60k and constantly downloading more),
- implementing weight decay,
- trying two different DCNN architectures,
- changing the loss function to cross entropy or mean square.
Nothing improved the quality of predictions. From the very beginning, I expected quite poor performance, due to the nature of the problem, but it seems that something is completely wrong. I think I am missing something obvious, despite investigating the problem for last two weeks. If anyone could provide some guidance on where to look for flaws, I would be extremely grateful. The code could be viewed here.