is it scientifically correct to label data by model built using golden data?

Question

I am trying to find a labeled dataset for users profiles pictures with their personality traits scores. Unfortunately, I did not find any and therefore, I decided to crawl twitter for public users profile pictures with their tweets. At that moment, I already had very good personality models that been trained on golden data of 100k users at facebook reported their personality after they answered 100-item personality-related questions. The final models are accurate and published in A* IEEE proceedings.

At this moment, I used the crawled twitter users tweets to predict their personality at the models we already have, and this gives me the ability to label their profile picture with the personality scores we predicted.

Later on, I extracted from the crawled Twitter users profile picture 50 facial features where I am correlating them using Pearson with the predicted personality scores in order to use the significantly correlated features to build another personality prediction models solely by analyzing the user profile picture.

So far, I have been with a conversation with many experienced researchers and some agree with what I am doing and some disagree, where the agreed researchers say that as long as I am finding correlated facial features with personality scores it should be scientifically correct. So I am kindly asking whether the methodology that I am following is scientifically correct or not? as many already tried to avoid answering my question.

A small diagram might help to quickly oversee what you are doing. You somehow seem to be applying a model for the relation between person personality traits and person facial features (obtained from Facebook images and interview data, which sounds rigorous) on data from twitter. But, your end result is not entirely clear to me. — Sextus Empiricus, Mar 15 '19 at 17:44
I am not sure what you mean by scientifically correct. If it works on a well designed test set, it is good enough. — lcrmorin, Mar 21 '19 at 17:30

score 3 · Answer 1 · answered Mar 15 '19 at 11:59

This could be a reasonable approach, if it is credible that facial features could correlate with personality traits and if the method for deriving personality traits from twitter posts is reliable. I'd guess a lot of people would rightly ask a lot of questions on these points.

You would have to be very cautious though that

whatever produces your facial features does not accidentally pick up subtle other photo traits (e.g. vain people have higher quality photos, people in warmer countries are more likely to use outdoors photos, politicians/business executives wear suits, sports people wear sports outfits etc.) that you would mistake for facial properties (see the controversy on identifying sexual orientation from dating portal photos),
the personality model really works (being published/peer-reviewed/whatever at some respected venue of course sounds nice, but is just one signal for a certain degree of credibility),
really works on Twitter posts (even if it has worked well in the past for Facebook posts, there might be a big question whether something that works for Facebook posts works for a very different format, i.e. twitter, and whether usage patterns/what the models pick up change over time),
it is legally/ethically okay to use grab data & photos from Twitter,
50 facial features "correlated using Pearson" makes sense (you would need a lot of data, which you may have). The type of correlation you get with a Person correlation may not be what you should be looking for and you may have a severe multiplicity problem, if you want to derive statements about individual features.

"Scientifically correct" really is a bit of an extreme dichotomy on something that is often not 100% "yes" or 100% "no". In this case, there are a lot of assumptions that you would have to discuss and there are probably "scientifically cleaner" to do something like this, but these ways would also more expensive, complicated and would probably give you a lot less labeled data.

considering the first point, I am only considering elements contained in faces attributes from the pictures, not properties (https://console.faceplusplus.com/documents/6329465). For the second point, we already approved that linguistic personality model trained of FB is way better than Twitter models (basically because Twitter allows only 170 character tweets and this limit many things). Can you elaborate little about 'Person correlation may not be what you should be looking for'? — Krebto, Mar 15 '19 at 13:48
Even if you limit to things in the face, can this be affected by lightning/ photo quality/make-up? Exactly because Twitter is limited to a fixed length, three communication style may differ and a FB model may not transfer. Pearson correlations are most suited to purely linear correlations without interactions between features, that may not be the case. — Björn, Mar 15 '19 at 14:09

score 0 · Answer 2 · answered Mar 15 '19 at 11:41

A critical assumption underlying the significance test associated with a Pearson correlation coefficient between two variables is that the variables must be bivariately normally distributed. ... See course material here http://oak.ucc.nau.edu/rh232/courses/eps525/handouts/pearson%20correlation%20coefficient%20-%20handout.doc

If you have verified that this assumption is not rejectable then you are good to go... if not then maybe not so much.

Note that there must be no anomalies (pulses,level shifts etc .) in either series otherwise normality is rejectable prima facie.

is it scientifically correct to label data by model built using golden data?

2 Answers2