Background:
I read this:
google schools US government about gender pay gap.
It derives from this google blog post by Eileen Naughton, VP of People Operations.
She asserts that google is somehow "sharing" it top-level analysis publicly. Top-level is ambiguous, but when used in a corporate sense means a summary of a summary of a summary. Her "top-level" thesis is that "ratings are blind to gender".
When I read articles like this, bbc news - the hidden sexism in workplace language or this hbr news - how gender bias corrupts performance reviews, they say that gender leaks through in a wide number of ways, including every form of gender-propelled language from vague feedback by gender instead of clear and specific skill reference, from usages of jargon, and from agentic vs. communal terminology by gender. They say that if the employee being reviewed is a woman, the way the reporting is given is substantially more subjective, negative, and critical. Sequential reviews demonstrate stronger confirmation bias against women, and their successes are more negatively attributed to spending long hours rather than ability or talent.
Question:
Are there any existing machine-learning analysis in the public space (peer-reviewed & published, or at least something like a strong kaggle winner block) where gender of the subject, not necessarily the author, is indicated by grammar or word choice. A candidate for benchmark of this would be along the lines the relatively recent Samsung emails because they likely contain annual reviews language along with gender.
I'm looking to be able to ascertain the most likely gender of the subject of the text, especially within the context of performance reviews, using leakage of the sort asserted by the BBC article.
By "language of the sort asserted by the BBC article" I do not mean the trivial solution of personal pronouns like "his" or "her" but instead super generic "great year" vs. specific comments about specific skills actually related to job performance.
I'm looking for usage of tools including but not limited to bag-of-words, word n-gram TF-IDF, fuzzy string matching, GloVe, probable latent semantic analysis.
I expect that it is not only word choice, but clause engineering, that are strong signals, but I would like to see it.
Is there any data or work here? I want to hear what data says.
Update:
- data driven examples regarding an adjacent topic especially in the context of self-reporting (link, link)
- there is an alleged book coming out by Paola Cecchi-Dimeglio and Kim Kleman that may use non-machine-learning, but more manual statistical analysis techniques to demonstrate and quantify bias given performance review. (waiting for link)
More links relating to review inequality:
- https://hbr.org/2016/04/research-vague-feedback-is-holding-women-back
- (@Lepidopterist) https://cs224d.stanford.edu/reports/BartleAric.pdf
- IEEE on unitentionally encoding bias into AI
- http://juliasilge.com/blog/Gender-Pronouns/
Update (again):