4

Background:
I read this: google schools US government about gender pay gap. It derives from this google blog post by Eileen Naughton, VP of People Operations.

She asserts that google is somehow "sharing" it top-level analysis publicly. Top-level is ambiguous, but when used in a corporate sense means a summary of a summary of a summary. Her "top-level" thesis is that "ratings are blind to gender".

When I read articles like this, bbc news - the hidden sexism in workplace language or this hbr news - how gender bias corrupts performance reviews, they say that gender leaks through in a wide number of ways, including every form of gender-propelled language from vague feedback by gender instead of clear and specific skill reference, from usages of jargon, and from agentic vs. communal terminology by gender. They say that if the employee being reviewed is a woman, the way the reporting is given is substantially more subjective, negative, and critical. Sequential reviews demonstrate stronger confirmation bias against women, and their successes are more negatively attributed to spending long hours rather than ability or talent.

Question:
Are there any existing machine-learning analysis in the public space (peer-reviewed & published, or at least something like a strong kaggle winner block) where gender of the subject, not necessarily the author, is indicated by grammar or word choice. A candidate for benchmark of this would be along the lines the relatively recent Samsung emails because they likely contain annual reviews language along with gender.

I'm looking to be able to ascertain the most likely gender of the subject of the text, especially within the context of performance reviews, using leakage of the sort asserted by the BBC article.

By "language of the sort asserted by the BBC article" I do not mean the trivial solution of personal pronouns like "his" or "her" but instead super generic "great year" vs. specific comments about specific skills actually related to job performance.

I'm looking for usage of tools including but not limited to bag-of-words, word n-gram TF-IDF, fuzzy string matching, GloVe, probable latent semantic analysis.

I expect that it is not only word choice, but clause engineering, that are strong signals, but I would like to see it.

Is there any data or work here? I want to hear what data says.

Update:

  • data driven examples regarding an adjacent topic especially in the context of self-reporting (link, link)
  • there is an alleged book coming out by Paola Cecchi-Dimeglio and Kim Kleman that may use non-machine-learning, but more manual statistical analysis techniques to demonstrate and quantify bias given performance review. (waiting for link)

More links relating to review inequality:

Update (again):

  • The request to "Fight Inequality" seems to make people more against it. (link) The process of trying to generate it, either de-motivates, or motivates against it.
  • It looks like one of the "killers" is job attrition. link
kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
EngrStudent
  • 8,232
  • 2
  • 29
  • 82
  • 3
    Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackexchange.com/rooms/56954/discussion-on-question-by-engrstudent-google-gender-pay-gap-vs). – whuber Apr 11 '17 at 19:04
  • 1
    I don't think I really understand your research question? Let $F$ be an indicator for the female gender. Let $X$ be the content of a performance review. How does $\operatorname{E}[F \mid X]$ tell us anything useful about discrimination? – Matthew Gunn Apr 13 '17 at 13:18
  • I am looking at the difference between $E\left[F | X\right]$ and $E\left[M | X\right]$. I think they should be roughly equivalent and that there should not be a stunning difference. I think that, all else being equal, if there is a stunning difference, then it likely indicates both that there is discrimination and a possible way to engineer that discrimination out of the process. Google officially says "zero leakage" and several other sources say "truckloads of leakage". I just want to quantify the leakage as a first-step. Show me the data. Show me the math. – EngrStudent Apr 13 '17 at 13:40
  • 6
    @EngrStudent "...if there is a stunning difference, then it likely indicates both that there is discrimination..." Except that's not true? Clearly personal pronouns (i.e. "he" and "she") will forecast gender and have nothing to do with discrimination. Names will forecast gender (eg. "Alice"). Job title will forecast gender because different jobs have different gender balance. I think what you're trying to get at is whether perceived performance is related to gender? It's an interesting idea. My problem is that's not the only channel linking gender and review text. – Matthew Gunn Apr 13 '17 at 14:09
  • 1
    Let's say the text reveals the gender of the person. So what? Given the current environment this seems should only benefit women, because there's a pressure on employers to employ and advance equal number of females regardless of the supply of talent. – Aksakal Jun 09 '17 at 20:57
  • 1
    I voted to close this question as off-topic because it's a subject-matter question (viz., can you detect gender from word choice?) rather than a question about data analysis itself. You might try https://cogsci.stackexchange.com/ – Kodiologist Jun 10 '17 at 00:47
  • @Kodiologist - I disagree that this is a psychology question. This is a data-analysis question. Can the data say it is a different question than "what is the human physics behind it". It is explicitly "is there any machine learning analysis in the public space that ....". – EngrStudent Jun 10 '17 at 01:08
  • Interrupting a work career to nurture children is understandable for mammals. – Carl Jun 12 '17 at 03:21
  • 1
    @Carl - do non-human mammals get "careers", outside of a zoo or circus? – EngrStudent Apr 02 '18 at 21:07
  • 1
    @EngrStudent Human specialization pales in comparison to the drone bee, whose neuter sex and specialization is genetically mandated. Unlike bees, some humans have lives outside of wage slavery. – Carl Apr 02 '18 at 22:41

0 Answers0