I have been working on a project that takes a technical approach to reveal the gender bias in various media sources.
There is a popular neural network model called word2vec which essentially represents a word as a vector in a multidimensional vector space – a word embedding. A word vector’s is positioned such that words that share common contexts are located in close proximity to one another in the space.
Representing words in this way opens new possibilities for optimizing search engines, and language analysis. What really intrigued me was that these word embeddings are going to be different depending on the training text data set, reflecting the different associations present. For example, a word-embedding trained on Wikipedia probably will not have the association of YAS to QUEEN.
Anyways, since these word-embeddings reflect associations in our society, I thought I would do an experiment to test if certain words are more commonly associated with women or men.
My approach was to project words onto a she<->he axis in this multidimensional vector space, we can see whether the word is more closely associated with she, or he. Projecting supposedly gender neutral words such as genius, intelligent, or bossy, should hopefully be right in the middle of the she<->he axis. However, we see that this is seldom the case, revealing the gender biases propagated through the original text and into the text embedding.
In the attached images, words placed in the left half(negative x-axis) of the image are more commonly associated with men, and the right half (positive x-axis) are more associated with women. Observing this graph clearly reveals that the media commonly associates toxic words with women. We consume this media every day, and are therefore subliminally consuming these biases every day. Much of our community believes that feminism isn’t relevant anymore as women and men have “equal rights”. Hopefully this scientific evidence will be concrete proof of the disparities that exist in the way we perceive gender, and that we still have a long way to go.
pic1:wikipedia, pic2:reddit <<working on better visualizations>>
Link to project :
Thoughts would be appreciated!