Investigating Gender Stereotypes in Large Language Models
Large Language Models (LLMs) have made significant strides recently, surpassing previous benchmarks in various fields. However, this paper delves into an important aspect related to LLMs – their behavior concerning gender stereotypes. Gender biases have been a challenge for previous models, and this paper aims to explore this issue further. We propose a straightforward method to assess the presence of gender bias, using the widely-known WinoBias dataset as a basis but with some modifications. This dataset is highly likely to be included in the training data of current LLMs.
We evaluate four recently published LLMs and reveal that they exhibit biased assumptions about men and women. These assumptions align with societal perceptions rather than being grounded in factual information. Furthermore, we delve into the models’ explanations for their choices. Our findings suggest that not only do these explanations reinforce stereotypes, but a significant proportion of them also contain factual inaccuracies, obscuring the true reasons behind the models’ decisions.
This investigation brings to light an important characteristic of LLMs – their training on unbalanced datasets. Consequently, even with reinforcement learning and human feedback, they tend to reflect these imbalances back to us. Similar to other societal biases, it is crucial to subject LLMs to thorough testing to ensure fair treatment of marginalized individuals and communities.