The Impact of ChatGPT on Open Data: Decreasing Activity on Stack Overflow

Large Language Models (LLMs), such as BERT, GPT, and PaLM, are gaining popularity in the field of Natural Language Processing and Understanding. OpenAI’s ChatGPT, based on GPT 3.5 and GPT 4, is an impressive chatbot used by over a million users. It can mimic human responses, generate unique content, answer questions, summarize text, and even translate languages.

ChatGPT has become a valuable tool for users seeking information and assistance online. However, there is a limitation. Engaging privately with these language models reduces the availability of publicly accessible human-generated data and knowledge resources. This reduction in open data can make it challenging to train future models with freely available information.

To investigate this further, researchers studied activity on Stack Overflow, a popular Q&A site for programmers. They wanted to understand how the release of ChatGPT affected the production of open data. The team found that Stack Overflow experienced a significant decrease in activity compared to its Chinese and Russian counterparts, where access to ChatGPT is restricted. They also observed a decline in activity compared to similar math forums, where ChatGPT is less effective due to a lack of training data. The researchers predicted a 16% decline in Stack Overflow posts after the launch of ChatGPT, and the impact of ChatGPT on reducing activity on Stack Overflow increased over time.

The team drew three key findings from their evaluation:

1. Reduced Posting Activity: Stack Overflow experienced a decline in the number of posts after the release of ChatGPT. This decline persisted and eventually reached about 25%.

2. No Change in Post Votes: Despite the drop in posting activity, the number of votes on Stack Overflow posts did not change significantly. This suggests that ChatGPT is replacing both low-quality and high-quality articles.

3. Effect on Diverse Programming Languages: ChatGPT had a varied impact on different programming languages discussed on Stack Overflow. Some languages, like Python and JavaScript, experienced a more noticeable decrease in posting activity. The relative decline in activity was influenced by the prevalence of these languages on GitHub.

The authors concluded that the widespread use of LLMs like ChatGPT might limit the availability of open data for users and future models to learn from. This has implications for the accessibility and sharing of knowledge on the internet and the long-term viability of the AI ecosystem.

