A mysterious chatbot puzzled the expert public last week after a program called “gpt2-chatbot” appeared in the LMSYS bot scene. robotarena is an online platform where top language models can compete. Here, the user gets two answers to the question, from which he can choose which one he finds better (or he can declare winners and losers).
After answering, it becomes clear which answer was given by which model. Based on users' decisions, the system creates a rating list. At the time of writing, OpenAI's GPT-4-Turbo is leading the rankings, but Anthropic's Claude 3 Opus and Google's Gemini 1.5 Pro are on the podium.
LMSYS, which runs Arena, is an open source language model development organization operating at the University of Berkeley. The mysterious name gpt2 is almost certainly a reference to OpenAI's GPT-2 model that was introduced in February 2019. This was an early precursor to contemporary GPT models. Naturally, there was immediate excitement that this vague chatter might be the long-awaited new model from OpenAI, a public testbed for GPT-5, or perhaps an interim GPT-4.5.
Sam Altman, the company's CEO, has previously explained that today's powerful GPT-4 hardware will soon be remembered as weak. Now, as for Ars Technica's question, he supposedly only answered that GPT-2 has always been close to his heart. OpenAI declined to comment on the story.
Anonymous cuckoo clock
After the sudden spike in interest, a lot of people started testing the chatbot, so it disappeared as quickly as it appeared. It later resurfaced as “I'm-a-good-gpt2” (also shared by Altman on X). Meanwhile, a new version has appeared that has chosen the name “I'm-also-a-good-gpt2”. The latter has either been taken on by a competitor, or OpenAI is testing another product.
According to observers, although traces of OpenAI can be found everywhere about what happened, it is almost certain that we are not seeing a test of GPT-5.
The gpt2 chatbot is good, very good, but if it's gpt-4.5, it's a huge disappointment
Matt Schumer, CEO of HyperWrite, wrote about it.
According to Ethan Mollick of the University of Pennsylvania, the new robot's capabilities are at the GPT-4 level, but it is more adept at solving difficult mathematical questions and drawing ASCII art.
The LMSYS arena rules allow anonymous testing of language models that are prepared, and they are not included in the classification list either. According to experts, it was mostly useful for popularizing the arena, but not at all for making the field of AI seem trustworthy or transparent.