On Monday night Hungarian time, OpenAI, the leading company in the field of artificial intelligence, presented a very impressive new model, which is natively capable of conducting conversations in real time, completely without delay, and based not only on written text, but also on videos and images. And the sounds. The model, called GPT-4o, will arrive gradually over the coming weeks, and the company has announced that, unlike the current top model, it will be available to everyone for free.
A few days before Monday's event, speculation has already begun about what exactly OpenAI will announce. the According to Reuters Most of the world's press treated it as a fact that they would be competing with the still-established Google search engine with their AI-based solution, but the company's CEO, Sam Altman, said the next day He explained this on X (formerly Twitter)., that neither a search engine nor GPT-5 will be announced. But he added that
They've been working hard on something new that they think people will love and it still feels like magic to him, and from what we saw on Monday, that's not surprising at all.
At the event, they didn't waste much time, they even covered the most important things first, namely that ChatGPT will get a desktop version and a new UI, and that their new model, available to everyone for free and the shipment will be called GPT-4o. Regarding the former, Mira Moratti, the company's CTO, said that an important element of their mission is to make their AI models freely available and easy to use, despite the fact that the models themselves are becoming more complex.
Then they jumped first to GPT-4o, which is a GPT-4 level model, but compared to the current top model (this is the GPT-4 Turbo, which was introduced about half a year ago), it is much faster and was capable of it. Significant improvement in all input methods compared to its predecessor. The API that developers can use for this purpose is also being updated, and according to Moratti, it is twice as fast as the previous model and has five times the maximum speed, but it costs only half as much. The new model will include all the functionality that was previously available only to subscribers, but will still have five times the maximum.
Moratti said the models have gotten better and better in recent years, but now handling them has been greatly simplified, so they've taken a big step toward making the interaction between people and machines feel natural. The most significant change is that voice-based input, where three models have worked together so far (one that takes notes, another that interprets and answers them, and a third that reads them), works natively in GPT-4o. This means that
The model can communicate in real time, without delay, based on camera images, written text and live speech, and based on the presentation, it looks like it was taken straight from science fiction.
This seems like a bit of a stretch at first, but based on the demos, they've actually put together something that a year ago you would have easily said they must have pre-assembled. Armed with the new model, you should talk to ChatGPT in the same way as Google Assistant or Siri, but unlike them, you can then talk to it in real-time as if you were talking to someone else. Not only because you don't have to wait for him for seconds, but also because what he says and how he says it feels eerily natural.
The new model recognizes and reacts to human tone of voice in real time, and you can't get too caught up in their use of words, it even recognizes that someone is panting nervously or breathing quietly and can respond to it. You can interrupt what he says as a real person, and he himself can imitate different emotional styles. This was presented by telling a story with him in an increasingly dramatic tone, and it became more and more dramatic. He had to finish by singing the ending, and before that he sighed exactly why he was drawn to this.
Such elements, which are not essential for conveying information, but are common in human communication, came up with the model all the time when they switched to video, and ChatGPT started talking even before they showed it what they wanted from it, he said. Things like, “Oops, I was a little too happy.” However, the model easily led an OpenAI expert to solve a first-order equation written on paper, and when Moratti interrupted him from the background, saying: “Okay, but what use is this in everyday life,” he even gave a little lecture on the topic.
Plus, of course, he was able to help with the programming: in about a second he understood what the code shown to him on the screen was doing and summarized it, and then did the same with the graph after which the code was shown running. Additionally, it turns out he can translate in real time, and was even able to tell from an expert's face that he was having a good time (and before that, he joked a line because he was accidentally shown the table first). Everything was as it was
We'll be just one step closer to the holographic virtual intelligence of Mass Effect – or the AI system of the movie Her (in Hungarian, woman), and it's no coincidence Altman wrote this too After the demo – and if GPT-4o really works that way, it might as well.
You can watch the full broadcast below, On the OpenAI channel And you can find interesting things beyond the demos shown here, a It's a joke the Preparing for the interview And Meet the dog more To GPT-4o they talk to each other and sing together.