OpenAI unveils GPT-4o: A conversational powerhouse with vision and voice

The updates are expected to roll out in the coming weeks, so get ready for a whole new way to connect with AI

A still taken from the OpenAI Spring Update live streamed on May 13, 2024. — YouTube/OpenAI
A still taken from the OpenAI Spring Update live streamed on May 13, 2024. — YouTube/OpenAI

In OpenAI's first ever mainstream live event, CTO Mira Murati on Monday announced a focus on making their powerful artificial intelligence (AI) tools easier to use for everyone, "wherever you are." This includes a refreshed desktop application and a new free tier with capabilities previously only available in the paid version of ChatGPT.

"An important part of our mission is being able to make our advanced AI tools available to everyone for free," said Murati, highlighting their commitment to expanding access. They're achieving this by introducing a new model, GPT-4o, which brings GPT-4-level intelligence to the free tier of ChatGPT. Additionally, the requirement to sign up for ChatGPT will be removed entirely.

While rumours swirled about a voice assistant, the live demo showcased a Mac desktop app featuring the existing mobile Voice mode. This suggests a more interactive user experience, but not a complete virtual assistant takeover.

Both the GPT-4o update and the desktop app are expected to roll out over the next few weeks.

Murati, hinted at a major shift in human-computer interaction with the introduction of GPT-4o. "There's so much we take for granted in how we communicate with each other," she said, emphasising the importance of natural interaction.

The new model, GPT-4o, breaks down barriers by understanding voice, text, and even visual information. "This lets us bring GPT-4 level intelligence to our free tier users," said Murati, highlighting months of development to achieve this goal.

With over 100 million people already using ChatGPT regularly, Murati revealed that GPT-4o is significantly more efficient than previous versions. This translates to more powerful GPTs (custom chatbots) being available for free. Users can also expect data, code, and vision tools, allowing them to analyse images without any usage limits.

OpenAI's generous free tier upgrade with GPT-4o leaves some wondering if the $20 monthly fee for ChatGPT Plus is still worth it.

CTO Murati assures users that there are still perks to the paid plan. She revealed that Plus subscribers will enjoy a significant advantage: five times more daily requests to the powerful GPT-4o compared to the free version. This translates to considerably more interaction and potentially faster results for those willing to pay.

Live chat gets real: GPT-4o brings live speech to OpenAI

OpenAI's latest update, GPT-4o, takes a big leap towards natural conversation with live speech capabilities. Unlike previous models that relied on text transcription, GPT-4o can directly understand and respond to spoken audio. This opens exciting possibilities for real-time interaction.

Imagine a conversation where you don't have to wait for pauses or complete sentences. The demo showcased this fluidity, with an OpenAI staffer even attempting some (perhaps not-so-relaxing) deep breaths into the microphone. GPT-4o, ever the helpful companion, not only picked up on the heavy breathing but even offered advice on improving breathing techniques! It even added a touch of humour, warning the staffer, "you're not a vacuum cleaner."

This ability to understand and respond to emotions in real-time speech further blurs the lines between human and machine interaction. Interrupting GPT-4o mid-sentence won't be a problem either, making conversations feel more natural and engaging.

See, chat, code: GPT-4o's powerhouse update

Moreover, OpenAI's latest update, GPT-4o, isn't just about conversation anymore. It boasts impressive new vision capabilities, essentially granting it the power to "see" through your phone's camera.

Imagine getting stuck on a math problem. In the demo, the team threw GPT-4o a curveball: a handwritten equation. While it didn't provide a straight answer, it surprised everyone by offering step-by-step guidance, acting more like a patient tutor than a simple calculator. Even more impressive, the AI could detect changes made to the equation in real-time, adjusting its advice accordingly.

But GPT-4o isn't just a math whiz. It can also be a charmer! Another demo showcased the AI's ability to process written text. When shown a note reading "I heart ChatGPT," the AI's voice took on a more emotional tone, playfully acknowledging the affection. There was a lighter moment too, where the camera lingered on the presenter's outfit, prompting GPT-4o to offer a compliment — a glimpse into its ability to understand and respond to visual cues in real-time conversations.

The applications extend beyond handwritten notes and heartfelt messages. Running on a Mac, GPT-4o seamlessly transitioned to analysing code. With an almost human-like voice, it could not only view the code being written but also identify potential problems.

The vision capabilities aren't limited to phone cameras either. The desktop app demo revealed GPT-4o's ability to see directly on the user's screen. It effortlessly analysed a displayed graph, providing valuable insights and feedback — a powerful tool for data visualization and analysis.

OpenAI's vision for GPT-4o paints a picture of a future where AI seamlessly integrates with our visual world, offering real-time assistance and understanding in a more natural way.

Good news for travelers

Imagine seamless conversations across languages. The demo highlighted ChatGPT Voice's ability to translate spoken Italian from Mira Murati to English in real-time, and vice versa. This could be a game-changer for travelers and international communication.

Moreover, the live demo showcased facial recognition through the camera, with the AI detecting a smile and prompting, "want to share the reason for your good vibes?" This emotional intelligence adds a new layer of understanding to human-AI interaction.

These advancements, along with the impressive voice assistant capabilities, echo CEO Sam Altman's description of GPT-4o as "magical." It has the potential to revolutionise the way we interact with AI, potentially moving away from text-based interfaces. The update is expected to roll out in the coming weeks, so get ready for a whole new way to connect with AI.