ChatGPT can now speak, listen and see images

OpenAI collaborated with professional voice actors to train the models to speak.

The generative artificial intelligence (AI) space continues to heat up as OpenAI has unveiled GPT-4V, a vision-capable model, and multimodal conversational modes for its ChatGPT system.

With the new upgrades, announced on Sept. 25, ChatGPT users will be able to engage the chatbot in conversations. The models powering ChatGPT, GPT-3.5 and GPT-4, can now understand plain language spoken queries and respond in one of five different voices.

ChatGPT can now see, hear, and speak. Rolling out over next two weeks, Plus users will be able to have voice conversations with ChatGPT (iOS & Android) and to include images in conversations (all platforms). https://t.co/uNZjgbR5Bm pic.twitter.com/paG0hMshXb
— OpenAI (@OpenAI) September 25, 2023

According to a blog post from OpenAI, this new multimodal interface will allow users to interact with ChatGPT in novel ways:

“Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it. When you’re home, snap pictures of your fridge and pantry to figure out what’s for dinner (and ask follow up questions for a step by step recipe). After dinner, help your child with a math problem by taking a photo, circling the problem set, and having it share hints with both of you.”

The upgraded version of ChatGPT will roll out to Plus and Enterprise users on mobile platforms in the next two weeks, with follow-on access for developers and other users “soon after.”

ChatGPT’s multimodal upgrade comes fresh on the heels of the launch of DALL-E 3, OpenAI’s most advanced image generation system.

According to OpenAI, DALL-E 3 also integrates natural language processing. This allows users to talk to the model in order to fine-tune results and to integrate ChatGPT for help in creating image prompts.

In other AI news, OpenAI competitor Anthropic announced a partnership with Amazon on Sept. 25. As Cointelegraph reported, Amazon will invest up to $4 billion to include cloud services and hardware access. In return, Anthropic said it will provide enhanced support for Amazon’s Bedrock foundational AI model along with “secure model customization and fine-tuning for businesses.”

Translate

ChatGPT can now speak, listen and see images