Conversational AI Assistants: Google and OpenAI's Latest Creations

Thursday - May 16 2024, 22:16 UTC - 1 year ago

tldr #

41 seconds

This week, Google and OpenAI announced new AI assistants that can converse in real time, analyze surroundings, and translate conversations. OpenAI's GPT-4o and Google's Gemini Live offer similar capabilities, but with unique features such as live video communication and voice recognition. Both are currently free to use, with paid plans available for increased capacity. Google is also working on a new do-everything AI agent, while OpenAI is exploring the possibility of embedding their assistant into other devices. These tools have the potential to become a part of daily routines, but could also be seen as just a sci-fi party trick. Only time will tell how useful they will truly be.

content #

3 minutes, 16 seconds

This week, Google and OpenAI both announced they’ve built supercharged AI assistants: tools that can converse with you in real time and recover when you interrupt them, analyze your surroundings via live video, and translate conversations on the fly.

OpenAI struck first on Monday, when it debuted its new flagship model GPT-4o. The live demonstration showed it reading bedtime stories and helping to solve math problems, all in a voice that sounded eerily like Joaquin Phoenix’s AI girlfriend in the movie Her (a trait not lost on CEO Sam Altman).

OpenAI's GPT-4o has a response delay of just 320 milliseconds, making it comparable to human conversation.

On Tuesday, Google announced its own new tools, including a conversational assistant called Gemini Live, which can do many of the same things. It also revealed that it’s building a sort of "do-everything" AI agent, which is currently in development but will not be released until later this year.

Soon you’ll be able to explore for yourself to gauge whether you’ll turn to these tools in your daily routine as much as their makers hope, or whether they’re more like a sci-fi party trick that eventually loses its charm. Here’s what you should know about how to access these new tools, what you might use them for, and how much it will cost.

Google's Gemini Live will allow users to communicate via live video, making it a unique feature in the world of AI assistants.

OpenAI’s GPT-4o .

What it’s capable of: The model can talk with you in real time, with a response delay of about 320 milliseconds, which OpenAI says is on par with natural human conversation. You can ask the model to interpret anything you point your smartphone camera at, and it can provide assistance with tasks like coding or translating text. It can also summarize information, and generate images, fonts, and 3D renderings.

Both OpenAI and Google are currently working on developing even more advanced AI agents that will be released in the near future.

How to access it: OpenAI says it will start rolling out GPT-4o’s text and vision features in the web interface as well as the GPT app, but has not set a date. The company says it will add the voice functions in the coming weeks, although it’s yet to set an exact date for this either. Developers can access the text and vision features in the API now, but voice mode will launch only to a "small group" of developers initially.

OpenAI's CEO Sam Altman noted the resemblance of GPT-4o's voice to that of Joaquin Phoenix's AI girlfriend in the movie Her.

How much it costs: Use of GPT-4o will be free, but OpenAI will set caps on how much you can use the model before you need to upgrade to a paid plan. Those who join one of OpenAI’s paid plans, which start at $20 per month, will have five times more capacity on GPT-4o.

Google’s Gemini Live .

What is Gemini Live? This is the Google product most comparable to GPT-4o—a version of the company’s AI model that you can speak with in real time. Google says that you’ll also be able to use the tool to communicate via live video "later this year." The company promises it will be a useful conversational assistant for things like preparing for a job interview or rehearsing a speech.

Google's AI division is exploring the possibility of embedding their assistant, Astra, into smart glasses and other devices.

How to access it: Gemini Live launches in "the coming months" via Google’s premium AI plan, Gemini Advanced.

How much it costs: Gemini Advanced offers a two-month free trial period and costs $20 per month thereafter.

People will be able to use Astra through their smartphones and possibly desktop computers, but the company is exploring other options too, such as embedding it into smart glasses or other devices, Oriol Vinyals, vice president of Google’s conversational AI division, said at the company’s I/O developer conference.

Gemini Advanced, Google's premium AI plan, offers a free two-month trial period before charging $20 per month.

hashtags #

aiassistants google openai superchargedai geminilive gpt-4o astra conversationalai livevideo

worddensity #

ai (6, 1.07%)
live (6, 1.07%)
gemini (6, 1.07%)
openai (5, 0.89%)
model (5, 0.89%)