ChatGPT already has “eyes” and can talk about what surrounds us in real time. Science fiction begins to become reality

It hit theaters more than a decade ago.Her’, a film that immersed us in a future where artificial intelligence assistants (AI) not only interacted with users by voice, but they could also “see” the environment through the camera. At that time, Siri had only been in operation for two years, and what the film proposed was perceived as a still very distant future. OpenAI is shortening the path to that reality.

ChatGPT can finally see what happens around us. This means that it will be possible to give you access to our camera when we use the advanced voice mode. In this way, the famous chatbot will be able to process the images in real time thanks to the multimodal model GPT-4o. The option to share our screen will also be available so that you can obtain real-time data from the applications we are using.

ChatGPT can now process video in real time

Once it is available on our device, starting to use this new capability will be very simple. We will simply have to open the application ChatGPT and press the button in the upper right corner to start advanced voice mode. The next step will be to click on the camera button. In the interface we will find a button that will allow us to choose the front or rear camera in case we use a telephone.

OpenAI’s AI chatbot can now be much more useful. For example, we might ask you to help us perform certain tasks. A member of the team gave a demonstration this Thursday. He asked ChatGPT to show him step by step how to make coffee with a filter. The model was able to recognize each of the objects on the table in real time and guide the person throughout the process. The latency seemed negligible.

When the company announced this functionality in May of this year, it presented many other usage scenarios. Among them we find a father solving math problems with his son, people playing rock, paper, scissors and we even saw an excited ChatGPT meeting a dog. One of the most interesting examples came from the accessibility options, that allowed the environment to be described in simple language.

If we wanted to share our screen with the chatbot we would simply have to press the three dots menu and then select Share screen. We have to wait to see if the chatbot’s vision capabilities will meet expectations, but we must mention that like any AI model it can make mistakes. In any case, OpenAI demonstrates that it remains at the forefront of the artificial intelligence race.

OpenAI says the new video mode will be available in the coming days “in most countries” for users of ChatGPT Plus ($20 a month) and ChatGPT Pro ($200 a month). If you are reading this from Spain, the company has indicated that it hopes to offer “soon” this novelty in the European Union. So we do not have an arrival date in the countries of the community bloc, apparently due to regulatory issues.

GPT-4o is able to explain to us what it sees. This can be an exceptional help for blind people.

Altman he said this week in a message on X that some of its products may arrive later in Europe. He also added that it is likely that they will not be able to offer some of their products. “We want to offer our products in Europe and we believe that a strong Europe is important for the world. We also have to comply with the regulations,” said the businessman in the same message.

Images | OpenAI | Screenshot