Google already has an AI that “reasons.” And there is a father, a son, a monkey and food to prove it

At Google they are taking a very serious stand against OpenAI. The launch of the Gemini 2.0 family of AI models stood out for its AI agent, Project Mariner, but now it comes with an equally striking novelty. With us we already have a preliminary version of Gemini 2.0 Flash Thinking, an AI model that “reasons” —as always, in quotes— as does o1 from OpenAI. We have tested it, and its behavior is remarkable.

This model can now be tested in AI Studiowhere it is enough to select it on the right side where we can choose which model we want to work with at any given time. By doing so, we can introduce all kinds of questions, but the ones that really make sense to ask here are mathematical or logical questions in which it is noted that the model has the ability to try to solve a problem by going back and reviewing its answers.

Let’s do a little experiment: we suggest trying to solve two problems that Gemini 2.0 Flash Thinking did solve. The first, with an image:

Screenshot 2024 12 20 At 7 49 19
Screenshot 2024 12 20 At 7 49 19

Given those pool balls with those numbers, are you able to find a combination in which three of them add up to 30? Think about it for a moment.

Do you have it? There is apparently no solution: no combination with those numbers achieves the proper result. But of course, there is a trick. The billiard ball with the number 9 can “turn around”, so the resulting number is 6. And thanks to that number we can achieve a combination (6+11+13) that solves the problem.

Logan Kilpatrick, head of AI Studio, was in charge of presenting the new model and demonstrating its capacity with that same example (hence the poor quality of the image of the billiard balls, sorry). If you click on the video and observe the reasoning process, you will see how Gemini 2.0 is actually able to detect just that “trick” to solve the problem. Amazing.

The second example is just as striking.. There are many logical problems that we can use to test these models, and one of them is found on Reddit, where a user stated it (in English) so that it was easily understandable by a chatbot.

The problem places us in a scenario with a father, a son, a monkey and food. They must cross a river and there are several conditions to do it properly:

  • They must cross the river in a small boat
  • The boat can only carry two things, but it can also only carry one
  • The boat cannot cross the river by itself
  • Only the father or son can pilot the boat, and both can go together if necessary
  • You can’t leave the food alone with your child because he will eat it.
  • You can’t leave the food alone with the monkey because he eats it
  • How does the father manage to cross everyone and everything to the other shore?
Screenshot 2024 12 20 At 8 06 37
Screenshot 2024 12 20 At 8 06 37

The solution proposed with Gemini, with that step 4 that the chatbot describes as “counterintuitive” because it may actually seem that way.

Once the problem is introduced, Gemini first analyzes the instructions to break them down, and then begins to “experiment.” After less than a minute he finds the solution, which has a particularly striking step:

  1. The father carries the food to the other side of the river
  2. The father returns alone
  3. The father takes the son to the other side
  4. The father returns, but with the food to prevent the son from eating it.
  5. The father leaves the food and takes the monkey to the other side
  6. The father returns alone
  7. The father brings the food to the other side
  8. Solved!
Screenshot 2024 12 20 At 8 05 42
Screenshot 2024 12 20 At 8 05 42

Claude 3.5 Sonnet couldn’t figure it out.

The problem, which is not particularly difficult for us, is very complex for models of this type. In fact, we tested it in Claude 3.5 Sonnet and this chatbot, after thinking about it a couple of times, responded by asking if the problem was impossible to solve.

The truth is that tests like this show that these types of models that “reason” they go one step further and they are especially useful in these types of situations. Jeff Dean, chief scientist at DeepMind, indicated in think”—the reality is that this goes beyond a stochastic model that generates text from its training set.

These types of models certainly take longer to respond, but it is curious to “watch them work” and see how they analyze these problems to try to solve them.

Screenshot 2024 12 20 At 8 17 49
Screenshot 2024 12 20 At 8 17 49

We actually did a third test. The famous one about counting R’s. In this case, we asked him to count the R’s in the phrase “the dog of San Roque has no tail because Ramón Ramírez has stolen it.” It’s not a strictly logical problem, but here Gemini he made a mistake and counted 10 R’s when in reality there are nine.

Even when I insisted that he check his answer, he gave the wrong answer again and again. So, amazing in some things, and surprisingly terrible in others that seem trivial to us.

Image | techopiniones with Freepik

In techopiniones | I have used ChatGPT Search as the default search engine thanks to the Chrome extension. And I think Google has a problem