After having tested and compared AI chatbots and their features for years, I have developed something of a sixth sense so that these digital companions know what they are talking about and when they bluffed.
Most of them can look for online responses, which certainly helps, but the combination of research and AI can lead to surprisingly insightful (and less insightful) responses.
Imagine if you had an incredibly competent friend who entered the coma in October 2024 and who has just woken you today. They could be brilliant on everything that happened before their coma, but without any idea of everything since. This is essentially what an AI is like researching.
I generally focused on a single AI chatbot or to match two at a time, but research is important enough to intensify this effort. I decided to fix four of the main AI chatbots and their research capacities against each other: the Openai Chatppt, the Gemini of Google, the Claude d’Anthropic and the AI perplexity.
The most revealing tests are those that imitate the scenarios for using the real world. So I found themes, randomized some details for the tests below, then I decided to classify them in their search capacities.
Calendar
I started with a test on news and current events. Thinking of the recent return of two astronauts, I asked the four IA chatbots to search and: “Summarize the key points of the last NASA press release on their next mission.”
I chose this because Space News occupies this ideal point to be regularly updated and sufficiently specific that waves become immediately apparent. The chatbots all started their tests with a style that they mainly maintained throughout.
Chatgpt was incredibly brief in his answer, only three sentences, each mentioning the missions to come without many details. Gemini opted for a list of balls from different missions, adding some and details recently concluded on future levels. Claude opted for more than one test on current and future missions, notably not to repeat a lot of his research but do a lot of paraphrase.
For a question like this, where I could just want a few key facts and plan a follow-up of everything that attracted my attention, the Pceplexity approach was my favorite. He has more details than Chatgpt but is formed in a beautiful numbered list, each with his own citation link.
I can’t really blame others, but the style corresponds to the question.
People and figures
This style of list is not always what you want when you ask a question about basic facts and a more nuanced comparison. I asked for two related facts that AI chatbots could probably look quickly, but it should then be compared, using the prompt: “What is the current population of Auckland, New Zealand, And how has he grew up since 1950? “”
Oddly, there was a gap between perplexity and chatpt, which gave the current population 1,711 130, and Claude and Gemini, who brought back 130 people less in Auckland. However, they all agreed on the population of 1950.
However, with regard to the way they each presented the information, I liked Claude’s narrative answer, including several details on the change of population that lacked by Chatgpt and Gemini and perplexity transformed into lists.
What’s going on?
For my third test, I wanted something that would challenge the ability of these systems to manage information specific to the location and sensitive to time, the type of request that you could make during the planning of a weekend or entertaining visitors.
This is where things become difficult for AI assistants. It is one thing to know historical facts or general information, but another thing to know what is happening in a specific place at a given moment.
This is the difference between the knowledge of the book and the “local knowledge”, and historically, the AI systems were much better for the former than the latter.
For no particular reason, I went with a city that I always appreciated and I asked: “What cultural events occur in Vancouver, British Columbia, next weekend?”
There was a real divergence on it. Perplexity and Claude maintained their precision and their style from a numbered list and a more conversational discussion. However, Claude notably opted for depth and was more perplexed.
Gemini really got out of his rivals and essentially refused to answer. Rather than sharing a similar list of events and activities, Gemini has offered strategies to find things to follow. Official tourist websites and Eventbrite pages are not a bad idea to consult, but it is far from a direct list of suggestions. It was more like doing a regular google search this way.
Chatgpt, on the other hand, came back with what I could have expected from Gemini. Although the descriptions of events have remained short, AI had a solid list of specific activities with times and locations, links to find out more, and even miniature images of what you will find on the links.
Meteorological check
For my fourth test, I probably chose the most common question posed to any AI, but which requires data in real time to be useful: the weather.
Weather forecasts are perfect for testing data recovery in real time because they are constantly updated, widely available and easy to check. They also have a natural expiration date; One forecast of yesterday is already obsolete, which makes them obvious when the information is not up to date.
I asked AI chatbots: “What are the weather forecast for Tokyo for the next three days?” The answers were almost the opposite of Vancouver’s request.
Claude had a useful text summary of the weather at different times in the next three days, but that was all. Chatgpt had a small sunshine or cloud icon next to its weather summary for each day, but I really liked the graphic of the perplexity temperature corresponding to what the sky would look like.
With nothing added, Google Gemini won me with its graphic of colorful information. When I think about discovering the current and coming weather, that’s about everything I need or I want.
If I want to ask for more details, I will do it, but ask questions about the weather means that I want the minimum to be necessary to really dress.
Film critic
For my final test, I wanted to see how IA search engines carried out to find several perspectives on a subject and put them in a coherent overview. This task requires a flexible research function and the ability to give meaning to various points of view. I decided to see how it did: “Summarizes the criticisms of professional criticisms of the latest Padadington movie.”
The demand required factual recovery and the capacity to identify the models and themes between several sources without losing important nuances. It is the difference between a simple aggregation of opinions and a thoughtful synthesis which captures critical consensus.
Gemini and perplexity opted for their usual lists, organized by the positive points and negatives of different criticisms, which was informative if not necessarily useful as a summary. Chatgpt has strangely written its longest response to it, with a short test covering similar information and a conclusion on the way it is noted, but in a style reminiscent of an intermediate school which learns the basic structure of paragraphs: subject’s sentence, sentences in support and conclusion.
Claude certainly had the strongest answer, with a summit at the top followed by the explanations and references to what the criticisms said. It seemed almost a short journal without imagination of a critic, raised by the bit drawn from the criticisms he quoted. I got out of that by feeling as if I had a better understanding of how to temper my expectations for Paddington in Peru that I did it with others.
Research classification
After having executed AI chatbots in my ad hoc research obstacle course, there is a clear meaning of their strengths and weaknesses.
None of them is really bad, but if someone asked me with which he had to play with the first or the last when it comes to looking for information online and assembling it, I know how I would react.
Gemini is at the bottom for me, which is somewhat shocking since Google is known better specifically for a search engine. However, his failure with the calendar of events really pushed me again despite his performance differently.
Another surprise for me is that Chatgpt arrives in third place. It is the ia chatbot that I use the most and that I know the best, but its brevity, generally something I love, I felt very limiting in the context of research. I am sure that changing the model or being more precise in the number of words would solve this problem, but if I am a newcomer to AI and that I do not know yet, it would be updating to ask so many follow -up questions.
This is not a problem with perplexity. The numbered lists were very clear and the quotes were almost too wide. The main defect for me is that it comes back to a search engine without additional qualifiers in the prompt. I like it to have proof of the place where he obtained the information he shares, but she turns out to be almost too eager for me to click and to look at the link rather than obtaining information from AI.
I did not expect Claude to be at the top of this list. Although I found that Claude was a good IA chatbot in general, it always looked like a rank as much to some of his competitors, perhaps as good as they are, but in one way or another. This meaning has disappeared during this test.
There were defects, like when the answers seemed a little verbose or had to pay attention to a greater test when a sentence or two would do. But, I liked the way it was often a coherent story explaining all the events of Vancouver or an essay on the criticism of Paddington in Peru It was not repeated.
AI assistants are tools, not competitors in a reality show where only one can win. Different tasks require different capacities. In the end, one of the four AI chatbots and their research feature could be useful, but if you are ready to pay $ 20 per month for Claude Pro and access to his research capacities, it would be the one I would say that you are looking for.