The Truth About Those Misleading Headlines Proclaiming That Humans Outdo AI When It Comes To Combatting Human Loneliness

Studying the impact of AI on human loneliness is gaining traction and revealing important insights.

getty

In today’s column, I examine the latest research on whether generative AI and large language models (LLMs) can aid people in reducing a personal sense of loneliness. The truth is that in some ways, yes, and other ways no, namely that it is a mixed bag and depends on significant factors. I say this because some in the media are misinterpreting or overinflating new research on this crucial matter and making outrageous and false claims that humans always outdo AI when it comes to battling human loneliness.

Not so.

Here’s the backstory. A new study was recently published on this crucial topic in a prestigious journal and made quite a loud splash broadly in the news. Numerous blaring headlines offered misleading assertions and characterizations about the impacts of AI usage on loneliness. I have serious doubts that those covering the story were fully familiar with what the study actually consisted of. Many probably just dipped into a few sentences here or there and opted to run with clickbait-style headlines.

I will share with you the true essence of the study. In addition, I go beyond the study and provide additional contextual considerations. There are important lessons to be learned about how to best interpret these kinds of studies. Plus, I hope that my remarks will aid those actively conducting new research in this field. I give helpful suggestions and discerning ideas on ways to formulate these types of studies.

Let’s talk about it.

This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).

AI And Mental Well-Being

As a quick background, I’ve been extensively covering and analyzing a myriad of facets regarding the advent of modern-era AI that produces mental health advice and performs AI-driven therapy. This rising use of AI has principally been spurred by the evolving advances and widespread adoption of generative AI. For an extensive listing of my well over one hundred analyses and postings, see the link here and the link here.

There is little doubt that this is a rapidly developing field and that there are tremendous upsides to be had, but at the same time, regrettably, hidden risks and outright gotchas come into these endeavors, too. I frequently speak up about these pressing matters, including in an appearance on an episode of CBS’s 60 Minutes, see the link here.

AI Providing Mental Health Guidance

Millions upon millions of people are using generative AI as their ongoing advisor on mental health considerations (note that ChatGPT alone has over 900 million weekly active users, a notable proportion of which dip into mental health aspects, see my analysis at the link here). The top-ranked use of contemporary generative AI and LLMs is to consult with the AI on mental health facets; see my coverage at the link here.

This popular usage makes abundant sense. You can access most of the major generative AI systems for nearly free or at a super low cost, doing so anywhere and at any time. Thus, if you have any mental health qualms that you want to chat about, all you need to do is log in to AI and proceed forthwith on a 24/7 basis.

There are significant worries that AI can readily go off the rails or otherwise dispense unsuitable or even egregiously inappropriate mental health advice. Banner headlines last year accompanied the lawsuit filed against OpenAI for their lack of AI safeguards when it came to providing cognitive advisement.

Today’s generic LLMs, known as general-purpose AI, such as ChatGPT, GPT-5, Claude, Gemini, Grok, CoPilot, and others, are not at all akin to the robust capabilities of human therapists. Meanwhile, specialized LLMs are being built to attain those desired qualities, though such AI is still primarily in the early development and testing stages. For more about purpose-built AI apps in mental health, see my in-depth coverage at the link here and the link here.

Need For Empirical Research

The prudent way to study AI and mental health is to take a robust scientific approach to the matter at hand. It is customary to make use of a randomized control trial (RCT) when conducting research in the field of psychology overall. I have repeatedly noted in my writings and talks that empirical research on the use of modern-era AI-based mental health apps is exceedingly sparse to date; see my in-depth discussion about this at the link here.

This is an unfortunate situation. The wheels of the world are avidly turning, and people are dipping into generic LLMs and customized LLMs for mental health support, yet we really do not know whether this is either good for society or perhaps undermining society. If the AI is doing a proper job of giving out psychological guidance, then we are undoubtedly benefiting from the advent of contemporary generative AI. On the other hand, if the downsides are significant and override the upsides, we are making a ghastly at-scale mistake.

Policymakers and lawmakers require reliable and well-designed research on AI and mental health so that they can proceed on a highly informed basis to craft suitable laws and promote practical policies. If they do not have proper analyses available, they need to rely on loosey-goosey recommendations. There are plenty of talking heads who spout off about this or that when it comes to the impacts and direction of AI and mental health. Personal biases and off-the-cuff opinions are aplenty.

Prime Examples Of Empirical Studies

I’ve been doing a series of analyses that showcase some of the best empirical work so far in the AI and mental health arena.

For example, I did an in-depth analysis of a now-classic research study that used a randomized control trial (RCT) for studying AI and mental health and was published in the New England Journal of Medicine AI, see the link here. This was a formative RCT study in the AI and mental health realm, particularly since it involves a notable generative AI chatbot known as Therabot. I discussed the ins and outs, pros and cons, associated with how such gold standard research studies are contributing to a budding knowledge base on this pressing topic.

In another one of my analyses, I reviewed an RCT study that involved a chatbot known as PATH, see the link here. In this instance, subjects were assessed via standardized tests for anxiety and depression using the widely utilized GAD-7 and PHQ-9 tests. The results indicated that the intervention or treatment group using PATH fared better by having lower GAD-7 and PHQ-9 scores than the control group. We can tentatively surmise that the treatment had a positive effect.

Another of my analyses probed the use of waitlist controls when performing empirical research on AI and mental health; see the link here. When conducting carefully crafted experiments on AI and mental health, there are numerous choices that can be made about the design of a control group. In a typical study, the treatment group is presumably going to be asked to use AI as the principal treatment element. The control group, then, is going to serve as a fundamental comparison to the treatment group. The design doesn’t have to go that way, and numerous alternative setups are equally suitable.

Recently Published Empirical Study

You are now ready and primed to dive into this latest research study that was recently published – a notable journal article entitled “Is A Random Human Peer Better Than A Highly Supportive Chatbot In Reducing Loneliness Over Time?” by Ruo-Ning Li, Dunigan Folk, Abhay Singh, Lyle Ungar, Elizabeth Dunn, Journal of Experimental Social Psychology, July 2026, which made these salient points (excerpts):

“A growing body of empirical evidence suggests that people experience immediate psychological benefits from interacting with chatbots.”
“In this pre-registered study, we tested the effectiveness of a chatbot versus a human peer in reducing loneliness among 296 students in their first semester of university.”
“For two weeks, participants either interacted with a chatbot or a human peer, or simply wrote a brief journal entry (control condition).”
“At 9 PM each day, participants received a text message reminding them to interact with their conversation partner (or write their journal entry) on Discord if they had not already done so.”
“Participants who messaged with human partners reported significantly lower post-study loneliness (M = 1.85, SE = 0.03) compared to those in the control condition (M = 2.00, SE = 0.04), and compared to those in the AI condition (M = 1.98, SE = 0.04).”

The research paper is well worth reading and extensively goes into much detail about the RCT design. By and large, the study found that for the experimental setup they devised, those in the group who interacted via text with fellow students scored on various measures as having less loneliness at the end of the study than did those who were using the AI chatbot.

Mass media leaped to the rash conclusion that this was somehow ironclad proof that AI does a worse job of combating loneliness than fellow humans would do. We should take a reflective moment to closely explore the study and consider what reasonable conclusions and takeaways are viable.

Experimental Designs Involve Tradeoffs

Before I unpack some of the aspects of this notable research, I’d like to emphasize that this paper is yet another wonderful addition to the growing body of research needed in this realm. The researchers were quite clever in their RCT design. They did an amazing job. They methodically analyzed the findings and offered strengths and weaknesses associated with the approach undertaken. Etc.

That being said, all empirical studies involve tradeoffs. How many subjects to include? How should subjects be identified for the study? How long should the experiment last? What type of AI should be used? And so on.

I am not going to particularly repeat their extensive analyses and will instead provide some additional points that I believe are worthy of added attention. Think of this as fellow colleagues civilly offering ways to understand what a study exhibits. This isn’t a right or wrong kind of consideration. Instead, it simply adds a keenly interested third-party perspective on these complex and vital matters.

Exploring The Nature Of The Subjects In The Study

Let’s start by exploring the nature of the subjects of the experiment.

The researchers intentionally chose the subjects in this experiment as first-semester college students at a Canadian university. An upside is that there are plenty of other studies suggesting that college students commonly report a strong sense of loneliness. In that sense, college students are a fruitful set of subjects for an experiment associated with lived feelings of loneliness. They are also relatively easy to employ in experiments (they were offered course credit or $20 to participate).

There are various possible complications regarding such subjects.

Having been a professor for many years, it is my experience that in the first year and especially the first semester, students are excited about college and eager to get underway. They don’t seem likely to have yet encountered much loneliness at college, which usually seems to be onset by the second or third year of college. The gist is that it might be worthwhile considering doing a similar study with students who might be further along in their college days and would seem more likely to have loneliness firmly on their minds.

Another consideration is that the students in the study had self-selected themselves for participation in the study. The researchers indicated they posted about the study by stating that the research was about student experiences in college. An important question is whether the students who stepped forward are predisposed to some characteristics that might relate to the factors of the study, such as being more or less prone to loneliness, more or less prone to socializing, more or less prone to using AI, and so on.

Various other facets are worth mulling over. According to the presented statistics, 72% of the subjects were indicated as female. Colleges typically are around 50/50 when it comes to females and males, though female attendance proportions have been rising. In any case, are there any potential complications about generalizing from a principally female student set of subjects to what both males and females might do when it comes to loneliness?

The subjects were around 18 years old. They were attending a Canadian university. Might there be age-dependent facets about 18-year-olds when it comes to perceptions of loneliness? Likewise, perceptions about the use of AI? This begs the question of whether their responses are applicable to other ages, such as people in their 20s, 30s, 40s, and so on. Can we reasonably generalize in this instance? Also, might there be cultural embeddings associated with either being Canadian or being in attendance at a Canadian college? Etc.

Bottom line is that news headlines that claim this study can be fully generalized to somehow represent all humans, despite the specifics of the cultural, age, gender, and other nuances of the subjects at hand, seem quite a leap of logic.

Additional Factors Of Consideration

When the study was undertaken, the AI that was chosen for use in the study was GPT-4o mini. That is now a somewhat dated version of OpenAI’s generative AI models. It would be interesting to see how GPT-5 would fare. Might a more advanced and robust AI chatbot do a better job at interacting and aiding in overcoming human loneliness? Maybe.

I was glad to see that the researchers gave instructions to GPT-4o mini about how it was to overall respond to the subjects. I say that I am glad of that because some studies just leave the AI at its defaults, which I think is a disservice to properly representing what AI can do.

There was, though, a twist that I would have suggested might be included. It’s this. The researchers nicely provided in their paper the full set of instructions, and upon inspecting those instructions, I noticed something that seems to be potentially missing. The instructions do not provide any directives for conversing on the specifics of the college the students are attending. This might have been crucial as a means of leveling the playing field.

The subjects who interacted with fellow students could easily converse about their in-common experience at the specific university they were attending. Indeed, the researchers noted that the human-to-human interactions at times led to impromptu study session get-togethers, volleyball matches, etc. This undoubtedly would give a decided edge to the human-to-human group because they are equally immersed in the particulars at that college.

If the GPT-4o mini were to be infused with detailed information about the specific college, including college courses underway, events on the campus, and the like, it would seem to put the AI on a more par level as the human-to-human interactions. Otherwise, the AI is merely blandly going to talk about college in general and lacks the specifics associated with the interests of the subjects in the study.

One additional consideration about leveling the playing field was that the subjects met each other face-to-face at the initial lab visit, being able to meet in-person their paired counterparts for the study. How much might this have instantly steered the subjects toward favoring the human-to-human interactions? An alternative approach might have been either that the subjects never met face-to-face, or perhaps met remotely online, though there are tradeoffs whichever way you go on this.

Important AI Upsides

The human-to-human group and the human-to-AI group were each respectively doing around 8 to 10 messages per day. This is interesting. You might expect that the daily messaging would be more frequent in the case of the human-to-human group. Maybe subjects didn’t want to hound their fellow college mates, or they themselves were too busy to do more messaging. In any case, AI and humans were on par in the frequency of messaging as initiated by the subjects themselves. Score a point for AI.

Furthermore, there wasn’t any significant difference, as rated by the subjects, about a sense of closeness when it came to the human-to-human and human-to-AI groups. Again, that scores a point for the AI. The subjects also reported that the AI had the highest levels of expressed empathy overall. I’ve repeatedly indicated that AI can express empathy and that humans often accept this “artificial” empathy as being essentially real, see my discussion at the link here.

The study occurred over 14 days. It was an experiment of two weeks in length. Is that a long enough time period? You can certainly suggest it is better than an experiment that lasts only one day or maybe just a week in length. But it is hard to extrapolate what might have happened if the study lasted an entire semester or the full school year.

The curious aspect is that when the subjects were told they could optionally continue for an additional week, extending therefore to 3 weeks, 14% of those using the AI continued, while 33% of those doing the human-to-human interactions continued. One perspective is that the 14% is a lot less than the 33%. I look at this somewhat differently. Why were only roughly one-third of the human-to-human pairs not wanting to continue their interactions? To me, that’s quite an implied statement about their sense of connection to their paired counterpart. You would assume that they would aim to readily continue interacting with their human counterpart, especially since it was only the first semester, and they have another three and a half years to go with that person at the same college.

The World We Are In

Again, I applaud the researchers for undertaking an important study on the ins and outs of AI and mental health. We need more studies of this nature. I’ll continue to keep you posted on the latest research efforts.

The blaring headlines that have misstated or failed to do their homework on what the study actually involved ought to be ashamed of their clickbait tactics. False narratives about the impacts of AI must be kept to a minimum. AI for mental health is abundantly a dual-sided proposition. AI can achieve at scale that which is good for human mental health, thankfully, but can also have sizable downsides if not suitably designed and deployed.

As per the wise words of Albert Einstein: “The important thing is to never stop questioning.”

Source: https://www.forbes.com/sites/lanceeliot/2026/05/13/the-truth-about-those-misleading-headlines-proclaiming-that-humans-outdo-ai-when-it-comes-to-combatting-human-loneliness/