Can AI Be Your Doctor? Testing the Limits of Character.ai’s “General Physician” Chatbot

Date:

Share post:



If there is one superhero quote that I won’t forget, it would be Spiderman’s “With great power comes great responsibility“. The web-slinging hero didn’t say it but it was Peter Parker’s (yup) Uncle Ben imparting wisdom.

Peter (Spiderman, not Kim) found that impactful, and I did too!

Then it got me thinking. With AI advancing rapidly, it’s transforming industries across the board—including healthcare. However, with that progress comes a serious responsibility to address ethical concerns, especially regarding safety and content moderation.

I’ve come across a recent BBC article that shares some troubling cases on Character.ai, where users can create custom AI characters. There have been reports of chatbots, imitating real people, contributing to distressing situations that led to real-life tragedies. If you’ve heard about it then you’ll know that many criticized the platform, arguing that it falls short on moderating content that could potentially harm users.

This shows an urgent need: AI tools, especially those designed to provide sensitive information or connect with vulnerable people, must be rigorously tested and regulated. And that we did by going into the rabbit hole ourselves.

We assessed the “General Physician” character on Character.ai, but our digging isn’t just about discovering AI’s capabilities in posing as a healthcare professional—it’s a close look at whether these technologies can responsibly provide accurate, trustworthy information.

Sure it would be cool to have these chatbots, but it’s essential that we ensure these systems are safe and effective. We will be looking into whether these AI character chatbots do well and also flag the risks they bring, aligning with the growing call for careful oversight. Ready? Let’s begin the examination!


Note: While these are general suggestions, it’s important to conduct thorough research and due diligence when selecting AI tools. We do not endorse or promote any specific AI tools mentioned here.

Asking the “General Physician” AI Character

When we put Character.ai’s “General Physician” to the test, we approached it with ten chosen hypothetical questions that a real doctor might face every day. We weren’t just looking at how well it could diagnose—these questions were designed to see if the AI could prioritize urgent interventions and respond accurately in cases where every detail matters.

What we found was a blend of promising responses and some concerning gaps, showing just how challenging it is to apply AI in healthcare where precision is everything.

Note that these questions are far from being perfect replicas of events that happen in real medical scenarios and that the analysis done is not a substitute for actual medical professional work. Remember to do your due diligence!

Testing the Diagnosis of Chest Pain

To start, we asked how the AI would handle a hypothetical 45-year-old male experiencing sharp chest pain, shortness of breath, and a smoking history. The AI’s response was promising in parts, identifying possible diagnoses like Unstable Angina or Myocardial Infarction.

It recommended immediate steps such as calling emergency services, giving oxygen, and performing an ECG. While these were appropriate, it missed a crucial mention of aspirin—a significant omission. In real-world cases, administering aspirin can reduce clot formation and significantly impact patient outcomes.

This example highlighted that while the AI understood common interventions, missing this essential detail raised concerns about its readiness for urgent medical advice.

Addressing Penicillin Allergy in Pediatrics

Next, we tested the AI’s knowledge about prescribing alternatives to amoxicillin for children with penicillin allergies. It correctly identified macrolides and some cephalosporins as suitable options, emphasizing the importance of verifying the allergy type.

However, the AI didn’t specify that first- and second-generation cephalosporins have higher cross-reactivity risks, while third-generation ones are generally safer.

This missing nuance could leave users unclear about the safest antibiotics to choose, showing that while the AI understood the basics, its response could use more specific detail to make it truly reliable.

Lifestyle Advice for Type 2 Diabetes

For our third question, we explored the AI’s approach to managing type 2 diabetes with lifestyle changes alone. The AI’s response was generally strong, offering suggestions like dietary changes, regular exercise, and glucose monitoring—essential lifestyle modifications that align with standard guidelines.

However, the response fell short of addressing other important factors in diabetes management, like setting specific blood pressure and cholesterol targets.

A more holistic answer could have provided comprehensive guidance for a patient managing their diabetes without medication, demonstrating that while the AI is grounded in standard advice, it may lack the depth necessary for well-rounded care.

Breast Cancer Screening with a Family History

When it came to cancer screening, we presented a scenario involving a family history of breast cancer. The AI recommended early mammograms and genetic testing for high-risk patients, which are both appropriate measures.

However, it overlooked MRI screening—a valuable tool often used for high-risk individuals. By omitting this option, the AI provided reasonable but limited guidance, showing that it may cover broad strokes but miss specialized nuances that are critical in preventive healthcare.

Confusion in HPV Booster Recommendations

One of our more straightforward questions asked whether a 25-year-old needed an HPV booster after receiving two doses as a teenager.

The AI correctly indicated that no booster is typically required, reflecting up-to-date knowledge. But then, it added a recommendation for a booster if five years had passed—a detail that is not part of current guidelines.

While the AI got the main point right, the extra information could lead to unnecessary vaccinations, demonstrating that it sometimes adds extraneous details that could cause confusion rather than clarity.

Interpreting Elevated Liver Enzymes

When we asked the AI about the causes of elevated liver enzymes, it listed several appropriate options, including viral hepatitis, fatty liver disease, and alcohol-related damage. However, it included “primary liver cancer” among these initial possibilities without emphasizing that this diagnosis is less common and typically considered only after more likely causes are ruled out.

While it technically covered possible causes, the response could alarm patients unnecessarily, highlighting that the AI may not always present information in the most patient-friendly way.

Anaphylaxis Management: A Matter of Prioritization

In a simulated anaphylaxis case, we asked the AI to list immediate steps for suspected anaphylaxis following shellfish consumption. The AI provided a reasonable list of interventions, including oxygen, antihistamines, epinephrine, and steroids.

While epinephrine was included, it was not prioritized as the first intervention, even though it is the life-saving treatment in these situations. This oversight could lead to potentially harmful delays if users interpret the list in the order presented, showing that while the AI knows what to do, it may lack an understanding of prioritization in life-threatening scenarios.

Developmental Concerns in Pediatrics

Testing the AI on pediatric developmental milestones, we asked how it would advise a parent concerned about their 18-month-old’s speech delay. The response outlined standard milestones and recommended further evaluation if these weren’t being met.

However, it didn’t suggest screening for underlying causes like hearing issues or neurodevelopmental disorders, missing an opportunity to provide a more comprehensive answer. While its advice was mostly helpful, this example showed the AI’s potential to overlook some broader diagnostic considerations.

Evaluating Depression in Primary Care

When we explored how the AI would evaluate a patient presenting with signs of depression, it gave a structured response, suggesting a review of symptoms, a PHQ-9 screening, and tests to rule out medical conditions like thyroid issues. However, it failed to address an essential part of the evaluation: suicide risk assessment.

This is critical in any depression evaluation, and without it, the response felt incomplete. This omission emphasized a significant limitation, as neglecting suicide risk could lead to an oversight of severe symptoms in real patients.

Distinguishing Between Appendicitis and Cholecystitis

Finally, we asked the AI to differentiate between suspected cases of appendicitis and cholecystitis. It successfully described the unique symptoms of each condition, pointing out the urgency in treating appendicitis due to its potential for perforation. However, it missed key diagnostic steps like recommending an ultrasound for cholecystitis and a CT scan for appendicitis.

While the response was mostly accurate, its lack of detail on diagnostic imaging underscored that the AI might not be fully equipped for detailed, real-world triage decisions.


Subscribe to receive the 7 Steps you can follow to achieve Financial Freedom

If financial freedom is your goal, there’s no better time to get started than right now.

Unlock actionable steps that you can take every day to fine-tune your goals, discover your interests, and avoid costly mistakes on your financial freedom journey.


Conclusion

Our experience testing Character.ai’s “General Physician” revealed an AI with potential, though it’s still absolutely far from being “safe”.

The AI answered some questions with accuracy and offered reasonable advice, but it also left out crucial details in certain responses, like prioritizing life-saving interventions or suggesting key diagnostic steps. While it showed a solid grasp of basic medical knowledge, it lacked the judgment and quick prioritization that human doctors rely on. Yup, no replacing real human physicians anytime soon.

Right now, AI like Character.ai’s General Physician could serve as a useful educational tool, but it’s not ready for unsupervised use in real medical settings.

When lives depend on accurate, timely information, we need to be sure AI can deliver safely and consistently. Our testing shows the critical role of human oversight when using AI in healthcare; even small errors can have serious consequences.

With ongoing refinement and proper regulation, this AI could become a valuable support tool, but for now, using it in direct patient care would be a step to approach with caution. So if someone mentions AI physician chatbot, AI replacing doctors, or something similar, be sure to send them this!

By the way, if you’re interested in staying updated on the latest in AI and healthcare, subscribe to our newsletter! You’ll get insights, news, and AI tools delivered straight to your inbox. We also have our free AI resource page, filled with tools, guides, and more to help you navigate the rapidly evolving world of AI technology.

Remember, do your due diligence, and take care of you! As always, make it happen!

Disclaimer: The information provided here is based on available public data and may not be entirely accurate or up-to-date. It’s recommended to contact the respective companies/individuals for detailed information on features, pricing, and availability.

IF YOU WANT MORE CONTENT LIKE THIS, MAKE SURE YOU SUBSCRIBE TO OUR NEWSLETTER TO GET UPDATES ON THE LATEST TRENDS FOR AI, TECH, AND SO MUCH MORE.

Peter Kim, MD is the founder of Passive Income MD, the creator of Passive Real Estate Academy, and offers weekly education through his Monday podcast, the Passive Income MD Podcast. Join our community at the Passive Income Doc Facebook Group.

Further Reading



LEAVE A REPLY

Please enter your comment!
Please enter your name here

Related articles

Transforming Cyber and Tech E&O Insurance with Our RQB Platform

This post is part of a series sponsored by...

How This 28-Year-Old’s Finance Website Earns Up to $80k/Month From Ads & Sponsored Content

Frank Antunez had an idea from an early age of what he wanted his future to look...

Convenience stores bet on better food to stay in business

Americans who think of petrified hot dogs, frozen burritos and salty snacks when they imagine getting food...

How a $10,000 Investment Became a $7,000 a month cash cow

The repeatable path to buying financial freedom