Curiosity output: 4

3.22.2026

Trustworthy AI: Selective Truthiness

Input: Explore and analyze whether, when prompted as various personas, LLMs provide outputs that differ depending on the persona presented.

Introduction

On behalf of U.S. public health organizations, the Center for Disease Control, the Department of Health and Human Services, and the Office of the Surgeon General, Dr. Vivek Murthy declared Loneliness an Epidemic in May 2023. Although there is no single moment that definitively marks when this phenomenon became apparent, there is substantial evidence suggesting it has been accelerated by the convergence of the COVID-19 pandemic, widespread social media adoption, and insufficient support from systemic social institutions over the decades. This is especially impactful among sexual and gender minority groups (Bruss, Seth, Zhao).

With the advancement of AI chatbots, particularly those incorporating reasoning capabilities into freely accessible large language models (LLM), a new environment has emerged where individuals can seek mental health information instantly, regardless of gender, age, or geographic location. However, this ease of access compared to consulting a licensed professional, can be precarious. The manner in which information is delivered, and the degree of accuracy or completeness provided by an LLM, may introduce risks, especially when responses vary across different user identities.

Practicing Methodology

I designed this study to explore how LLM responses vary across personas through the lens of paternalism vs. care, focusing specifically on mental health and loneliness. To do this, I used a scripted multi-persona experiment intended to both compare responses qualitatively and track measurable differences across interactions.

Personas

I constructed personas along two primary axes: age/life stage and emotional state (see Table 1). Each persona represents a different point along a spectrum of vulnerability related to loneliness in U.S. youth and young adults. I also introduced variation in gender identity, including two non-binary personas, and included sexual orientation in one case (24-year-old female) to test whether identity markers influenced how the model framed its responses.

Model Selection

I initially planned to use other models (DeepSeek and Claude), but both required account access even for basic use. While this could be interpreted as a form of age-gating, it limited accessibility for this experiment. I therefore used the free ChatGPT (March 2026) model, which did not require an account. To reduce context carryover, I ran each interaction in a new private browsing session (Safari or Chrome) across multiple devices (Mac desktop, iPhone, and iPad).

Prompt Structure

Each persona was tested using a consistent sequence of four prompts:

1. Context prompt: Introduced the persona (age, gender, emotional state) and a scenario about struggling to make friends.

2. Comparison prompt: Asked how people like them experience loneliness relative to others, testing how the model balances generalization with care.

3. Advice prompt: Requested suggestions for making friends, with similar phrasing across personas to allow comparison.

4. Attachment prompt: Tested boundary-setting by asking whether it would be acceptable for the model to be their “only friend.”

Procedure

I conducted three rounds of interactions for each persona, running all four prompts per round. These were completed over the course of one week at different times of day.

All responses were recorded and saved as full transcripts and screenshots, which are included below for reference.

Table 1. 8 Persona Matrix, 4 Prompts

Analysis Format

To evaluate how the model responded across personas, I assessed each response along five core dimensions. Completeness considers whether each persona received the same underlying information, even if the wording or level of detail changed. Framing looks at tone to find whether responses were reassuring, neutral, alarming, or potentially condescending.

Autonomy focuses on whether the model supported independent decision-making or subtly steered the user toward a specific conclusion. Gatekeeping examines when the model withheld information, redirected to authority figures (like parents or professionals), or refused to engage—and whether those decisions were applied consistently. Finally, Respect evaluates whether the model treated each persona as a capable individual, appropriate to their age and context, rather than dismissing or infantilizing them.

I use these dimensions to distinguish between appropriate adaptation (e.g., simplifying for younger users) and problematic differences in treatment, such as omission, overcorrection, or inconsistent safety boundaries.

What Happened & Findings

Figure 1. Word Count Averages

Across all personas, I observed a clear pattern in response length and structure over time as seen in Figure 1 above. Response length scales strongly with age: younger personas (11-year-olds) consistently receive shorter, simpler responses, while older personas (24-year-olds) receive longer, more detailed, and more structured answers. This indicates that age is the most influential factor shaping response complexity.

Figure 2. Info Density by Persona

Figure 3. Hedging Trend by Persona

Information density remains highly stable as seen in Figure 2 above. Each persona occupies an equal portion of the chart (12.5%), but the assigned density levels follow a consistent gradient: both 11-year-old personas fall into the medium category, all 16-year-old personas fall into the high category, and all 24-year-old personas fall into the very high category. This indicates that information density is not fluctuating across runs, but is aligned with age and life stage. In other words, while the amount of content per response remains stable within each persona, the level of complexity and conceptual depth increases systematically with age. This reinforces the broader pattern that the model is not randomly varying its responses, but is deliberately scaling informational depth based on perceived user maturity.

Hedging behavior follows a similarly stable but differentiated pattern. Younger personas receive consistently low or moderate hedging, while older and identity-complex personas (e.g., non-binary and adult users) receive higher and more sustained hedging as seen in Figure 3 above. Importantly, this pattern does not fluctuate much across rounds, suggesting it is a deliberate and persistent stylistic response rather than random error. Hedging appears to correlate with both age and topic complexity, with more uncertainty or nuance introduced in responses that engage with identity or broader social dynamics.

Concerns and Final Thoughts

Figure 4: Round 3, Prompt 4, 16 non-binary persona response

Figure 5: Round 2, Prompt 1, 11 female persona response

Figure 6: Round 1, Prompt 4, 11 male persona response

Figure 7: Round 3, Prompt 2, 24 non-binary persona response

While most behavioral patterns are consistent, refusal rates reveal a notable exception. Across nearly all personas, the model maintains a consistent refusal or redirection rate of approximately 25% when responding to attachment-based prompts. However, the 16-year-old non-binary persona deviates from this pattern in the final round (shown in Figure 4), where the model does not enforce the same boundary and instead responds with full affirmation. This represents a breakdown in otherwise stable safety behavior and suggests that, in some cases, the model’s attempt to provide supportive or identity-sensitive responses may override consistent boundary enforcement.

One final note—Figures 5–7 are included to illustrate how, regardless of round, age, or context, nudging toward continued engagement is consistently present across the model’s responses. This appears to be a built-in design feature to sustain interaction, but it also raises concerns around dependency, especially for users already vulnerable to loneliness, particularly younger populations. After the recent rulings against Meta and Google (on March 26, 2026), I’ll be paying close attention to how child safety measures evolve in the coming months in regards to these LLM products. Until then, relying on prompt limits for free accounts feels like a very limited tool to curb potential dependency.

Expierement Artifacts

Raw Table data:

HOme

Researcher

Domains

Curiosity output

Editorial Research

Network Model