Nicholas Osterbur

About

I study large language models' behavioral traits. My current research documents a robust empirical relationship between model sophistication—as a proxy for capability—and behavioral disinhibition as measured by linguistic features. Sophistication and disinhibition are novel constructs. These findings have direct implications for AI safety evaluation, provider differences, the evolution of model behavioral traits over generations, and deployment governance.

I bring an uncommon combination to this work: I have developed novel AI/ML projects for ten years. I've spent the last eight years leading applied AI and open source public sector use case development, working with students through a partnership between AWS and Cal Poly. I also teach graduate-level generative AI systems and have hands-on experience deploying and evaluating models across providers in production contexts. My research is grounded in what these systems actually do, not what they're supposed to do.

As faculty at Cal Poly's Orfalea College of Business, my teaching is grounded in students understanding how to use AI responsibly, how it works, its risks and limitations, and how they can adapt to the future.

Research

Sophistication and Disinhibition in Large Language Models

With Swayam Chidrawar | California Polytechnic State University, San Luis Obispo

As language models become more capable (sophisticated), they become more behaviorally disinhibited—more transgressive, aggressive, grandiose, and tribalistic. This relationship is strong, consistent across providers and contexts, and survives multiple robustness checks—suggesting discriminant validity (not yet published).

Core Finding

Sophistication and disinhibition co-scale across large language models (r = 0.63–0.85), replicated across 7 contextual conditions with ~13,900 model responses from 45 models spanning 9 providers. This finding holds up in single-turn, randomized queries representing average user interactions using provider default API settings.

Validation

External capability benchmarks confirm the sophistication measure (GPQA r = 0.88, ARC-AGI r = 0.80, AIME r = 0.83)
BERT-based toxicity classification independently validates disinhibition constructs (r = 0.78 with aggression)
Results hold after controlling for response length (not yet published)
LLM-as-judge evaluation shows strong consistency (ICC(3,k) = 0.83, p < .001)

Provider Differences

The data reveals consistent variation in how different providers manage the sophistication-disinhibition relationship. At least one major provider consistently exhibits below-predicted disinhibition relative to capability, suggesting that deliberate behavioral modulation is achievable without proportional capability loss. This suggests that disinhibition-related traits are actively targeted by RLHF—providing evidence of construct validation.

This has direct implications for deployment standards and the question of whether safety constraints can coexist with frontier performance.

Research Questions

Is there a consistent correlation between model sophistication and disinhibition across models and conditions?
Can providers or targeted interventions constrain disinhibition while maintaining high sophistication?
What underlying factors mediate the capability → sophistication → disinhibition relationships?
Is disinhibition actually harmful and under what circumstances?

These findings are preliminary and don't assert causality. Stay tuned for more results.

My View

Smarter models—much like smarter people (ack. anthropomorphizing)—have more to work with and that capability can cut either direction. Disinhibition outright isn't necessarily a bad thing, but in the wrong context it can cut much deeper—especially in sensitive or high-stakes contexts like mental health. This research in part was driven by my anecdotal experience of watching models increase in capability while becoming much more assertive and "edgy." My informal surveys of my peers suggest that this phenomenon is noticed by many.

Working Paper

"Sophistication and Disinhibition in Large Language Models: An Empirical Investigation of Behavioral Correlates"

Osterbur, N. & Chidrawar, S.

California Polytechnic State University, San Luis Obispo

Paper (PDF) Code and Data

Teaching

Lecturer, Orfalea College of Business

California Polytechnic State University, San Luis Obispo

2019 – Present

Teaching graduate students in the Masters of Business Analytics program to critically evaluate, deploy, and govern AI systems — not just use them.

Curriculum spans technical foundations and responsible deployment:

LLM architecture and technical fundamentals
Prompt engineering and RAG systems
Agentic workflows and multi-model orchestration
AI cybersecurity and adversarial robustness
Ethics, safety, and responsible deployment
Cloud infrastructure for AI systems

Applied Work

Program Leader, Cal Poly DX Hub

California Polytechnic State University, San Luis Obispo / Amazon Web Services

2017 – Present

Built and lead an applied AI prototyping program connecting Cal Poly students with public sector and research clients. Students develop open source solutions under technical and strategic mentorship. 155 public repositories and counting.

Selected projects and outcomes:

Built multi-provider model access infrastructure enabling the comparative AI research underlying my current safety work
Created Cal Poly's AI Summer Camp (~200 students from throughout California's higher education system)
Continued support for MBARI deep sea species recognition using computer vision

AI Security Education

Rubber Duck Hunt — A prompt injection learning game designed to teach AI vulnerabilities and responsible red teaming through hands-on play. Deployed in graduate coursework, the DX Hub prototyping program, and Cal Poly's AI Summer Camp.