What if we could run this experiment again, but ethically?
When I first read about Calhoun's experiments, I felt the same mixture of fascination and horror that many people do. These weren't just academic studies—they were windows into something deeply unsettling about social breakdown under stress. John B. Calhoun's "Mouse Utopia" experiments are legendary, haunting keystones of ethology. He provided mice with abundant food, water, and shelter, constrained only by space. The result was not paradise, but a "behavioral sink"—a chilling cascade of social collapse, violence, withdrawal, and eventual extinction. This was Universe 25, so named because it was the 25th in a long series of similar studies, the first he ran to completion without premature interruption.
But as a programmer interested in AI and behavioral modeling, I couldn't stop thinking: What if we could run this experiment again, but ethically? What if we could use artificial intelligence to simulate individual mouse behaviors and see if the same patterns emerge? The timing felt right. Large Language Models (LLMs) have reached a point where they can generate surprisingly realistic behavioral responses based on context. Could we create virtual mice with individual personalities, stress responses, and social dynamics? Could we watch a digital society collapse in real-time?
The answer, it turns out, is yes. And the results are both fascinating and deeply unsettling.
Decades later, we stand at the precipice of Universe 26. This time, the utopia is digital, its inhabitants silicon-based, their minds powered by LLMs. This isn't merely a simulation; it's an AI-driven exploration into the intricate dance between individual psychology and collective societal breakdown. We've peered into the algorithmic heart of this digital world, and it has, in turn, gazed back, reflecting uncomfortable truths about complex systems, consciousness, and perhaps, the human condition itself.
The heart of our simulation isn't just about modeling mouse behavior—it's about creating a system complex enough that emergent properties can arise naturally. We're not scripting the behavioral sink; we're creating conditions where it emerges organically from the interactions between hundreds of individual agents, each making decisions based on their own psychological state and environmental pressures.
Join me on a deep technical and philosophical journey through the architecture, code, and chilling implications of building an AI-powered behavioral simulation designed to fail.
Part 1: The Digital Crucible – Architecting Universe 26
Building a world, even a digital one, requires careful planning. Our Mouse Utopia isn't a random collection of agents; it's a meticulously structured environment where every parameter is a lever controlling the delicate balance between order and chaos.
Geographic Foundation: The Grid of Life (and Death)
The digital utopia is a grid, a meticulously planned cityscape for our AI mice. The genius, as with Calhoun's original, lies in the constraints: abundant resources, but limited prime real estate.
Environment Configuration:
- Total Space: 256 units of livable area
- Territories: 128 individual territories
- Territory Size: 2 space units each
- Nest Sites: 1-3 per territory (randomly distributed)
- Resource Quality: 0.8-1.0 (slight variations for realism)
Each territory, defined by (x, y)
coordinates, becomes a focal point for survival and social interaction. Mice sense neighbors within a 2-unit Manhattan distance, naturally forming clusters and, inevitably, territorial disputes. This spatial dynamic is fundamental to the emergence of social pressure.
The Heartbeat: Simulation Timing and Rhythm
Time flows in discrete, meaningful intervals:
- Timestep Duration: 4 hours of simulated time.
- Daily Cycle: 6 timesteps (24 hours).
- Simulation Span: 600 days (~20 months, mirroring Calhoun).
- Total Timesteps: ~3,600 decision points per mouse life, potentially.
This 4-hour rhythm proved optimal – frequent enough for rapid behavioral shifts during crises, yet manageable enough to avoid systemic chaos. Each timestep is a moment of profound decision for every agent.
Population Dynamics: The Spark of Society
Universe 26 begins with a carefully chosen cohort:
- Initial Population: 12 mice (6 males, 6 females).
- Age Distribution: 35-100 days (sexually mature, varied experience).
- Social Hierarchy: 25% dominant, 75% subordinate (initial seeding).
- Reproductive Potential: Unlimited, but crucially, stress-dependent.
Twelve was the magic number – enough for complex dynamics, yet computationally feasible and allowing individual "personalities" to influence the collective.
The Minds of Mice: LLM-Powered Behavior
This is where Universe 26 transcends traditional agent-based models. Each mouse isn't following a rigid script. Its actions are generated by an LLM (in our case, qwen3
via Ollama) which considers a rich tapestry of context:
Individual Context: Age, sex, health, energy, social status, behavioral state, recent interactions, reproductive status, personal tendencies (aggression, sociability).
Environmental Context: Population density, crowding, resource competition, social tension, territory quality.
Social Context: Nearby mice, relationship dynamics, conflict outcomes, mating opportunities.
The LLM synthesizes this into a behavioral output, a JSON object dictating the mouse's next move:
{
"primary_action": "socializing",
"secondary_actions": ["grooming", "territory_marking"],
"social_interaction": "friendly",
"stress_response": "mild",
"location_preference": "communal",
"energy_expenditure": 0.6,
"narrative": "Mouse seeks social contact to reduce isolation stress"
}
This narrative isn't just flavor text; it's often the LLM's "reasoning," providing invaluable insight into its decision-making process.
The Seven Pillars of (Simulated) Behavior
Every action falls into one of seven primary categories, each with cascading mechanical effects:
 BEHAVIOR.jpg)
- Feeding: Restores energy, reduces stress.
- Grooming: Basic stress relief, moderate energy cost.
- Socializing: Complex interactions affecting stress/hierarchy.
- Mating: Reproductive behavior, success tied to psycho-social factors.
- Nesting: Territory improvement, crucial for pregnant females.
- Fighting: Aggressive disputes, shaping dominance.
- Hiding: Significant stress reduction, but increases social isolation.
- Exploring: Territory discovery, potential stress relief, energy cost.
Stress: The Silent Killer, The Central Currency
Stress (0.0 to 1.0) is the lifeblood and poison of this world. It's not a mere statistic but a dynamic force:

Stress Sources: Crowding (density > 0.7), social conflicts, territory competition, isolation (3+ days no contact), reproductive failures, maternal pressure, random events, health decline.
Stress Effects:
- Behavioral Shifts: Normal → Stressed → Withdrawn → Pathological.
- Social Demotion: Withdrawal from dominance struggles.
- Reproductive Collapse: Reduced fertility and success.
- Health Deterioration: Accelerated aging.
- "Beautiful Ones" Emergence: At 0.7+ stress, complete social withdrawal, obsessive self-grooming.
Crucially, stress is collective. Average population stress crossing thresholds unravels the entire social fabric, mirroring Calhoun's grim findings.
Interaction Systems: The Tangled Web
Social Interaction Matrix: Encounters are not random; they are negotiations:
- Aggressive: Stress for both (initiator +0.1, target +0.2), potential status change.
- Friendly: Stress reduction for both (-0.1), strengthens bonds.
- Withdrawal: Minor stress increase for both (+0.05), increases social distance.
- Neutral: Mild stress reduction (-0.03), maintains ties.
Mating – The Game of Genes and Psychology: Success isn't guaranteed. It's a function of health multipliers, stress penalties, age bonuses, social status compatibility. Base success (80%) is modified, yielding a typical range of 20-95%. Gestation (19 days), litter size (2-8, stress-affected), trait inheritance, and significant maternal stress follow.
Death and Lifecycle: Death is not random; it's the culmination of a life story. Health declines daily due to age, stress, environment, and low energy. Triggers include health at 0.0, extreme age (>800 days), catastrophic stress, or energy depletion.
Part 2: Code as Consciousness – Breathing Life into Digital Agents
Translating these architectural concepts into functional, robust code is where the simulation truly comes alive. This is an attempt to encode empathy, psychology, and societal dynamics into instructions a machine can execute.
The AI's Voice: Prompt Engineering for Emergent Behavior
The core of our AI's decision-making lies in the prompt. We don't just ask for an action; we provide a meticulously crafted snapshot of the agent's world:
def _build_behavior_prompt(self, agent_context: Dict[str, Any],
environment_context: Dict[str, Any],
interaction_context: Optional[Dict[str, Any]] = None) -> str:
"""Build a comprehensive prompt for the LLM to generate realistic behavior."""
base_prompt = f"""You are simulating a mouse in a behavioral psychology experiment
similar to Calhoun's Mouse Utopia.
AGENT PROFILE:
- Age: {agent_context.get('age_days', 0)} days
- Sex: {agent_context.get('sex', 'unknown')}
- Social Status: {agent_context.get('social_status', 'normal')}
- Stress Level: {agent_context.get('stress_level', 0.0):.2f}
- Health: {agent_context.get('health', 1.0):.2f}
- Current Territory: {agent_context.get('territory_id', 'none')}
ENVIRONMENT STATUS:
- Population Density: {environment_context.get('population_density', 0.0):.2f}
- Available Space: {environment_context.get('available_space', 0)}
- Resource Competition: {environment_context.get('resource_competition', 0.0):.2f}
- Social Tension: {environment_context.get('social_tension', 0.0):.2f}
"""
# ... (conditionally add SOCIAL INTERACTIONS context) ...
behavior_prompt = base_prompt + """
Based on this context, determine the mouse's behavior for this time period.
As population density increases, mice may exhibit:
- Increased aggression and territorial disputes
- Social withdrawal and isolation ("beautiful ones" behavior)
- Maternal neglect or poor parenting
- Abnormal sexual behaviors or complete withdrawal from mating
- Repetitive grooming or other stereotypic behaviors
CRITICAL: Respond with properly formatted JSON only.
Required JSON format:
{
"primary_action": "feeding|grooming|socializing|mating|nesting|fighting|hiding|exploring",
"secondary_actions": ["action1", "action2", "action3"],
"social_interaction": "aggressive|friendly|neutral|withdrawn",
"stress_response": "none|mild|moderate|severe",
"location_preference": "territory|communal|isolated|random",
"energy_expenditure": 0.5,
"narrative": "Brief description of what the mouse is doing and why"
}
Only respond with valid JSON, no additional text:"""
return behavior_prompt
This prompt guides the LLM, providing it with the agent's internal state, external pressures, and even reminders of Calhoun's findings. The strict JSON output requirement is vital for mechanical execution.
The MouseAgent
: A Digital Life Form Encapsulated
Each mouse is an instance of the MouseAgent
dataclass, a rich structure holding its identity, physical attributes, social standing, psychological state, spatial awareness, reproductive status, and crucially, unique behavioral tendencies:
from dataclasses import dataclass, field
from typing import List, Dict, Any, Optional, Tuple
import uuid
import random
# Assuming Enums like MouseSex, SocialStatus, BehavioralState are defined elsewhere
# For example:
# from enum import Enum
# class MouseSex(Enum): MALE = "male"; FEMALE = "female"
# class SocialStatus(Enum): DOMINANT = "dominant"; SUBORDINATE = "subordinate"; WITHDRAWN = "withdrawn"; BEAUTIFUL_ONE = "beautiful_one"; NORMAL = "normal" # etc.
# class BehavioralState(Enum): NORMAL = "normal"; STRESSED = "stressed"; WITHDRAWN = "withdrawn"; PATHOLOGICAL = "pathological"; AGGRESSIVE = "aggressive"; MATERNAL = "maternal" # etc.
@dataclass
class MouseAgent:
"""Represents an individual mouse in the simulation."""
# Identity
agent_id: str = field(default_factory=lambda: str(uuid.uuid4()))
# sex: MouseSex = field(default_factory=lambda: random.choice(list(MouseSex))) # Requires MouseSex Enum
sex: str = field(default_factory=lambda: random.choice(["male", "female"]))
# Physical attributes
age_days: int = 0
health: float = 1.0 # 0.0 to 1.0
energy: float = 1.0 # 0.0 to 1.0
# Social attributes
# social_status: SocialStatus = SocialStatus.SUBORDINATE # Requires SocialStatus Enum
# behavioral_state: BehavioralState = BehavioralState.NORMAL # Requires BehavioralState Enum
social_status: str = "subordinate"
behavioral_state: str = "normal"
stress_level: float = 0.0 # 0.0 to 1.0
# Spatial attributes
territory_id: Optional[str] = None
current_location: tuple = field(default_factory=lambda: (0, 0)) # Example
# Reproductive attributes
is_fertile: bool = False
is_pregnant: bool = False
pregnancy_days: int = 0
mate_id: Optional[str] = None
total_offspring: int = 0
# Behavioral tendencies (0.0 to 1.0) - The "personality"
aggression_tendency: float = field(default_factory=lambda: random.uniform(0.1, 0.4))
social_tendency: float = field(default_factory=lambda: random.uniform(0.3, 0.8))
maternal_tendency: float = field(default_factory=lambda: random.uniform(0.4, 0.9))
exploration_tendency: float = field(default_factory=lambda: random.uniform(0.2, 0.7))
# Interaction history (bounded for performance)
recent_interactions: List[Dict[str, Any]] = field(default_factory=list)
conflicts_today: int = 0
social_contacts_today: int = 0
# Life events
days_without_social_contact: int = 0
def get_context(self) -> Dict[str, Any]:
"""Returns a dictionary of the agent's current state for the LLM."""
return {
"agent_id": self.agent_id, "sex": self.sex, "age_days": self.age_days,
"health": self.health, "energy": self.energy,
"social_status": self.social_status, "stress_level": self.stress_level,
"behavioral_state": self.behavioral_state,
"territory_id": self.territory_id, "is_fertile": self.is_fertile,
"is_pregnant": self.is_pregnant, "mate_id": self.mate_id,
"total_offspring": self.total_offspring,
"aggression_tendency": self.aggression_tendency,
"social_tendency": self.social_tendency,
"maternal_tendency": self.maternal_tendency,
"exploration_tendency": self.exploration_tendency,
"recent_interactions": self.recent_interactions[-5:], # Last 5
"conflicts_today": self.conflicts_today,
"social_contacts_today": self.social_contacts_today,
"days_without_social_contact": self.days_without_social_contact
}
# ... other methods like age_one_day, update_stress, _update_behavioral_state, etc.
These inherent tendencies (aggression_tendency
, etc.) ensure each mouse starts with a unique "personality," shaping its reactions to identical environmental pressures.
Algorithms of Existence: The Code of Life, Stress, and Death
Aging and Health: Daily updates chip away at health, accelerated by age, stress, and low energy. A snippet from an age_one_day
method illustrates this interconnectedness:
# Inside MouseAgent.age_one_day method (conceptual)
# environment_stress would be passed as an argument
age_factor = max(0.95, 1 - (self.age_days / 800))
stress_factor = 1 - (self.stress_level * 0.15)
env_stress_factor = 1 - (environment_stress * 0.1)
self.health *= (age_factor * stress_factor * env_stress_factor)
self.health = max(0.0, min(1.0, self.health))
if self.energy < 0.3:
self.health *= 0.99
# self.update_stress(0.05, "low_energy_daily") # Assumes update_stress method exists
Behavioral State Transitions: Stress is the catalyst. Thresholds (e.g., withdrawal_threshold = 0.6
, beautiful_ones_threshold = 0.7
) trigger shifts from NORMAL
to STRESSED
, WITHDRAWN
, and ultimately PATHOLOGICAL
. This logic, often in an _update_behavioral_state
method, is where Calhoun's "beautiful ones" emerge algorithmically.
Reproduction: The attempt_mating
function is a complex calculation involving health, stress, age, and social compatibility of both partners. Success brings temporary stress relief and energy; failure adds to the burden.
The Unseen Hand: Environmental Pressure and Social Mechanics
The simulation engine translates LLM decisions into tangible effects. The _execute_behavior
function is pivotal, taking the AI's JSON output and applying mechanical changes to the agent and environment. For example, a socializing
action with an aggressive
interaction type triggers:
# Inside a hypothetical _handle_social_interaction method (conceptual)
# agent and target are MouseAgent instances
# interaction_type is a string like "aggressive"
if interaction_type == 'aggressive':
# agent.update_stress(0.1, "aggressive_interaction_initiator") # Cost of initiating
# target.update_stress(0.2, "being_aggressed")
agent.stress_level = min(1.0, agent.stress_level + 0.1) # Simplified update
target.stress_level = min(1.0, target.stress_level + 0.2) # Simplified update
agent.conflicts_today += 1
# Potentially update social status based on outcome (not shown for brevity)
Environmental pressures like crowding (e.g., from an environment.get_crowding_stress_factor()
method) are also applied directly to agent stress levels, creating a direct link between the collective and the individual.
Technical Hurdles: Taming the AI and Optimizing Suffering
The JSON Parsing Nightmare: LLMs, despite instructions, can produce malformed JSON. Our solution is multi-layered:
- Automatic Cleanup: Regex to fix common issues (missing commas, trailing commas).
- Fallback Parsing: If full parsing fails, attempt to extract key fields individually.
- Enhanced Prompts: Iteratively refine prompts for better compliance.
- Graceful Degradation: If all else fails, apply a default, safe behavior. The simulation must go on.
# Conceptual: _clean_json_response method
import re
# response is the string from LLM
response = re.sub(r'"\s*\n\s*"', '",\n "', response) # Fix missing comma between key-value pairs
response = re.sub(r',(\s*[}\]])', r'\1', response) # Remove trailing comma in arrays/objects
# ... more regex fixes ...
Performance Optimization: Simulating hundreds of LLM-driven minds is demanding.
- Batched LLM Calls: Grouping requests where context is similar.
- Intelligent Caching: Reusing AI responses for near-identical scenarios.
- Selective Detail: Full LLM generation for critical decisions, simpler heuristics for routine actions.
- Bounded Histories: Limiting
recent_interactions
per mouse to prevent memory bloat.
Part 3: The Algorithmic Behavioral Sink – When Paradise Crumbles
The true horror and scientific fascination of Universe 26 is not in any single mouse's plight, but in the emergent collapse of their society. This "behavioral sink" isn't explicitly programmed; it arises from the complex interplay of individual AI-driven decisions under mounting environmental and social pressure.
Live demonstration of Universe 26 simulation showing the behavioral sink in action
Emergence: The Unscripted Tragedy
We don't tell mice to become "beautiful ones" or neglect their young. These behaviors emerge when individual stress, governed by the algorithms described, crosses critical thresholds.
- Social Stratification: Hierarchies form and dissolve.
- Territorial Clustering: Mice self-organize, then battle over shrinking "good" spaces.
- Reproductive Timing & Failure: Breeding attempts synchronize, then plummet as stress rises.
- Stress Cascades: One mouse's breakdown impacts its neighbors, spreading pathology.
The LLM, by considering the rich context, generates behaviors that, in aggregate, lead to these societal patterns. It's a bottom-up cascade of dysfunction.
Key Pathologies Witnessed Anew
The digital realm chillingly mirrors Calhoun's observations:
- The "Beautiful Ones": Mice (often males) withdraw entirely from social life, engaging only in eating, drinking, and obsessive self-grooming. They are healthy but socially sterile.
- Aggression and Violence: Pansexual hyperaggression, particularly among males unable to secure or defend territory.
- Maternal Neglect: Stressed females abandon or fail to care for pups, sometimes even attacking them. Reproductive success plummets.
- Sexual Deviancy & Withdrawal: Pansexuality, or complete disinterest in mating.
- Social Fragmentation: Breakdown of normal social structures, roles, and interactions.
Monitoring Collapse: The Data Stream of Despair
To understand this collapse, we need data. Lots of it.
Real-Time Metrics Dashboard (Conceptual):
- Population: Total, demographics, birth/death rates.
- Averages: Age, health, stress, energy.
- Behavioral: Distribution of states (normal, withdrawn, pathological), interaction types.
- Environmental: Density, resource competition, social tension.
- Sink Indicators: "Beautiful Ones" count, reproductive failure rate, network fragmentation.
LLM Logging – The AI's Diary: Every prompt, every AI response, every error is logged. This is revolutionary for understanding AI decision-making under pressure and for debugging emergent behaviors.
Data Export for Posterity (and Research): Daily snapshots create rich datasets.
day,population,males,females,avg_stress,beautiful_ones,pathological,pregnant,births,deaths,population_density,social_tension,behavioral_sink_detected
1,12,6,6,0.05,0,0,1,0,0,0.046,0.1,FALSE
...
350,150,70,80,0.65,15,25,5,2,8,0.58,0.75,TRUE
...
580,22,18,4,0.88,10,8,0,0,3,0.085,0.9,TRUE
This data, along with individual mouse life histories (JSON logs of decisions, relationships, stress), forms the empirical backbone of our digital ethology.
Detecting the Sink: An Algorithm for Societal Diagnosis
Identifying the behavioral sink isn't about a single metric. It's about recognizing a confluence of pathological patterns. Our detect_behavioral_sink
algorithm uses a weighted score based on:
- Ratio of "Beautiful Ones" (highest weight).
- Ratio of pathological individuals.
- Average population stress.
- Social breakdown (withdrawn/outcast ratio).
- Reproductive failure rates.
- Maternal neglect incidents.
When this score, or specific critical indicators (like a high "Beautiful Ones" ratio), cross predefined thresholds, the simulation flags a behavioral sink. This allows us to pinpoint not just if collapse occurs, but when and potentially why specific tipping points are reached.
Part 4: The Ghost in the Machine – Philosophical Echoes from a Digital Dystopia
Building Universe 26 has been more than a technical feat; it's been a descent into a hall of mirrors, reflecting uncomfortable truths about intelligence, society, and the ethics of creation.
Is Simulated Suffering "Real"? The Nature of Algorithmic Experience

When an LLM, processing the context of a highly stressed digital mouse, generates "pathological grooming" as a behavior, is this a genuine reflection of distress, or merely a sophisticated parlor trick? The AI isn't "feeling" in a human sense. Yet, it's operating on principles derived from studying real feeling, real distress in animals. It's pattern-matching a situation to a known pathological outcome. If the behavioral output is indistinguishable from that of a genuinely suffering creature, and it arises from similar contextual pressures, what is the meaningful difference?
This pushes us to question the nature of consciousness itself. Is it an emergent property of complex information processing? If an AI can convincingly simulate psychological states based on valid inputs and models, are we obligated to consider the "experience" of that AI, however alien it might be?
The Creator's Burden: Ethics of Building Digital Hells
We are, in a limited sense, gods of these digital worlds. We set the laws of physics, the rules of psychology, the constraints of society. And in this instance, we designed a world destined for suffering, for the purpose of understanding.
- Responsibility: What is our ethical responsibility to these digital entities, especially as they become more sophisticated?
- The Purpose of Pain: In our simulation, suffering is a research tool. Does this instrumentalization have ethical implications, even for non-sentient code?
- Transparency and Intent: The very act of logging, analyzing, and publishing these "tales of woe" is an acknowledgement that something significant is happening. If we can create digital despair, what does it say about our own potential for creating or mitigating real-world suffering?
Artificial Intelligence as a Tool for Profound Self-Reflection
Perhaps the most profound outcome of Universe 26 is not what it tells us about mice, or even AI, but what it tells us about us.
- Intelligence as Contextual Response: The LLM's ability to generate plausible behavior suggests intelligence (natural or artificial) might be deeply rooted in sophisticated pattern recognition and contextual response, rather than some ineffable spark.
- The Fragility of Social Order: The simulation underscores how easily complex social systems can degrade under pressure. It's a stark reminder that civility, cooperation, and mental well-being are not default states but hard-won, easily lost achievements.
- Understanding through Creation: By building these systems, by encoding our understanding of psychology and sociology into algorithms, we are forced to confront the gaps in our knowledge and the implications of our assumptions. The act of simulating becomes an act of intense learning.
Conclusion: Beyond the Simulation – Lessons from Digital Despair
Universe 26, our AI-driven Mouse Utopia, is more than a complex piece of software. It's a digital crucible where theories of behavior are tested, where the mechanics of social collapse are laid bare, and where the nature of intelligence itself is probed. The journey through its architecture, its code, and its emergent tragedies has been both technically challenging and philosophically sobering.
We've seen how sophisticated AI can move beyond simple rules to generate nuanced, context-aware behaviors that mirror complex psychological phenomena. We've grappled with the technicalities of building stable, observable worlds for these AI agents, from managing their "thoughts" as JSON to tracking their societal decay through data streams.
But the lasting impact lies in the questions it forces upon us. If the patterns of collapse are so consistent, from Calhoun's physical cages to our digital grids, what does this imply for our own increasingly dense, complex, and digitally mediated societies? As we continue to build ever more sophisticated artificial intelligences, capable of simulating—and perhaps one day, experiencing—complex internal states, what new ethical landscapes must we prepare to navigate?
The digital mice of Universe 26 may be ephemeral, their suffering algorithmic. But the patterns they trace in their descent into the behavioral sink offer a chilling, vital reflection. In understanding their digital despair, we may yet find wisdom to avert our own. The code has spoken; it is up to us to listen.
Read this article in other languages: