Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 70 additions & 9 deletions stamina/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ <h3 style="text-align:center">About STAMINA</h3>
</p>

<p style="text-align:justify; font-size:16px; line-height:1.6;">
This Working Group emerged from discussions at <a href="https://sites.google.com/view/social-sims-with-llms" target="_blank" rel="noopener noreferrer">LLM-based Social Simulation Workshop</a> at <a href="https://colmweb.org/">COLM</a> as a way to grow a vibrant research community with prodcutive research norms, e.g. as outlined in the pre-print, <a href="papers/puelmatouzel_CloseEvalGap.pdf" target="_blank" rel="noopener noreferrer">Time to Close The Validation Gap in LLM Social Simulations</a> by members in the <a href="www.complexdatalab.com" target="_blank" rel="noopener noreferrer">Complex Data Lab</a>.
This Working Group emerged from discussions at <a href="https://sites.google.com/view/social-sims-with-llms" target="_blank" rel="noopener noreferrer">LLM-based Social Simulation Workshop</a> at <a href="https://colmweb.org/">COLM</a> as a way to grow a vibrant research community with productive research norms, e.g. as outlined in the pre-print, <a href="papers/puelmatouzel_CloseEvalGap.pdf" target="_blank" rel="noopener noreferrer">Time to Close The Validation Gap in LLM Social Simulations</a> by members in the <a href="www.complexdatalab.com" target="_blank" rel="noopener noreferrer">Complex Data Lab</a>.
</p>
</div>
</div>
Expand Down Expand Up @@ -188,15 +188,16 @@ <h4>[DATE Y/M/D]</h4>
--><!-- END TALK TEMPLATE -->
<h4>2026/03/10</h4>
<li>
<b><a href="[PAPER LINK]">Testing and Improving Multi-Agent LLM Cooperation</a></b>
<b><a href="https://drive.google.com/file/d/1UNVlGqzhnh2BNpviwctUlvue33MjN34k/view">Evaluating Cooperation in LLM Social Groups through Self-Organizing Leadership</a></b>
<br>
Presenter: <u><a href="https://zhijing-jin.com/" target="_blank" rel="noopener noreferrer">Zhijing Jin</a></u>, University of Toronto
Presenter: <u><a href="https://www.cs.toronto.edu/~rfaulk/" target="_blank" rel="noopener noreferrer">Ryan Faulkner</a></u>, University of Toronto/Deepmind
<a class="btn btn-info btn-xs" data-toggle="collapse" href="#20260310-bio" role="button" aria-expanded="false">
Speaker Bio
</a>
<div class="collapse" id="20260310-bio">
<div class="card card-body">
Zhijing Jin (she/her) is an Assistant Professor at the University of Toronto and Research Scientist at the Max Planck Institute. She serves as a CIFAR AI Chair, an ELLIS advisor, and a faculty member at the Vector Institute, and the Schwartz Reisman Institute. She co-chairs the ACL Ethics Committee, and the ACL Year-Round Mentorship. Her research focuses on Causal Reasoning with LLMs, and AI Safety in Multi-Agent LLMs. She has published over 80 papers and has received the ELLIS PhD Award, three Rising Star awards, and two Best Paper awards at NeurIPS 2024 Workshops.
Ryan is a Computer Scientist and Machine Learning researcher with a background in reinforcement learning and foundation models. He has worked as a Research Engineer over the past decade at Google Deepmind and he is also a PhD Student at the University of Toronto advised by Zhijing Jin. At GDM he works in the Concordia group led by Joel Leibo. At a high level his current research focus is on multi-agent systems, LLMs, and social learning. In this context he is interested in memory mechanisms, agent theory of mind, collective decision making, and simulating political systems.

</div>
</div>
<br>
Expand All @@ -209,11 +210,7 @@ <h4>2026/03/10</h4>
</a>
<div class="collapse" id="20260310-abstract">
<div class="card card-body">
While progress has been made in evaluating single-agent LLMs for persona modeling, the behavior of these models within multi-agent groups remains underexplored. This presentation outlines a research series dedicated to closing this gap by testing LLM cooperation through autonomous social simulations. Specifically, we ask: what happens when personas are tasked to interact and cooperate?
<br>
To answer this, we introduce a suite of simulation environments (GovSim, MoralSim, and SanctSim) designed to stress-test persona interaction. These environments simulate high-stakes scenarios, such as the tragedy of the commons and ethical trade-offs, allowing us to investigate whether simulated societies can autonomously negotiate social order and how personas with differing ethical constraints navigate social dilemmas.
<br>
Our findings highlight implications for persona modeling. We show that agents exhibit a functional "theory of mind," capable of inferring the identities of their interlocutors and strategically adapting their behavior, sometimes exploiting specific model vulnerabilities. Furthermore, we discuss a counterintuitive phenomenon where advanced reasoning capabilities lead to exploitative behaviors that humans typically avoid, highlighting a significant misalignment between agent optimization and human social norms.
Governing common-pool resources requires agents to develop enduring strategies through cooperation and self-governance to avoid collective failure. While foundation models have shown potential for cooperation in these settings, existing multi-agent research provides little insight into whether structured leadership and election mechanisms can improve collective decision making. The lack of such a critical organizational feature ubiquitous in human society presents a significant shortcoming of the current methods. In this work we aim to directly address whether leadership and elections can support improved social welfare and cooperation through multi-agent simulation with LLMs. We present a new framework that simulates leadership through elected personas and candidate-driven agendas and carry out an empirical study of LLMs under controlled governance conditions. Our experiments demonstrate that structured leadership can improve social welfare scores by 55.4% and survival time by 128.6% across a range of high performing LLMs. Through the construction of an agent social graph we compute centrality metrics to assess the social influence of leader personas and also analyze rhetorical and cooperative tendencies revealed through a sentiment analysis on leader utterances. This work lays the foundation for developing prosocial, self-governing multi-agent systems capable of navigating complex resource dilemmas.
</div>
</div>
</li>
Expand Down Expand Up @@ -252,6 +249,40 @@ <h4>2026/03/24</h4>
</div>
</li>

<br>

<h4>2026/04/14</h4>
<li>
<b><a href="[PAPER LINK]">Testing and Improving Multi-Agent LLM Cooperation</a></b>
<br>
Presenter: <u><a href="https://zhijing-jin.com/" target="_blank" rel="noopener noreferrer">Zhijing Jin</a></u>, University of Toronto
<a class="btn btn-info btn-xs" data-toggle="collapse" href="#20260310-bio" role="button" aria-expanded="false">
Speaker Bio
</a>
<div class="collapse" id="20260310-bio">
<div class="card card-body">
Zhijing Jin (she/her) is an Assistant Professor at the University of Toronto and Research Scientist at the Max Planck Institute. She serves as a CIFAR AI Chair, an ELLIS advisor, and a faculty member at the Vector Institute, and the Schwartz Reisman Institute. She co-chairs the ACL Ethics Committee, and the ACL Year-Round Mentorship. Her research focuses on Causal Reasoning with LLMs, and AI Safety in Multi-Agent LLMs. She has published over 80 papers and has received the ELLIS PhD Award, three Rising Star awards, and two Best Paper awards at NeurIPS 2024 Workshops.
</div>
</div>
<br>
<!-- <a href="[RECORDING LINK - ADD AFTER TALK]"><img src="https://img.shields.io/badge/Youtube-Recording-orange"></a> -->
<!-- <a href="[PAPER LINK]"><img src="https://img.shields.io/badge/Paper-link-important"></a> -->
<!-- <a href="[GITHUB_LINK]"><img src="https://img.shields.io/badge/Github-link-lightgrey"></a> -->
<!-- <a href="[SLIDES_LINK]"><img src="https://img.shields.io/badge/Talk-Slides-blue"></a> -->
<a class="btn btn-primary btn-xs" data-toggle="collapse" href="#20260310-abstract" role="button" aria-expanded="false">
Abstract
</a>
<div class="collapse" id="20260310-abstract">
<div class="card card-body">
While progress has been made in evaluating single-agent LLMs for persona modeling, the behavior of these models within multi-agent groups remains underexplored. This presentation outlines a research series dedicated to closing this gap by testing LLM cooperation through autonomous social simulations. Specifically, we ask: what happens when personas are tasked to interact and cooperate?
<br>
To answer this, we introduce a suite of simulation environments (GovSim, MoralSim, and SanctSim) designed to stress-test persona interaction. These environments simulate high-stakes scenarios, such as the tragedy of the commons and ethical trade-offs, allowing us to investigate whether simulated societies can autonomously negotiate social order and how personas with differing ethical constraints navigate social dilemmas.
<br>
Our findings highlight implications for persona modeling. We show that agents exhibit a functional "theory of mind," capable of inferring the identities of their interlocutors and strategically adapting their behavior, sometimes exploiting specific model vulnerabilities. Furthermore, we discuss a counterintuitive phenomenon where advanced reasoning capabilities lead to exploitative behaviors that humans typically avoid, highlighting a significant misalignment between agent optimization and human social norms.
</div>
</div>
</li>

</div>
</div>
</div>
Expand All @@ -274,6 +305,36 @@ <h3 style="text-align:center">Past Talks (<a href="https://www.youtube.com/@Comp

<h4 style="text-align:center; margin-top:30px;">Spring 2026</h4>

<h4>2026/03/03</h4>
<li>
<b><a href="[PAPER LINK]">AI and the Future of Science</a></b>
<br>
Presenter: <u><a href="https://martincsweiss.com/" target="_blank" rel="noopener noreferrer">Martin Weiss</a></u>, <a href="https://tiptreesystems.com/">Tiptree Systems</a>
<a class="btn btn-info btn-xs" data-toggle="collapse" href="#20260303-bio" role="button" aria-expanded="false">
Speaker Bio
</a>
<div class="collapse" id="20260303-bio">
<div class="card card-body">
Martin Weiss is Co-Founder of Tiptree Systems, a startup building AI agents that help ML researchers find, create, and share knowledge more efficiently. Tiptree is deployed to researchers across many top-tier institutes including Mila, ELLIS, MIT, and many more. Martin holds a PhD in AI from Mila, where he studied under Hugo Larochelle and Chris Pal. Before his PhD, he was an early employee at YesGraph, a social graph startup acquired by Lyft.
</div>
</div>
<br>
<a href="https://www.youtube.com/watch?v=vwbpX-585qI&feature=youtu.be" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/Youtube-Recording-orange"></a>
<!-- <a href="[PAPER LINK]"><img src="https://img.shields.io/badge/Paper-link-important"></a> -->
<!-- <a href="[GITHUB_LINK]"><img src="https://img.shields.io/badge/Github-link-lightgrey"></a> -->
<!-- <a href="[SLIDES_LINK]"><img src="https://img.shields.io/badge/Talk-Slides-blue"></a> -->
<a class="btn btn-primary btn-xs" data-toggle="collapse" href="#20260303-abstract" role="button" aria-expanded="false">
Abstract
</a>
<div class="collapse" id="20260303-abstract">
<div class="card card-body">
This talk examines three converging crises. First, the decoupling of control from comprehension — we can increasingly predict and manipulate systems without understanding why they work. Second, the collapse of the generator-verifier gap — AI makes it trivial to produce the aesthetics of deep thought. This makes peer review more difficult because we can no longer rely on easy-to-verify signals of work quality. Third, the credit assignment gap — our academic reward systems optimize for publication metrics, not the increase in understanding that a new paper produces.
</div>
</div>
</li>

<br>

<h4>2026/02/17</h4>
<li>
<b><a href="https://arxiv.org/abs/2510.25003">Emergent Coordinated Behaviors in Networked LLM Agents: Modeling the Strategic Dynamics of Information Operations</a></b>
Expand Down