Monitoring, Market Primitives, and the Stability of Algorithmic Collusion (submitted)
This paper develops an analytical framework to study when sophisticated machine learning algorithms may learn to collude. Algorithms observe a state variable and update policies to maximize long-term payoffs; their long-run policies correspond to the stable equilibria of a tractable differential equation. In a repeated Bertrand game, I derive necessary and sufficient conditions under which Nash equilibria are learned. This reveals how the interplay between monitoring technology (state variables) and market conditions determines whether competitive or collusive outcomes emerge. I apply these insights to evaluate two key regulatory policies: limiting algorithmic data inputs and imposing competition in the software provider market.
The Algorithmic Advantage: How Reinforcement Learning Generates Rich Communication (submitted)
(with Emilio Calvano and Juha Tolvanen)
We analyze strategic communication when advice is generated by a reinforcement-learning algorithm rather than by a fully rational sender. Building on the cheap-talk framework of Crawford and Sobel (1982), an advisor adapts its messages based on payoff feedback, while a decision maker best-responds. We provide a theoretical analysis of the long-run communication outcomes induced by such reward-driven adaptation. With aligned preferences, we establish that learning robustly leads to informative communication even from uninformative initial policies. With misaligned preferences, no stable outcome exists; instead, learning generates cycles that sustain highly informative communication and payoffs exceeding those of any static equilibrium.
The Bounds to Algorithmic Collusion: Q-learning, gradient learning, and the Folk Theorem
(with Galit Ashkenazi-Golan, Domenico Mergoni Cecchelli, and Edward Plumb)
We explore the behaviour emerging from learning agents repeatedly interacting strategically for a wide range of learning dynamics including Q-learning, projected gradient, replicator and log-barrier dynamics. Going beyond the better-understood classes of potential games and zero-sum games, we consider the setting of a general repeated game with finite recall, for different forms of monitoring. We obtain a Folk Theorem-like result and characterise the set of payoff vectors that can be obtained by these dynamics, discovering a wide range of possibilities for the emergence of algorithmic collusion.
Strategic Learning: When slow and steady wins the race
(with Galit Ashkenazi-Golan, Edward Plumb, and Yufei Zhang)
Learning agents are increasingly involved in decision making. When this decision making is for a strategic interaction, the question of strategically choosing the learning method emerges. We provide an initial step into understanding the implications of a strategic choice of a parameter - the speed of learning in multiagent gradient learning. We use 2x2 games to map the different considerations that are involved in choosing the speed strategically: the effect on basins of attraction, on cyclic behaviour and on the trajectory in dominance-solvable games. For the latter, we show that, while intuitively learning as fast as possible might seem to be an optimal choice, this is not always the case.
Dormant Paper
Learning to Best Reply: On the Consistency of Multi-Agent Batch Reinforcement Learning
This paper provides asymptotic results for a class of model-free actor-critic batch - reinforcement learning algorithms in the multi-agent setting. At each period, each agent faces an estimation problem (the critic, e.g. a value function), and a policy updating problem. The estimation step is done by parametric function estimation based on a batch of past observations. Agents have no knowledge of each others incentives and policies. I provide sufficient conditions for each agent's parametric function estimator to be consistent in the multi-agent environment, which enables agents to learn to best respond despite the non-stationarity inherent in multi-agent systems. The conditions depend on the environment, batch size, and policy step size.
These sufficient conditions are useful in the asymptotic analysis of multi-agent learning, e.g. in the application of long-run characterisations using stochastic approximation techniques.
Zombie Prevalence and Bank Health: Exploring Feedback Effects (R&R at Management Science)
(with Andreea Rotarescu and Kevin Song)
This paper investigates feedback effects between bank health and zombie firms—financially distressed firms receiving subsidized credit. The literature focuses on how banks create zombies, overlooking zombies’ impact on bank health. Using Spanish firm-bank data (2005-2014), we document a vicious cycle: lower bank capital ratios are associated with higher zombie activity in served industries, while higher zombie prevalence is associated with reduced bank capital. We link this to a previously unexplored mechanism where banks respond appropriately to observable financial distress through higher provisioning, but overlook risks from relationship borrowers receiving subsidized rates. Our findings suggest that this feedback stems not from financial distress alone, but from the combination of distress with interest rate subsidies.