Reinforcement Learning and Collusion (submitted)
Online Appendix
This paper develops an analytical framework to characterize the long run policies learned by repeatedly interacting algorithms. In the model, algorithms observe a state variable and update their policies to maximize long-term discounted payoffs. I show that their long run policies correspond to the stable equilibria of a tractable differential equation. I take advantage of this framework to analyze a repeated Bertrand game, where the stage game Nash equilibrium serves as a non-collusive benchmark. I derive necessary and sufficient conditions under which this Nash equilibrium is learned, revealing how the interplay between monitoring technology (state variables) and market conditions (price elasticities, markups) determines whether competitive or collusive outcomes emerge. Finally, I apply these insights to evaluate two key regulatory policies: limiting data inputs for algorithms and imposing competition in the software provider market. My results demonstrate that the former strategy is a more promising approach to curbing algorithmic collusion.
Strategic Communication and Algorithmic Advice
(with Emilio Calvano and Juha Tolvanen)
We study a model of communication in which a better-informed sender learns to communicate with a receiver who takes an action that affects the welfare of both. Specifically, we model the sender as a machine-learning-based algorithmic recommendation system and the receiver as a rational, best-responding agent that understands how the algorithm works. The results demonstrate robust communication, which either emerges from scratch (i.e., originating from babbling where no common language initially exists) or persists when initialized. We show that the sender's learning hinders communication, limiting the extent of information transmission even when the algorithm's designer's and the receiver's preferences are aligned. We then show that when the two are not aligned, there is a robust pattern where the algorithm plays a cut-off strategy pooling messages when its private information suggests actions in the direction of its preference bias while sending mostly separate signals otherwise.
Working Paper
Learning to Best Reply: On the Consistency of Multi-Agent Batch Reinforcement Learning
This paper provides asymptotic results for a class of model-free actor-critic batch - reinforcement learning algorithms in the multi-agent setting. At each period, each agent faces an estimation problem (the critic, e.g. a value function), and a policy updating problem. The estimation step is done by parametric function estimation based on a batch of past observations. Agents have no knowledge of each others incentives and policies. I provide sufficient conditions for each agent's parametric function estimator to be consistent in the multi-agent environment, which enables agents to learn to best respond despite the non-stationarity inherent in multi-agent systems. The conditions depend on the environment, batch size, and policy step size.
These sufficient conditions are useful in the asymptotic analysis of multi-agent learning, e.g. in the application of long-run characterisations using stochastic approximation techniques.
Estimating Dynamic Spillover Effects along multiple Networks in a linear Panel Model (submitted)
(with Andreea Rotarescu and Kevin Song)
This paper investigates feedback effects between bank health and zombie firms—financially distressed firms receiving subsidized credit. The literature focuses on how banks create zombies, overlooking zombies’ impact on bank health. Using Spanish firm-bank data (2005-2014), we document a vicious cycle: lower bank capital ratios are associated with higher zombie activity in served industries, while higher zombie prevalence is associated with reduced bank capital. We link this to a previously unexplored mechanism where banks respond appropriately to observable financial distress through higher provisioning, but overlook risks from relationship borrowers receiving subsidized rates. Our findings suggest that this feedback stems not from financial distress alone, but from the combination of distress with interest rate subsidies.