Reinforcement Learning and Collusion (submitted)
Online Appendix
This paper presents an analytical characterization of the long run policies learned by algorithms that interact repeatedly. The algorithms observe a state variable and update policies to maximize long term discounted payoffs. I show that their long run policies correspond to equilibria that are stable points of a tractable differential equation. As an example, I consider a repeated Bertrand game, for which learning the stage game Nash equilibrium serves as non-collusive benchmark. I give necessary and sufficient conditions for this Nash equilibrium to be learned. I show how the interplay between monitoring technology (state variables) and market conditions such as price elasticities and markups determine whether stage game Nash, or collusive equilibria may be learned. I apply this framework to analyze two key regulatory policies: limiting data inputs for algorithms, and imposing competition in the software provider market. My results demonstrate that the former strategy holds more promise.
Strategic Communication and Algorithmic Advice
(with Emilio Calvano and Juha Tolvanen)
We study a model of communication in which a better-informed sender learns to communicate with a receiver who takes an action that affects the welfare of both. Specifically, we model the sender as a machine-learning-based algorithmic recommendation system and the receiver as a rational, best-responding agent that understands how the algorithm works. The results demonstrate robust communication, which either emerges from scratch (i.e., originating from babbling where no common language initially exists) or persists when initialized. We show that the sender's learning hinders communication, limiting the extent of information transmission even when the algorithm's designer's and the receiver's preferences are aligned. We then show that when the two are not aligned, there is a robust pattern where the algorithm plays a cut-off strategy pooling messages when its private information suggests actions in the direction of its preference bias while sending mostly separate signals otherwise.
Working Paper
Learning to Best Reply: On the Consistency of Multi-Agent Batch Reinforcement Learning
This paper provides asymptotic results for a class of model-free actor-critic batch - reinforcement learning algorithms in the multi-agent setting. At each period, each agent faces an estimation problem (the critic, e.g. a value function), and a policy updating problem. The estimation step is done by parametric function estimation based on a batch of past observations. Agents have no knowledge of each others incentives and policies. I provide sufficient conditions for each agent's parametric function estimator to be consistent in the multi-agent environment, which enables agents to learn to best respond despite the non-stationarity inherent in multi-agent systems. The conditions depend on the environment, batch size, and policy step size.
These sufficient conditions are useful in the asymptotic analysis of multi-agent learning, e.g. in the application of long-run characterisations using stochastic approximation techniques.
Estimating Dynamic Spillover Effects along multiple Networks in a linear Panel Model (submitted)
(with Andreea Rotarescu and Kevin Song)
Spillover of economic outcomes often arises over multiple networks, and distinguishing their separate roles is important in empirical research. For example, the direction of spillover between two groups (such as banks and industrial sectors linked in a bipartite graph) has important economic implications, and a researcher may want to learn which direction appears prominent in data. For this, we need to have an empirical methodology that allows for both directions of spillover simultaneously. In this paper, we develop a dynamic linear panel model and asymptotic inference with large $n$ and small $T$, where both directions of spillover are accommodated through multiple networks. Using the methodology developed here, we perform an empirical study of spillovers between bank weakness and zombie-firm congestion in industrial sectors, using firm-bank matched data from Spain between 2005 and 2012. Overall, we find that there is positive spillover in both directions between banks and sectors.