Research

Economics and Computer Science

Reinforcement Learning and Collusion

This paper presents an analytical characterization of the long run policies learned by algorithms that interact repeatedly. These algorithms update policies which are maps from observed states to actions. I show that the long run policies correspond to equilibria that are stable points of a tractable differential equation. As a running example, I consider a repeated Cournot game of quantity competition, for which learning the stage game Nash equilibrium serves as non-collusive benchmark. I give necessary and sufficient conditions for this Nash equilibrium not to be learned. These conditions are requirements on the state variables algorithms use to determine their actions, and on the stage game. When algorithms determine actions based only on the past period's price, the Nash equilibrium can be learned. However, agents may condition their actions on richer types of information beyond the past period's price. In that case, I give sufficient conditions such that the policies converge with positive probability to a collusive equilibrium, while never converging to the Nash equilibrium. 


Strategic Communication and Algorithmic Advice

(with Emilio Calvano and Juha Tolvanen)

We study a model of communication in which a better-informed sender learns to communicate with a receiver who takes an action that affects the welfare of both. Specifically, we model the sender as a machine-learning-based algorithmic recommendation system and the receiver as a rational, best-responding agent that understands how the algorithm works. The results demonstrate robust communication, which either emerges from scratch (i.e., originating from babbling where no common language initially exists) or persists when initialized. We show that the sender's learning hinders communication, limiting the extent of information transmission even when the algorithm's designer's and the receiver's preferences are aligned. We then show that when the two are not aligned, there is a robust pattern where the algorithm plays a cut-off strategy pooling messages when its private information suggests actions in the direction of its preference bias while sending mostly separate signals otherwise.


Working Paper

Learning to Best Reply: On the Consistency of Multi-Agent Batch Reinforcement Learning

 This paper provides asymptotic results for a class of model-free actor-critic batch - reinforcement learning algorithms in the multi-agent setting. At each period, each agent faces an estimation problem (the critic, e.g. a value function), and a policy updating problem. The estimation step is done by parametric function estimation based on a batch of past observations. Agents have no knowledge of each others incentives and policies. I provide sufficient conditions for each agent's parametric function estimator to be consistent in the multi-agent environment, which enables agents to learn to best respond despite the non-stationarity inherent in multi-agent systems. The conditions depend on the environment, batch size, and policy step size.

    These sufficient conditions are useful in the asymptotic analysis of multi-agent learning, e.g. in the application of long-run characterisations using stochastic approximation techniques. 

Econometric Theory

Estimating Dynamic Spillover Effects along multiple Networks in a linear Panel Model  (submitted)

(with Andreea Rotarescu and Kevin Song)

Spillover of economic outcomes often arises over multiple networks, and distinguishing their separate roles is important in empirical research. For example, the direction of spillover between two groups (such as banks and industrial sectors linked in a bipartite graph) has important economic implications, and a researcher may want to learn which direction appears prominent in data. For this, we need to have an empirical methodology that allows for both directions of spillover simultaneously. In this paper, we develop a dynamic linear panel model and asymptotic inference with large $n$ and small $T$, where both directions of spillover are accommodated through multiple networks. Using the methodology developed here, we perform an empirical study of spillovers between bank weakness and zombie-firm congestion in industrial sectors, using firm-bank matched data from Spain between 2005 and 2012. Overall, we find that there is positive spillover in both directions between banks and sectors.