An empirical risk minimization approach for offline inverse RL and Dynamic Discrete Choice models

Best AI papers explained - A podcast by Enoch H. Kang

Categories:

This paper introduces a novel **Empirical Risk Minimization (ERM)-based gradient method** named GLADIUS, designed for **Inverse Reinforcement Learning (IRL)** and **Dynamic Discrete Choice (DDC)** models. The core innovation lies in its ability to **infer rewards and Q-functions** without requiring explicit knowledge or estimation of **state-transition probabilities**, a common hurdle in **large state spaces**. The paper theoretically demonstrates **global optimality guarantees** by proving that its objective function satisfies the **Polyak-Łojasiewicz (PL) condition**, a less restrictive alternative to strong convexity. Furthermore, it differentiates IRL/DDC from **imitation learning (IL)**, asserting that IL is a "strictly easier" problem as it directly mimics behavior without inferring underlying rewards, thus limiting its utility for **counterfactual reasoning**. Empirical results on a **bus engine replacement problem** and **high-dimensional environments** validate GLADIUS's effectiveness and **scalability**, outperforming existing non-oracle methods.