Compute-Optimal Scaling for Value-Based Deep RL

Best AI papers explained - A podcast by Enoch H. Kang

Categories:

This paper investigates compute-optimal scaling strategies for value-based deep reinforcement learning (RL), focusing on efficient resource allocation for neural network training. It examines the interplay between model size and batch size, identifying a unique phenomenon termed TD-overfitting where smaller models struggle with larger batch sizes due to evolving, lower-quality target values. The research proposes a prescriptive rule for optimal batch size selection that accounts for both model size and the updates-to-data (UTD) ratio, enabling better compute and data efficiency. Furthermore, the paper provides a framework for allocating computational resources (like UTD and model size) to achieve specific performance targets or maximize performance within a given budget, often demonstrating predictable power-law relationships for these scaling decisions.