Language Model Personalization via Reward Factorization

Best AI papers explained - A podcast by Enoch H. Kang

Categories:

This paper discusses Personalization via Reward Factorization (PReF), a novel framework designed to enhance Large Language Models (LLMs) by personalizing responses to individual user preferences. Unlike traditional Reinforcement Learning from Human Feedback (RLHF) which assumes universal preferences, PReF models user-specific rewards as a linear combination of "base reward functions" and efficiently infers these user-specific weights with minimal data (as few as 10 responses). The framework demonstrates significant improvements in personalizing LLM outputs over existing methods and addresses the computational challenges of adapting LLMs for diverse users. Through experiments with synthetic and real users, the authors validate PReF's ability to achieve substantial personalization, evidenced by a 67% win rate against default GPT-4o responses in human evaluations.