Personalized language modeling from personalized human feedback

Best AI papers explained - A podcast by Enoch H. Kang

Categories:

This paper introduces Personalized-RLHF (P-RLHF), a novel framework designed to create personalized large language models (LLMs) that cater to individual user preferences. Unlike traditional Reinforcement Learning from Human Feedback (RLHF), which assumes uniform preferences, P-RLHF integrates a lightweight user model to capture both explicit preferences (from textual input) and implicit preferences (from feedback data). The framework jointly learns this user model with the LLM through new objectives like Personalized Direct Preference Optimization (P-DPO), demonstrating improved alignment with individual user preferences and efficient scalability compared to non-personalized or prompting-based approaches. This method addresses the limitations of prior techniques that either require multiple LLMs or rely on predefined preference dimensions.