General Intelligence Requires Reward-based Pretraining

Best AI papers explained - A podcast by Enoch H. Kang

Categories:

This position paper argues that Large Language Models (LLMs), despite their current utility as Artificial Useful Intelligence (AUI), often struggle with robust and adaptive reasoning required for Artificial General Intelligence (AGI) because their training methods overfit to specific data patterns. The authors propose a shift from the current supervised pretraining (SPT) paradigm to reward-based pretraining (RPT), similar to how AlphaZero surpassed AlphaGo by learning purely through reinforcement. To achieve this, they suggest training on synthetic tasks with reduced token spaces to foster generalizable reasoning skills and decoupling knowledge from reasoning through an external memory system. This proposed architecture would allow the reasoning module to operate with a smaller context, relying on learned retrieval mechanisms for information, thereby promoting more robust generalization across novel domains.