Out-of-Distribution Generalization via Composition: A Lens Through Induction Heads in Transformers
Study on Out-of-Distribution Generalization and Composition Mechanisms in Large Language Models Paper Background In recent years, large language models (LLMs) such as GPT-4 have demonstrated remarkable creativity in handling novel tasks, often solving problems with just a few examples. These tasks require models to generalize on distributions diffe...