Name
Algorithms/Understanding
Date & Time
Thursday, October 26, 2023, 9:00 AM - 10:30 AM
Speakers
Zeyuan Allen-Zhu, Meta / FAIR Labs Physics Of Language Models: Knowledge Storage, Extraction And Manipulation Even if LLMs losslessly memorize the pretraining data, it may not be finetuned to extract knowledge from it. Probing techniques suggest that data augmentation is necessary on the pretrain level, regardless of model size, train time and finetune choices. https://arxiv.org/abs/2309.14316
Why do LLMs need Chain of Thoughts even for basic questions (e.g. was Biden born on an even day)? We show that LLMs cannot efficiently manipulate knowledge even if such knowledge is 100% extractable; plus, inverse knowledge search is just impossible. https://arxiv.org/abs/2309.14402
This short presentation will cover one result from each of the two papers to give a full story. Extended talk will be available on YouTube shortly.
Sanjeev Arora, Princeton University A Theory For Emergence Of Complex Skills In Llms A driver of current AI research is the fact that new skills emerge in language models when their parameter set and training corpora are scaled up. This phenomenon is poorly understood, and a mechanistic explanation via mathematical analysis of gradient-based training seems forbiddingly difficult. The current paper takes a different approach, analysing emergence using the famous (and empirical) Scaling Laws of LLMs and a simple statistical framework. Contributions include: (a) A statistical framework that relates cross-entropy loss of LLMs to competence on the basic skills that underlie language tasks. (b) Mathematical analysis showing that the Scaling Laws imply a strong form of inductive bias that allows the pre-trained model to learn very efficiently. We informally call this slingshot generalization since naively viewed it appears to give competence levels at skills that violate usual generalization theory. (c) A key example of slingshot generalization, that competence at executing tasks involving k-tuples of skills emerges essentially at the same scaling and same rate as competence on the elementary skills themselves.
Max Vladymyrov, Google Understanding In-context Learning
Sanjeev Arora, Princeton University A Theory For Emergence Of Complex Skills In Llms A driver of current AI research is the fact that new skills emerge in language models when their parameter set and training corpora are scaled up. This phenomenon is poorly understood, and a mechanistic explanation via mathematical analysis of gradient-based training seems forbiddingly difficult. The current paper takes a different approach, analysing emergence using the famous (and empirical) Scaling Laws of LLMs and a simple statistical framework. Contributions include: (a) A statistical framework that relates cross-entropy loss of LLMs to competence on the basic skills that underlie language tasks. (b) Mathematical analysis showing that the Scaling Laws imply a strong form of inductive bias that allows the pre-trained model to learn very efficiently. We informally call this slingshot generalization since naively viewed it appears to give competence levels at skills that violate usual generalization theory. (c) A key example of slingshot generalization, that competence at executing tasks involving k-tuples of skills emerges essentially at the same scaling and same rate as competence on the elementary skills themselves.
Max Vladymyrov, Google Understanding In-context Learning
Description
30 minute talks
Location Name
Kline Tower: 14th Floor
Full Address
219 Prospect St
New Haven, CT 06511
United States
New Haven, CT 06511
United States
Session Type
Workshop