Session Details: FDS Workshop: "Theory and Practice of Foundation Models"

Name

Algorithms/Understanding

Date & Time

Thursday, October 26, 2023, 9:00 AM - 10:30 AM

Speakers

Zeyuan Allen-Zhu, Meta / FAIR Labs Physics Of Language Models: Knowledge Storage, Extraction And Manipulation Even if LLMs losslessly memorize the pretraining data, it may not be finetuned to extract knowledge from it. Probing techniques suggest that data augmentation is necessary on the pretrain level, regardless of model size, train time and finetune choices. https://arxiv.org/abs/2309.14316 Why do LLMs need Chain of Thoughts even for basic questions (e.g. was Biden born on an even day)? We show that LLMs cannot efficiently manipulate knowledge even if such knowledge is 100% extractable; plus, inverse knowledge search is just impossible. https://arxiv.org/abs/2309.14402 This short presentation will cover one result from each of the two papers to give a full story. Extended talk will be available on YouTube shortly.
Sanjeev Arora, Princeton University A Theory For Emergence Of Complex Skills In Llms A driver of current AI research is the fact that new skills emerge in language models when their parameter set and training corpora are scaled up. This phenomenon is poorly understood, and a mechanistic explanation via mathematical analysis of gradient-based training seems forbiddingly difficult. The current paper takes a different approach, analysing emergence using the famous (and empirical) Scaling Laws of LLMs and a simple statistical framework. Contributions include: (a) A statistical framework that relates cross-entropy loss of LLMs to competence on the basic skills that underlie language tasks. (b) Mathematical analysis showing that the Scaling Laws imply a strong form of inductive bias that allows the pre-trained model to learn very efficiently. We informally call this slingshot generalization since naively viewed it appears to give competence levels at skills that violate usual generalization theory. (c) A key example of slingshot generalization, that competence at executing tasks involving k-tuples of skills emerges essentially at the same scaling and same rate as competence on the elementary skills themselves.
Max Vladymyrov, Google Understanding In-context Learning

Zeyuan Allen-Zhu

Description

30 minute talks

Location Name

Kline Tower: 14th Floor

Full Address

219 Prospect St
New Haven, CT 06511
United States

Session Type

Workshop