Name
                                    Algorithms
                                        Date & Time
                                    Friday, October 27, 2023, 10:45 AM - 12:30 PM
                                        Speakers
                                    Kyunghyun Cho, New York University
Vincent Cohen-Addad, Google Research Data Selection / Active Learning
Nishanth Dikkala, Google Research Alternating Updates - A Method For Efficiently Scaling Up Transformer Models It has been well established that increasing scale in deep transformer networks leads to improved quality and performance. However, this increase in scale often comes with prohibitive increases in compute cost and inference latency. I will present our work on Alternating Updates (AltUp), a simple-to-implement method to increase a model's capacity without the computational burden. AltUp enables the widening of the learned representation, i.e., the token embedding, while only incurring a negligible increase in latency. AltUp achieves this by working on a sub-block of the widened representation at each layer and using a predict-and-correct mechanism to update the inactivated blocks. Our experiments on benchmark transformer models and language tasks demonstrate the consistent effectiveness of AltUp on a diverse set of scenarios. Notably, on SuperGLUE and SQuAD benchmarks, AltUp enables up to 87% speedup relative to the dense baselines at the same accuracy.
Ahmad Beirami, Google Research
David Woodruff, CMU
                                Vincent Cohen-Addad, Google Research Data Selection / Active Learning
Nishanth Dikkala, Google Research Alternating Updates - A Method For Efficiently Scaling Up Transformer Models It has been well established that increasing scale in deep transformer networks leads to improved quality and performance. However, this increase in scale often comes with prohibitive increases in compute cost and inference latency. I will present our work on Alternating Updates (AltUp), a simple-to-implement method to increase a model's capacity without the computational burden. AltUp enables the widening of the learned representation, i.e., the token embedding, while only incurring a negligible increase in latency. AltUp achieves this by working on a sub-block of the widened representation at each layer and using a predict-and-correct mechanism to update the inactivated blocks. Our experiments on benchmark transformer models and language tasks demonstrate the consistent effectiveness of AltUp on a diverse set of scenarios. Notably, on SuperGLUE and SQuAD benchmarks, AltUp enables up to 87% speedup relative to the dense baselines at the same accuracy.
Ahmad Beirami, Google Research
David Woodruff, CMU
Location Name
                                    FDS in Kline Tower: 13th Floor
                                        Full Address
                                    Kline Tower - 13th and 14th Floors
219 Prospect St
New Haven, CT 06511
United States
                                        219 Prospect St
New Haven, CT 06511
United States
Session Type
                                    Workshop