10. prosince 2023
Řečník · 0 sledujících
Řečník · 1 sledující
Řečník · 0 sledujících
Řečník · 0 sledujících
Řečník · 5 sledujících
The quality of training data impacts the performance of pre-trained large language models (LMs). Given a fixed budget of tokens, it is unclear what data to best select for the model’s performance across tasks. To study this, we develop a new framework based on a simple hypothesis: similar to how humans acquire interdependent skills in a deliberate order, there exists a natural order in how the LM best learns a set of skills from its training data. If such order exists, it can be exploited for improved understanding of LMs and data-efficient training. Using this intuition, our framework formalizes the notion of a skill and of an ordered set of skills in terms of their associated data. We demonstrate that these ordered skill sets exist on synthetic and real data, and their existence enables skills later in the order to be learned with less data given that we train on their prerequisite skills. Interestingly, we find that these ordered skill sets are not captured by intuitive data groupings based on metadata and embedding clustering. Using our proposed framework, we introduce an online data sampling algorithm, Skill-It, over mixtures of skills for learning skills more quickly for both continual pre-training and fine-tuning regimes, where we aim to learn multiple skills in the former and an individual skill in the latter. In the continual pre-training setting, on the LEGO synthetic we show that our approach results in 36.5 points increase in accuracy over random sampling. In the fine-tuning setting, on the Natural Instructions dataset our online mixture with prerequisite skills reduces the validation loss on a target skill by 13.6The quality of training data impacts the performance of pre-trained large language models (LMs). Given a fixed budget of tokens, it is unclear what data to best select for the model’s performance across tasks. To study this, we develop a new framework based on a simple hypothesis: similar to how humans acquire interdependent skills in a deliberate order, there exists a natural order in how the LM best learns a set of skills from its training data. If such order exists, it can be exploited for im…
Účet · 646 sledujících
Profesionální natáčení a streamování po celém světě.
Prezentace na podobné téma, kategorii nebo přednášejícího
Zekun Li, …
Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %
Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %
Max Torop, …
Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %
Jungtaek Kim, …
Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %
Han Cui, …
Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %
Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %