May 3, 2021
The predominant approach for language modeling is to encode a sequence of tokens from left to right, but this eliminates a source of information: the order by which the sequence was naturally generated. One strategy to recover this information is to decode both the content and location of tokens. Prior work supervises content and location with hand-designed loss functions or bootstraps from a predefined ordering. These approaches require domain-specific insight. We address this limitation with an unsupervised learner that discovers high-quality autoregressive orders without domain-specific prior. Our learner is a neural network that performs variational inference with the autoregressive order as a latent variable. The corresponding ELBO is not differentiable, so we develop a practical algorithm for end-to-end optimization using policy gradients. Strong empirical results with our solution on image captioning and code generation suggest that our algorithm is capable of discovering various autoregressive orders for different sequences that are competitive with or better than fixed orders.
The International Conference on Learning Representations (ICLR) is the premier gathering of professionals dedicated to the advancement of the branch of artificial intelligence called representation learning, but generally referred to as deep learning. ICLR is globally renowned for presenting and publishing cutting-edge research on all aspects of deep learning used in the fields of artificial intelligence, statistics and data science, as well as important application areas such as machine vision, computational biology, speech recognition, text understanding, gaming, and robotics.
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker