Attention is Not Only a Weight: Analyzing Transformers with Vector Norms

by · Nov 16, 2020 · 144 views ·