Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov
Abstract: Transformers have a potential of learning longer-term dependency, but are
limited by a fixed-length context in the setting of language modeling. We
propose a novel neural architecture Transformer-XL that enables learning
dependency beyond a fixed length without disrupting temporal coherence. It
consists of a segment-level recurrence mechanism and a novel positional
encoding scheme. Our method not only enables capturing longer-term dependency,
but also resolves the context fragmentation problem. As a res ...
Traffic: 1 users visited in the last hour