X-MOD 模型是由 Jonas Pfeiffer、Naman Goyal、Xi Lin、Xian Li、James Cross、Sebastian Riedel 和 Mikel Artetxe 在Lifting the Curse of Multilinguality by Pre-training Modular ...
SwitchTransformers 模型是由 William Fedus、Barret Zoph 和 Noam Shazeer 在Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity中提...
RoBERTa-PreLayerNorm 模型由 Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli 在 fairseq: A Fast, Extensible T...