本文分享论文『VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training』,由南大王利民团队提出第一个VideoMAE 框架,使用超高 masking ratio(90%-95%),性能S......
本文分享论文『UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation』,由西南交大&MSRA提出《UniVL》,用于多模态理解和生成的统一视频和语言预训练模型!...