视频理解(Video Understandi […]
图像字幕生成(Image Captionin […]
视觉基础模型(Visual Foundati […]
图文匹配(Image-Text Matchi […]
视频到文本生成(Video-to-Text […]