🏢 Shanghai Jiao Tong University
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation
·3420 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Shanghai Jiao Tong University
Show-o Turbo dramatically speeds up multimodal understanding and generation by leveraging parallel decoding and consistency distillation, achieving significant performance gains with fewer sampling st…
iFormer: Integrating ConvNet and Transformer for Mobile Application
·7046 words·34 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Classification
🏢 Shanghai Jiao Tong University
iFormer: A new family of mobile hybrid vision networks that expertly blends ConvNeXt’s fast local feature extraction with the efficient global modeling of self-attention, achieving top-tier accuracy a…
PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World
·3633 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Human-AI Interaction
🏢 Shanghai Jiao Tong University
PC Agent: While you sleep, AI works! This AI system uses human cognition transfer to perform complex digital tasks, exceeding the capabilities of existing digital agents by efficiently learning from h…
Towards Universal Soccer Video Understanding
·2836 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Shanghai Jiao Tong University
Soccer video understanding gets a major boost with SoccerReplay-1988, the largest multi-modal dataset, and MatchVision, a new visual-language model achieving state-of-the-art performance on event clas…