🏢 University of Chinese Academy of Sciences
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation
·3191 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 University of Chinese Academy of Sciences
HAVEN: A new benchmark to tackle the hallucination issue in video understanding of large multimodal models!
UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
·4040 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 University of Chinese Academy of Sciences
UniPose: A unified multimodal framework for human pose comprehension, generation, and editing, enabling seamless transitions across various modalities and showcasing zero-shot generalization.
Continuous Speculative Decoding for Autoregressive Image Generation
·1799 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 University of Chinese Academy of Sciences
Researchers have developed Continuous Speculative Decoding, boosting autoregressive image generation speed by up to 2.33x while maintaining image quality.