🏢 Hong Kong University of Science and Technology
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
·2715 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Hong Kong University of Science and Technology
LeviTor: Revolutionizing image-to-video synthesis with intuitive 3D trajectory control, generating realistic videos from static images by abstracting object masks into depth-aware control points.
Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
·2901 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Hong Kong University of Science and Technology
Enhance image captions significantly with DCE, a novel engine leveraging visual specialists to generate comprehensive, detailed descriptions surpassing LMM and human-annotated captions.
AniDoc: Animation Creation Made Easier
·2223 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Hong Kong University of Science and Technology
AniDoc automates cartoon animation line art video colorization, making animation creation easier!
GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs
·3380 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Hong Kong University of Science and Technology
Training-free method adds physical properties to 3D models using vision-language models.
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
·3111 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Hong Kong University of Science and Technology
Lyra: An efficient, speech-centric framework for omni-cognition, achieving state-of-the-art results across various modalities while being highly efficient.
VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation
·2511 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Hong Kong University of Science and Technology
VideoGen-of-Thought (VGoT) creates high-quality, multi-shot videos by collaboratively generating scripts, keyframes, and video clips, ensuring narrative consistency and visual coherence.
OmniCreator: Self-Supervised Unified Generation with Universal Editing
·5399 words·26 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Hong Kong University of Science and Technology
OmniCreator: Self-supervised unified image+video generation & universal editing.
MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control
·4302 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Hong Kong University of Science and Technology
MagicDriveDiT generates high-resolution, long street-view videos with precise control, exceeding limitations of previous methods in autonomous driving.
Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models
·3715 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Hong Kong University of Science and Technology
Golden Touchstone, a new bilingual benchmark, comprehensively evaluates financial LLMs across eight tasks, revealing model strengths and weaknesses and advancing FinLLM research.