🏢 Chinese University of Hong Kong
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control
·3223 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Chinese University of Hong Kong
VideoPainter: Edit any video, any length, with user-guided instructions!
TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
·2590 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Chinese University of Hong Kong
TrajectoryCrafter: Precisely control camera movement in monocular videos with a novel diffusion model for coherent 4D content generation.
Dyve: Thinking Fast and Slow for Dynamic Process Verification
·1995 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Chinese University of Hong Kong
Dyve: A novel dynamic process verifier boosts LLM reasoning accuracy by cleverly combining fast, immediate checks with deeper, slower analyses for complex steps, achieving significant performance gain…
Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT
·3016 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Chinese University of Hong Kong
Lumina-Video: Efficient and flexible video generation using a multi-scale Next-DiT architecture with motion control.
MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation
·2615 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Chinese University of Hong Kong
MotionCanvas lets users design cinematic video shots with intuitive controls for camera and object movements, translating scene-space intentions into video animations.
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference
·1697 words·8 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Chinese University of Hong Kong
CMOE efficiently transforms dense LLMs into sparse MoE architectures via expert carving, enabling fast inference without extensive retraining.
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
·2592 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Chinese University of Hong Kong
Large language models (LLMs) are rapidly evolving, yet often struggle to adapt to human preferences quickly. This paper introduces Test-Time Preference Optimization (TPO), an innovative framework that…
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
·2565 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Chinese University of Hong Kong
Dispider: A novel system enabling real-time interaction with video LLMs via disentangled perception, decision, and reaction modules for efficient, accurate responses to streaming video.
NILE: Internal Consistency Alignment in Large Language Models
·3034 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Chinese University of Hong Kong
NILE framework significantly boosts LLM performance by aligning instruction-tuning datasets with pre-trained internal knowledge, achieving up to 68.5% gains.
Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion
·3868 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Chinese University of Hong Kong
Neural LightRig uses multi-light diffusion to accurately estimate object normals and materials from a single image, outperforming existing methods.
Imagine360: Immersive 360 Video Generation from Perspective Anchor
·2648 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Chinese University of Hong Kong
Imagine360: Generating immersive 360° videos from perspective videos, improving quality and accessibility of 360° content creation.
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding
·4218 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Chinese University of Hong Kong
Video-3D LLM masters 3D scene understanding by cleverly fusing video data with 3D positional encoding, achieving state-of-the-art performance.