🏢 Chinese University of Hong Kong

VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control

7 March 2025·3223 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Chinese University of Hong Kong

VideoPainter: Edit any video, any length, with user-guided instructions!

TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models

7 March 2025·2590 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Chinese University of Hong Kong

TrajectoryCrafter: Precisely control camera movement in monocular videos with a novel diffusion model for coherent 4D content generation.

Dyve: Thinking Fast and Slow for Dynamic Process Verification

16 February 2025·1995 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Chinese University of Hong Kong

Dyve: A novel dynamic process verifier boosts LLM reasoning accuracy by cleverly combining fast, immediate checks with deeper, slower analyses for complex steps, achieving significant performance gain…

Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT

10 February 2025·3016 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Chinese University of Hong Kong

Lumina-Video: Efficient and flexible video generation using a multi-scale Next-DiT architecture with motion control.

MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation

6 February 2025·2615 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Chinese University of Hong Kong

MotionCanvas lets users design cinematic video shots with intuitive controls for camera and object movements, translating scene-space intentions into video animations.

CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference

6 February 2025·1697 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Chinese University of Hong Kong

CMOE efficiently transforms dense LLMs into sparse MoE architectures via expert carving, enabling fast inference without extensive retraining.

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

22 January 2025·2592 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Chinese University of Hong Kong

Large language models (LLMs) are rapidly evolving, yet often struggle to adapt to human preferences quickly. This paper introduces Test-Time Preference Optimization (TPO), an innovative framework that…

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

6 January 2025·2565 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Chinese University of Hong Kong

Dispider: A novel system enabling real-time interaction with video LLMs via disentangled perception, decision, and reaction modules for efficient, accurate responses to streaming video.

NILE: Internal Consistency Alignment in Large Language Models

21 December 2024·3034 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Chinese University of Hong Kong

NILE framework significantly boosts LLM performance by aligning instruction-tuning datasets with pre-trained internal knowledge, achieving up to 68.5% gains.

Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion

12 December 2024·3868 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Chinese University of Hong Kong

Neural LightRig uses multi-light diffusion to accurately estimate object normals and materials from a single image, outperforming existing methods.

Imagine360: Immersive 360 Video Generation from Perspective Anchor

4 December 2024·2648 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Chinese University of Hong Kong

Imagine360: Generating immersive 360° videos from perspective videos, improving quality and accessibility of 360° content creation.

Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding

30 November 2024·4218 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Chinese University of Hong Kong

Video-3D LLM masters 3D scene understanding by cleverly fusing video data with 3D positional encoding, achieving state-of-the-art performance.