🏢 Westlake University
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
·3300 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Westlake University
VidKV: Achieves 1.5x-bit KV cache quantization for VideoLLMs, maintaining performance without retraining.
ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness
·2550 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Westlake University
ETCH: Equivariantly fitting bodies to clothed humans through tightness for better pose and shape accuracy.
Autoregressive Image Generation with Randomized Parallel Decoding
·3693 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Westlake University
ARPG: Randomly generate high-quality images by parallel decoding, outperforming existing methods in efficiency, memory, and quality.
CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction
·3880 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Robotics
🏢 Westlake University
CARP: A novel visuomotor policy learning paradigm achieves high accuracy and 10x faster inference than state-of-the-art by combining autoregressive efficiency and diffusion model precision through a c…
Direct Preference Optimization Using Sparse Feature-Level Constraints
·2078 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Westlake University
Feature-level constrained Preference Optimization (FPO) boosts LLM alignment efficiency and stability by using sparse autoencoders and feature-level constraints, achieving significant improvements ove…