Skip to main content

🏢 Westlake University

Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
·3300 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Westlake University
VidKV: Achieves 1.5x-bit KV cache quantization for VideoLLMs, maintaining performance without retraining.
ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness
·2550 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Westlake University
ETCH: Equivariantly fitting bodies to clothed humans through tightness for better pose and shape accuracy.
Autoregressive Image Generation with Randomized Parallel Decoding
·3693 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Westlake University
ARPG: Randomly generate high-quality images by parallel decoding, outperforming existing methods in efficiency, memory, and quality.
CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction
·3880 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 Westlake University
CARP: A novel visuomotor policy learning paradigm achieves high accuracy and 10x faster inference than state-of-the-art by combining autoregressive efficiency and diffusion model precision through a c…
Direct Preference Optimization Using Sparse Feature-Level Constraints
·2078 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Westlake University
Feature-level constrained Preference Optimization (FPO) boosts LLM alignment efficiency and stability by using sparse autoencoders and feature-level constraints, achieving significant improvements ove…