🏢 Peking University
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
·2726 words·13 mins
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Peking University
LLaVA-01: A novel visual language model achieves superior reasoning performance through structured, multi-stage processing and efficient inference-time scaling, surpassing even larger, closed-source m…
Large Language Models Can Self-Improve in Long-context Reasoning
·3316 words·16 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Peking University
LLMs can now self-improve long-context reasoning via SEALONG, a novel method leveraging multiple model outputs and minimum Bayes risk scoring to enable effective supervised fine-tuning or preference o…
GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation
·2630 words·13 mins
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Peking University
GaussianAnything: Interactive point cloud latent diffusion enables high-quality, editable 3D models from images or text, overcoming existing 3D generation limitations.
KMM: Key Frame Mask Mamba for Extended Motion Generation
·2527 words·12 mins
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Peking University
KMM: Key Frame Mask Mamba generates extended, diverse human motion from text prompts by innovatively masking key frames in the Mamba architecture and using contrastive learning for improved text-motio…
Training-free Regional Prompting for Diffusion Transformers
·1817 words·9 mins
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Peking University
Training-free Regional Prompting for FLUX boosts compositional text-to-image generation by cleverly manipulating attention mechanisms, achieving fine-grained control without retraining.
DreamPolish: Domain Score Distillation With Progressive Geometry Generation
·2197 words·11 mins
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Peking University
DreamPolish: A new text-to-3D model generates highly detailed 3D objects with polished surfaces and realistic textures using progressive geometry refinement and a novel domain score distillation tech…
HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models
·2152 words·11 mins
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Peking University
HelloMeme enhances text-to-image models by integrating spatial knitting attentions, enabling high-fidelity meme video generation while preserving model generalization.