Skip to main content

🏢 Peking University

LLaVA-o1: Let Vision Language Models Reason Step-by-Step
·2726 words·13 mins
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Peking University
LLaVA-01: A novel visual language model achieves superior reasoning performance through structured, multi-stage processing and efficient inference-time scaling, surpassing even larger, closed-source m…
Large Language Models Can Self-Improve in Long-context Reasoning
·3316 words·16 mins
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Peking University
LLMs can now self-improve long-context reasoning via SEALONG, a novel method leveraging multiple model outputs and minimum Bayes risk scoring to enable effective supervised fine-tuning or preference o…
GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation
·2630 words·13 mins
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Peking University
GaussianAnything: Interactive point cloud latent diffusion enables high-quality, editable 3D models from images or text, overcoming existing 3D generation limitations.
KMM: Key Frame Mask Mamba for Extended Motion Generation
·2527 words·12 mins
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Peking University
KMM: Key Frame Mask Mamba generates extended, diverse human motion from text prompts by innovatively masking key frames in the Mamba architecture and using contrastive learning for improved text-motio…
Training-free Regional Prompting for Diffusion Transformers
·1817 words·9 mins
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Peking University
Training-free Regional Prompting for FLUX boosts compositional text-to-image generation by cleverly manipulating attention mechanisms, achieving fine-grained control without retraining.
DreamPolish: Domain Score Distillation With Progressive Geometry Generation
·2197 words·11 mins
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Peking University
DreamPolish: A new text-to-3D model generates highly detailed 3D objects with polished surfaces and realistic textures using progressive geometry refinement and a novel domain score distillation tech…
HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models
·2152 words·11 mins
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Peking University
HelloMeme enhances text-to-image models by integrating spatial knitting attentions, enabling high-fidelity meme video generation while preserving model generalization.