🏢 Tsinghua University
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling
·2163 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Tsinghua University
SparseFlex: Achieves high-res, arbitrary-topology 3D shape modeling via sparse isosurface representation and sectional voxel training. Revolutionizing 3D generative AI!
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation
·6982 words·33 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Tsinghua University
ReaRAG enhances factuality in large reasoning models (LRMs) by integrating knowledge-guided reasoning with iterative retrieval augmented generation.
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
·10790 words·51 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tsinghua University
BIZGEN: Article-level Visual Text Rendering for Infographics Generation
Video-T1: Test-Time Scaling for Video Generation
·3231 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Tsinghua University
Video-T1 enhances video generation through test-time scaling, improving quality and consistency by viewing generation as a search for optimal video trajectories.
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs
·4802 words·23 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tsinghua University
LEMMA: LLMs learn math via mistake analysis and correction, boosting performance without external critics.
XAttention: Block Sparse Attention with Antidiagonal Scoring
·2960 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tsinghua University
XAttention: Antidiagonal scoring unlocks block-sparse attention, slashing compute costs in long-context Transformers without sacrificing accuracy.
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
·2721 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Tsinghua University
DeepMesh: RL-guided auto-regressive creation of artist-quality 3D meshes, enhanced by tokenization & DPO for human-aligned aesthetics.
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers
·365 words·2 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tsinghua University
DiffMoE: Dynamically selects tokens for scalable diffusion transformers, unlocking new efficiency levels in image generation.
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
·3349 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 Tsinghua University
DAPO: Open-sources a LLM reinforcement learning system that achieves SOTA AIME scores, fostering reproducible research at scale.
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
·2841 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Tsinghua University
DeepPerception enhances MLLMs with cognitive visual perception, achieving superior grounding through knowledge integration & reasoning.
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
·2233 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Embodied AI
🏢 Tsinghua University
UniGoal: A novel framework for universal zero-shot goal-oriented navigation, outperforming task-specific methods with a unified approach.
KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation
·1710 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Robotics
🏢 Tsinghua University
KUDA unifies dynamics learning and visual prompting with keypoints for open-vocabulary robot manipulation.
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
·2477 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Tsinghua University
GTR: Prevents thought collapse in RL-based VLM agents by process guidance, enhancing performance in complex visual reasoning tasks.
ProReflow: Progressive Reflow with Decomposed Velocity
·1902 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tsinghua University
ProReflow: Improves diffusion model efficiency via progressive training and direction-focused velocity alignment.
SAGE: A Framework of Precise Retrieval for RAG
·3653 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Tsinghua University
SAGE: Precise RAG via semantic segmentation, adaptive chunking, and LLM feedback, boosting QA accuracy & cost-efficiency.
Identifying Sensitive Weights via Post-quantization Integral
·2603 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 Tsinghua University
PQI: Accurately identify sensitive weights in post-quantization to enhance LLM compression & performance!
Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting
·3674 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Tsinghua University
ArtGS: Achieves state-of-the-art, efficient interactable replicas of complex articulated objects via Gaussian Splatting.
FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling
·3445 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Text Generation
🏢 Tsinghua University
FR-Spec: Frequency-Ranked Speculative Sampling accelerates LLMs by optimizing vocabulary space compression, achieving 1.12x speedup over EAGLE-2.
JL1-CD: A New Benchmark for Remote Sensing Change Detection and a Robust Multi-Teacher Knowledge Distillation Framework
·3675 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Segmentation
🏢 Tsinghua University
JL1-CD: New all-inclusive dataset & multi-teacher knowledge distillation framework for robust remote sensing change detection, achieving state-of-the-art results!
Craw4LLM: Efficient Web Crawling for LLM Pretraining
·3024 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tsinghua University
CRAW4LLM: Efficiently crawls web pages for LLM pretraining by prioritizing influence scores, boosting data quality & cutting crawling waste.