Skip to main content

🏢 Tsinghua University

SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling
·2163 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tsinghua University
SparseFlex: Achieves high-res, arbitrary-topology 3D shape modeling via sparse isosurface representation and sectional voxel training. Revolutionizing 3D generative AI!
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation
·6982 words·33 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 Tsinghua University
ReaRAG enhances factuality in large reasoning models (LRMs) by integrating knowledge-guided reasoning with iterative retrieval augmented generation.
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
·10790 words·51 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tsinghua University
BIZGEN: Article-level Visual Text Rendering for Infographics Generation
Video-T1: Test-Time Scaling for Video Generation
·3231 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Tsinghua University
Video-T1 enhances video generation through test-time scaling, improving quality and consistency by viewing generation as a search for optimal video trajectories.
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs
·4802 words·23 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
LEMMA: LLMs learn math via mistake analysis and correction, boosting performance without external critics.
XAttention: Block Sparse Attention with Antidiagonal Scoring
·2960 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
XAttention: Antidiagonal scoring unlocks block-sparse attention, slashing compute costs in long-context Transformers without sacrificing accuracy.
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
·2721 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tsinghua University
DeepMesh: RL-guided auto-regressive creation of artist-quality 3D meshes, enhanced by tokenization & DPO for human-aligned aesthetics.
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers
·365 words·2 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tsinghua University
DiffMoE: Dynamically selects tokens for scalable diffusion transformers, unlocking new efficiency levels in image generation.
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
·3349 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Tsinghua University
DAPO: Open-sources a LLM reinforcement learning system that achieves SOTA AIME scores, fostering reproducible research at scale.
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
·2841 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Tsinghua University
DeepPerception enhances MLLMs with cognitive visual perception, achieving superior grounding through knowledge integration & reasoning.
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
·2233 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Embodied AI 🏢 Tsinghua University
UniGoal: A novel framework for universal zero-shot goal-oriented navigation, outperforming task-specific methods with a unified approach.
KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation
·1710 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 Tsinghua University
KUDA unifies dynamics learning and visual prompting with keypoints for open-vocabulary robot manipulation.
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
·2477 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Tsinghua University
GTR: Prevents thought collapse in RL-based VLM agents by process guidance, enhancing performance in complex visual reasoning tasks.
ProReflow: Progressive Reflow with Decomposed Velocity
·1902 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tsinghua University
ProReflow: Improves diffusion model efficiency via progressive training and direction-focused velocity alignment.
SAGE: A Framework of Precise Retrieval for RAG
·3653 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 Tsinghua University
SAGE: Precise RAG via semantic segmentation, adaptive chunking, and LLM feedback, boosting QA accuracy & cost-efficiency.
Identifying Sensitive Weights via Post-quantization Integral
·2603 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Tsinghua University
PQI: Accurately identify sensitive weights in post-quantization to enhance LLM compression & performance!
Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting
·3674 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tsinghua University
ArtGS: Achieves state-of-the-art, efficient interactable replicas of complex articulated objects via Gaussian Splatting.
FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling
·3445 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Text Generation 🏢 Tsinghua University
FR-Spec: Frequency-Ranked Speculative Sampling accelerates LLMs by optimizing vocabulary space compression, achieving 1.12x speedup over EAGLE-2.
JL1-CD: A New Benchmark for Remote Sensing Change Detection and a Robust Multi-Teacher Knowledge Distillation Framework
·3675 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Segmentation 🏢 Tsinghua University
JL1-CD: New all-inclusive dataset & multi-teacher knowledge distillation framework for robust remote sensing change detection, achieving state-of-the-art results!
Craw4LLM: Efficient Web Crawling for LLM Pretraining
·3024 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
CRAW4LLM: Efficiently crawls web pages for LLM pretraining by prioritizing influence scores, boosting data quality & cutting crawling waste.