Skip to main content

🏢 Tsinghua University

Craw4LLM: Efficient Web Crawling for LLM Pretraining
·3024 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
CRAW4LLM: Efficiently crawls web pages for LLM pretraining by prioritizing influence scores, boosting data quality & cutting crawling waste.
PAFT: Prompt-Agnostic Fine-Tuning
·3569 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
PAFT dynamically adjusts prompts during LLM fine-tuning, improving model robustness and generalization across diverse prompts without sacrificing performance or efficiency.
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
·4398 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Reasoning 🏢 Tsinghua University
video-SALMONN-01: An open-source audio-visual LLM enhances video understanding with a novel reasoning-intensive dataset and the pDPO method, achieving significant accuracy gains.
DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References
·4451 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 Tsinghua University
DexTrack achieves highly generalizable neural tracking control for dexterous robot manipulation by iteratively training a controller using high-quality demonstrations refined via homotopy optimization…
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs
·2416 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
New benchmark COUNTERMATH enhances LLMs’ mathematical reasoning using counterexample-driven proofs, revealing current models’ limitations and paving the way for improved mathematical capabilities.
Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM
·3355 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
Steel-LLM: A fully open-source, resource-efficient Chinese LLM trained with transparency, achieving competitive performance despite limited resources.
Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile
·4798 words·23 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Tsinghua University
EFFICIENT-VDIT accelerates video generation by 7.8x using sparse attention and multi-step distillation.
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
·3884 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
Smaller LLMs can outperform larger ones by strategically increasing computation during inference, defying conventional LLM scaling.
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment
·2102 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Tsinghua University
Ola: a novel 7B parameter omni-modal language model achieves state-of-the-art performance across image, video and audio tasks using a progressive modality alignment training strategy.
Process Reinforcement through Implicit Rewards
·3889 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
PRIME (Process Reinforcement through IMplicit rEwards) revolutionizes LLM training by efficiently using implicit process rewards from online policy rollouts and outcome labels, significantly boosting …
Improving Video Generation with Human Feedback
·4418 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Tsinghua University
Human feedback boosts video generation! New VideoReward model & alignment algorithms significantly improve video quality and user prompt alignment, exceeding prior methods.
Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament
·2172 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
Pairwise RM, a novel reward model with knockout tournaments, significantly boosts large language model accuracy in test-time scaling by comparing solution pairs, eliminating arbitrary scoring inconsis…
FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces
·4361 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Tsinghua University
FILMAGENT: A multi-agent framework automates end-to-end virtual film production using LLMs, exceeding single-agent performance in a collaborative workflow.
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario
·3933 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
ComplexFuncBench, a new benchmark, rigorously evaluates LLMs’ complex function-calling abilities across real-world scenarios involving multi-step processes, constraints, and long contexts.
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
·1945 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
This survey paper explores the exciting new frontier of Large Reasoning Models (LRMs), focusing on how reinforcement learning and clever prompting techniques are boosting LLMs’ reasoning capabilities.
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding
·4505 words·22 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Tsinghua University
Parameter-Inverted Image Pyramid Networks (PIIP) drastically cut visual model computing costs without sacrificing accuracy by using smaller models for higher-resolution images and larger models for lo…
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
·5517 words·26 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
URSA-7B: A new multimodal model significantly improves chain-of-thought reasoning in mathematics!
On Computational Limits and Provably Efficient Criteria of Visual Autoregressive Models: A Fine-Grained Complexity Analysis
·285 words·2 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tsinghua University
This paper unveils critical thresholds for efficient visual autoregressive model computation, proving sub-quartic time is impossible beyond a certain input matrix norm while establishing efficient app…
EpiCoder: Encompassing Diversity and Complexity in Code Generation
·5051 words·24 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
EpiCoder revolutionizes code generation by using feature trees to create diverse and complex training data, resulting in state-of-the-art performance on various benchmarks.
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
·3666 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Tsinghua University
MotionBench, a new benchmark, reveals that existing video models struggle with fine-grained motion understanding. To address this, the authors propose TE Fusion, a novel architecture that improves mo…