🏢 Tsinghua University

Craw4LLM: Efficient Web Crawling for LLM Pretraining

19 February 2025·3024 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

CRAW4LLM: Efficiently crawls web pages for LLM pretraining by prioritizing influence scores, boosting data quality & cutting crawling waste.

PAFT: Prompt-Agnostic Fine-Tuning

18 February 2025·3569 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

PAFT dynamically adjusts prompts during LLM fine-tuning, improving model robustness and generalization across diverse prompts without sacrificing performance or efficiency.

video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

17 February 2025·4398 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Reasoning 🏢 Tsinghua University

video-SALMONN-01: An open-source audio-visual LLM enhances video understanding with a novel reasoning-intensive dataset and the pDPO method, achieving significant accuracy gains.

DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References

13 February 2025·4451 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 Tsinghua University

DexTrack achieves highly generalizable neural tracking control for dexterous robot manipulation by iteratively training a controller using high-quality demonstrations refined via homotopy optimization…

One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs

12 February 2025·2416 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

New benchmark COUNTERMATH enhances LLMs’ mathematical reasoning using counterexample-driven proofs, revealing current models’ limitations and paving the way for improved mathematical capabilities.

Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM

10 February 2025·3355 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

Steel-LLM: A fully open-source, resource-efficient Chinese LLM trained with transparency, achieving competitive performance despite limited resources.

Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile

10 February 2025·4798 words·23 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Tsinghua University

EFFICIENT-VDIT accelerates video generation by 7.8x using sparse attention and multi-step distillation.

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

10 February 2025·3884 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

Smaller LLMs can outperform larger ones by strategically increasing computation during inference, defying conventional LLM scaling.

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

6 February 2025·2102 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Tsinghua University

Ola: a novel 7B parameter omni-modal language model achieves state-of-the-art performance across image, video and audio tasks using a progressive modality alignment training strategy.

Process Reinforcement through Implicit Rewards

3 February 2025·3889 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

PRIME (Process Reinforcement through IMplicit rEwards) revolutionizes LLM training by efficiently using implicit process rewards from online policy rollouts and outcome labels, significantly boosting …

Improving Video Generation with Human Feedback

23 January 2025·4418 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Tsinghua University

Human feedback boosts video generation! New VideoReward model & alignment algorithms significantly improve video quality and user prompt alignment, exceeding prior methods.

Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament

22 January 2025·2172 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

Pairwise RM, a novel reward model with knockout tournaments, significantly boosts large language model accuracy in test-time scaling by comparing solution pairs, eliminating arbitrary scoring inconsis…

FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces

22 January 2025·4361 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Tsinghua University

FILMAGENT: A multi-agent framework automates end-to-end virtual film production using LLMs, exceeding single-agent performance in a collaborative workflow.

ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario

17 January 2025·3933 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

ComplexFuncBench, a new benchmark, rigorously evaluates LLMs’ complex function-calling abilities across real-world scenarios involving multi-step processes, constraints, and long contexts.

Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

16 January 2025·1945 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

This survey paper explores the exciting new frontier of Large Reasoning Models (LRMs), focusing on how reinforcement learning and clever prompting techniques are boosting LLMs’ reasoning capabilities.

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

14 January 2025·4505 words·22 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Tsinghua University

Parameter-Inverted Image Pyramid Networks (PIIP) drastically cut visual model computing costs without sacrificing accuracy by using smaller models for higher-resolution images and larger models for lo…

URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics

8 January 2025·5517 words·26 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

URSA-7B: A new multimodal model significantly improves chain-of-thought reasoning in mathematics!

On Computational Limits and Provably Efficient Criteria of Visual Autoregressive Models: A Fine-Grained Complexity Analysis

8 January 2025·285 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tsinghua University

This paper unveils critical thresholds for efficient visual autoregressive model computation, proving sub-quartic time is impossible beyond a certain input matrix norm while establishing efficient app…

EpiCoder: Encompassing Diversity and Complexity in Code Generation

8 January 2025·5051 words·24 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

EpiCoder revolutionizes code generation by using feature trees to create diverse and complex training data, resulting in state-of-the-art performance on various benchmarks.

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

6 January 2025·3666 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Tsinghua University

MotionBench, a new benchmark, reveals that existing video models struggle with fine-grained motion understanding. To address this, the authors propose TE Fusion, a novel architecture that improves mo…