Skip to main content

🏢 Tsinghua University

ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario
·3933 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
ComplexFuncBench, a new benchmark, rigorously evaluates LLMs’ complex function-calling abilities across real-world scenarios involving multi-step processes, constraints, and long contexts.
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
·1945 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
This survey paper explores the exciting new frontier of Large Reasoning Models (LRMs), focusing on how reinforcement learning and clever prompting techniques are boosting LLMs’ reasoning capabilities.
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding
·4505 words·22 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Tsinghua University
Parameter-Inverted Image Pyramid Networks (PIIP) drastically cut visual model computing costs without sacrificing accuracy by using smaller models for higher-resolution images and larger models for lo…
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
·5517 words·26 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
URSA-7B: A new multimodal model significantly improves chain-of-thought reasoning in mathematics!
On Computational Limits and Provably Efficient Criteria of Visual Autoregressive Models: A Fine-Grained Complexity Analysis
·285 words·2 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tsinghua University
This paper unveils critical thresholds for efficient visual autoregressive model computation, proving sub-quartic time is impossible beyond a certain input matrix norm while establishing efficient app…
EpiCoder: Encompassing Diversity and Complexity in Code Generation
·5051 words·24 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
EpiCoder revolutionizes code generation by using feature trees to create diverse and complex training data, resulting in state-of-the-art performance on various benchmarks.
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
·3666 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Tsinghua University
MotionBench, a new benchmark, reveals that existing video models struggle with fine-grained motion understanding. To address this, the authors propose TE Fusion, a novel architecture that improves mo…
Dynamic Scaling of Unit Tests for Code Reward Modeling
·3208 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
Boosting code generation accuracy with more unit tests! This research shows that increasing the number of unit tests used to evaluate code generated by LLMs significantly improves accuracy, especially…
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
·8988 words·43 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tsinghua University
VisionReward, a novel reward model, surpasses existing methods by precisely capturing multi-dimensional human preferences for image and video generation, enabling more accurate and stable model optimi…
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation
·3981 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
New benchmarks, HumanEval Pro and MBPP Pro, reveal LLMs struggle with self-invoking code generation, highlighting a critical gap in current code reasoning capabilities.
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
·2203 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
FoPE enhances attention’s periodic extension for better length generalization in language models by addressing spectral damage in RoPE using Fourier Series and zeroing out destructive frequencies.
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching
·3841 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tsinghua University
Distilled Decoding (DD) drastically speeds up image generation from autoregressive models by using flow matching to enable one-step sampling, achieving significant speedups while maintaining acceptabl…
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
·5664 words·27 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
ReMoE: Revolutionizing Mixture-of-Experts with fully differentiable ReLU routing, achieving superior scalability and performance.
How to Synthesize Text Data without Model Collapse?
·5702 words·27 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
Token-level editing prevents language model collapse from synthetic data by theoretically bounding test error and empirically improving model performance.
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer
·3553 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Tsinghua University
LLaVA-UHD v2 enhances MLLMs by integrating high-resolution visual details using a hierarchical window transformer.
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
·3747 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
Self-play method SPAR enhances LLMs instruction following abilities, beating GPT-4 on IFEval
ColorFlow: Retrieval-Augmented Image Sequence Colorization
·2655 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tsinghua University
ColorFlow, a new AI model, accurately colorizes black-and-white image sequences while preserving character identity.
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
·3840 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Tsinghua University
SynerGen-VL: A simpler, more powerful unified MLLM for image understanding and generation.
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
·9241 words·44 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Tsinghua University
InternVL 2.5, a new open-source multimodal LLM, surpasses 70% on the MMMU benchmark, rivaling top commercial models through model, data, and test-time scaling strategies.
GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration
·4750 words·23 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Tsinghua University
GENMAC: Multi-agent collaboration revolutionizes compositional text-to-video generation, achieving state-of-the-art results by iteratively refining videos via specialized agents.