Skip to main content

🏢 Tsinghua University

Training and Evaluating Language Models with Template-based Data Generation
·415 words·2 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
Researchers created TemplateGSM, a massive dataset of 7M+ grade-school math problems and solutions, using GPT-4 to generate templates, significantly advancing LLM training for mathematical reasoning.
TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video
·4076 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Tsinghua University
TAPTRv3 achieves state-of-the-art long-video point tracking by cleverly using spatial and temporal context to enhance feature querying, surpassing previous methods and demonstrating strong performance…
Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS
·2022 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
HiAR-ICL, a novel automated reasoning paradigm using Monte Carlo Tree Search, surpasses state-of-the-art accuracy in complex mathematical reasoning by shifting focus from specific examples to abstract…
Patience Is The Key to Large Language Model Reasoning
·477 words·3 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
Boosting Large Language Model (LLM) reasoning without massive datasets: A novel training method encourages ‘patient’ reasoning, improving accuracy by up to 6.7% on benchmark tasks.
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
·3206 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
SageAttention2 achieves 4-bit accurate attention, boosting inference speed by 2x compared to FlashAttention2, while maintaining end-to-end accuracy across diverse models.
AnimateAnything: Consistent and Controllable Animation for Video Generation
·2615 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Tsinghua University
AnimateAnything: A unified approach enabling precise & consistent video manipulation via a novel optical flow representation and frequency stabilization.
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
·4459 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Tsinghua University
Boosting multimodal reasoning in LLMs, researchers developed Mixed Preference Optimization (MPO) and a large-scale dataset (MMPR), significantly improving reasoning accuracy and achieving performance …
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
·2885 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
LLaMA-Mesh: Unifying 3D mesh generation with LLMs by directly representing meshes as text, enabling efficient text-to-3D conversion within a single model.
Sharingan: Extract User Action Sequence from Desktop Recordings
·9852 words·47 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Tsinghua University
Sharingan extracts user action sequences from desktop recordings using novel VLM-based methods, achieving 70-80% accuracy and enabling RPA.
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
·4045 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Tsinghua University
JanusFlow harmonizes autoregression and rectified flow for unified multimodal understanding and generation, achieving state-of-the-art results on standard benchmarks.
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
·2263 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tsinghua University
DimensionX generates photorealistic 3D and 4D scenes from a single image via controllable video diffusion, enabling precise manipulation of spatial structure and temporal dynamics.
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
·3659 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
WEBRL: A self-evolving online curriculum reinforcement learning framework empowers open LLMs to excel as high-performing web agents, surpassing proprietary models.
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
·4028 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
Researchers discovered predictable scaling laws for activation sparsity in LLMs, showing how data, architecture, and model size influence sparsity, paving the way for more efficient and interpretable …
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
·3111 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 Tsinghua University
DeeR-VLA dynamically adjusts the size of a multimodal large language model based on task difficulty, significantly reducing computational cost and memory usage in robotic control without compromising …
Constraint Back-translation Improves Complex Instruction Following of Large Language Models
·3717 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
Constraint Back-translation enhances complex instruction following in LLMs by leveraging inherent constraints in existing datasets for efficient high-quality data creation.
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents
·3766 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Human-AI Interaction 🏢 Tsinghua University
ANDROIDLAB, a novel framework, systematically benchmarks Android autonomous agents, improving LLM and LMM success rates on 138 tasks via a unified environment and open-source dataset.