🏢 Tsinghua University
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
·2885 words·14 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tsinghua University
LLaMA-Mesh: Unifying 3D mesh generation with LLMs by directly representing meshes as text, enabling efficient text-to-3D conversion within a single model.
Sharingan: Extract User Action Sequence from Desktop Recordings
·9852 words·47 mins
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Tsinghua University
Sharingan extracts user action sequences from desktop recordings using novel VLM-based methods, achieving 70-80% accuracy and enabling RPA.
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
·4045 words·19 mins
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Tsinghua University
JanusFlow harmonizes autoregression and rectified flow for unified multimodal understanding and generation, achieving state-of-the-art results on standard benchmarks.
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
·2263 words·11 mins
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Tsinghua University
DimensionX generates photorealistic 3D and 4D scenes from a single image via controllable video diffusion, enabling precise manipulation of spatial structure and temporal dynamics.
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
·3659 words·18 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tsinghua University
WEBRL: A self-evolving online curriculum reinforcement learning framework empowers open LLMs to excel as high-performing web agents, surpassing proprietary models.
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
·4028 words·19 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tsinghua University
Researchers discovered predictable scaling laws for activation sparsity in LLMs, showing how data, architecture, and model size influence sparsity, paving the way for more efficient and interpretable …
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
·3111 words·15 mins
AI Generated
🤗 Daily Papers
AI Applications
Robotics
🏢 Tsinghua University
DeeR-VLA dynamically adjusts the size of a multimodal large language model based on task difficulty, significantly reducing computational cost and memory usage in robotic control without compromising …
Constraint Back-translation Improves Complex Instruction Following of Large Language Models
·3717 words·18 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tsinghua University
Constraint Back-translation enhances complex instruction following in LLMs by leveraging inherent constraints in existing datasets for efficient high-quality data creation.
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents
·3766 words·18 mins
AI Generated
🤗 Daily Papers
Multimodal Learning
Human-AI Interaction
🏢 Tsinghua University
ANDROIDLAB, a novel framework, systematically benchmarks Android autonomous agents, improving LLM and LMM success rates on 138 tasks via a unified environment and open-source dataset.