🏢 Tsinghua University
Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion
·3025 words·15 mins·
loading
·
loading
AI Generated
AI Applications
Robotics
🏢 Tsinghua University
Make-An-Agent generates high-performing robotic control policies from single behavioral demonstrations using behavior-prompted diffusion, showcasing impressive generalization and real-world applicabil…
Make Continual Learning Stronger via C-Flat
·2055 words·10 mins·
loading
·
loading
Machine Learning
Continual Learning
🏢 Tsinghua University
Boost continual learning with C-Flat: a novel, one-line-code optimizer creating flatter loss landscapes for enhanced stability and generalization across various continual learning scenarios.
LoRA-GA: Low-Rank Adaptation with Gradient Approximation
·2382 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Tsinghua University
LoRA-GA: A novel initialization method dramatically speeds up low-rank adaptation (LoRA) for LLMs, achieving convergence rates comparable to full fine-tuning while improving performance.
Learning Cooperative Trajectory Representations for Motion Forecasting
·3261 words·16 mins·
loading
·
loading
AI Generated
AI Applications
Autonomous Vehicles
🏢 Tsinghua University
V2X-Graph: a novel cooperative motion forecasting framework achieving interpretable trajectory feature fusion for enhanced accuracy.
Learning 1D Causal Visual Representation with De-focus Attention Networks
·2168 words·11 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 Tsinghua University
De-focus Attention Networks achieve comparable performance to 2D non-causal models using 1D causal visual representation, solving the ‘over-focus’ issue in existing 1D causal vision models.
LCM: Locally Constrained Compact Point Cloud Model for Masked Point Modeling
·2913 words·14 mins·
loading
·
loading
AI Generated
Computer Vision
3D Vision
🏢 Tsinghua University
LCM: a novel, locally constrained, compact point cloud model surpasses Transformer-based methods by significantly improving performance and efficiency in various downstream tasks.
iVideoGPT: Interactive VideoGPTs are Scalable World Models
·3466 words·17 mins·
loading
·
loading
AI Applications
Robotics
🏢 Tsinghua University
iVideoGPT: A scalable, interactive world model trained on millions of human & robot manipulation videos, enabling efficient video prediction and model-based reinforcement learning.
Instruction-Guided Visual Masking
·3666 words·18 mins·
loading
·
loading
AI Generated
Multimodal Learning
Vision-Language Models
🏢 Tsinghua University
Instruction-Guided Visual Masking (IVM) boosts multimodal instruction following by precisely focusing models on relevant image regions via visual masking, achieving state-of-the-art results on multipl…
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory
·2045 words·10 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Tsinghua University
InfLLM: Training-free long-context extrapolation for LLMs via efficient context memory.
Inferring Neural Signed Distance Functions by Overfitting on Single Noisy Point Clouds through Finetuning Data-Driven based Priors
·3586 words·17 mins·
loading
·
loading
Computer Vision
3D Vision
🏢 Tsinghua University
This research presents LocalN2NM, a novel method for inferring neural signed distance functions (SDF) from single, noisy point clouds by finetuning data-driven priors, achieving faster inference and b…
Improving Adaptivity via Over-Parameterization in Sequence Models
·2081 words·10 mins·
loading
·
loading
AI Generated
AI Theory
Generalization
🏢 Tsinghua University
Over-parameterized gradient descent dynamically adapts to signal structure, improving sequence model generalization and outperforming fixed-kernel methods.
ImOV3D: Learning Open Vocabulary Point Clouds 3D Object Detection from Only 2D Images
·2938 words·14 mins·
loading
·
loading
Computer Vision
3D Vision
🏢 Tsinghua University
ImOV3D: Revolutionizing open-vocabulary 3D object detection by learning from 2D images alone!
Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model
·3058 words·15 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 Tsinghua University
Researchers solve the conditional image leakage problem in image-to-video diffusion models by proposing a new inference strategy and a time-dependent noise distribution for training. This yields video…
HLM-Cite: Hybrid Language Model Workflow for Text-based Scientific Citation Prediction
·2361 words·12 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Tsinghua University
HLM-Cite: A hybrid language model workflow boosts scientific citation prediction accuracy by 17.6% and scales to 100K candidate papers, surpassing existing methods.
GuardT2I: Defending Text-to-Image Models from Adversarial Prompts
·3130 words·15 mins·
loading
·
loading
AI Generated
Multimodal Learning
Vision-Language Models
🏢 Tsinghua University
GuardT2I: A novel framework defends text-to-image models against adversarial prompts by translating latent guidance embeddings into natural language, enabling effective adversarial prompt detection wi…
Graph-enhanced Optimizers for Structure-aware Recommendation Embedding Evolution
·2186 words·11 mins·
loading
·
loading
AI Applications
Recommendation Systems
🏢 Tsinghua University
SEvo, a novel embedding update mechanism, directly injects graph structural information into recommendation embeddings, boosting performance significantly while avoiding the computational overhead of …
GO4Align: Group Optimization for Multi-Task Alignment
·3370 words·16 mins·
loading
·
loading
AI Generated
Machine Learning
Multi-Task Learning
🏢 Tsinghua University
GO4Align: Dynamically aligning multi-task learning to conquer task imbalance with superior efficiency!
GLinSAT: The General Linear Satisfiability Neural Network Layer By Accelerated Gradient Descent
·1911 words·9 mins·
loading
·
loading
AI Theory
Optimization
🏢 Tsinghua University
GLinSAT: A novel neural network layer efficiently solves general linear constraint satisfaction problems via accelerated gradient descent, enabling differentiable backpropagation and improved GPU perf…
GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation
·2032 words·10 mins·
loading
·
loading
Computer Vision
3D Vision
🏢 Tsinghua University
GeoLRM: Generate stunning 3D models from just 21 images using a novel geometry-aware transformer, surpassing existing methods in efficiency and quality!
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing
·2413 words·12 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 Tsinghua University
GenArtist uses a multimodal large language model as an AI agent to unify image generation and editing, achieving state-of-the-art performance by decomposing complex tasks and leveraging a comprehensiv…