🏢 Tsinghua University

Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion

26 September 2024·3025 words·15 mins· loading · loading

AI Generated AI Applications Robotics 🏢 Tsinghua University

Make-An-Agent generates high-performing robotic control policies from single behavioral demonstrations using behavior-prompted diffusion, showcasing impressive generalization and real-world applicabil…

Make Continual Learning Stronger via C-Flat

26 September 2024·2055 words·10 mins· loading · loading

Machine Learning Continual Learning 🏢 Tsinghua University

Boost continual learning with C-Flat: a novel, one-line-code optimizer creating flatter loss landscapes for enhanced stability and generalization across various continual learning scenarios.

LoRA-GA: Low-Rank Adaptation with Gradient Approximation

26 September 2024·2382 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tsinghua University

LoRA-GA: A novel initialization method dramatically speeds up low-rank adaptation (LoRA) for LLMs, achieving convergence rates comparable to full fine-tuning while improving performance.

Learning Cooperative Trajectory Representations for Motion Forecasting

26 September 2024·3261 words·16 mins· loading · loading

AI Generated AI Applications Autonomous Vehicles 🏢 Tsinghua University

V2X-Graph: a novel cooperative motion forecasting framework achieving interpretable trajectory feature fusion for enhanced accuracy.

Learning 1D Causal Visual Representation with De-focus Attention Networks

26 September 2024·2168 words·11 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Tsinghua University

De-focus Attention Networks achieve comparable performance to 2D non-causal models using 1D causal visual representation, solving the ‘over-focus’ issue in existing 1D causal vision models.

LCM: Locally Constrained Compact Point Cloud Model for Masked Point Modeling

26 September 2024·2913 words·14 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Tsinghua University

LCM: a novel, locally constrained, compact point cloud model surpasses Transformer-based methods by significantly improving performance and efficiency in various downstream tasks.

iVideoGPT: Interactive VideoGPTs are Scalable World Models

26 September 2024·3466 words·17 mins· loading · loading

AI Applications Robotics 🏢 Tsinghua University

iVideoGPT: A scalable, interactive world model trained on millions of human & robot manipulation videos, enabling efficient video prediction and model-based reinforcement learning.

Instruction-Guided Visual Masking

26 September 2024·3666 words·18 mins· loading · loading

AI Generated Multimodal Learning Vision-Language Models 🏢 Tsinghua University

Instruction-Guided Visual Masking (IVM) boosts multimodal instruction following by precisely focusing models on relevant image regions via visual masking, achieving state-of-the-art results on multipl…

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

26 September 2024·2045 words·10 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Tsinghua University

InfLLM: Training-free long-context extrapolation for LLMs via efficient context memory.

Inferring Neural Signed Distance Functions by Overfitting on Single Noisy Point Clouds through Finetuning Data-Driven based Priors

26 September 2024·3586 words·17 mins· loading · loading

Computer Vision 3D Vision 🏢 Tsinghua University

This research presents LocalN2NM, a novel method for inferring neural signed distance functions (SDF) from single, noisy point clouds by finetuning data-driven priors, achieving faster inference and b…

Improving Adaptivity via Over-Parameterization in Sequence Models

26 September 2024·2081 words·10 mins· loading · loading

AI Generated AI Theory Generalization 🏢 Tsinghua University

Over-parameterized gradient descent dynamically adapts to signal structure, improving sequence model generalization and outperforming fixed-kernel methods.

ImOV3D: Learning Open Vocabulary Point Clouds 3D Object Detection from Only 2D Images

26 September 2024·2938 words·14 mins· loading · loading

Computer Vision 3D Vision 🏢 Tsinghua University

ImOV3D: Revolutionizing open-vocabulary 3D object detection by learning from 2D images alone!

Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model

26 September 2024·3058 words·15 mins· loading · loading

Computer Vision Image Generation 🏢 Tsinghua University

Researchers solve the conditional image leakage problem in image-to-video diffusion models by proposing a new inference strategy and a time-dependent noise distribution for training. This yields video…

HLM-Cite: Hybrid Language Model Workflow for Text-based Scientific Citation Prediction

26 September 2024·2361 words·12 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Tsinghua University

HLM-Cite: A hybrid language model workflow boosts scientific citation prediction accuracy by 17.6% and scales to 100K candidate papers, surpassing existing methods.

GuardT2I: Defending Text-to-Image Models from Adversarial Prompts

26 September 2024·3130 words·15 mins· loading · loading

AI Generated Multimodal Learning Vision-Language Models 🏢 Tsinghua University

GuardT2I: A novel framework defends text-to-image models against adversarial prompts by translating latent guidance embeddings into natural language, enabling effective adversarial prompt detection wi…

Graph-enhanced Optimizers for Structure-aware Recommendation Embedding Evolution

26 September 2024·2186 words·11 mins· loading · loading

AI Applications Recommendation Systems 🏢 Tsinghua University

SEvo, a novel embedding update mechanism, directly injects graph structural information into recommendation embeddings, boosting performance significantly while avoiding the computational overhead of …

GO4Align: Group Optimization for Multi-Task Alignment

26 September 2024·3370 words·16 mins· loading · loading

AI Generated Machine Learning Multi-Task Learning 🏢 Tsinghua University

GO4Align: Dynamically aligning multi-task learning to conquer task imbalance with superior efficiency!

GLinSAT: The General Linear Satisfiability Neural Network Layer By Accelerated Gradient Descent

26 September 2024·1911 words·9 mins· loading · loading

AI Theory Optimization 🏢 Tsinghua University

GLinSAT: A novel neural network layer efficiently solves general linear constraint satisfaction problems via accelerated gradient descent, enabling differentiable backpropagation and improved GPU perf…

GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation

26 September 2024·2032 words·10 mins· loading · loading

Computer Vision 3D Vision 🏢 Tsinghua University

GeoLRM: Generate stunning 3D models from just 21 images using a novel geometry-aware transformer, surpassing existing methods in efficiency and quality!

GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing

26 September 2024·2413 words·12 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Tsinghua University

GenArtist uses a multimodal large language model as an AI agent to unify image generation and editing, achieving state-of-the-art performance by decomposing complex tasks and leveraging a comprehensiv…