🏢 Peking University

RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance

26 September 2024·2658 words·13 mins· loading · loading

Computer Vision Image Generation 🏢 Peking University

RectifID personalizes image generation by cleverly guiding a diffusion model using off-the-shelf classifiers, achieving identity preservation without needing extra training data.

Reasons and Solutions for the Decline in Model Performance after Editing

26 September 2024·2167 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Peking University

Boosting large language model performance after knowledge editing: A new method (D4S) minimizes model damage by regulating the explosive growth of parameter layers, enabling multiple effective edits.

RAGraph: A General Retrieval-Augmented Graph Learning Framework

26 September 2024·2459 words·12 mins· loading · loading

AI Generated Machine Learning Graph Neural Networks 🏢 Peking University

RAGRAPH, a novel retrieval-augmented graph learning framework, boosts GNN generalization by integrating external graph data, significantly outperforming state-of-the-art methods.

Quality-Improved and Property-Preserved Polarimetric Imaging via Complementarily Fusing

26 September 2024·1809 words·9 mins· loading · loading

Computer Vision Image Enhancement 🏢 Peking University

This paper introduces a novel three-phase neural network framework that significantly enhances the quality of polarimetric images by complementarily fusing degraded noisy and blurry snapshots, preserv…

PrivCirNet: Efficient Private Inference via Block Circulant Transformation

26 September 2024·3185 words·15 mins· loading · loading

AI Theory Privacy 🏢 Peking University

PrivCirNet accelerates private deep learning inference by cleverly transforming DNN weights into circulant matrices, converting matrix-vector multiplications into efficient 1D convolutions suitable fo…

Pre-Trained Multi-Goal Transformers with Prompt Optimization for Efficient Online Adaptation

26 September 2024·2369 words·12 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Peking University

MGPO: Efficient online RL adaptation via prompt optimization of pre-trained multi-goal transformers.

PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models

26 September 2024·3100 words·15 mins· loading · loading

Large Language Models 🏢 Peking University

PiSSA, a novel parameter-efficient fine-tuning method, surpasses LoRA by initializing adapter matrices using the principal components of the original model, achieving faster convergence and enhanced p…

Pandora's Box: Towards Building Universal Attackers against Real-World Large Vision-Language Models

26 September 2024·2651 words·13 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Peking University

Researchers developed a universal adversarial patch to fool real-world large vision-language models (LVLMs) across multiple tasks, without needing access to internal model details.

Panacea: Pareto Alignment via Preference Adaptation for LLMs

26 September 2024·2565 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Peking University

Panacea: a novel LLM alignment method achieving Pareto optimality via online preference adaptation using a single model.

Optimizing over Multiple Distributions under Generalized Quasar-Convexity Condition

26 September 2024·331 words·2 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Peking University

This paper proposes ‘generalized quasar-convexity’ to optimize problems with multiple probability distributions, offering adaptive algorithms with superior iteration complexities compared to existing …

Opponent Modeling based on Subgoal Inference

26 September 2024·2148 words·11 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Peking University

Opponent modeling based on subgoal inference (OMG) outperforms existing methods by inferring opponent subgoals, enabling better generalization to unseen opponents in multi-agent environments.

OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding

26 September 2024·2396 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Peking University

OpenGaussian achieves 3D point-level open vocabulary understanding using 3D Gaussian Splatting by training 3D instance features with high 3D consistency, employing a two-level codebook for feature dis…

One-Step Diffusion Distillation through Score Implicit Matching

26 September 2024·2065 words·10 mins· loading · loading

Computer Vision Image Generation 🏢 Peking University

Score Implicit Matching (SIM) revolutionizes diffusion model distillation by creating high-quality, single-step generators from complex, multi-step models, achieving comparable performance and enablin…

On the Power of Small-size Graph Neural Networks for Linear Programming

26 September 2024·2361 words·12 mins· loading · loading

AI Generated AI Theory Optimization 🏢 Peking University

Small-size Graph Neural Networks effectively solve Linear Programs!

OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

26 September 2024·3479 words·17 mins· loading · loading

AI Generated Multimodal Learning Vision-Language Models 🏢 Peking University

OmniJARVIS: Unified vision-language-action tokenization enables open-world instruction-following agents via unified multimodal interaction data.

Multi-Agent Coordination via Multi-Level Communication

26 September 2024·1851 words·9 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Peking University

SeqComm, a novel multi-level communication scheme, tackles multi-agent coordination by leveraging asynchronous decision-making and a two-phase communication process for improved efficiency and theoret…

MotionBooth: Motion-Aware Customized Text-to-Video Generation

26 September 2024·2883 words·14 mins· loading · loading

🏢 Peking University

MotionBooth: A new framework enabling precise control over both object and camera movements in customized text-to-video generation, achieving high-quality video while maintaining training efficiency.

MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts

26 September 2024·4224 words·20 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Peking University

MoLE: Mixture of Low-rank Experts enhances human-centric text-to-image diffusion models by using low-rank modules trained on high-quality face and hand datasets to improve the realism of faces and han…

MO-DDN: A Coarse-to-Fine Attribute-based Exploration Agent for Multi-Object Demand-driven Navigation

26 September 2024·4206 words·20 mins· loading · loading

AI Generated Multimodal Learning Embodied AI 🏢 Peking University

MO-DDN: A new benchmark and coarse-to-fine exploration agent boosts embodied AI’s ability to handle multi-object, preference-based task planning.

MemoryFormer : Minimize Transformer Computation by Removing Fully-Connected Layers

26 September 2024·2036 words·10 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Peking University

MemoryFormer drastically cuts large language model computation by replacing fully-connected layers with memory-efficient hashing, enabling faster and more scalable AI.