🏢 Peking University
RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance
·2658 words·13 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 Peking University
RectifID personalizes image generation by cleverly guiding a diffusion model using off-the-shelf classifiers, achieving identity preservation without needing extra training data.
Reasons and Solutions for the Decline in Model Performance after Editing
·2167 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Peking University
Boosting large language model performance after knowledge editing: A new method (D4S) minimizes model damage by regulating the explosive growth of parameter layers, enabling multiple effective edits.
RAGraph: A General Retrieval-Augmented Graph Learning Framework
·2459 words·12 mins·
loading
·
loading
AI Generated
Machine Learning
Graph Neural Networks
🏢 Peking University
RAGRAPH, a novel retrieval-augmented graph learning framework, boosts GNN generalization by integrating external graph data, significantly outperforming state-of-the-art methods.
Quality-Improved and Property-Preserved Polarimetric Imaging via Complementarily Fusing
·1809 words·9 mins·
loading
·
loading
Computer Vision
Image Enhancement
🏢 Peking University
This paper introduces a novel three-phase neural network framework that significantly enhances the quality of polarimetric images by complementarily fusing degraded noisy and blurry snapshots, preserv…
PrivCirNet: Efficient Private Inference via Block Circulant Transformation
·3185 words·15 mins·
loading
·
loading
AI Theory
Privacy
🏢 Peking University
PrivCirNet accelerates private deep learning inference by cleverly transforming DNN weights into circulant matrices, converting matrix-vector multiplications into efficient 1D convolutions suitable fo…
Pre-Trained Multi-Goal Transformers with Prompt Optimization for Efficient Online Adaptation
·2369 words·12 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Peking University
MGPO: Efficient online RL adaptation via prompt optimization of pre-trained multi-goal transformers.
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
·3100 words·15 mins·
loading
·
loading
Large Language Models
🏢 Peking University
PiSSA, a novel parameter-efficient fine-tuning method, surpasses LoRA by initializing adapter matrices using the principal components of the original model, achieving faster convergence and enhanced p…
Pandora's Box: Towards Building Universal Attackers against Real-World Large Vision-Language Models
·2651 words·13 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 Peking University
Researchers developed a universal adversarial patch to fool real-world large vision-language models (LVLMs) across multiple tasks, without needing access to internal model details.
Panacea: Pareto Alignment via Preference Adaptation for LLMs
·2565 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Peking University
Panacea: a novel LLM alignment method achieving Pareto optimality via online preference adaptation using a single model.
Optimizing over Multiple Distributions under Generalized Quasar-Convexity Condition
·331 words·2 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Peking University
This paper proposes ‘generalized quasar-convexity’ to optimize problems with multiple probability distributions, offering adaptive algorithms with superior iteration complexities compared to existing …
Opponent Modeling based on Subgoal Inference
·2148 words·11 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Peking University
Opponent modeling based on subgoal inference (OMG) outperforms existing methods by inferring opponent subgoals, enabling better generalization to unseen opponents in multi-agent environments.
OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding
·2396 words·12 mins·
loading
·
loading
Computer Vision
3D Vision
🏢 Peking University
OpenGaussian achieves 3D point-level open vocabulary understanding using 3D Gaussian Splatting by training 3D instance features with high 3D consistency, employing a two-level codebook for feature dis…
One-Step Diffusion Distillation through Score Implicit Matching
·2065 words·10 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 Peking University
Score Implicit Matching (SIM) revolutionizes diffusion model distillation by creating high-quality, single-step generators from complex, multi-step models, achieving comparable performance and enablin…
On the Power of Small-size Graph Neural Networks for Linear Programming
·2361 words·12 mins·
loading
·
loading
AI Generated
AI Theory
Optimization
🏢 Peking University
Small-size Graph Neural Networks effectively solve Linear Programs!
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents
·3479 words·17 mins·
loading
·
loading
AI Generated
Multimodal Learning
Vision-Language Models
🏢 Peking University
OmniJARVIS: Unified vision-language-action tokenization enables open-world instruction-following agents via unified multimodal interaction data.
Multi-Agent Coordination via Multi-Level Communication
·1851 words·9 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Peking University
SeqComm, a novel multi-level communication scheme, tackles multi-agent coordination by leveraging asynchronous decision-making and a two-phase communication process for improved efficiency and theoret…
MotionBooth: Motion-Aware Customized Text-to-Video Generation
·2883 words·14 mins·
loading
·
loading
🏢 Peking University
MotionBooth: A new framework enabling precise control over both object and camera movements in customized text-to-video generation, achieving high-quality video while maintaining training efficiency.
MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts
·4224 words·20 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 Peking University
MoLE: Mixture of Low-rank Experts enhances human-centric text-to-image diffusion models by using low-rank modules trained on high-quality face and hand datasets to improve the realism of faces and han…
MO-DDN: A Coarse-to-Fine Attribute-based Exploration Agent for Multi-Object Demand-driven Navigation
·4206 words·20 mins·
loading
·
loading
AI Generated
Multimodal Learning
Embodied AI
🏢 Peking University
MO-DDN: A new benchmark and coarse-to-fine exploration agent boosts embodied AI’s ability to handle multi-object, preference-based task planning.
MemoryFormer : Minimize Transformer Computation by Removing Fully-Connected Layers
·2036 words·10 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Peking University
MemoryFormer drastically cuts large language model computation by replacing fully-connected layers with memory-efficient hashing, enabling faster and more scalable AI.