Skip to main content

šŸ¢ Xi'an Jiaotong University

TPR: Topology-Preserving Reservoirs for Generalized Zero-Shot Learning
·2613 words·13 mins· loading · loading
Multimodal Learning Vision-Language Models šŸ¢ Xi'an Jiaotong University
Topology-Preserving Reservoirs (TPR) enhances CLIP’s zero-shot learning by using a dual-space alignment and a topology-preserving objective to improve generalization to unseen classes, achieving state…
Prompt-Agnostic Adversarial Perturbation for Customized Diffusion Models
·3455 words·17 mins· loading · loading
Computer Vision Image Generation šŸ¢ Xi'an Jiaotong University
Prompt-Agnostic Adversarial Perturbation (PAP) defends customized diffusion models against image tampering, achieving superior generalization over prompt-specific methods.
OneActor: Consistent Subject Generation via Cluster-Conditioned Guidance
·3168 words·15 mins· loading · loading
Computer Vision Image Generation šŸ¢ Xi'an Jiaotong University
OneActor: One-shot tuning for consistent subject image generation, bypassing laborious backbone tuning via semantic guidance, achieving 4x faster speed.
Neural P$^3$M: A Long-Range Interaction Modeling Enhancer for Geometric GNNs
·2015 words·10 mins· loading · loading
Machine Learning Deep Learning šŸ¢ Xi'an Jiaotong University
Neural PĀ³M enhances geometric GNNs by incorporating mesh points to model long-range interactions in molecules, achieving state-of-the-art accuracy in predicting energy and forces.
Measuring Mutual Policy Divergence for Multi-Agent Sequential Exploration
·2042 words·10 mins· loading · loading
Machine Learning Reinforcement Learning šŸ¢ Xi'an Jiaotong University
MADPO, a novel MARL framework, uses mutual policy divergence maximization with conditional Cauchy-Schwarz divergence to enhance exploration and agent heterogeneity in sequential updating, outperformin…
Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering
·2344 words·12 mins· loading · loading
Natural Language Processing Question Answering šŸ¢ Xi'an Jiaotong University
New dataset MUSIC-AVQA-R and a multi-faceted cycle collaborative debiasing strategy significantly improve audio-visual question answering robustness.
Learning 3D Equivariant Implicit Function with Patch-Level Pose-Invariant Representation
·2788 words·14 mins· loading · loading
Computer Vision 3D Vision šŸ¢ Xi'an Jiaotong University
3D surface reconstruction revolutionized: PEIF leverages patch-level pose-invariant representations and 3D patch-level equivariance for state-of-the-art accuracy, even with varied poses and datasets!
IPM-LSTM: A Learning-Based Interior Point Method for Solving Nonlinear Programs
·2991 words·15 mins· loading · loading
AI Generated AI Theory Optimization šŸ¢ Xi'an Jiaotong University
IPM-LSTM accelerates nonlinear program solving by up to 70% using LSTM networks to approximate linear system solutions within the interior point method.
Flipped Classroom: Aligning Teacher Attention with Student in Generalized Category Discovery
·3153 words·15 mins· loading · loading
Image Classification šŸ¢ Xi'an Jiaotong University
FlipClass dynamically updates the teacher model in a teacher-student framework to align with the student’s attention, resolving learning inconsistencies and significantly improving generalized categor…
Concentrate Attention: Towards Domain-Generalizable Prompt Optimization for Language Models
·3084 words·15 mins· loading · loading
Natural Language Processing Text Classification šŸ¢ Xi'an Jiaotong University
Boost language model performance across domains with ‘Concentration’: a new prompt optimization objective that prioritizes stable, deep-layer attention.
ColJailBreak: Collaborative Generation and Editing for Jailbreaking Text-to-Image Deep Generation
·2067 words·10 mins· loading · loading
Computer Vision Image Generation šŸ¢ Xi'an Jiaotong University
ColJailBreak cleverly circumvents AI safety filters by first generating safe images and then subtly injecting unsafe content using image editing.