Skip to main content

🏢 Nanyang Technological University

What Is Missing For Graph Homophily? Disentangling Graph Homophily For Graph Neural Networks
·2555 words·12 mins· loading · loading
AI Generated AI Theory Representation Learning 🏢 Nanyang Technological University
Tri-Hom disentangles graph homophily into label, structural, and feature aspects, providing a more comprehensive and accurate metric for predicting GNN performance.
Transferable Adversarial Attacks on SAM and Its Downstream Models
·2130 words·10 mins· loading · loading
Computer Vision Image Segmentation 🏢 Nanyang Technological University
UMI-GRAT: A universal meta-initialized and gradient robust adversarial attack effectively exploits vulnerabilities in the Segment Anything Model (SAM) and its fine-tuned downstream models, even withou…
Robust Fine-tuning of Zero-shot Models via Variance Reduction
·2809 words·14 mins· loading · loading
Computer Vision Vision-Language Models 🏢 Nanyang Technological University
Variance Reduction Fine-tuning (VRF) simultaneously boosts in-distribution and out-of-distribution accuracy in fine-tuned zero-shot models, overcoming the ID-OOD trade-off.
Reasoning Multi-Agent Behavioral Topology for Interactive Autonomous Driving
·4093 words·20 mins· loading · loading
AI Generated AI Applications Autonomous Vehicles 🏢 Nanyang Technological University
BeTopNet uses braid theory to create a topological representation of multi-agent future driving behaviors, improving prediction and planning accuracy in autonomous driving systems.
Open-Vocabulary Object Detection via Language Hierarchy
·2960 words·14 mins· loading · loading
Computer Vision Object Detection 🏢 Nanyang Technological University
Language Hierarchical Self-training (LHST) enhances weakly-supervised object detection by integrating language hierarchy, mitigating label mismatch, and improving generalization across diverse dataset…
MVGamba: Unify 3D Content Generation as State Space Sequence Modeling
·2497 words·12 mins· loading · loading
Computer Vision 3D Vision 🏢 Nanyang Technological University
MVGamba: A unified, feed-forward 3D content generation model achieving state-of-the-art quality and speed using an RNN-like state space model for efficient multi-view Gaussian reconstruction.
Mitigating Object Hallucination via Concentric Causal Attention
·2174 words·11 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 Nanyang Technological University
Concentric Causal Attention (CCA) significantly reduces object hallucination in LVLMs by cleverly reorganizing visual tokens to mitigate the impact of long-term decay in Rotary Position Encoding.
Learning to Handle Complex Constraints for Vehicle Routing Problems
·3237 words·16 mins· loading · loading
AI Theory Optimization 🏢 Nanyang Technological University
Proactive Infeasibility Prevention (PIP) framework significantly improves neural methods for solving complex Vehicle Routing Problems by proactively preventing infeasible solutions and enhancing const…
Learning 3D Garment Animation from Trajectories of A Piece of Cloth
·2097 words·10 mins· loading · loading
Computer Vision 3D Vision 🏢 Nanyang Technological University
Animates diverse garments realistically from a single cloth’s trajectory using a disentangled learning approach and Energy Unit Network (EUNet).
Hybrid Mamba for Few-Shot Segmentation
·2385 words·12 mins· loading · loading
Computer Vision Image Segmentation 🏢 Nanyang Technological University
Hybrid Mamba Network (HMNet) boosts few-shot segmentation accuracy by efficiently fusing support and query features using a novel hybrid Mamba architecture, significantly outperforming current state-o…
Historical Test-time Prompt Tuning for Vision Foundation Models
·2286 words·11 mins· loading · loading
Computer Vision Image Segmentation 🏢 Nanyang Technological University
HisTPT: Historical Test-Time Prompt Tuning memorizes past learning, enabling robust online prompt adaptation for vision models, overcoming performance degradation in continuously changing data streams…
Generalizable Implicit Motion Modeling for Video Frame Interpolation
·2114 words·10 mins· loading · loading
Computer Vision Video Understanding 🏢 Nanyang Technological University
Generalizable Implicit Motion Modeling (GIMM) revolutionizes video frame interpolation by accurately predicting optical flows at any timestep, surpassing existing methods and achieving state-of-the-ar…
FASTopic: Pretrained Transformer is a Fast, Adaptive, Stable, and Transferable Topic Model
·3348 words·16 mins· loading · loading
AI Generated Natural Language Processing Topic Modeling 🏢 Nanyang Technological University
FASTopic: a pretrained transformer-based topic model achieving superior speed, adaptivity, stability, and transferability compared to existing methods.
Distributed-Order Fractional Graph Operating Network
·2406 words·12 mins· loading · loading
🏢 Nanyang Technological University
DRAGON: A novel GNN framework using distributed-order fractional calculus surpasses traditional methods by capturing complex graph dynamics with enhanced flexibility and performance.
ContextGS : Compact 3D Gaussian Splatting with Anchor Level Context Model
·1913 words·9 mins· loading · loading
Computer Vision 3D Vision 🏢 Nanyang Technological University
ContextGS: Revolutionizing 3D scene compression with an anchor-level autoregressive model, achieving 15x size reduction in 3D Gaussian Splatting while boosting rendering quality.
Beware of Road Markings: A New Adversarial Patch Attack to Monocular Depth Estimation
·2480 words·12 mins· loading · loading
AI Applications Autonomous Vehicles 🏢 Nanyang Technological University
Researchers developed AdvRM, a new adversarial patch attack against monocular depth estimation models, which effectively camouflages patches as road markings to mislead depth predictions for any obsta…
ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users
·3873 words·19 mins· loading · loading
AI Generated Computer Vision Image Generation 🏢 Nanyang Technological University
ART: A novel automatic red-teaming framework reveals safety vulnerabilities in popular text-to-image models by identifying unsafe outputs even from seemingly harmless prompts.