🏢 Nanyang Technological University

What Is Missing For Graph Homophily? Disentangling Graph Homophily For Graph Neural Networks

26 September 2024·2555 words·12 mins· loading · loading

AI Generated AI Theory Representation Learning 🏢 Nanyang Technological University

Tri-Hom disentangles graph homophily into label, structural, and feature aspects, providing a more comprehensive and accurate metric for predicting GNN performance.

Transferable Adversarial Attacks on SAM and Its Downstream Models

26 September 2024·2130 words·10 mins· loading · loading

Computer Vision Image Segmentation 🏢 Nanyang Technological University

UMI-GRAT: A universal meta-initialized and gradient robust adversarial attack effectively exploits vulnerabilities in the Segment Anything Model (SAM) and its fine-tuned downstream models, even withou…

Robust Fine-tuning of Zero-shot Models via Variance Reduction

26 September 2024·2809 words·14 mins· loading · loading

Computer Vision Vision-Language Models 🏢 Nanyang Technological University

Variance Reduction Fine-tuning (VRF) simultaneously boosts in-distribution and out-of-distribution accuracy in fine-tuned zero-shot models, overcoming the ID-OOD trade-off.

Reasoning Multi-Agent Behavioral Topology for Interactive Autonomous Driving

26 September 2024·4093 words·20 mins· loading · loading

AI Generated AI Applications Autonomous Vehicles 🏢 Nanyang Technological University

BeTopNet uses braid theory to create a topological representation of multi-agent future driving behaviors, improving prediction and planning accuracy in autonomous driving systems.

Open-Vocabulary Object Detection via Language Hierarchy

26 September 2024·2960 words·14 mins· loading · loading

Computer Vision Object Detection 🏢 Nanyang Technological University

Language Hierarchical Self-training (LHST) enhances weakly-supervised object detection by integrating language hierarchy, mitigating label mismatch, and improving generalization across diverse dataset…

MVGamba: Unify 3D Content Generation as State Space Sequence Modeling

26 September 2024·2497 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Nanyang Technological University

MVGamba: A unified, feed-forward 3D content generation model achieving state-of-the-art quality and speed using an RNN-like state space model for efficient multi-view Gaussian reconstruction.

Mitigating Object Hallucination via Concentric Causal Attention

26 September 2024·2174 words·11 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Nanyang Technological University

Concentric Causal Attention (CCA) significantly reduces object hallucination in LVLMs by cleverly reorganizing visual tokens to mitigate the impact of long-term decay in Rotary Position Encoding.

Learning to Handle Complex Constraints for Vehicle Routing Problems

26 September 2024·3237 words·16 mins· loading · loading

AI Theory Optimization 🏢 Nanyang Technological University

Proactive Infeasibility Prevention (PIP) framework significantly improves neural methods for solving complex Vehicle Routing Problems by proactively preventing infeasible solutions and enhancing const…

Learning 3D Garment Animation from Trajectories of A Piece of Cloth

26 September 2024·2097 words·10 mins· loading · loading

Computer Vision 3D Vision 🏢 Nanyang Technological University

Animates diverse garments realistically from a single cloth’s trajectory using a disentangled learning approach and Energy Unit Network (EUNet).

Hybrid Mamba for Few-Shot Segmentation

26 September 2024·2385 words·12 mins· loading · loading

Computer Vision Image Segmentation 🏢 Nanyang Technological University

Hybrid Mamba Network (HMNet) boosts few-shot segmentation accuracy by efficiently fusing support and query features using a novel hybrid Mamba architecture, significantly outperforming current state-o…

Historical Test-time Prompt Tuning for Vision Foundation Models

26 September 2024·2286 words·11 mins· loading · loading

Computer Vision Image Segmentation 🏢 Nanyang Technological University

HisTPT: Historical Test-Time Prompt Tuning memorizes past learning, enabling robust online prompt adaptation for vision models, overcoming performance degradation in continuously changing data streams…

Generalizable Implicit Motion Modeling for Video Frame Interpolation

26 September 2024·2114 words·10 mins· loading · loading

Computer Vision Video Understanding 🏢 Nanyang Technological University

Generalizable Implicit Motion Modeling (GIMM) revolutionizes video frame interpolation by accurately predicting optical flows at any timestep, surpassing existing methods and achieving state-of-the-ar…

FASTopic: Pretrained Transformer is a Fast, Adaptive, Stable, and Transferable Topic Model

26 September 2024·3348 words·16 mins· loading · loading

AI Generated Natural Language Processing Topic Modeling 🏢 Nanyang Technological University

FASTopic: a pretrained transformer-based topic model achieving superior speed, adaptivity, stability, and transferability compared to existing methods.

Distributed-Order Fractional Graph Operating Network

26 September 2024·2406 words·12 mins· loading · loading

🏢 Nanyang Technological University

DRAGON: A novel GNN framework using distributed-order fractional calculus surpasses traditional methods by capturing complex graph dynamics with enhanced flexibility and performance.

ContextGS : Compact 3D Gaussian Splatting with Anchor Level Context Model

26 September 2024·1913 words·9 mins· loading · loading

Computer Vision 3D Vision 🏢 Nanyang Technological University

ContextGS: Revolutionizing 3D scene compression with an anchor-level autoregressive model, achieving 15x size reduction in 3D Gaussian Splatting while boosting rendering quality.

Beware of Road Markings: A New Adversarial Patch Attack to Monocular Depth Estimation

26 September 2024·2480 words·12 mins· loading · loading

AI Applications Autonomous Vehicles 🏢 Nanyang Technological University

Researchers developed AdvRM, a new adversarial patch attack against monocular depth estimation models, which effectively camouflages patches as road markings to mislead depth predictions for any obsta…

ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users

26 September 2024·3873 words·19 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Nanyang Technological University

ART: A novel automatic red-teaming framework reveals safety vulnerabilities in popular text-to-image models by identifying unsafe outputs even from seemingly harmless prompts.