🏢 City University of Hong Kong

Revisiting the Integration of Convolution and Attention for Vision Backbone

26 September 2024·2197 words·11 mins· loading · loading

Computer Vision Image Classification 🏢 City University of Hong Kong

GLMix: A novel vision backbone efficiently integrates convolutions and multi-head self-attention at different granularities, achieving state-of-the-art performance while addressing scalability issues.

PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference

26 September 2024·4348 words·21 mins· loading · loading

Computer Vision Image Generation 🏢 City University of Hong Kong

PrefPaint: Aligning image inpainting diffusion models with human preferences using reinforcement learning, resulting in significantly improved visual appeal.

Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model

26 September 2024·2122 words·10 mins· loading · loading

Computer Vision Image Classification 🏢 City University of Hong Kong

MSVMamba: A novel multi-scale vision model leveraging state-space models, achieves high accuracy in image classification and object detection while maintaining linear complexity, solving the long-rang…

Mixture of Adversarial LoRAs: Boosting Robust Generalization in Meta-Tuning

26 September 2024·2777 words·14 mins· loading · loading

Computer Vision Few-Shot Learning 🏢 City University of Hong Kong

Boosting Robust Few-Shot Learning with Adversarial Meta-Tuning!

LuSh-NeRF: Lighting up and Sharpening NeRFs for Low-light Scenes

26 September 2024·2414 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 City University of Hong Kong

LuSh-NeRF: A novel model reconstructs sharp, bright NeRFs from hand-held low-light photos by sequentially modeling and removing noise and blur, outperforming existing methods.

LLM-ESR: Large Language Models Enhancement for Long-tailed Sequential Recommendation

26 September 2024·1957 words·10 mins· loading · loading

AI Applications Recommendation Systems 🏢 City University of Hong Kong

LLM-ESR enhances sequential recommendation by integrating semantic information from LLMs, significantly improving performance on long-tail users and items.

Learning Where to Edit Vision Transformers

26 September 2024·3346 words·16 mins· loading · loading

AI Generated Computer Vision Image Classification 🏢 City University of Hong Kong

Meta-learning a hypernetwork on CutMix-augmented data enables data-efficient and precise correction of vision transformer errors by identifying optimal parameters for fine-tuning.

G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models

26 September 2024·2323 words·11 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 City University of Hong Kong

G3: A novel framework leverages Retrieval-Augmented Generation to achieve highly accurate worldwide image geolocalization, overcoming limitations of existing methods.

Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models

26 September 2024·4174 words·20 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 City University of Hong Kong

OLIVINE uses visual foundation models for fine-grained image-to-LiDAR contrastive distillation, mitigating self-conflict issues and improving 3D representation learning.

CODA: A Correlation-Oriented Disentanglement and Augmentation Modeling Scheme for Better Resisting Subpopulation Shifts

26 September 2024·1907 words·9 mins· loading · loading

Machine Learning Deep Learning 🏢 City University of Hong Kong

CODA: A novel modeling scheme tackles subpopulation shifts in machine learning by disentangling spurious correlations, augmenting data strategically, and using reweighted consistency loss for improved…

Boosting Weakly Supervised Referring Image Segmentation via Progressive Comprehension

26 September 2024·5057 words·24 mins· loading · loading

AI Generated Natural Language Processing Vision-Language Models 🏢 City University of Hong Kong

PCNet boosts weakly-supervised referring image segmentation by progressively processing textual cues, mimicking human comprehension, and significantly improving target localization.

BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction

26 September 2024·2165 words·11 mins· loading · loading

AI Generated AI Applications Autonomous Vehicles 🏢 City University of Hong Kong

BehaviorGPT, a novel autoregressive Transformer, simulates realistic traffic agent behavior by modeling each time step as ‘current’, achieving top results in the 2024 Waymo Open Sim Agents Challenge.

Attention boosted Individualized Regression

26 September 2024·1826 words·9 mins· loading · loading

AI Generated AI Applications Healthcare 🏢 City University of Hong Kong

Attention boosted Individualized Regression (AIR) provides a novel individualized modeling framework for matrix data, leveraging sample-specific internal relations without needing extra sample similar…

Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare

26 September 2024·2147 words·11 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 City University of Hong Kong

Compare2Score: A novel IQA model teaches large multimodal models to translate comparative image quality judgments into continuous quality scores, significantly outperforming existing methods.

A versatile informative diffusion model for single-cell ATAC-seq data generation and analysis

26 September 2024·1769 words·9 mins· loading · loading

Machine Learning Deep Learning 🏢 City University of Hong Kong

ATAC-Diff: A versatile diffusion model for high-quality single-cell ATAC-seq data generation and analysis, surpassing state-of-the-art.