🏢 City University of Hong Kong
Revisiting the Integration of Convolution and Attention for Vision Backbone
·2197 words·11 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 City University of Hong Kong
GLMix: A novel vision backbone efficiently integrates convolutions and multi-head self-attention at different granularities, achieving state-of-the-art performance while addressing scalability issues.
PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference
·4348 words·21 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 City University of Hong Kong
PrefPaint: Aligning image inpainting diffusion models with human preferences using reinforcement learning, resulting in significantly improved visual appeal.
Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model
·2122 words·10 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 City University of Hong Kong
MSVMamba: A novel multi-scale vision model leveraging state-space models, achieves high accuracy in image classification and object detection while maintaining linear complexity, solving the long-rang…
Mixture of Adversarial LoRAs: Boosting Robust Generalization in Meta-Tuning
·2777 words·14 mins·
loading
·
loading
Computer Vision
Few-Shot Learning
🏢 City University of Hong Kong
Boosting Robust Few-Shot Learning with Adversarial Meta-Tuning!
LuSh-NeRF: Lighting up and Sharpening NeRFs for Low-light Scenes
·2414 words·12 mins·
loading
·
loading
Computer Vision
3D Vision
🏢 City University of Hong Kong
LuSh-NeRF: A novel model reconstructs sharp, bright NeRFs from hand-held low-light photos by sequentially modeling and removing noise and blur, outperforming existing methods.
LLM-ESR: Large Language Models Enhancement for Long-tailed Sequential Recommendation
·1957 words·10 mins·
loading
·
loading
AI Applications
Recommendation Systems
🏢 City University of Hong Kong
LLM-ESR enhances sequential recommendation by integrating semantic information from LLMs, significantly improving performance on long-tail users and items.
Learning Where to Edit Vision Transformers
·3346 words·16 mins·
loading
·
loading
AI Generated
Computer Vision
Image Classification
🏢 City University of Hong Kong
Meta-learning a hypernetwork on CutMix-augmented data enables data-efficient and precise correction of vision transformer errors by identifying optimal parameters for fine-tuning.
G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models
·2323 words·11 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 City University of Hong Kong
G3: A novel framework leverages Retrieval-Augmented Generation to achieve highly accurate worldwide image geolocalization, overcoming limitations of existing methods.
Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models
·4174 words·20 mins·
loading
·
loading
AI Generated
Computer Vision
3D Vision
🏢 City University of Hong Kong
OLIVINE uses visual foundation models for fine-grained image-to-LiDAR contrastive distillation, mitigating self-conflict issues and improving 3D representation learning.
CODA: A Correlation-Oriented Disentanglement and Augmentation Modeling Scheme for Better Resisting Subpopulation Shifts
·1907 words·9 mins·
loading
·
loading
Machine Learning
Deep Learning
🏢 City University of Hong Kong
CODA: A novel modeling scheme tackles subpopulation shifts in machine learning by disentangling spurious correlations, augmenting data strategically, and using reweighted consistency loss for improved…
Boosting Weakly Supervised Referring Image Segmentation via Progressive Comprehension
·5057 words·24 mins·
loading
·
loading
AI Generated
Natural Language Processing
Vision-Language Models
🏢 City University of Hong Kong
PCNet boosts weakly-supervised referring image segmentation by progressively processing textual cues, mimicking human comprehension, and significantly improving target localization.
BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction
·2165 words·11 mins·
loading
·
loading
AI Generated
AI Applications
Autonomous Vehicles
🏢 City University of Hong Kong
BehaviorGPT, a novel autoregressive Transformer, simulates realistic traffic agent behavior by modeling each time step as ‘current’, achieving top results in the 2024 Waymo Open Sim Agents Challenge.
Attention boosted Individualized Regression
·1826 words·9 mins·
loading
·
loading
AI Generated
AI Applications
Healthcare
🏢 City University of Hong Kong
Attention boosted Individualized Regression (AIR) provides a novel individualized modeling framework for matrix data, leveraging sample-specific internal relations without needing extra sample similar…
Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare
·2147 words·11 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 City University of Hong Kong
Compare2Score: A novel IQA model teaches large multimodal models to translate comparative image quality judgments into continuous quality scores, significantly outperforming existing methods.
A versatile informative diffusion model for single-cell ATAC-seq data generation and analysis
·1769 words·9 mins·
loading
·
loading
Machine Learning
Deep Learning
🏢 City University of Hong Kong
ATAC-Diff: A versatile diffusion model for high-quality single-cell ATAC-seq data generation and analysis, surpassing state-of-the-art.