Skip to main content

🏢 Hong Kong Polytechnic University

Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection
·2040 words·10 mins· loading · loading
3D Vision 🏢 Hong Kong Polytechnic University
Voxel Mamba: a group-free 3D object detection method using state space models, achieving higher accuracy and efficiency by overcoming limitations of serialization-based Transformers.
Unveiling the Potential of Robustness in Selecting Conditional Average Treatment Effect Estimators
·1533 words·8 mins· loading · loading
AI Generated AI Theory Causality 🏢 Hong Kong Polytechnic University
A new, nuisance-free Distributionally Robust Metric (DRM) is proposed for selecting robust Conditional Average Treatment Effect (CATE) estimators, improving the reliability of personalized decision-ma…
Towards Safe Concept Transfer of Multi-Modal Diffusion via Causal Representation Editing
·3866 words·19 mins· loading · loading
AI Generated Multimodal Learning Vision-Language Models 🏢 Hong Kong Polytechnic University
Causal Representation Editing (CRE) improves safe image generation by precisely removing unsafe concepts from diffusion models, enhancing efficiency and flexibility.
Preventing Model Collapse in Deep Canonical Correlation Analysis by Noise Regularization
·2437 words·12 mins· loading · loading
Multimodal Learning Representation Learning 🏢 Hong Kong Polytechnic University
Noise Regularization rescues Deep Canonical Correlation Analysis from model collapse!
Preventing Dimensional Collapse in Self-Supervised Learning via Orthogonality Regularization
·2561 words·13 mins· loading · loading
Machine Learning Self-Supervised Learning 🏢 Hong Kong Polytechnic University
Orthogonal regularization prevents dimensional collapse in self-supervised learning, significantly boosting model performance across diverse benchmarks.
Personalized Adapter for Large Meteorology Model on Devices: Towards Weather Foundation Models
·7727 words·37 mins· loading · loading
AI Generated Machine Learning Federated Learning 🏢 Hong Kong Polytechnic University
LM-WEATHER uses pre-trained language models to create highly accurate, personalized weather models directly on resource-constrained devices, achieving state-of-the-art results with significantly reduc…
OwMatch: Conditional Self-Labeling with Consistency for Open-world Semi-Supervised Learning
·2493 words·12 mins· loading · loading
Machine Learning Semi-Supervised Learning 🏢 Hong Kong Polytechnic University
OwMatch: a novel framework conquering open-world semi-supervised learning challenges by combining conditional self-labeling and consistency for substantially enhanced accuracy across known and unknown…
One-Step Effective Diffusion Network for Real-World Image Super-Resolution
·2247 words·11 mins· loading · loading
Computer Vision Image Generation 🏢 Hong Kong Polytechnic University
OSEDiff: One-step diffusion network for real-world image super-resolution, achieving comparable or better results than multi-step methods with significantly reduced computational cost and improved ima…
MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
·2608 words·13 mins· loading · loading
Large Language Models 🏢 Hong Kong Polytechnic University
MetaLA: Unified optimal linear approximation to softmax attention map, achieving linear complexity and surpassing existing models in various benchmarks.
KnowGPT: Knowledge Graph based Prompting for Large Language Models
·1971 words·10 mins· loading · loading
Natural Language Processing Question Answering 🏢 Hong Kong Polytechnic University
KnowGPT: A novel framework boosts Large Language Model accuracy by intelligently integrating knowledge graphs, significantly reducing factual errors and achieving near-human performance on benchmark d…
Entity Alignment with Noisy Annotations from Large Language Models
·1820 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Hong Kong Polytechnic University
LLM4EA: A novel framework efficiently merges knowledge graphs using LLMs, overcoming noisy annotations and high costs via active learning and unsupervised label refinement, boosting accuracy and effic…
Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning
·2723 words·13 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 Hong Kong Polytechnic University
DEMO framework enhances text-to-video generation by decomposing text encoding and conditioning into content and motion components, resulting in videos with significantly improved motion dynamics.
Cross-modal Representation Flattening for Multi-modal Domain Generalization
·3259 words·16 mins· loading · loading
AI Generated Multimodal Learning Vision-Language Models 🏢 Hong Kong Polytechnic University
Cross-Modal Representation Flattening (CMRF) improves multi-modal domain generalization by creating consistent flat loss regions and enhancing knowledge transfer between modalities, outperforming exis…
Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions
·2284 words·11 mins· loading · loading
Vision-Language Models 🏢 Hong Kong Polytechnic University
Can AI understand humor? A new benchmark, YESBUT, reveals that even state-of-the-art models struggle with the nuanced humor of juxtaposed comics, highlighting the need for improved AI in understandin…
Cost-efficient Knowledge-based Question Answering with Large Language Models
·1874 words·9 mins· loading · loading
AI Generated Natural Language Processing Question Answering 🏢 Hong Kong Polytechnic University
Coke: A cost-efficient KBQA strategy using LLMs and KGMs, maximizing accuracy while minimizing GPT-4 fees by up to 20.89%
Addressing Asynchronicity in Clinical Multimodal Fusion via Individualized Chest X-ray Generation
·3843 words·19 mins· loading · loading
AI Applications Healthcare 🏢 Hong Kong Polytechnic University
DDL-CXR dynamically generates up-to-date chest X-ray image representations using latent diffusion models, effectively addressing asynchronous multimodal clinical data for improved prediction.
AdaNeg: Adaptive Negative Proxy Guided OOD Detection with Vision-Language Models
·2295 words·11 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 Hong Kong Polytechnic University
AdaNeg dynamically generates negative proxies during testing to improve vision-language model OOD detection, significantly outperforming existing methods on ImageNet.