🏢 Hong Kong Polytechnic University

Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection

26 September 2024·2040 words·10 mins· loading · loading

3D Vision 🏢 Hong Kong Polytechnic University

Voxel Mamba: a group-free 3D object detection method using state space models, achieving higher accuracy and efficiency by overcoming limitations of serialization-based Transformers.

Unveiling the Potential of Robustness in Selecting Conditional Average Treatment Effect Estimators

26 September 2024·1533 words·8 mins· loading · loading

AI Generated AI Theory Causality 🏢 Hong Kong Polytechnic University

A new, nuisance-free Distributionally Robust Metric (DRM) is proposed for selecting robust Conditional Average Treatment Effect (CATE) estimators, improving the reliability of personalized decision-ma…

Towards Safe Concept Transfer of Multi-Modal Diffusion via Causal Representation Editing

26 September 2024·3866 words·19 mins· loading · loading

AI Generated Multimodal Learning Vision-Language Models 🏢 Hong Kong Polytechnic University

Causal Representation Editing (CRE) improves safe image generation by precisely removing unsafe concepts from diffusion models, enhancing efficiency and flexibility.

Preventing Model Collapse in Deep Canonical Correlation Analysis by Noise Regularization

26 September 2024·2437 words·12 mins· loading · loading

Multimodal Learning Representation Learning 🏢 Hong Kong Polytechnic University

Noise Regularization rescues Deep Canonical Correlation Analysis from model collapse!

Preventing Dimensional Collapse in Self-Supervised Learning via Orthogonality Regularization

26 September 2024·2561 words·13 mins· loading · loading

Machine Learning Self-Supervised Learning 🏢 Hong Kong Polytechnic University

Orthogonal regularization prevents dimensional collapse in self-supervised learning, significantly boosting model performance across diverse benchmarks.

Personalized Adapter for Large Meteorology Model on Devices: Towards Weather Foundation Models

26 September 2024·7727 words·37 mins· loading · loading

AI Generated Machine Learning Federated Learning 🏢 Hong Kong Polytechnic University

LM-WEATHER uses pre-trained language models to create highly accurate, personalized weather models directly on resource-constrained devices, achieving state-of-the-art results with significantly reduc…

OwMatch: Conditional Self-Labeling with Consistency for Open-world Semi-Supervised Learning

26 September 2024·2493 words·12 mins· loading · loading

Machine Learning Semi-Supervised Learning 🏢 Hong Kong Polytechnic University

OwMatch: a novel framework conquering open-world semi-supervised learning challenges by combining conditional self-labeling and consistency for substantially enhanced accuracy across known and unknown…

One-Step Effective Diffusion Network for Real-World Image Super-Resolution

26 September 2024·2247 words·11 mins· loading · loading

Computer Vision Image Generation 🏢 Hong Kong Polytechnic University

OSEDiff: One-step diffusion network for real-world image super-resolution, achieving comparable or better results than multi-step methods with significantly reduced computational cost and improved ima…

MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map

26 September 2024·2608 words·13 mins· loading · loading

Large Language Models 🏢 Hong Kong Polytechnic University

MetaLA: Unified optimal linear approximation to softmax attention map, achieving linear complexity and surpassing existing models in various benchmarks.

KnowGPT: Knowledge Graph based Prompting for Large Language Models

26 September 2024·1971 words·10 mins· loading · loading

Natural Language Processing Question Answering 🏢 Hong Kong Polytechnic University

KnowGPT: A novel framework boosts Large Language Model accuracy by intelligently integrating knowledge graphs, significantly reducing factual errors and achieving near-human performance on benchmark d…

Entity Alignment with Noisy Annotations from Large Language Models

26 September 2024·1820 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Hong Kong Polytechnic University

LLM4EA: A novel framework efficiently merges knowledge graphs using LLMs, overcoming noisy annotations and high costs via active learning and unsupervised label refinement, boosting accuracy and effic…

Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning

26 September 2024·2723 words·13 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Hong Kong Polytechnic University

DEMO framework enhances text-to-video generation by decomposing text encoding and conditioning into content and motion components, resulting in videos with significantly improved motion dynamics.

Cross-modal Representation Flattening for Multi-modal Domain Generalization

26 September 2024·3259 words·16 mins· loading · loading

AI Generated Multimodal Learning Vision-Language Models 🏢 Hong Kong Polytechnic University

Cross-Modal Representation Flattening (CMRF) improves multi-modal domain generalization by creating consistent flat loss regions and enhancing knowledge transfer between modalities, outperforming exis…

Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions

26 September 2024·2284 words·11 mins· loading · loading

Vision-Language Models 🏢 Hong Kong Polytechnic University

Can AI understand humor? A new benchmark, YESBUT, reveals that even state-of-the-art models struggle with the nuanced humor of juxtaposed comics, highlighting the need for improved AI in understandin…

Cost-efficient Knowledge-based Question Answering with Large Language Models

26 September 2024·1874 words·9 mins· loading · loading

AI Generated Natural Language Processing Question Answering 🏢 Hong Kong Polytechnic University

Coke: A cost-efficient KBQA strategy using LLMs and KGMs, maximizing accuracy while minimizing GPT-4 fees by up to 20.89%

Addressing Asynchronicity in Clinical Multimodal Fusion via Individualized Chest X-ray Generation

26 September 2024·3843 words·19 mins· loading · loading

AI Applications Healthcare 🏢 Hong Kong Polytechnic University

DDL-CXR dynamically generates up-to-date chest X-ray image representations using latent diffusion models, effectively addressing asynchronous multimodal clinical data for improved prediction.

AdaNeg: Adaptive Negative Proxy Guided OOD Detection with Vision-Language Models

26 September 2024·2295 words·11 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Hong Kong Polytechnic University

AdaNeg dynamically generates negative proxies during testing to improve vision-language model OOD detection, significantly outperforming existing methods on ImageNet.