🏢 Hong Kong University of Science and Technology

Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference

26 September 2024·1463 words·7 mins· loading · loading

Reverse Transition Kernel (RTK) framework accelerates diffusion inference by enabling balanced subproblem decomposition, achieving superior convergence rates with RTK-MALA and RTK-ULD algorithms.

RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models

26 September 2024·2570 words·13 mins· loading · loading

Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology

RestoreAgent, an AI-powered image restoration agent, autonomously identifies and corrects multiple image degradations, exceeding human expert performance.

QGFN: Controllable Greediness with Action Values

26 September 2024·3928 words·19 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Hong Kong University of Science and Technology

QGFN boosts Generative Flow Networks (GFNs) by cleverly combining their sampling policy with an action-value estimate, creating controllable and efficient generation of high-reward samples.

Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency model

26 September 2024·4297 words·21 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 Hong Kong University of Science and Technology

Offline RL struggles with OOD action overestimation. QDQ tackles this by penalizing uncertain Q-values using a consistency model, enhancing offline RL performance.

Phased Consistency Models

26 September 2024·5013 words·24 mins· loading · loading

Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology

Phased Consistency Models (PCMs) revolutionize diffusion model generation by overcoming LCM limitations, achieving superior speed and quality in image and video generation.

Performative Control for Linear Dynamical Systems

26 September 2024·426 words·2 mins· loading · loading

AI Generated AI Applications Finance 🏢 Hong Kong University of Science and Technology

Performative control, where control policies change system dynamics, is analyzed; offering sufficient conditions for unique solutions, and proposing a convergent algorithm for achieving them.

Parsimony or Capability? Decomposition Delivers Both in Long-term Time Series Forecasting

26 September 2024·1663 words·8 mins· loading · loading

🏢 Hong Kong University of Science and Technology

SSCNN, a novel decomposition-based model, achieves superior long-term time series forecasting accuracy using 99% fewer parameters than existing methods, proving that bigger isn’t always better.

LiT: Unifying LiDAR 'Languages' with LiDAR Translator

26 September 2024·2585 words·13 mins· loading · loading

AI Applications Autonomous Vehicles 🏢 Hong Kong University of Science and Technology

LiDAR Translator (LiT) unifies diverse LiDAR data through a novel data-driven translation framework, enabling zero-shot and multi-domain joint learning, thus improving autonomous driving systems.

LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

26 September 2024·3222 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology

LISA, a layerwise importance sampling method, dramatically improves memory-efficient large language model fine-tuning, outperforming existing methods while using less GPU memory.

Learning an Actionable Discrete Diffusion Policy via Large-Scale Actionless Video Pre-Training

26 September 2024·2188 words·11 mins· loading · loading

AI Applications Robotics 🏢 Hong Kong University of Science and Technology

Actionable AI agents are trained efficiently via a novel framework, VPDD, which uses discrete diffusion to pre-train on massive human videos, and fine-tunes on limited robot data for superior multi-ta…

LaSe-E2V: Towards Language-guided Semantic-aware Event-to-Video Reconstruction

26 September 2024·2343 words·11 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Hong Kong University of Science and Technology

LaSe-E2V: Language-guided semantic-aware event-to-video reconstruction uses text descriptions to improve video quality and consistency.

Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement Learning

26 September 2024·2281 words·11 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Hong Kong University of Science and Technology

Kaleidoscope: Learnable Masks for Heterogeneous MARL achieves high sample efficiency and policy diversity by using learnable masks for adaptive partial parameter sharing.

Improving Neural ODE Training with Temporal Adaptive Batch Normalization

26 September 2024·3052 words·15 mins· loading · loading

AI Generated Machine Learning Deep Learning 🏢 Hong Kong University of Science and Technology

Boosting Neural ODE training, Temporal Adaptive Batch Normalization (TA-BN) resolves traditional Batch Normalization’s limitations by providing a continuous-time counterpart, enabling deeper networks …

Improved Bayes Regret Bounds for Multi-Task Hierarchical Bayesian Bandit Algorithms

26 September 2024·1596 words·8 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Hong Kong University of Science and Technology

This paper significantly improves Bayes regret bounds for hierarchical Bayesian bandit algorithms, achieving logarithmic regret in finite action settings and enhanced bounds in multi-task linear and c…

HOPE: Shape Matching Via Aligning Different K-hop Neighbourhoods

26 September 2024·1940 words·10 mins· loading · loading

Computer Vision 3D Vision 🏢 Hong Kong University of Science and Technology

HOPE: a novel shape matching method achieving both accuracy and smoothness by aligning different k-hop neighborhoods and refining maps via local map distortion.

HAWK: Learning to Understand Open-World Video Anomalies

26 September 2024·3198 words·16 mins· loading · loading

Natural Language Processing Vision-Language Models 🏢 Hong Kong University of Science and Technology

HAWK: a novel framework leveraging interactive VLMs and motion modality achieves state-of-the-art performance in open-world video anomaly understanding, generating descriptions and answering questions…

GVKF: Gaussian Voxel Kernel Functions for Highly Efficient Surface Reconstruction in Open Scenes

26 September 2024·2497 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Hong Kong University of Science and Technology

GVKF: A novel method achieves highly efficient and accurate 3D surface reconstruction in open scenes by integrating fast 3D Gaussian splatting with continuous scene representation using kernel regres…

GITA: Graph to Visual and Textual Integration for Vision-Language Graph Reasoning

26 September 2024·2396 words·12 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Hong Kong University of Science and Technology

GITA, a novel framework, integrates visual graphs into language models for superior vision-language graph reasoning, outperforming existing LLMs and introducing the first vision-language dataset, GVLQ…

GIC: Gaussian-Informed Continuum for Physical Property Identification and Simulation

26 September 2024·2226 words·11 mins· loading · loading

3D Vision 🏢 Hong Kong University of Science and Technology

GIC: Novel hybrid framework leverages 3D Gaussian representation for accurate physical property estimation from visual observations, achieving state-of-the-art performance.

Free Lunch in Pathology Foundation Model: Task-specific Model Adaptation with Concept-Guided Feature Enhancement

26 September 2024·2805 words·14 mins· loading · loading

AI Applications Healthcare 🏢 Hong Kong University of Science and Technology

Boost pathology model accuracy with Concept Anchor-guided Task-specific Feature Enhancement (CATE)! This adaptable paradigm enhances feature extraction for specific tasks using task-relevant concepts,…