Skip to main content

🏢 Hong Kong University of Science and Technology

Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference
·1463 words·7 mins· loading · loading
🏢 Hong Kong University of Science and Technology
Reverse Transition Kernel (RTK) framework accelerates diffusion inference by enabling balanced subproblem decomposition, achieving superior convergence rates with RTK-MALA and RTK-ULD algorithms.
RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models
·2570 words·13 mins· loading · loading
Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology
RestoreAgent, an AI-powered image restoration agent, autonomously identifies and corrects multiple image degradations, exceeding human expert performance.
QGFN: Controllable Greediness with Action Values
·3928 words·19 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Hong Kong University of Science and Technology
QGFN boosts Generative Flow Networks (GFNs) by cleverly combining their sampling policy with an action-value estimate, creating controllable and efficient generation of high-reward samples.
Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency model
·4297 words·21 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning 🏢 Hong Kong University of Science and Technology
Offline RL struggles with OOD action overestimation. QDQ tackles this by penalizing uncertain Q-values using a consistency model, enhancing offline RL performance.
Phased Consistency Models
·5013 words·24 mins· loading · loading
Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology
Phased Consistency Models (PCMs) revolutionize diffusion model generation by overcoming LCM limitations, achieving superior speed and quality in image and video generation.
Performative Control for Linear Dynamical Systems
·426 words·2 mins· loading · loading
AI Generated AI Applications Finance 🏢 Hong Kong University of Science and Technology
Performative control, where control policies change system dynamics, is analyzed; offering sufficient conditions for unique solutions, and proposing a convergent algorithm for achieving them.
Parsimony or Capability? Decomposition Delivers Both in Long-term Time Series Forecasting
·1663 words·8 mins· loading · loading
🏢 Hong Kong University of Science and Technology
SSCNN, a novel decomposition-based model, achieves superior long-term time series forecasting accuracy using 99% fewer parameters than existing methods, proving that bigger isn’t always better.
LiT: Unifying LiDAR 'Languages' with LiDAR Translator
·2585 words·13 mins· loading · loading
AI Applications Autonomous Vehicles 🏢 Hong Kong University of Science and Technology
LiDAR Translator (LiT) unifies diverse LiDAR data through a novel data-driven translation framework, enabling zero-shot and multi-domain joint learning, thus improving autonomous driving systems.
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
·3222 words·16 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology
LISA, a layerwise importance sampling method, dramatically improves memory-efficient large language model fine-tuning, outperforming existing methods while using less GPU memory.
Learning an Actionable Discrete Diffusion Policy via Large-Scale Actionless Video Pre-Training
·2188 words·11 mins· loading · loading
AI Applications Robotics 🏢 Hong Kong University of Science and Technology
Actionable AI agents are trained efficiently via a novel framework, VPDD, which uses discrete diffusion to pre-train on massive human videos, and fine-tunes on limited robot data for superior multi-ta…
LaSe-E2V: Towards Language-guided Semantic-aware Event-to-Video Reconstruction
·2343 words·11 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 Hong Kong University of Science and Technology
LaSe-E2V: Language-guided semantic-aware event-to-video reconstruction uses text descriptions to improve video quality and consistency.
Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement Learning
·2281 words·11 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Hong Kong University of Science and Technology
Kaleidoscope: Learnable Masks for Heterogeneous MARL achieves high sample efficiency and policy diversity by using learnable masks for adaptive partial parameter sharing.
Improving Neural ODE Training with Temporal Adaptive Batch Normalization
·3052 words·15 mins· loading · loading
AI Generated Machine Learning Deep Learning 🏢 Hong Kong University of Science and Technology
Boosting Neural ODE training, Temporal Adaptive Batch Normalization (TA-BN) resolves traditional Batch Normalization’s limitations by providing a continuous-time counterpart, enabling deeper networks …
Improved Bayes Regret Bounds for Multi-Task Hierarchical Bayesian Bandit Algorithms
·1596 words·8 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Hong Kong University of Science and Technology
This paper significantly improves Bayes regret bounds for hierarchical Bayesian bandit algorithms, achieving logarithmic regret in finite action settings and enhanced bounds in multi-task linear and c…
HOPE: Shape Matching Via Aligning Different K-hop Neighbourhoods
·1940 words·10 mins· loading · loading
Computer Vision 3D Vision 🏢 Hong Kong University of Science and Technology
HOPE: a novel shape matching method achieving both accuracy and smoothness by aligning different k-hop neighborhoods and refining maps via local map distortion.
HAWK: Learning to Understand Open-World Video Anomalies
·3198 words·16 mins· loading · loading
Natural Language Processing Vision-Language Models 🏢 Hong Kong University of Science and Technology
HAWK: a novel framework leveraging interactive VLMs and motion modality achieves state-of-the-art performance in open-world video anomaly understanding, generating descriptions and answering questions…
GVKF: Gaussian Voxel Kernel Functions for Highly Efficient Surface Reconstruction in Open Scenes
·2497 words·12 mins· loading · loading
Computer Vision 3D Vision 🏢 Hong Kong University of Science and Technology
GVKF: A novel method achieves highly efficient and accurate 3D surface reconstruction in open scenes by integrating fast 3D Gaussian splatting with continuous scene representation using kernel regres…
GITA: Graph to Visual and Textual Integration for Vision-Language Graph Reasoning
·2396 words·12 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 Hong Kong University of Science and Technology
GITA, a novel framework, integrates visual graphs into language models for superior vision-language graph reasoning, outperforming existing LLMs and introducing the first vision-language dataset, GVLQ…
GIC: Gaussian-Informed Continuum for Physical Property Identification and Simulation
·2226 words·11 mins· loading · loading
3D Vision 🏢 Hong Kong University of Science and Technology
GIC: Novel hybrid framework leverages 3D Gaussian representation for accurate physical property estimation from visual observations, achieving state-of-the-art performance.
Free Lunch in Pathology Foundation Model: Task-specific Model Adaptation with Concept-Guided Feature Enhancement
·2805 words·14 mins· loading · loading
AI Applications Healthcare 🏢 Hong Kong University of Science and Technology
Boost pathology model accuracy with Concept Anchor-guided Task-specific Feature Enhancement (CATE)! This adaptable paradigm enhances feature extraction for specific tasks using task-relevant concepts,…