🏢 Hong Kong University of Science and Technology

Position: Interactive Generative Video as Next-Generation Game Engine

21 March 2025·1964 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Gaming 🏢 Hong Kong University of Science and Technology

Interactive Generative Video (IGV) can revolutionize game creation by using AI to generate endless, novel content for next-gen game engines.

Temporal Regularization Makes Your Video Generator Stronger

19 March 2025·3350 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Hong Kong University of Science and Technology

FluxFlow: Make your video generator stronger via temporal regularization!

Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation

17 March 2025·2806 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology

Rewards Are Enough!

Long-Video Audio Synthesis with Multi-Agent Collaboration

13 March 2025·2152 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Audio-Visual Learning 🏢 Hong Kong University of Science and Technology

LVAS-Agent: Multi-agent system conquers long-video audio synthesis with collaborative dubbing, script, design, & more!

LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization

11 March 2025·2300 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology

LightGen: Efficient image generation via knowledge distillation and direct preference optimization.

Learning Few-Step Diffusion Models by Trajectory Distribution Matching

9 March 2025·4283 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology

TDM: a new diffusion distillation paradigm unifying trajectory distillation and distribution matching, surpassing teachers in a data-free manner with state-of-the-art performance and low training cost…

RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification

4 March 2025·2593 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology

RectifiedHR: Enables training-free high-resolution image generation via energy rectification, boosting both efficiency and effectiveness.

Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

24 February 2025·3779 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology

GOAT: Adaptively boosts LoRA with SVD & MoE alignment, closing the gap with Full FT.

Perovskite-LLM: Knowledge-Enhanced Large Language Models for Perovskite Solar Cell Research

18 February 2025·3084 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology

Perovskite-LLM: a new knowledge-enhanced system boosts perovskite solar cell research by integrating a domain-specific knowledge graph, high-quality datasets, and specialized LLMs for superior knowled…

Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation

18 February 2025·2594 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Hong Kong University of Science and Technology

mmMamba: a novel framework creates linear-complexity multimodal models via distillation, drastically improving efficiency without sacrificing performance.

Atom of Thoughts for Markov LLM Test-Time Scaling

17 February 2025·2660 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology

Atom of Thoughts (AOT) revolutionizes LLM test-time scaling by decomposing complex reasoning into independent sub-questions, drastically reducing computation while maintaining high accuracy.

FinMTEB: Finance Massive Text Embedding Benchmark

16 February 2025·3630 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology

FinMTEB: A new benchmark reveals that general-purpose embedding models struggle in the finance domain; domain-specific models excel, and surprisingly, simple BoW outperforms sophisticated models on ce…

I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models

12 February 2025·3464 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Reasoning 🏢 Hong Kong University of Science and Technology

ThinkDiff empowers text-to-image diffusion models with multimodal reasoning by aligning vision-language models to an LLM decoder, achieving state-of-the-art results on in-context reasoning benchmarks.

CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction

11 February 2025·5174 words·25 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology

CODEI/O: Condensing reasoning patterns from code into LLM training data for enhanced reasoning.

CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers

10 February 2025·2569 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology

CustomVideoX: Zero-shot personalized video generation, exceeding existing methods in quality & consistency via 3D reference attention and dynamic adaptation.

Generating Symbolic World Models via Test-time Scaling of Large Language Models

7 February 2025·2722 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology

LLMs excel at complex reasoning but struggle with planning; this paper introduces a test-time scaling approach that enhances LLMs’ PDDL reasoning, enabling high-quality PDDL domain generation, outperf…

FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

7 February 2025·4450 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Hong Kong University of Science and Technology

FlashVideo: Generate stunning high-resolution videos efficiently using a two-stage framework prioritizing fidelity and detail, achieving state-of-the-art results.

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

6 February 2025·3315 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Text Generation 🏢 Hong Kong University of Science and Technology

Llasa, a novel single-Transformer TTS model, achieves state-of-the-art performance by scaling both training and inference compute, improving naturalness, prosody and emotional expressiveness.

Weak-to-Strong Diffusion with Reflection

1 February 2025·4655 words·22 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Hong Kong University of Science and Technology

W2SD: A novel framework boosts diffusion model quality by using the difference between weak and strong models to refine sampling trajectories, achieving state-of-the-art performance.

GaussianAvatar-Editor: Photorealistic Animatable Gaussian Head Avatar Editor

17 January 2025·2208 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Hong Kong University of Science and Technology

GaussianAvatar-Editor enables photorealistic, text-driven editing of animatable 3D heads, solving motion occlusion and ensuring temporal consistency.