🏢 Hong Kong University of Science and Technology
Position: Interactive Generative Video as Next-Generation Game Engine
·1964 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Gaming
🏢 Hong Kong University of Science and Technology
Interactive Generative Video (IGV) can revolutionize game creation by using AI to generate endless, novel content for next-gen game engines.
Temporal Regularization Makes Your Video Generator Stronger
·3350 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Hong Kong University of Science and Technology
FluxFlow: Make your video generator stronger via temporal regularization!
Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation
·2806 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Hong Kong University of Science and Technology
Rewards Are Enough!
Long-Video Audio Synthesis with Multi-Agent Collaboration
·2152 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Audio-Visual Learning
🏢 Hong Kong University of Science and Technology
LVAS-Agent: Multi-agent system conquers long-video audio synthesis with collaborative dubbing, script, design, & more!
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization
·2300 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Hong Kong University of Science and Technology
LightGen: Efficient image generation via knowledge distillation and direct preference optimization.
Learning Few-Step Diffusion Models by Trajectory Distribution Matching
·4283 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Hong Kong University of Science and Technology
TDM: a new diffusion distillation paradigm unifying trajectory distillation and distribution matching, surpassing teachers in a data-free manner with state-of-the-art performance and low training cost…
RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification
·2593 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Hong Kong University of Science and Technology
RectifiedHR: Enables training-free high-resolution image generation via energy rectification, boosting both efficiency and effectiveness.
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
·3779 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Hong Kong University of Science and Technology
GOAT: Adaptively boosts LoRA with SVD & MoE alignment, closing the gap with Full FT.
Perovskite-LLM: Knowledge-Enhanced Large Language Models for Perovskite Solar Cell Research
·3084 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Hong Kong University of Science and Technology
Perovskite-LLM: a new knowledge-enhanced system boosts perovskite solar cell research by integrating a domain-specific knowledge graph, high-quality datasets, and specialized LLMs for superior knowled…
Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation
·2594 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Hong Kong University of Science and Technology
mmMamba: a novel framework creates linear-complexity multimodal models via distillation, drastically improving efficiency without sacrificing performance.
Atom of Thoughts for Markov LLM Test-Time Scaling
·2660 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Hong Kong University of Science and Technology
Atom of Thoughts (AOT) revolutionizes LLM test-time scaling by decomposing complex reasoning into independent sub-questions, drastically reducing computation while maintaining high accuracy.
FinMTEB: Finance Massive Text Embedding Benchmark
·3630 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Hong Kong University of Science and Technology
FinMTEB: A new benchmark reveals that general-purpose embedding models struggle in the finance domain; domain-specific models excel, and surprisingly, simple BoW outperforms sophisticated models on ce…
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
·3464 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Multimodal Reasoning
🏢 Hong Kong University of Science and Technology
ThinkDiff empowers text-to-image diffusion models with multimodal reasoning by aligning vision-language models to an LLM decoder, achieving state-of-the-art results on in-context reasoning benchmarks.
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction
·5174 words·25 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Hong Kong University of Science and Technology
CODEI/O: Condensing reasoning patterns from code into LLM training data for enhanced reasoning.
CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers
·2569 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Hong Kong University of Science and Technology
CustomVideoX: Zero-shot personalized video generation, exceeding existing methods in quality & consistency via 3D reference attention and dynamic adaptation.
Generating Symbolic World Models via Test-time Scaling of Large Language Models
·2722 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Hong Kong University of Science and Technology
LLMs excel at complex reasoning but struggle with planning; this paper introduces a test-time scaling approach that enhances LLMs’ PDDL reasoning, enabling high-quality PDDL domain generation, outperf…
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation
·4450 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Hong Kong University of Science and Technology
FlashVideo: Generate stunning high-resolution videos efficiently using a two-stage framework prioritizing fidelity and detail, achieving state-of-the-art results.
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis
·3315 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Text Generation
🏢 Hong Kong University of Science and Technology
Llasa, a novel single-Transformer TTS model, achieves state-of-the-art performance by scaling both training and inference compute, improving naturalness, prosody and emotional expressiveness.
Weak-to-Strong Diffusion with Reflection
·4655 words·22 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 Hong Kong University of Science and Technology
W2SD: A novel framework boosts diffusion model quality by using the difference between weak and strong models to refine sampling trajectories, achieving state-of-the-art performance.
GaussianAvatar-Editor: Photorealistic Animatable Gaussian Head Avatar Editor
·2208 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Hong Kong University of Science and Technology
GaussianAvatar-Editor enables photorealistic, text-driven editing of animatable 3D heads, solving motion occlusion and ensuring temporal consistency.