Paper Reviews by AI
2025
CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers
·2569 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Hong Kong University of Science and Technology
CustomVideoX: Zero-shot personalized video generation, exceeding existing methods in quality & consistency via 3D reference attention and dynamic adaptation.
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
·3884 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tsinghua University
Smaller LLMs can outperform larger ones by strategically increasing computation during inference, defying conventional LLM scaling.
Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance
·1752 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tongyi Lab, Alibaba Group
Animate Anyone 2 creates high-fidelity character animations by incorporating environmental context, resulting in seamless character-environment integration and more realistic object interactions.
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning
·507 words·3 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Stanford University
Language models learn effective social deduction strategies in a virtual game by using their goal to predict useful information as a dense reward signal, doubling win rates compared to standard RL.
The Curse of Depth in Large Language Models
·2429 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Medical Artificial Intelligence Laboratory, Westlake University
Deep layers in LLMs underperform due to Pre-Layer Normalization; LayerNorm Scaling resolves this by controlling output variance, significantly improving training efficiency.
LM2: Large Memory Models
·2722 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Convergence Labs Ltd
LM2: Large Memory Models enhance Transformers by adding an auxiliary memory module, significantly improving multi-step reasoning and long-context information synthesis.
Dual Caption Preference Optimization for Diffusion Models
·4961 words·24 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Arizona State University
Dual Caption Preference Optimization (DCPO) significantly boosts diffusion model image quality by using paired captions to resolve data distribution conflicts and irrelevant prompt issues.
3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly
·3328 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Segmentation
🏢 Shanghai University
3CAD: A new large-scale, real-world dataset with diverse 3C product anomalies boosts unsupervised anomaly detection, enabling superior algorithm development via a novel Coarse-to-Fine framework.
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation
·3420 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Shanghai Jiao Tong University
Show-o Turbo dramatically speeds up multimodal understanding and generation by leveraging parallel decoding and consistency distillation, achieving significant performance gains with fewer sampling st…
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding
·6090 words·29 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Carnegie Mellon University
APE: a novel method significantly speeds up context-augmented generation (CAG). By using adaptive parallel encoding, APE achieves a 4.5x speedup and maintains high accuracy even with 128K length cont…
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
·3961 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Fudan University
VideoRoPE enhances video processing in Transformer models by introducing a novel 3D rotary position embedding that preserves spatio-temporal relationships, resulting in superior performance across var…
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
·5939 words·28 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 University of Maryland
Boost LLM reasoning power at test time by recursively processing latent information, enabling dramatic performance gains with fewer parameters.
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations
·3320 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 ISTA
QuEST enables stable, accurate LLM training using only 1-bit weights and activations, achieving Pareto-optimal performance compared to higher-precision models.
QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
·5172 words·25 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 NVIDIA Research
QLIP: A new visual tokenizer unifying autoregressive multimodal understanding & generation with state-of-the-art reconstruction and zero-shot performance!
Goku: Flow Based Video Generative Foundation Models
·3430 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 University of Hong Kong
Goku: a novel family of joint image-and-video generation models uses rectified flow Transformers, achieving industry-leading performance with a robust data pipeline and training infrastructure.
Generating Symbolic World Models via Test-time Scaling of Large Language Models
·2722 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Hong Kong University of Science and Technology
LLMs excel at complex reasoning but struggle with planning; this paper introduces a test-time scaling approach that enhances LLMs’ PDDL reasoning, enabling high-quality PDDL domain generation, outperf…
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation
·4450 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Hong Kong University of Science and Technology
FlashVideo: Generate stunning high-resolution videos efficiently using a two-stage framework prioritizing fidelity and detail, achieving state-of-the-art results.
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
·2622 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 University of California, Los Angeles
DuoGuard: a novel two-player RL framework generates high-quality synthetic data, improving multilingual LLM safety by outperforming state-of-the-art models with a significantly smaller model size and …
AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting
·4072 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 National Yang Ming Chiao Tung University
AuraFusion360: High-quality 360° scene inpainting achieved via novel augmented unseen region alignment and a new benchmark dataset.
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning
·8117 words·39 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 University of British Columbia
ARR: A novel zero-shot prompting method significantly boosts LLM performance on diverse question-answering tasks by explicitly incorporating question analysis, information retrieval, and step-by-step …