Paper Reviews by AI

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

10 February 2025·3884 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

Smaller LLMs can outperform larger ones by strategically increasing computation during inference, defying conventional LLM scaling.

Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance

10 February 2025·1752 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tongyi Lab, Alibaba Group

Animate Anyone 2 creates high-fidelity character animations by incorporating environmental context, resulting in seamless character-environment integration and more realistic object interactions.

Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning

9 February 2025·507 words·3 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Stanford University

Language models learn effective social deduction strategies in a virtual game by using their goal to predict useful information as a dense reward signal, doubling win rates compared to standard RL.

The Curse of Depth in Large Language Models

9 February 2025·2429 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Medical Artificial Intelligence Laboratory, Westlake University

Deep layers in LLMs underperform due to Pre-Layer Normalization; LayerNorm Scaling resolves this by controlling output variance, significantly improving training efficiency.

LM2: Large Memory Models

9 February 2025·2722 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Convergence Labs Ltd

LM2: Large Memory Models enhance Transformers by adding an auxiliary memory module, significantly improving multi-step reasoning and long-context information synthesis.

Dual Caption Preference Optimization for Diffusion Models

9 February 2025·4961 words·24 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Arizona State University

Dual Caption Preference Optimization (DCPO) significantly boosts diffusion model image quality by using paired captions to resolve data distribution conflicts and irrelevant prompt issues.

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly

9 February 2025·3328 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Segmentation 🏢 Shanghai University

3CAD: A new large-scale, real-world dataset with diverse 3C product anomalies boosts unsupervised anomaly detection, enabling superior algorithm development via a novel Coarse-to-Fine framework.

Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation

8 February 2025·3420 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Shanghai Jiao Tong University

Show-o Turbo dramatically speeds up multimodal understanding and generation by leveraging parallel decoding and consistency distillation, achieving significant performance gains with fewer sampling st…

APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding

8 February 2025·6090 words·29 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

APE: a novel method significantly speeds up context-augmented generation (CAG). By using adaptive parallel encoding, APE achieves a 4.5x speedup and maintains high accuracy even with 128K length cont…

VideoRoPE: What Makes for Good Video Rotary Position Embedding?

7 February 2025·3961 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Fudan University

VideoRoPE enhances video processing in Transformer models by introducing a novel 3D rotary position embedding that preserves spatio-temporal relationships, resulting in superior performance across var…

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

7 February 2025·5939 words·28 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Maryland

Boost LLM reasoning power at test time by recursively processing latent information, enabling dramatic performance gains with fewer parameters.

QuEST: Stable Training of LLMs with 1-Bit Weights and Activations

7 February 2025·3320 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 ISTA

QuEST enables stable, accurate LLM training using only 1-bit weights and activations, achieving Pareto-optimal performance compared to higher-precision models.

QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation

7 February 2025·5172 words·25 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 NVIDIA Research

QLIP: A new visual tokenizer unifying autoregressive multimodal understanding & generation with state-of-the-art reconstruction and zero-shot performance!

Goku: Flow Based Video Generative Foundation Models

7 February 2025·3430 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Hong Kong

Goku: a novel family of joint image-and-video generation models uses rectified flow Transformers, achieving industry-leading performance with a robust data pipeline and training infrastructure.

Generating Symbolic World Models via Test-time Scaling of Large Language Models

7 February 2025·2722 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology

LLMs excel at complex reasoning but struggle with planning; this paper introduces a test-time scaling approach that enhances LLMs’ PDDL reasoning, enabling high-quality PDDL domain generation, outperf…

FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

7 February 2025·4450 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Hong Kong University of Science and Technology

FlashVideo: Generate stunning high-resolution videos efficiently using a two-stage framework prioritizing fidelity and detail, achieving state-of-the-art results.

DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails

7 February 2025·2622 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of California, Los Angeles

DuoGuard: a novel two-player RL framework generates high-quality synthetic data, improving multilingual LLM safety by outperforming state-of-the-art models with a significantly smaller model size and …

AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting

7 February 2025·4072 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 National Yang Ming Chiao Tung University

AuraFusion360: High-quality 360° scene inpainting achieved via novel augmented unseen region alignment and a new benchmark dataset.

ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning

7 February 2025·8117 words·39 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 University of British Columbia

ARR: A novel zero-shot prompting method significantly boosts LLM performance on diverse question-answering tasks by explicitly incorporating question analysis, information retrieval, and step-by-step …

Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions

6 February 2025·6016 words·29 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Brown University

Simple interactions can easily elicit harmful outputs from LLMs, which are often overlooked. The SPEAK EASY framework and HARMSCORE metric expose this vulnerability and provide tools for better safet…