Skip to main content

Paper Reviews by AI

2025

Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate
·2552 words·12 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Carnegie Mellon University
Critique Fine-Tuning (CFT) outperforms traditional supervised fine-tuning (SFT) in training language models, achieving comparable results with significantly less data and opening new avenues in AI.
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
·3663 words·18 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 UC Berkeley
Reinforcement learning (RL) surpasses supervised fine-tuning (SFT) in fostering generalization in foundation models, while SFT aids RL’s stability; a comparative study across text and visual domains r…
SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model
·4043 words·19 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing
SafeRAG: A new benchmark exposes critical security vulnerabilities in Retrieval-Augmented Generation (RAG) systems by introducing four novel attack types and a comprehensive dataset for evaluation, re…
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
·3794 words·18 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Seed-Foundation-Model Team, Bytedance
Boosting Large Language Model (LLM) performance, researchers introduce Over-Tokenized Transformers, decoupling input/output vocabularies to improve language modeling. Scaling input vocabularies improv…
Optimizing Large Language Model Training Using FP4 Quantization
·1562 words·8 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Microsoft Research
First-ever FP4 training framework for LLMs achieves accuracy comparable to BF16 and FP8, enabling efficient ultra-low precision training.
Histoires Morales: A French Dataset for Assessing Moral Alignment
·8270 words·39 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Laboratoire Hubert Curien
HISTOIRESMORALES: a new French dataset tackles the crucial issue of aligning language models with human moral values, providing valuable resources for ethical AI research in a previously underserved l…
DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation
·3227 words·16 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Image Generation 🏒 Peking University
DIFFSPLAT repurposes 2D image diffusion models to natively generate high-quality 3D Gaussian splats, overcoming limitations in existing 3D generation methods.
IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding
·2564 words·13 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Artificial Intelligence Institute, University of South Carolina
IndicMMLU-Pro: a new benchmark rigorously evaluates large language models’ multi-task language understanding capabilities across nine major Indian languages, pushing Indic language AI research forward…
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
·2407 words·12 mins· loading · loading
AI Generated πŸ€— Daily Papers Speech and Audio Text-to-Speech 🏒 Chinese University of Hong Kong, Shenzhen
Emilia-Pipe and its resulting datasets, Emilia and Emilia-Large, offer the largest open-source, multilingual speech corpus, enabling more natural and spontaneous AI speech generation.
Atla Selene Mini: A General Purpose Evaluation Model
·1893 words·9 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Atla
Atla Selene Mini: A state-of-the-art small LLM judge surpassing larger models in benchmark performance!
iFormer: Integrating ConvNet and Transformer for Mobile Application
·7046 words·34 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Image Classification 🏒 Shanghai Jiao Tong University
iFormer: A new family of mobile hybrid vision networks that expertly blends ConvNeXt’s fast local feature extraction with the efficient global modeling of self-attention, achieving top-tier accuracy a…
Baichuan-Omni-1.5 Technical Report
·3756 words·18 mins· loading · loading
AI Generated πŸ€— Daily Papers Multimodal Learning Vision-Language Models 🏒 Baichuan Inc.
Baichuan-Omni-1.5: An open-source omni-modal LLM achieving SOTA performance across multiple modalities.
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer
·1758 words·9 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Peking University
ARWKV: A novel RNN-attention-based language model, distilled from a larger model, achieves strong performance using significantly fewer resources, opening a new path in efficient language model develo…
Relightable Full-Body Gaussian Codec Avatars
·3832 words·18 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision 3D Vision 🏒 ETH Zurich
Relightable Full-Body Gaussian Codec Avatars: Realistic, animatable full-body avatars are now possible using learned radiance transfer and efficient 3D Gaussian splatting.
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques
·2423 words·12 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 the Chinese University of Hong Kong, Shenzhen
RealCritic: A new benchmark effectively evaluates language models’ critique abilities using a closed-loop methodology, showcasing advanced reasoning models’ superiority in self and iterative critique.
Humanity's Last Exam
·2314 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Center for AI Safety
Humanity’s Last Exam (HLE): a groundbreaking multi-modal benchmark pushing the boundaries of large language model (LLM) capabilities, revealing a significant gap between current LLMs and human experts…
Chain-of-Retrieval Augmented Generation
·4155 words·20 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Question Answering 🏒 Microsoft Research
CoRAG, a novel Chain-of-Retrieval Augmented Generation model, dynamically refines queries for improved accuracy in multi-hop question answering, achieving state-of-the-art performance.
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos
·4575 words·22 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Video Understanding 🏒 Nanyang Technological University
Video-MMMU benchmark systematically evaluates Large Multimodal Models’ knowledge acquisition from videos across multiple disciplines and cognitive stages, revealing significant gaps between human and …
Temporal Preference Optimization for Long-Form Video Understanding
·2626 words·13 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Video Understanding 🏒 Stanford University
Boosting long-form video understanding, Temporal Preference Optimization (TPO) enhances video-LLMs by leveraging preference learning. It achieves this through a self-training method using preference …
Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models
·8384 words·40 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Microsoft Research
SIGMA, a novel large language model, achieves up to 33.36% faster inference speeds by using DiffQKV attention, which differentially optimizes query, key, and value components in the attention mech…