Paper Reviews by AI
2025
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate
·2552 words·12 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Carnegie Mellon University
Critique Fine-Tuning (CFT) outperforms traditional supervised fine-tuning (SFT) in training language models, achieving comparable results with significantly less data and opening new avenues in AI.
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
·3663 words·18 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ UC Berkeley
Reinforcement learning (RL) surpasses supervised fine-tuning (SFT) in fostering generalization in foundation models, while SFT aids RL’s stability; a comparative study across text and visual domains r…
SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model
·4043 words·19 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing
SafeRAG: A new benchmark exposes critical security vulnerabilities in Retrieval-Augmented Generation (RAG) systems by introducing four novel attack types and a comprehensive dataset for evaluation, re…
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
·3794 words·18 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Seed-Foundation-Model Team, Bytedance
Boosting Large Language Model (LLM) performance, researchers introduce Over-Tokenized Transformers, decoupling input/output vocabularies to improve language modeling. Scaling input vocabularies improv…
Optimizing Large Language Model Training Using FP4 Quantization
·1562 words·8 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Microsoft Research
First-ever FP4 training framework for LLMs achieves accuracy comparable to BF16 and FP8, enabling efficient ultra-low precision training.
Histoires Morales: A French Dataset for Assessing Moral Alignment
·8270 words·39 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Laboratoire Hubert Curien
HISTOIRESMORALES: a new French dataset tackles the crucial issue of aligning language models with human moral values, providing valuable resources for ethical AI research in a previously underserved l…
DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation
·3227 words·16 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Peking University
DIFFSPLAT repurposes 2D image diffusion models to natively generate high-quality 3D Gaussian splats, overcoming limitations in existing 3D generation methods.
IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding
·2564 words·13 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Artificial Intelligence Institute, University of South Carolina
IndicMMLU-Pro: a new benchmark rigorously evaluates large language models’ multi-task language understanding capabilities across nine major Indian languages, pushing Indic language AI research forward…
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
·2407 words·12 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Speech and Audio
Text-to-Speech
π’ Chinese University of Hong Kong, Shenzhen
Emilia-Pipe and its resulting datasets, Emilia and Emilia-Large, offer the largest open-source, multilingual speech corpus, enabling more natural and spontaneous AI speech generation.
Atla Selene Mini: A General Purpose Evaluation Model
·1893 words·9 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Atla
Atla Selene Mini: A state-of-the-art small LLM judge surpassing larger models in benchmark performance!
iFormer: Integrating ConvNet and Transformer for Mobile Application
·7046 words·34 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Classification
π’ Shanghai Jiao Tong University
iFormer: A new family of mobile hybrid vision networks that expertly blends ConvNeXt’s fast local feature extraction with the efficient global modeling of self-attention, achieving top-tier accuracy a…
Baichuan-Omni-1.5 Technical Report
·3756 words·18 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Multimodal Learning
Vision-Language Models
π’ Baichuan Inc.
Baichuan-Omni-1.5: An open-source omni-modal LLM achieving SOTA performance across multiple modalities.
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer
·1758 words·9 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Peking University
ARWKV: A novel RNN-attention-based language model, distilled from a larger model, achieves strong performance using significantly fewer resources, opening a new path in efficient language model develo…
Relightable Full-Body Gaussian Codec Avatars
·3832 words·18 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
3D Vision
π’ ETH Zurich
Relightable Full-Body Gaussian Codec Avatars: Realistic, animatable full-body avatars are now possible using learned radiance transfer and efficient 3D Gaussian splatting.
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques
·2423 words·12 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ the Chinese University of Hong Kong, Shenzhen
RealCritic: A new benchmark effectively evaluates language models’ critique abilities using a closed-loop methodology, showcasing advanced reasoning models’ superiority in self and iterative critique.
Humanity's Last Exam
·2314 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Center for AI Safety
Humanity’s Last Exam (HLE): a groundbreaking multi-modal benchmark pushing the boundaries of large language model (LLM) capabilities, revealing a significant gap between current LLMs and human experts…
Chain-of-Retrieval Augmented Generation
·4155 words·20 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Question Answering
π’ Microsoft Research
CoRAG, a novel Chain-of-Retrieval Augmented Generation model, dynamically refines queries for improved accuracy in multi-hop question answering, achieving state-of-the-art performance.
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos
·4575 words·22 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Video Understanding
π’ Nanyang Technological University
Video-MMMU benchmark systematically evaluates Large Multimodal Modelsβ knowledge acquisition from videos across multiple disciplines and cognitive stages, revealing significant gaps between human and …
Temporal Preference Optimization for Long-Form Video Understanding
·2626 words·13 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Video Understanding
π’ Stanford University
Boosting long-form video understanding, Temporal Preference Optimization (TPO) enhances video-LLMs by leveraging preference learning. It achieves this through a self-training method using preference …
Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models
·8384 words·40 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Microsoft Research
SIGMA, a novel large language model, achieves up to 33.36% faster inference speeds by using DiffQKV attention, which differentially optimizes query, key, and value components in the attention mech…