Paper Reviews by AI
2025
Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
·6016 words·29 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Brown University
Simple interactions can easily elicit harmful outputs from LLMs, which are often overlooked. The SPEAK EASY framework and HARMSCORE metric expose this vulnerability and provide tools for better safet…
Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More
·4088 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Classification
🏢 Johns Hopkins University
Smaller image patches improve vision transformer performance, defying conventional wisdom and revealing a new scaling law for enhanced visual understanding.
PILAF: Optimal Human Preference Sampling for Reward Modeling
·2374 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 NYU
PILAF optimizes human feedback in reward modeling for better LLM alignment by using a novel response sampling strategy that aligns reward modeling with value optimization.
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment
·2102 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Tsinghua University
Ola: a novel 7B parameter omni-modal language model achieves state-of-the-art performance across image, video and audio tasks using a progressive modality alignment training strategy.
MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation
·2615 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Chinese University of Hong Kong
MotionCanvas lets users design cinematic video shots with intuitive controls for camera and object movements, translating scene-space intentions into video animations.
MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion
·2840 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 ByteDance
MAGA reformulates existing corpora to massively expand LLM pretraining data, boosting performance across various model sizes while maintaining quality.
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis
·3315 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Text Generation
🏢 Hong Kong University of Science and Technology
Llasa, a novel single-Transformer TTS model, achieves state-of-the-art performance by scaling both training and inference compute, improving naturalness, prosody and emotional expressiveness.
Linear Correlation in LM's Compositional Generalization and Hallucination
·8299 words·39 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 UC San Diego
Language models surprisingly exhibit linear relationships when composing knowledge; this linearity, resilient to fine-tuning, predicts compositional generalization and hallucination.
Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression
·3466 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Robotics
🏢 MIT
HMA: a novel approach for generating high-quality robotic videos 15x faster, enabling real-time policy evaluation and data augmentation for scaling robot learning.
FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks
·3169 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Speech and Audio
Speech Coding
🏢 Concordia University
FocalCodec: a single codebook, low-bitrate speech codec using focal modulation, achieves competitive performance in speech resynthesis and voice conversion.
Fast Video Generation with Sliding Tile Attention
·4012 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 University of California, San Diego
Sliding Tile Attention (STA) boosts video generation speed by 2.43-3.53x without losing quality by exploiting inherent data redundancy in video diffusion models.
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference
·1697 words·8 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Chinese University of Hong Kong
CMOE efficiently transforms dense LLMs into sparse MoE architectures via expert carving, enabling fast inference without extensive retraining.
BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation
·2217 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Salesforce AI Research
BOLT bootstraps Long Chain-of-Thought reasoning in LLMs without distillation, achieving impressive results across various benchmarks.
Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization
·2451 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Microsoft Research
Researchers jointly optimize prompt content and format to significantly boost Large Language Model (LLM) performance.
Agency Is Frame-Dependent
·400 words·2 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 Google DeepMind
Agency, a key concept in AI, is shown to be relative to the observer’s perspective (frame-dependent), challenging traditional binary definitions and necessitating a more nuanced approach for AI system…
Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach
·2337 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 UC Los Angeles
This paper introduces PointVid, a 3D-aware video generation framework using 3D point regularization to enhance video realism and address common issues like object morphing.
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
·3144 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Meta AI
Boosting language model reasoning: A novel hybrid approach using latent tokens drastically shortens reasoning traces, improving model performance and efficiency.
The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering
·4880 words·23 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Rutgers University
VISTA steers LVLMs away from hallucinations by cleverly adjusting token rankings during inference, improving visual grounding and semantic coherence.
Teaching Language Models to Critique via Reinforcement Learning
·4328 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 University of Hong Kong
LLMs learn to critique and refine their output via reinforcement learning, significantly improving code generation.
On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices
·3325 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Ulsan National Institute of Science and Technology
On-device Sora makes high-quality, diffusion-based text-to-video generation possible on smartphones, overcoming computational and memory limitations through novel techniques.