Paper Reviews by AI
2024
AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge
·2611 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Nanyang Technological University
Auto-built benchmark with up-to-date knowledge ensures contamination-free LLM evaluation.
AniDoc: Animation Creation Made Easier
·2223 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Hong Kong University of Science and Technology
AniDoc automates cartoon animation line art video colorization, making animation creation easier!
VidTok: A Versatile and Open-Source Video Tokenizer
·2918 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Microsoft Research
VidTok: an open-source, top performing video tokenizer.
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain
·3082 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Gaoling School of Artificial Intelligence, Renmin University of China
OmniEval: Automatic benchmark for evaluating financial RAG systems.
Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models
·5510 words·26 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Beijing University of Posts and Telecommunications
New benchmark reveals how well AI understands and meets real-world human needs.
Move-in-2D: 2D-Conditioned Human Motion Generation
·2569 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Adobe Research
Move-in-2D generates realistic human motion sequences conditioned on a 2D scene image and text prompt, overcoming limitations of existing approaches and improving video synthesis.
MIVE: New Design and Benchmark for Multi-Instance Video Editing
·7714 words·37 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 KAIST
Edit many objects at once in videos! MIVE does it accurately without affecting other areas, a big step for AI video editing.
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning
·5162 words·25 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Robotics
🏢 MIT
MoDE makes AI for robot control faster and more efficient.
ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers
·1458 words·7 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tongyi Lab
ChatDiT enables zero-shot, multi-turn image generation using pretrained diffusion transformers and a novel multi-agent framework.
Are Your LLMs Capable of Stable Reasoning?
·2140 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Shanghai AI Laboratory
G-Pass@k & LiveMathBench: Evaluating the stability of LLMs.
Wonderland: Navigating 3D Scenes from a Single Image
·3153 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 University of Toronto
Generate wide-scope 3D scenes from single images in a snap!
Whisper-GPT: A Hybrid Representation Audio Large Language Model
·1640 words·8 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Speech and Audio
Audio Generation
🏢 Stanford University
Whisper-GPT, a hybrid audio LLM, improves music/speech generation by combining audio waveforms and text.
StrandHead: Text to Strand-Disentangled 3D Head Avatars Using Hair Geometric Priors
·2185 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Nanjing University
Create realistic 3D heads with specific hairstyles from text, no 3D hair data needed!
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
·3747 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tsinghua University
Self-play method SPAR enhances LLMs instruction following abilities, beating GPT-4 on IFEval
Sequence Matters: Harnessing Video Models in 3D Super-Resolution
·4603 words·22 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Sungkyunkwan University
Leveraging video models, researchers achieve state-of-the-art 3D super-resolution by generating ‘video-like’ sequences from unordered images, eliminating artifacts and computational demands.
SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
·3575 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Huawei Noah's Ark Lab
SepLLM shrinks LLMs, speeding them up by over 50% without losing much accuracy.
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation
·4628 words·22 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Renmin University of China
RetroLLM unifies retrieval & generation in LLMs, boosting accuracy and cutting costs.
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes
·3969 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Peking University
MOVIS enhances 3D scene generation by improving cross-view consistency in multi-object novel view synthesis.
Just a Simple Transformation is Enough for Data Protection in Vertical Federated Learning
·2945 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Federated Learning
🏢 MIPT
Simple tweak, big privacy win: MLP-based architectures boost data protection in federated learning.
IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations
·3912 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Chinese University Hong Kong
IDArb: A diffusion model for decomposing images into intrinsic components like albedo, normal, and material properties, handling varying views and lighting.