Paper Reviews by AI
2024
Learned Compression for Compressed Learning
·2966 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Classification
🏢 University of Texas at Austin
WaLLOC: a novel neural codec boosts compressed-domain learning by combining wavelet transforms with asymmetric autoencoders, achieving high compression ratios with minimal computation and uniform dime…
JuStRank: Benchmarking LLM Judges for System Ranking
·13985 words·66 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 IBM Research
JuStRank: LLM system ranker benchmark reveals critical judge qualities (decisiveness, bias) impacting ranking accuracy, highlighting instance-level performance doesn’t guarantee accurate system-level…
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
·4018 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Nanjing University
InstanceCap improves text-to-video generation through detailed, instance-aware captions.
GenEx: Generating an Explorable World
·2719 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Embodied AI
🏢 Johns Hopkins University
GenEx generates explorable 3D worlds from a single image, enabling embodied AI agents to explore and learn.
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders
·6779 words·32 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Georgia Institute of Technology
Gaze-LLE achieves state-of-the-art gaze estimation by using a frozen DINOv2 encoder and a lightweight decoder, simplifying architecture and improving efficiency.
FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction
·4390 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Tencent AI Lab
FreeSplatter: a novel feed-forward framework reconstructs high-quality 3D scenes from uncalibrated sparse-view images, estimating camera poses in seconds.
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
·2401 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Nanyang Technological University
FreeScale generates stunning 8K images and high-fidelity videos without retraining.
FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers
·2812 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Virginia Tech
Edit images precisely with AI, no masks needed!
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
·3185 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 CUHK MMLab
EasyRef uses multimodal LLMs to generate images from multiple references, overcoming limitations of prior methods by capturing consistent visual elements and offering improved zero-shot generalization…
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation
·3252 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Peking University
DisPose disentangles pose guidance for controllable human image animation, generating diverse animations while preserving appearance consistency using only sparse skeleton pose input, eliminating the …
Arbitrary-steps Image Super-resolution via Diffusion Inversion
·3889 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Nanyang Technological University
InvSR: a novel image super-resolution technique using diffusion inversion, enabling flexible sampling steps for efficient and high-fidelity results.
TidyBot++: An Open-Source Holonomic Mobile Manipulator for Robot Learning
·1675 words·8 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Robotics
🏢 Princeton University
TidyBot++: Low-cost, open-source holonomic mobile base makes robot learning easier.
SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs
·2774 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Saudi Data & Artificial Intelligence Authority
Fine-tuning small language models? Tweak the learning rate and batch size for a reasoning boost!
Multimodal Latent Language Modeling with Next-Token Diffusion
·4442 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Multimodal Generation
🏢 Microsoft Research
LatentLM: a novel multimodal model unifying discrete & continuous data via next-token diffusion, surpassing existing methods in performance & scalability across various tasks.
Video Motion Transfer with Diffusion Transformers
·3141 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 University of Oxford
DiTFlow: training-free video motion transfer using Diffusion Transformers, enabling realistic motion control in synthesized videos via Attention Motion Flow.
UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics
·3117 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 University of Hong Kong
UniReal: a universal framework for image generation and editing, unifying diverse tasks via learning real-world dynamics from video data, achieving highly realistic and versatile results.
STIV: Scalable Text and Image Conditioned Video Generation
·5285 words·25 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Apple
STIV: A novel, scalable method for text and image-conditioned video generation, systematically improving model architectures, training, and data curation for superior performance.
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
·6546 words·31 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Document Parsing
🏢 Shanghai AI Laboratory
OmniDocBench, a novel benchmark, tackles limitations in current document parsing by introducing a diverse, high-quality dataset with comprehensive annotations, enabling fair multi-level evaluation of …
ObjCtrl-2.5D: Training-free Object Control with Camera Poses
·3506 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Nanyang Technological University
ObjCtrl-2.5D: Training-free, precise image-to-video object control using 3D trajectories and camera poses.
Mobile Video Diffusion
·3393 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Qualcomm AI Research
MobileVD: The first mobile-optimized video diffusion model, achieving 523x efficiency improvement over state-of-the-art with minimal quality loss, enabling realistic video generation on smartphones.