Skip to main content

Paper Reviews by AI

2024

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
·6546 words·31 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Document Parsing 🏢 Shanghai AI Laboratory
OmniDocBench, a novel benchmark, tackles limitations in current document parsing by introducing a diverse, high-quality dataset with comprehensive annotations, enabling fair multi-level evaluation of …
ObjCtrl-2.5D: Training-free Object Control with Camera Poses
·3506 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Nanyang Technological University
ObjCtrl-2.5D: Training-free, precise image-to-video object control using 3D trajectories and camera poses.
Mobile Video Diffusion
·3393 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Qualcomm AI Research
MobileVD: The first mobile-optimized video diffusion model, achieving 523x efficiency improvement over state-of-the-art with minimal quality loss, enabling realistic video generation on smartphones.
Granite Guardian
·4191 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 IBM Research
Granite Guardian: Open-source risk detection models for LLMs, surpassing existing models in accuracy and offering comprehensive coverage across multiple risk dimensions, promoting safer AI.
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models
·3186 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Stanford University
FiVA dataset and its adaptation framework enable unprecedented fine-grained control over visual attributes in text-to-image generation, empowering users to craft highly customized images.
FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing
·3317 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Institute of Automation, Chinese Academy of Sciences
FireFlow makes editing images faster and better.
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
·4676 words·22 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Shanghai Artificial Intelligence Laboratory
Introducing Evaluation Agent, a faster, more flexible human-like framework for evaluating visual generative AI.
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
·3918 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Peking University
DiffSensei: A new framework generates customized manga with dynamic multi-character control using multi-modal LLMs and diffusion models, outperforming existing methods.
Contextualized Counterspeech: Strategies for Adaptation, Personalization, and Evaluation
·1928 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Pisa
Contextualized AI counterspeech significantly outperforms generic methods by adapting to the moderation context and user, improving persuasiveness without sacrificing other qualities.
BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities
·2853 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Mohamed Bin Zayed University of Artificial Intelligence
BiMediX2, a bilingual medical expert LMM excels in diverse medical modalities.
A New Federated Learning Framework Against Gradient Inversion Attacks
·2925 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Federated Learning 🏢 School of Computing and Data Science, University of Hong Kong
HyperFL: A new federated learning framework breaking the direct connection between shared parameters and private data, effectively defending against gradient inversion attacks while maintaining favora…
Training Large Language Models to Reason in a Continuous Latent Space
·2859 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Meta AI
LLMs are trained to reason using language, but COCONUT lets them reason directly in a continuous latent space, boosting performance on logical tasks requiring complex planning.
MoViE: Mobile Diffusion for Video Editing
·2482 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Qualcomm AI Research
MoViE: Mobile Diffusion for Video Editing achieves 12 FPS video editing on mobile phones by optimizing existing image editing models, achieving a major breakthrough in on-device video processing.
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance
·3931 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Huawei Noah's Ark Lab
ILLUME: A unified multi-modal LLM efficiently integrates visual understanding & generation, achieving competitive performance with significantly less data.
EMOv2: Pushing 5M Vision Model Frontier
·6258 words·30 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Classification 🏢 Tencent AI Lab
EMOv2 achieves state-of-the-art performance in various vision tasks using a novel Meta Mobile Block, pushing the 5M parameter lightweight model frontier.
CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction
·3880 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 Westlake University
CARP: A novel visuomotor policy learning paradigm achieves high accuracy and 10x faster inference than state-of-the-art by combining autoregressive efficiency and diffusion model precision through a c…
PIG: Physics-Informed Gaussians as Adaptive Parametric Mesh Representations
·5378 words·26 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Department of Artificial Intelligence, Sungkyunkwan University
Physics-Informed Gaussians (PIGs) revolutionize PDE solving by using adaptive, learnable Gaussian functions for superior accuracy and efficiency.
Fully Open Source Moxin-7B Technical Report
·334 words·2 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Northeastern University
Moxin-LLM: A fully open-source 7B parameter LLM achieving superior zero-shot performance, promoting transparency and reproducibility in AI research.
Chimera: Improving Generalist Model with Domain-Specific Experts
·4776 words·23 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Shanghai Artificial Intelligence Laboratory
Chimera boosts large multimodal models’ performance on specialized tasks by cleverly integrating domain-specific expert models, achieving state-of-the-art results on multiple benchmarks.
SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts
·3252 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 University of Adelaide
State-Adaptive Mixture of Experts (SAME) model excels in generic language-guided visual navigation by consolidating diverse tasks and dynamically adapting to varying instruction granularities.