Paper Reviews by AI
2024
ColorFlow: Retrieval-Augmented Image Sequence Colorization
·2655 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tsinghua University
ColorFlow, a new AI model, accurately colorizes black-and-white image sequences while preserving character identity.
Smaller Language Models Are Better Instruction Evolvers
·5507 words·26 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Beijing University of Posts and Telecommunications
Smaller is better: SLMs outperform LLMs in evolving complex & diverse instructions for AI training.
GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs
·3380 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Hong Kong University of Science and Technology
Training-free method adds physical properties to 3D models using vision-language models.
SCBench: A KV Cache-Centric Analysis of Long-Context Methods
·5380 words·26 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Microsoft Corporation
New benchmark for evaluating long-context models finds sub-O(n) methods lacking in real-world use cases.
Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attacks on Breast Ultrasound Images
·2021 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 University of British Columbia
New attack fools breast ultrasound AI using subtle text prompts.
Large Action Models: From Inception to Implementation
·2938 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Robotics
🏢 Microsoft
From language models to action models: building AI that does things.
Byte Latent Transformer: Patches Scale Better Than Tokens
·4848 words·23 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 University of Washington
BLT: tokenizer-free LLM for efficiency and robustness
BrushEdit: All-In-One Image Inpainting and Editing
·3281 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Peking University
BrushEdit revolutionizes interactive image editing with instructions & inpainting.
Apollo: An Exploration of Video Understanding in Large Multimodal Models
·1887 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Meta GenAI
Apollo LMMs achieve SOTA on video understanding tasks by exploring and optimizing the design and training of video-LMMs.
Word Sense Linking: Disambiguating Outside the Sandbox
·2984 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Word Sense Disambiguation
🏢 Sapienza University of Rome
Word Sense Linking (WSL) revolutionizes word sense disambiguation by tackling its real-world limitations. It combines span identification and sense linking in plain text, offering better integration …
The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective
·1893 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 National Library of Norway
Norwegians show that using copyrighted material improves LLMs, but raises legal and ethical issues.
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
·3840 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Tsinghua University
SynerGen-VL: A simpler, more powerful unified MLLM for image understanding and generation.
Shiksha: A Technical Domain focused Translation Dataset and Model for Indian Languages
·1855 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Machine Translation
🏢 Indian Institute of Technology Madras
Shiksha: A new multilingual translation dataset and model surpasses existing benchmarks for Indian languages, focusing on scientific, technical, and educational domains.
RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios
·3495 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 UC Santa Barbara
RULEARENA, a new benchmark, rigorously evaluates large language models’ ability to apply complex, real-world rules across diverse scenarios, revealing significant shortcomings in current LLMs’ rule-gu…
Phi-4 Technical Report
·2630 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Microsoft Research
Phi-4: a 14B parameter LLM surpassing its teacher model (GPT-4) in STEM-focused QA through innovative synthetic data generation and post-training techniques.
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation
·5249 words·25 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Microsoft Research
OLA-VLM boosts multimodal LLMs’ visual understanding by distilling knowledge from specialized visual encoders into the LLM’s internal representations during pretraining, achieving significant performa…
Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion
·3868 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Chinese University of Hong Kong
Neural LightRig uses multi-light diffusion to accurately estimate object normals and materials from a single image, outperforming existing methods.
Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation
·3107 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Multimodal Generation
🏢 University of Edinburgh
VMB generates music from videos, images, and text, using description and retrieval bridges to improve quality and controllability.
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
·3111 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Hong Kong University of Science and Technology
Lyra: An efficient, speech-centric framework for omni-cognition, achieving state-of-the-art results across various modalities while being highly efficient.
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models
·2785 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 ETH Zurich
LoRACLR merges multiple LoRA models for high-fidelity multi-concept image generation, using a contrastive objective to ensure concept distinctiveness and prevent interference.