Skip to main content

Computer Vision

Visual Counter Turing Test (VCT^2): Discovering the Challenges for AI-Generated Image Detection and Introducing Visual AI Index (V_AI)
·195 words·1 min· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of South Carolina
New benchmark VCT² reveals limitations of AI-generated image detectors; Visual AI Index (VAI) provides a robust evaluation framework.
Optimizing Brain Tumor Segmentation with MedNeXt: BraTS 2024 SSA and Pediatrics
·1682 words·8 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Segmentation 🏢 Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI)
MedNeXt, a novel model ensemble, optimizes brain tumor segmentation in diverse populations, achieving state-of-the-art results on the BraTS 2024 SSA and pediatric datasets.
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
·3474 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Seoul National University
Diptych Prompting: a novel zero-shot subject-driven image generator leveraging large-scale text-to-image models and inpainting for precise subject alignment and high-quality image synthesis.
TEXGen: a Generative Diffusion Model for Mesh Textures
·3720 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Hong Kong
TEXGen: A groundbreaking generative diffusion model creates high-resolution 3D mesh textures directly from text and image prompts, exceeding prior methods in quality and efficiency.
Style-Friendly SNR Sampler for Style-Driven Generation
·4866 words·23 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Seoul National University
Style-friendly SNR sampler biases diffusion model training towards higher noise levels, enabling it to learn and generate images with higher style fidelity.
OminiControl: Minimal and Universal Control for Diffusion Transformer
·3446 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 National University of Singapore
OminiControl: A minimal, universal framework efficiently integrates image conditions into diffusion transformers, enabling diverse and precise control over image generation.
Morph: A Motion-free Physics Optimization Framework for Human Motion Generation
·2160 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tencent AI Lab
Morph: a novel motion-free physics optimization framework drastically enhances human motion generation’s physical plausibility using synthetic data, achieving state-of-the-art quality.
Material Anything: Generating Materials for Any 3D Object via Diffusion
·4056 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Northwestern Polytechnical University
Material Anything: Generate realistic materials for ANY 3D object via diffusion!
Efficient Long Video Tokenization via Coordinated-based Patch Reconstruction
·2991 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 KAIST
CoordTok: a novel video tokenizer drastically reduces token count for long videos, enabling memory-efficient training of diffusion models for high-quality, long video generation.
Stable Flow: Vital Layers for Training-Free Image Editing
·2773 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Snap Research
Stable Flow achieves diverse, consistent image editing without training by strategically injecting source image features into specific ‘vital’ layers of a diffusion transformer model. This training-f…
SegBook: A Simple Baseline and Cookbook for Volumetric Medical Image Segmentation
·2952 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Segmentation 🏢 Stanford University
SegBook: a large-scale benchmark, reveals that fine-tuning full-body CT pre-trained models significantly improves performance on various downstream medical image segmentation tasks, particularly for s…
Novel View Extrapolation with Video Diffusion Priors
·2381 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Nanyang Technological University
ViewExtrapolator leverages Stable Video Diffusion to realistically extrapolate novel views far beyond training data, dramatically improving the quality of 3D scene generation.
MyTimeMachine: Personalized Facial Age Transformation
·3186 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of North Carolina at Chapel Hill
MyTimeMachine personalizes facial age transformation using just 50 personal photos, outperforming existing methods by generating re-aged faces that closely match a person’s actual appearance at variou…
MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control
·4302 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Hong Kong University of Science and Technology
MagicDriveDiT generates high-resolution, long street-view videos with precise control, exceeding limitations of previous methods in autonomous driving.
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models
·3966 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Nanyang Technological University
VBench++: A new benchmark suite meticulously evaluates video generative models across 16 diverse dimensions, aligning with human perception for improved model development and fairer comparisons.
Stylecodes: Encoding Stylistic Information For Image Generation
·237 words·2 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 String
StyleCodes enables easy style sharing for image generation by encoding styles as compact strings, enhancing control and collaboration while minimizing quality loss.
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
·2784 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 University of Washington
SAMURAI enhances the Segment Anything Model 2 for real-time, zero-shot visual object tracking by incorporating motion-aware memory and motion modeling, significantly improving accuracy and robustness.
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements
·3219 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Segmentation 🏢 Bilkent University
ITACLIP boosts training-free semantic segmentation by architecturally enhancing CLIP, integrating LLM-generated class descriptions, and employing image engineering; achieving state-of-the-art results.
Generative World Explorer
·1739 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Johns Hopkins University
Generative World Explorer (Genex) enables agents to imaginatively explore environments, updating beliefs with generated observations for better decision-making.
Continuous Speculative Decoding for Autoregressive Image Generation
·1799 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Chinese Academy of Sciences
Researchers have developed Continuous Speculative Decoding, boosting autoregressive image generation speed by up to 2.33x while maintaining image quality.