Computer Vision
Segment Any Motion in Videos
·2413 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Segmentation
🏢 UC Berkeley
New method for moving object segmentation by combining long-range motion cues, semantic features, and SAM2, achieving state-of-the-art performance in challenging scenarios.
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation
·2259 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 KAIST
ORIGEN: First zero-shot 3D orientation grounding in text-to-image generation.
Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging
·2702 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Chinese University of Hong Kong, Shenzhen
Hi3DGen: High-fidelity 3D geometry generation from images via normal bridging.
X$^{2}$-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction
·2612 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 the Chinese University of Hong Kong
X2-Gaussian enables continuous-time 4D CT reconstruction via dynamic radiative Gaussian splatting and self-supervised respiratory motion learning.
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
·3977 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Shanghai Artificial Intelligence Laboratory
VBench 2.0: A new benchmark suite advancing video generation evaluation with intrinsic faithfulness metrics.
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling
·2163 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Tsinghua University
SparseFlex: Achieves high-res, arbitrary-topology 3D shape modeling via sparse isosurface representation and sectional voxel training. Revolutionizing 3D generative AI!
Reconstructing Humans with a Biomechanically Accurate Skeleton
·2828 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 University of Texas at Austin
HSMR: Reconstructing 3D humans with a biomechanically accurate skeleton model from a single image, enhancing pose realism.
Optimal Stepsize for Diffusion Sampling
·3204 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 University Chinese Academic of Science
Optimal Stepsize Distillation accelerates diffusion sampling by distilling knowledge from reference trajectories, achieving 10x speedup with minimal performance loss.
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
·3416 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Shanghai AI Laboratory
Lumina-Image 2.0: A unified & efficient image generative framework, outperforming previous models with only 2.6B parameters.
LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing
·2412 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 University of Waterloo
LOCATEdit refines cross-attention maps with graph Laplacian regularization, achieving precise & localized text-guided image editing without artifacts.
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis
·2431 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Shanghai AI Laboratory
LeX-Art: High-quality text-to-image generation via scalable data synthesis.
Exploring the Evolution of Physics Cognition in Video Generation: A Survey
·3260 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Huazhong University of Science and Technology
This survey explores the evolution of physics cognition in video generation, addressing the gap between visual realism and physical accuracy.
ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model
·1950 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Alibaba Group
ChatAnyone: Stylized real-time portrait video generation with hierarchical motion diffusion model.
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models
·393 words·2 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 KAIST
Fixing fine-tuned diffusion models! By using richer, unconditional priors, they generate better images and videos.
Synthetic Video Enhances Physical Fidelity in Video Synthesis
·4236 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 ByteDance Seed
Synthetic data can enhance the physical realism of video synthesis, paving the way for more believable generated content.
Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency
·2359 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Huazhong University of Science and Technology
Free4D: Tuning-free 4D scene generation with spatial-temporal consistency.
Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
·4642 words·22 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 UCLA
Feature4X: 4D Agentic AI from Monocular Video w/ Gaussian Feature Fields
DINeMo: Learning Neural Mesh Models with no 3D Annotations
·1595 words·8 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Johns Hopkins University
DINeMo: Learns 3D models with no 3D annotations, leveraging pseudo-correspondence from visual foundation models for enhanced pose estimation.
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
·10790 words·51 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tsinghua University
BIZGEN: Article-level Visual Text Rendering for Infographics Generation
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models
·2885 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Central South University
LongTextAR advances long-text image generation via a novel tokenizer, enabling accurate, controllable, and high-fidelity text rendering in images.