Computer Vision
MoViE: Mobile Diffusion for Video Editing
·2482 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Qualcomm AI Research
MoViE: Mobile Diffusion for Video Editing achieves 12 FPS video editing on mobile phones by optimizing existing image editing models, achieving a major breakthrough in on-device video processing.
EMOv2: Pushing 5M Vision Model Frontier
·6258 words·30 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Classification
🏢 Tencent AI Lab
EMOv2 achieves state-of-the-art performance in various vision tasks using a novel Meta Mobile Block, pushing the 5M parameter lightweight model frontier.
Mind the Time: Temporally-Controlled Multi-Event Video Generation
·4541 words·22 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 University of Toronto
MinT: Generating coherent videos with precisely timed, multiple events via temporal control, surpassing existing methods.
Maximizing Alignment with Minimal Feedback: Efficiently Learning Rewards for Visuomotor Robot Policy Alignment
·2984 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Robotics
🏢 UC Berkeley
RAPL efficiently aligns robots with human preferences using minimal feedback by aligning visual representations before reward learning.
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment
·276 words·2 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Fudan University
LiFT leverages human feedback, including reasoning, to effectively align text-to-video models with human preferences, significantly improving video quality.
ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality
·2050 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Zhejiang University
ZipAR accelerates autoregressive image generation by up to 91% through parallel decoding leveraging spatial locality in images, making high-resolution image generation significantly faster.
SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion
·3379 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 VinAI Research
SwiftEdit achieves lightning-fast, high-quality text-guided image editing in just 0.23 seconds via a novel one-step diffusion process.
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
·5538 words·26 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 ByteDance
Infinity, a novel bitwise autoregressive model, sets new records in high-resolution image synthesis, outperforming top diffusion models in speed and quality.
HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing
·7357 words·35 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Peking University
HumanEdit: A new human-rewarded dataset revolutionizes instruction-based image editing by providing high-quality, diverse image pairs with detailed instructions, enabling precise model evaluation and …
Hidden in the Noise: Two-Stage Robust Watermarking for Images
·3984 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 New York University
WIND: A novel, distortion-free image watermarking method leveraging diffusion models’ initial noise for robust AI-generated content authentication.
AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models
·3014 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 ByteDance
AnyDressing: Customizable multi-garment virtual dressing via a novel latent diffusion model!
NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images
·3265 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tencent AI Lab
NVComposer: A novel generative NVS model boosts synthesis quality by implicitly inferring spatial relationships from multiple sparse, unposed images, eliminating reliance on external alignment.
MV-Adapter: Multi-view Consistent Image Generation Made Easy
·3888 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 School of Software, Beihang University
MV-Adapter easily transforms existing image generators into multi-view consistent image generators, improving efficiency and adaptability.
MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities
·3868 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Segmentation
🏢 School of Artificial Intelligence, Shanghai Jiao Tong University
MRGen, a novel diffusion-based data engine, controllably synthesizes MRI data for unannotated modalities, boosting segmentation model performance.
Mimir: Improving Video Diffusion Models for Precise Text Understanding
·3398 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Ant Group
Mimir: A novel framework harmonizes LLMs and video diffusion models for precise text understanding in video generation, producing high-quality videos with superior text comprehension.
MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation
·2260 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Tsinghua University
MIDI: a novel multi-instance diffusion model generates compositional 3D scenes from single images by simultaneously creating multiple 3D instances with accurate spatial relationships and high generali…
Imagine360: Immersive 360 Video Generation from Perspective Anchor
·2648 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Chinese University of Hong Kong
Imagine360: Generating immersive 360° videos from perspective videos, improving quality and accessibility of 360° content creation.
Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion
·4118 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Zhejiang University
ScoreLiDAR: Distilling diffusion models for 5x faster, higher-quality 3D LiDAR scene completion!
CleanDIFT: Diffusion Features without Noise
·3337 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 CompVis @ LMU Munich, MCML
CleanDIFT revolutionizes diffusion feature extraction by leveraging clean images and a lightweight fine-tuning method, significantly boosting performance across various tasks without noise or timestep…
2DGS-Room: Seed-Guided 2D Gaussian Splatting with Geometric Constrains for High-Fidelity Indoor Scene Reconstruction
·2645 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Tsinghua University
2DGS-Room: Seed-guided 2D Gaussian splatting with geometric constraints achieves state-of-the-art high-fidelity indoor scene reconstruction.