🏢 KAIST

ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation

28 March 2025·2259 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 KAIST

ORIGEN: First zero-shot 3D orientation grounding in text-to-image generation.

Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models

26 March 2025·393 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 KAIST

Fixing fine-tuned diffusion models! By using richer, unconditional priors, they generate better images and videos.

Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing

25 March 2025·2020 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 KAIST

Inference-time scaling for flow models enhances alignment with user preferences via stochastic generation and budget allocation.

Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models

12 March 2025·410 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 KAIST

New ‘Silent Branding Attack’ poisons text-to-image models, embedding brand logos without text prompts, raising ethical issues for image generation tools.

Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling

11 March 2025·3192 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 KAIST

SynCoS: Synchronized sampling generates high-quality & coherent long videos from text, without extra training!

Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

7 March 2025·1708 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 KAIST

Sketch-of-Thought(SoT) reduces LLM token usage by up to 76% while maintaining (or improving) accuracy via cognitive-inspired sketching.

Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models

20 February 2025·5119 words·25 mins· loading · loading

AI Generated 🤗 Daily Papers AI Theory Safety 🏢 KAIST

LLMs fail to act safely when considering user-specific safety standards, which were made to be solved via new benchmark.

SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models

18 February 2025·2481 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 KAIST

SafeRoute efficiently enhances LLM safety by adaptively using smaller and larger safety guard models, maximizing accuracy while minimizing costs.

MIVE: New Design and Benchmark for Multi-Instance Video Editing

17 December 2024·7714 words·37 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 KAIST

Edit many objects at once in videos! MIVE does it accurately without affecting other areas, a big step for AI video editing.

Controllable Human Image Generation with Personalized Multi-Garments

25 November 2024·4062 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 KAIST

BootComp: generate realistic human images wearing multiple garments using a novel synthetic data pipeline & diffusion model, enabling diverse applications like virtual try-on.

Efficient Long Video Tokenization via Coordinated-based Patch Reconstruction

22 November 2024·2991 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 KAIST

CoordTok: a novel video tokenizer drastically reduces token count for long videos, enabling memory-efficient training of diffusion models for high-quality, long video generation.