🏢 KAIST
Unlocking the Capabilities of Masked Generative Models for Image Synthesis via Self-Guidance
·1792 words·9 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 KAIST
Self-guidance boosts masked generative models’ image synthesis, achieving superior quality and diversity with fewer steps!
TrackIME: Enhanced Video Point Tracking via Instance Motion Estimation
·2140 words·11 mins·
loading
·
loading
Video Understanding
🏢 KAIST
TrackIME enhances video point tracking by cleverly pruning the search space, resulting in improved accuracy and efficiency.
Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels
·2412 words·12 mins·
loading
·
loading
Computer Vision
Image Segmentation
🏢 KAIST
PixelCLIP: Open-vocabulary semantic segmentation without pixel-level labels! Leveraging unlabeled image masks from Vision Foundation Models and an online clustering algorithm, PixelCLIP achieves imp…
SyncTweedies: A General Generative Framework Based on Synchronized Diffusions
·4065 words·20 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
🏢 KAIST
SyncTweedies: a zero-shot diffusion synchronization framework generates diverse visual content (images, panoramas, 3D textures) by synchronizing multiple diffusion processes without fine-tuning, demon…
Stochastic Optimal Control for Diffusion Bridges in Function Spaces
·2194 words·11 mins·
loading
·
loading
Machine Learning
Deep Learning
🏢 KAIST
Researchers extended stochastic optimal control theory to infinite-dimensional spaces, enabling the creation of diffusion bridges for generative modeling in function spaces, demonstrating applications…
Stochastic Extragradient with Flip-Flop Shuffling & Anchoring: Provable Improvements
·1705 words·9 mins·
loading
·
loading
AI Generated
AI Theory
Optimization
🏢 KAIST
Stochastic extragradient with flip-flop shuffling & anchoring achieves provably faster convergence in minimax optimization.
Simulation-Free Training of Neural ODEs on Paired Data
·3545 words·17 mins·
loading
·
loading
AI Generated
Machine Learning
Deep Learning
🏢 KAIST
Train Neural ODEs without simulations, achieving high performance on regression and classification by using flow matching in the embedding space of data pairs.
Regularized Q-Learning
·1497 words·8 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 KAIST
RegQ: A novel regularized Q-learning algorithm ensures convergence with linear function approximation, solving a long-standing instability problem in reinforcement learning.
Provable Benefit of Cutout and CutMix for Feature Learning
·1796 words·9 mins·
loading
·
loading
Image Classification
🏢 KAIST
CutMix and Cutout data augmentation methods provably improve feature learning by enabling the network to learn rarer features and noise vectors more effectively.
Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning
·4495 words·22 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 KAIST
LLMs boost tabular data prediction by generating optimized features via decision tree reasoning, outperforming existing methods.
Online Adaptation of Language Models with a Memory of Amortized Contexts
·2374 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 KAIST
MAC: Efficiently updates large language models (LLMs) using a memory of compressed contexts for improved real-time knowledge retention and adaptation.
Neural Pose Representation Learning for Generating and Transferring Non-Rigid Object Poses
·3744 words·18 mins·
loading
·
loading
AI Generated
Computer Vision
3D Vision
🏢 KAIST
Learn disentangled 3D object poses and transfer them between different object identities using a novel neural pose representation, boosting 3D shape generation!
Multi-hypotheses Conditioned Point Cloud Diffusion for 3D Human Reconstruction from Occluded Images
·2520 words·12 mins·
loading
·
loading
AI Generated
Computer Vision
3D Vision
🏢 KAIST
MHCDIFF: a novel pipeline using multi-hypotheses conditioned point cloud diffusion for accurate 3D human reconstruction from occluded images, outperforming state-of-the-art methods.
Molecule Generation with Fragment Retrieval Augmentation
·2469 words·12 mins·
loading
·
loading
Machine Learning
Deep Learning
🏢 KAIST
f-RAG: A novel fragment-based molecular generation framework boosts drug discovery by combining retrieval augmentation with a generative model, enabling exploration beyond existing fragments and signi…
Model Fusion through Bayesian Optimization in Language Model Fine-Tuning
·3140 words·15 mins·
loading
·
loading
Large Language Models
🏢 KAIST
Bayesian Optimization Model Fusion (BOMF) significantly boosts language model fine-tuning by optimizing both loss and metrics through multi-objective Bayesian optimization, yielding considerable perfo…
Mitigating Covariate Shift in Behavioral Cloning via Robust Stationary Distribution Correction
·2650 words·13 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 KAIST
DrilDICE robustly tackles covariate shift in offline imitation learning by using a stationary distribution correction and a distributionally robust objective, significantly improving performance.
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
·3263 words·16 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 KAIST
Meteor: Mamba-based Traversal of Rationale achieves significant vision-language improvements by efficiently embedding multifaceted rationales in a large language model, without scaling the model or us…
Learning to Merge Tokens via Decoupled Embedding for Efficient Vision Transformers
·3286 words·16 mins·
loading
·
loading
AI Generated
Computer Vision
Image Classification
🏢 KAIST
Decoupled Token Embedding for Merging (DTEM) significantly improves Vision Transformer efficiency by using a decoupled embedding module for relaxed token merging, achieving consistent performance gain…
Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models
·3521 words·17 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 KAIST
MuDI: a novel framework for multi-subject image personalization, effectively decoupling identities to prevent mixing using segmented subjects and a new evaluation metric.
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
·4632 words·22 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 KAIST
LLMs’ factual knowledge acquisition during pretraining is surprisingly non-linear: more data doesn’t guarantee better knowledge retention, and forgetting follows a power law.