🏢 KAIST

Unlocking the Capabilities of Masked Generative Models for Image Synthesis via Self-Guidance

26 September 2024·1792 words·9 mins· loading · loading

Computer Vision Image Generation 🏢 KAIST

Self-guidance boosts masked generative models’ image synthesis, achieving superior quality and diversity with fewer steps!

TrackIME: Enhanced Video Point Tracking via Instance Motion Estimation

26 September 2024·2140 words·11 mins· loading · loading

Video Understanding 🏢 KAIST

TrackIME enhances video point tracking by cleverly pruning the search space, resulting in improved accuracy and efficiency.

Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels

26 September 2024·2412 words·12 mins· loading · loading

Computer Vision Image Segmentation 🏢 KAIST

PixelCLIP: Open-vocabulary semantic segmentation without pixel-level labels! Leveraging unlabeled image masks from Vision Foundation Models and an online clustering algorithm, PixelCLIP achieves imp…

SyncTweedies: A General Generative Framework Based on Synchronized Diffusions

26 September 2024·4065 words·20 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 KAIST

SyncTweedies: a zero-shot diffusion synchronization framework generates diverse visual content (images, panoramas, 3D textures) by synchronizing multiple diffusion processes without fine-tuning, demon…

Stochastic Optimal Control for Diffusion Bridges in Function Spaces

26 September 2024·2194 words·11 mins· loading · loading

Machine Learning Deep Learning 🏢 KAIST

Researchers extended stochastic optimal control theory to infinite-dimensional spaces, enabling the creation of diffusion bridges for generative modeling in function spaces, demonstrating applications…

Stochastic Extragradient with Flip-Flop Shuffling & Anchoring: Provable Improvements

26 September 2024·1705 words·9 mins· loading · loading

AI Generated AI Theory Optimization 🏢 KAIST

Stochastic extragradient with flip-flop shuffling & anchoring achieves provably faster convergence in minimax optimization.

Simulation-Free Training of Neural ODEs on Paired Data

26 September 2024·3545 words·17 mins· loading · loading

AI Generated Machine Learning Deep Learning 🏢 KAIST

Train Neural ODEs without simulations, achieving high performance on regression and classification by using flow matching in the embedding space of data pairs.

Regularized Q-Learning

26 September 2024·1497 words·8 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 KAIST

RegQ: A novel regularized Q-learning algorithm ensures convergence with linear function approximation, solving a long-standing instability problem in reinforcement learning.

Provable Benefit of Cutout and CutMix for Feature Learning

26 September 2024·1796 words·9 mins· loading · loading

Image Classification 🏢 KAIST

CutMix and Cutout data augmentation methods provably improve feature learning by enabling the network to learn rarer features and noise vectors more effectively.

Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning

26 September 2024·4495 words·22 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 KAIST

LLMs boost tabular data prediction by generating optimized features via decision tree reasoning, outperforming existing methods.

Online Adaptation of Language Models with a Memory of Amortized Contexts

26 September 2024·2374 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 KAIST

MAC: Efficiently updates large language models (LLMs) using a memory of compressed contexts for improved real-time knowledge retention and adaptation.

Neural Pose Representation Learning for Generating and Transferring Non-Rigid Object Poses

26 September 2024·3744 words·18 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 KAIST

Learn disentangled 3D object poses and transfer them between different object identities using a novel neural pose representation, boosting 3D shape generation!

Multi-hypotheses Conditioned Point Cloud Diffusion for 3D Human Reconstruction from Occluded Images

26 September 2024·2520 words·12 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 KAIST

MHCDIFF: a novel pipeline using multi-hypotheses conditioned point cloud diffusion for accurate 3D human reconstruction from occluded images, outperforming state-of-the-art methods.

Molecule Generation with Fragment Retrieval Augmentation

26 September 2024·2469 words·12 mins· loading · loading

Machine Learning Deep Learning 🏢 KAIST

f-RAG: A novel fragment-based molecular generation framework boosts drug discovery by combining retrieval augmentation with a generative model, enabling exploration beyond existing fragments and signi…

Model Fusion through Bayesian Optimization in Language Model Fine-Tuning

26 September 2024·3140 words·15 mins· loading · loading

Large Language Models 🏢 KAIST

Bayesian Optimization Model Fusion (BOMF) significantly boosts language model fine-tuning by optimizing both loss and metrics through multi-objective Bayesian optimization, yielding considerable perfo…

Mitigating Covariate Shift in Behavioral Cloning via Robust Stationary Distribution Correction

26 September 2024·2650 words·13 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 KAIST

DrilDICE robustly tackles covariate shift in offline imitation learning by using a stationary distribution correction and a distributionally robust objective, significantly improving performance.

Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

26 September 2024·3263 words·16 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 KAIST

Meteor: Mamba-based Traversal of Rationale achieves significant vision-language improvements by efficiently embedding multifaceted rationales in a large language model, without scaling the model or us…

Learning to Merge Tokens via Decoupled Embedding for Efficient Vision Transformers

26 September 2024·3286 words·16 mins· loading · loading

AI Generated Computer Vision Image Classification 🏢 KAIST

Decoupled Token Embedding for Merging (DTEM) significantly improves Vision Transformer efficiency by using a decoupled embedding module for relaxed token merging, achieving consistent performance gain…

Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models

26 September 2024·3521 words·17 mins· loading · loading

Computer Vision Image Generation 🏢 KAIST

MuDI: a novel framework for multi-subject image personalization, effectively decoupling identities to prevent mixing using segmented subjects and a new evaluation metric.

How Do Large Language Models Acquire Factual Knowledge During Pretraining?

26 September 2024·4632 words·22 mins· loading · loading

Natural Language Processing Large Language Models 🏢 KAIST

LLMs’ factual knowledge acquisition during pretraining is surprisingly non-linear: more data doesn’t guarantee better knowledge retention, and forgetting follows a power law.