Spotlight Others

Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model

26 September 2024·2167 words·11 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Wuhan University

Text-DiFuse: A novel interactive multi-modal image fusion framework leverages text-modulated diffusion models for superior performance in complex scenarios.

Tetrahedron Splatting for 3D Generation

26 September 2024·2346 words·12 mins· loading · loading

3D Vision 🏢 Fudan University

TeT-Splatting: a novel 3D representation enabling fast convergence, real-time rendering, and precise mesh extraction for high-fidelity 3D generation.

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

26 September 2024·2393 words·12 mins· loading · loading

Image Generation 🏢 Nankai University

StoryDiffusion enhances long-range image & video generation by introducing a simple yet effective self-attention mechanism and a semantic motion predictor, achieving high content consistency without t…

Stabilized Proximal-Point Methods for Federated Optimization

26 September 2024·1402 words·7 mins· loading · loading

Federated Learning 🏢 Saarland University

S-DANE & ACC-S-DANE achieve best-known communication complexity for federated learning, improving local computation efficiency via stabilized proximal-point methods.

SpikeReveal: Unlocking Temporal Sequences from Real Blurry Inputs with Spike Streams

26 September 2024·2631 words·13 mins· loading · loading

Image Generation 🏢 Peking University

SpikeReveal: Self-supervised learning unlocks sharp video sequences from blurry, real-world spike camera data, overcoming limitations of prior supervised approaches.

SocraticLM: Exploring Socratic Personalized Teaching with Large Language Models

26 September 2024·1983 words·10 mins· loading · loading

AI Applications Education 🏢 University of Science and Technology of China

SocraticLM achieves a Socratic teaching paradigm, surpassing GPT-4 by 12%, through a novel multi-agent training pipeline and a comprehensive evaluation system.

Slight Corruption in Pre-training Data Makes Better Diffusion Models

26 September 2024·4250 words·20 mins· loading · loading

Image Generation 🏢 Carnegie Mellon University

Slightly corrupting pre-training data significantly improves diffusion models’ image generation quality, diversity, and fidelity.

Skinned Motion Retargeting with Dense Geometric Interaction Perception

26 September 2024·2892 words·14 mins· loading · loading

AI Applications Gaming 🏢 Tsinghua University

MeshRet: A novel retargeting framework that uses dense geometric interaction modeling for realistic, artifact-free skinned character animation.

Semi-Supervised Sparse Gaussian Classification: Provable Benefits of Unlabeled Data

26 September 2024·1347 words·7 mins· loading · loading

Semi-Supervised Learning 🏢 Weizmann Institute of Science

This study proves that combining labeled and unlabeled data significantly improves high-dimensional sparse Gaussian classification, offering a polynomial-time SSL algorithm that outperforms supervised…

Semi-supervised Multi-label Learning with Balanced Binary Angular Margin Loss

26 September 2024·1724 words·9 mins· loading · loading

Semi-Supervised Learning 🏢 College of Computer Science and Technology, Jilin University

S2ML2-BBAM: A new semi-supervised multi-label learning method that balances feature angle distributions to improve accuracy and fairness.

Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences

26 September 2024·2131 words·11 mins· loading · loading

Generative Learning 🏢 University of Toronto

Curated synthetic data provably optimizes human preferences in iterative generative model training, maximizing expected reward while mitigating variance.

SegVol: Universal and Interactive Volumetric Medical Image Segmentation

26 September 2024·2947 words·14 mins· loading · loading

Image Segmentation 🏢 Peking University

SegVol: A universal, interactive 3D medical image segmentation model achieving state-of-the-art performance across diverse anatomical categories.

Second-order forward-mode optimization of recurrent neural networks for neuroscience

26 September 2024·2260 words·11 mins· loading · loading

🏢 University of Cambridge

SOFO: a novel second-order optimizer enables efficient and memory-friendly RNN training for neuroscience tasks, surpassing Adam’s performance, especially on long time horizons.

Schrodinger Bridge Flow for Unpaired Data Translation

26 September 2024·3752 words·18 mins· loading · loading

Transfer Learning 🏢 Google DeepMind

Accelerate unpaired data translation with Schrödinger Bridge Flow, a novel algorithm solving optimal transport problems efficiently without repeatedly training models!

Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation

26 September 2024·2541 words·12 mins· loading · loading

Image Generation 🏢 Shanghai Jiao Tong University

DisCo: a novel framework for generalizable complex image generation using scene graph disentanglement and composition, achieving superior performance over existing methods.

Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers

26 September 2024·2614 words·13 mins· loading · loading

AI Applications Robotics 🏢 Massachusetts Institute of Technology

Heterogeneous Pre-trained Transformers (HPT) enables robots to learn generalizable policies from diverse data, drastically improving performance on unseen tasks.

Scaling Continuous Latent Variable Models as Probabilistic Integral Circuits

26 September 2024·2531 words·12 mins· loading · loading

🏢 Eindhoven University of Technology

Researchers scaled continuous latent variable models by building DAG-shaped probabilistic integral circuits (PICs) and training them efficiently using tensorized architectures and neural functional sh…

SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection

26 September 2024·2771 words·14 mins· loading · loading

Object Detection 🏢 Nankai University

SARDet-100K: A new benchmark dataset and open-source toolkit revolutionizes large-scale SAR object detection.

Saliency-driven Experience Replay for Continual Learning

26 September 2024·2628 words·13 mins· loading · loading

Image Classification 🏢 University of Catania

Boosting AI’s continual learning via saliency-driven experience replay, achieving up to 20% accuracy improvement.

SA3DIP: Segment Any 3D Instance with Potential 3D Priors

26 September 2024·1792 words·9 mins· loading · loading

3D Vision 🏢 Xidian University

SA3DIP boosts 3D instance segmentation accuracy by cleverly using 3D spatial and textural cues alongside 2D multi-view masks, overcoming limitations of previous methods.