Spotlight Others
2024
Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model
·2167 words·11 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
π’ Wuhan University
Text-DiFuse: A novel interactive multi-modal image fusion framework leverages text-modulated diffusion models for superior performance in complex scenarios.
Tetrahedron Splatting for 3D Generation
·2346 words·12 mins·
loading
·
loading
3D Vision
π’ Fudan University
TeT-Splatting: a novel 3D representation enabling fast convergence, real-time rendering, and precise mesh extraction for high-fidelity 3D generation.
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
·2393 words·12 mins·
loading
·
loading
Image Generation
π’ Nankai University
StoryDiffusion enhances long-range image & video generation by introducing a simple yet effective self-attention mechanism and a semantic motion predictor, achieving high content consistency without t…
Stabilized Proximal-Point Methods for Federated Optimization
·1402 words·7 mins·
loading
·
loading
Federated Learning
π’ Saarland University
S-DANE & ACC-S-DANE achieve best-known communication complexity for federated learning, improving local computation efficiency via stabilized proximal-point methods.
SpikeReveal: Unlocking Temporal Sequences from Real Blurry Inputs with Spike Streams
·2631 words·13 mins·
loading
·
loading
Image Generation
π’ Peking University
SpikeReveal: Self-supervised learning unlocks sharp video sequences from blurry, real-world spike camera data, overcoming limitations of prior supervised approaches.
SocraticLM: Exploring Socratic Personalized Teaching with Large Language Models
·1983 words·10 mins·
loading
·
loading
AI Applications
Education
π’ University of Science and Technology of China
SocraticLM achieves a Socratic teaching paradigm, surpassing GPT-4 by 12%, through a novel multi-agent training pipeline and a comprehensive evaluation system.
Slight Corruption in Pre-training Data Makes Better Diffusion Models
·4250 words·20 mins·
loading
·
loading
Image Generation
π’ Carnegie Mellon University
Slightly corrupting pre-training data significantly improves diffusion models’ image generation quality, diversity, and fidelity.
Skinned Motion Retargeting with Dense Geometric Interaction Perception
·2892 words·14 mins·
loading
·
loading
AI Applications
Gaming
π’ Tsinghua University
MeshRet: A novel retargeting framework that uses dense geometric interaction modeling for realistic, artifact-free skinned character animation.
Semi-Supervised Sparse Gaussian Classification: Provable Benefits of Unlabeled Data
·1347 words·7 mins·
loading
·
loading
Semi-Supervised Learning
π’ Weizmann Institute of Science
This study proves that combining labeled and unlabeled data significantly improves high-dimensional sparse Gaussian classification, offering a polynomial-time SSL algorithm that outperforms supervised…
Semi-supervised Multi-label Learning with Balanced Binary Angular Margin Loss
·1724 words·9 mins·
loading
·
loading
Semi-Supervised Learning
π’ College of Computer Science and Technology, Jilin University
S2ML2-BBAM: A new semi-supervised multi-label learning method that balances feature angle distributions to improve accuracy and fairness.
Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences
·2131 words·11 mins·
loading
·
loading
Generative Learning
π’ University of Toronto
Curated synthetic data provably optimizes human preferences in iterative generative model training, maximizing expected reward while mitigating variance.
SegVol: Universal and Interactive Volumetric Medical Image Segmentation
·2947 words·14 mins·
loading
·
loading
Image Segmentation
π’ Peking University
SegVol: A universal, interactive 3D medical image segmentation model achieving state-of-the-art performance across diverse anatomical categories.
Second-order forward-mode optimization of recurrent neural networks for neuroscience
·2260 words·11 mins·
loading
·
loading
π’ University of Cambridge
SOFO: a novel second-order optimizer enables efficient and memory-friendly RNN training for neuroscience tasks, surpassing Adam’s performance, especially on long time horizons.
Schrodinger Bridge Flow for Unpaired Data Translation
·3752 words·18 mins·
loading
·
loading
Transfer Learning
π’ Google DeepMind
Accelerate unpaired data translation with SchrΓΆdinger Bridge Flow, a novel algorithm solving optimal transport problems efficiently without repeatedly training models!
Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation
·2541 words·12 mins·
loading
·
loading
Image Generation
π’ Shanghai Jiao Tong University
DisCo: a novel framework for generalizable complex image generation using scene graph disentanglement and composition, achieving superior performance over existing methods.
Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers
·2614 words·13 mins·
loading
·
loading
AI Applications
Robotics
π’ Massachusetts Institute of Technology
Heterogeneous Pre-trained Transformers (HPT) enables robots to learn generalizable policies from diverse data, drastically improving performance on unseen tasks.
Scaling Continuous Latent Variable Models as Probabilistic Integral Circuits
·2531 words·12 mins·
loading
·
loading
π’ Eindhoven University of Technology
Researchers scaled continuous latent variable models by building DAG-shaped probabilistic integral circuits (PICs) and training them efficiently using tensorized architectures and neural functional sh…
SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection
·2771 words·14 mins·
loading
·
loading
Object Detection
π’ Nankai University
SARDet-100K: A new benchmark dataset and open-source toolkit revolutionizes large-scale SAR object detection.
Saliency-driven Experience Replay for Continual Learning
·2628 words·13 mins·
loading
·
loading
Image Classification
π’ University of Catania
Boosting AI’s continual learning via saliency-driven experience replay, achieving up to 20% accuracy improvement.
SA3DIP: Segment Any 3D Instance with Potential 3D Priors
·1792 words·9 mins·
loading
·
loading
3D Vision
π’ Xidian University
SA3DIP boosts 3D instance segmentation accuracy by cleverly using 3D spatial and textural cues alongside 2D multi-view masks, overcoming limitations of previous methods.