Spotlight Others
2024
X-Ray: A Sequential 3D Representation For Generation
·2206 words·11 mins·
loading
·
loading
3D Vision
🏢 National University of Singapore
X-Ray: A novel 3D representation generating complete object surfaces from a single image!
Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)
·3423 words·17 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 University of California, Santa Barbara
T2IScoreScore objectively evaluates text-to-image prompt faithfulness metrics using semantic error graphs, revealing that simpler metrics surprisingly outperform complex, computationally expensive one…
Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection
·2040 words·10 mins·
loading
·
loading
3D Vision
🏢 Hong Kong Polytechnic University
Voxel Mamba: a group-free 3D object detection method using state space models, achieving higher accuracy and efficiency by overcoming limitations of serialization-based Transformers.
Voila-A: Aligning Vision-Language Models with User's Gaze Attention
·2566 words·13 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 SKLSDE Lab, Beihang University
Voila-A enhances vision-language models by aligning their attention with user gaze, improving real-world application effectiveness and interpretability.
VMamba: Visual State Space Model
·2891 words·14 mins·
loading
·
loading
Image Classification
🏢 University of Chinese Academy of Sciences
VMamba: a vision backbone achieving linear time complexity using Visual State Space (VSS) blocks and 2D Selective Scan (SS2D) for efficient visual representation.
VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought
·2266 words·11 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 Carnegie Mellon University
VLMs learn to generate their own memories by abstracting experiences from noisy demonstrations and human feedback, significantly boosting in-context learning performance.
Unveiling Encoder-Free Vision-Language Models
·2435 words·12 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 Peking University
EVE, a groundbreaking encoder-free vision-language model, rivals encoder-based counterparts using a fraction of the data and resources, demonstrating efficient, transparent training for pure decoder-o…
Unitary Convolutions for Learning on Graphs and Groups
·2134 words·11 mins·
loading
·
loading
🏢 Harvard University
Stable deep learning on graphs achieved using novel unitary group convolutions, preventing over-smoothing and enhancing model robustness.
Unified Gradient-Based Machine Unlearning with Remain Geometry Enhancement
·3015 words·15 mins·
loading
·
loading
Image Generation
🏢 Shanghai Jiao Tong University
Enhance deep neural network privacy and trustworthiness with unified gradient-based machine unlearning, leveraging remain geometry for efficient forgetting and performance preservation.
Trajectory Flow Matching with Applications to Clinical Time Series Modelling
·1814 words·9 mins·
loading
·
loading
AI Applications
Healthcare
🏢 McGill University
Simulation-free Neural SDE training via Trajectory Flow Matching unlocks scalability and stability for modeling complex real-world time series, particularly in clinical settings.
TrackIME: Enhanced Video Point Tracking via Instance Motion Estimation
·2140 words·11 mins·
loading
·
loading
Video Understanding
🏢 KAIST
TrackIME enhances video point tracking by cleverly pruning the search space, resulting in improved accuracy and efficiency.
Towards Universal Mesh Movement Networks
·2599 words·13 mins·
loading
·
loading
🏢 Imperial College London
Universal Mesh Movement Network (UM2N) revolutionizes mesh movement for PDE solvers, enabling zero-shot adaptation to diverse problems and significantly accelerating simulations with improved accuracy…
Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration
·1888 words·9 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 Zhejiang University
UniKE: A unified multimodal editing method achieves superior reliability, generality, and locality by disentangling knowledge into semantic and truthfulness spaces, enabling enhanced collaboration bet…
Towards Understanding Evolving Patterns in Sequential Data
·1866 words·9 mins·
loading
·
loading
🏢 Western University
EVORATE quantifies evolving patterns in sequential data, enabling better model selection and temporal analysis for improved machine learning.
Towards training digitally-tied analog blocks via hybrid gradient computation
·2580 words·13 mins·
loading
·
loading
🏢 Montreal Institute of Learning Algorithms
Hybrid neural networks, combining digital feedforward and analog energy-based blocks, are trained end-to-end via a novel BP-EP gradient chaining algorithm, achieving state-of-the-art results on ImageN…
Tolerant Algorithms for Learning with Arbitrary Covariate Shift
·419 words·2 mins·
loading
·
loading
🏢 University of Pennsylvania
This paper introduces efficient algorithms for learning under arbitrary covariate shift, addressing limitations of prior approaches by enabling classifiers to abstain from predictions in high-shift sc…
The ALCHEmist: Automated Labeling 500x CHEaper than LLM Data Annotators
·2284 words·11 mins·
loading
·
loading
🏢 University of Wisconsin-Madison
Alchemist, a novel automated labeling system, reduces data annotation costs by 500x compared to LLMs while improving accuracy by an average of 12.9%.
TFG: Unified Training-Free Guidance for Diffusion Models
·3585 words·17 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 Stanford University
TFG: A unified, training-free framework for boosting diffusion model performance by efficiently searching its algorithm-agnostic design space.
TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control
·2216 words·11 mins·
loading
·
loading
Image Generation
🏢 Institute of Information Engineering, Chinese Academy of Sciences
TextCtrl: a novel diffusion-based scene text editing method using prior guidance control, achieving superior style fidelity and accuracy with a new real-world benchmark dataset, ScenePair.
Text2CAD: Generating Sequential CAD Designs from Beginner-to-Expert Level Text Prompts
·2351 words·12 mins·
loading
·
loading
AI Applications
Manufacturing
🏢 DFKI
Text2CAD: AI generates CAD models from text prompts!