Spotlight Others

X-Ray: A Sequential 3D Representation For Generation

26 September 2024·2206 words·11 mins· loading · loading

3D Vision 🏢 National University of Singapore

X-Ray: A novel 3D representation generating complete object surfaces from a single image!

Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)

26 September 2024·3423 words·17 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 University of California, Santa Barbara

T2IScoreScore objectively evaluates text-to-image prompt faithfulness metrics using semantic error graphs, revealing that simpler metrics surprisingly outperform complex, computationally expensive one…

Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection

26 September 2024·2040 words·10 mins· loading · loading

3D Vision 🏢 Hong Kong Polytechnic University

Voxel Mamba: a group-free 3D object detection method using state space models, achieving higher accuracy and efficiency by overcoming limitations of serialization-based Transformers.

Voila-A: Aligning Vision-Language Models with User's Gaze Attention

26 September 2024·2566 words·13 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 SKLSDE Lab, Beihang University

Voila-A enhances vision-language models by aligning their attention with user gaze, improving real-world application effectiveness and interpretability.

VMamba: Visual State Space Model

26 September 2024·2891 words·14 mins· loading · loading

Image Classification 🏢 University of Chinese Academy of Sciences

VMamba: a vision backbone achieving linear time complexity using Visual State Space (VSS) blocks and 2D Selective Scan (SS2D) for efficient visual representation.

VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought

26 September 2024·2266 words·11 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Carnegie Mellon University

VLMs learn to generate their own memories by abstracting experiences from noisy demonstrations and human feedback, significantly boosting in-context learning performance.

Unveiling Encoder-Free Vision-Language Models

26 September 2024·2435 words·12 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Peking University

EVE, a groundbreaking encoder-free vision-language model, rivals encoder-based counterparts using a fraction of the data and resources, demonstrating efficient, transparent training for pure decoder-o…

Unitary Convolutions for Learning on Graphs and Groups

26 September 2024·2134 words·11 mins· loading · loading

🏢 Harvard University

Stable deep learning on graphs achieved using novel unitary group convolutions, preventing over-smoothing and enhancing model robustness.

Unified Gradient-Based Machine Unlearning with Remain Geometry Enhancement

26 September 2024·3015 words·15 mins· loading · loading

Image Generation 🏢 Shanghai Jiao Tong University

Enhance deep neural network privacy and trustworthiness with unified gradient-based machine unlearning, leveraging remain geometry for efficient forgetting and performance preservation.

Trajectory Flow Matching with Applications to Clinical Time Series Modelling

26 September 2024·1814 words·9 mins· loading · loading

AI Applications Healthcare 🏢 McGill University

Simulation-free Neural SDE training via Trajectory Flow Matching unlocks scalability and stability for modeling complex real-world time series, particularly in clinical settings.

TrackIME: Enhanced Video Point Tracking via Instance Motion Estimation

26 September 2024·2140 words·11 mins· loading · loading

Video Understanding 🏢 KAIST

TrackIME enhances video point tracking by cleverly pruning the search space, resulting in improved accuracy and efficiency.

Towards Universal Mesh Movement Networks

26 September 2024·2599 words·13 mins· loading · loading

🏢 Imperial College London

Universal Mesh Movement Network (UM2N) revolutionizes mesh movement for PDE solvers, enabling zero-shot adaptation to diverse problems and significantly accelerating simulations with improved accuracy…

Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration

26 September 2024·1888 words·9 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Zhejiang University

UniKE: A unified multimodal editing method achieves superior reliability, generality, and locality by disentangling knowledge into semantic and truthfulness spaces, enabling enhanced collaboration bet…

Towards Understanding Evolving Patterns in Sequential Data

26 September 2024·1866 words·9 mins· loading · loading

🏢 Western University

EVORATE quantifies evolving patterns in sequential data, enabling better model selection and temporal analysis for improved machine learning.

Towards training digitally-tied analog blocks via hybrid gradient computation

26 September 2024·2580 words·13 mins· loading · loading

🏢 Montreal Institute of Learning Algorithms

Hybrid neural networks, combining digital feedforward and analog energy-based blocks, are trained end-to-end via a novel BP-EP gradient chaining algorithm, achieving state-of-the-art results on ImageN…

Tolerant Algorithms for Learning with Arbitrary Covariate Shift

26 September 2024·419 words·2 mins· loading · loading

🏢 University of Pennsylvania

This paper introduces efficient algorithms for learning under arbitrary covariate shift, addressing limitations of prior approaches by enabling classifiers to abstain from predictions in high-shift sc…

The ALCHEmist: Automated Labeling 500x CHEaper than LLM Data Annotators

26 September 2024·2284 words·11 mins· loading · loading

🏢 University of Wisconsin-Madison

Alchemist, a novel automated labeling system, reduces data annotation costs by 500x compared to LLMs while improving accuracy by an average of 12.9%.

TFG: Unified Training-Free Guidance for Diffusion Models

26 September 2024·3585 words·17 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Stanford University

TFG: A unified, training-free framework for boosting diffusion model performance by efficiently searching its algorithm-agnostic design space.

TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control

26 September 2024·2216 words·11 mins· loading · loading

Image Generation 🏢 Institute of Information Engineering, Chinese Academy of Sciences

TextCtrl: a novel diffusion-based scene text editing method using prior guidance control, achieving superior style fidelity and accuracy with a new real-world benchmark dataset, ScenePair.

Text2CAD: Generating Sequential CAD Designs from Beginner-to-Expert Level Text Prompts

26 September 2024·2351 words·12 mins· loading · loading

AI Applications Manufacturing 🏢 DFKI

Text2CAD: AI generates CAD models from text prompts!