Posters
2024
Segment, Shuffle, and Stitch: A Simple Layer for Improving Time-Series Representations
·3043 words·15 mins·
loading
·
loading
Machine Learning
Representation Learning
π’ Queen's University
Boost time-series model accuracy with Segment, Shuffle, and Stitch (S3)! This simple layer shuffles data segments to enhance representation learning, improving classification, forecasting, and anomaly…
Segment Anything without Supervision
·1959 words·10 mins·
loading
·
loading
Computer Vision
Image Segmentation
π’ UC Berkeley
Unsupervised SAM (UnSAM) achieves competitive image segmentation results without human annotation, surpassing previous unsupervised methods and even improving supervised SAM’s accuracy.
Segment Any Change
·2244 words·11 mins·
loading
·
loading
Computer Vision
Image Segmentation
π’ Stanford University
AnyChange achieves zero-shot image change detection by adapting the Segment Anything Model (SAM) via a training-free bitemporal latent matching method, significantly outperforming previous state-of-th…
SEEV: Synthesis with Efficient Exact Verification for ReLU Neural Barrier Functions
·1687 words·8 mins·
loading
·
loading
AI Theory
Safety
π’ Washington University in St. Louis
SEEV framework efficiently verifies ReLU neural barrier functions by reducing activation regions and using tight over-approximations, significantly improving verification efficiency without sacrificin…
Seek Commonality but Preserve Differences: Dissected Dynamics Modeling for Multi-modal Visual RL
·2815 words·14 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
π’ Peking University
Dissected Dynamics Modeling (DDM) excels at multi-modal visual reinforcement learning by cleverly separating and integrating common and unique features across different sensory inputs for more accurat…
Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
·3346 words·16 mins·
loading
·
loading
AI Generated
Multimodal Learning
Vision-Language Models
π’ ByteDance Inc.
Boosting vision-language model performance, Contrastive ALignment (CAL) prioritizes visually correlated text tokens during training via a simple, computationally efficient re-weighting strategy, signi…
Seeing Beyond the Crop: Using Language Priors for Out-of-Bounding Box Keypoint Prediction
·2045 words·10 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
π’ University of Waterloo
TokenCLIPose leverages language priors to predict human keypoints beyond bounding boxes, improving pose estimation accuracy significantly on ice hockey, lacrosse and CrowdPose datasets.
Secret Collusion among AI Agents: Multi-Agent Deception via Steganography
·5189 words·25 mins·
loading
·
loading
AI Generated
AI Theory
Safety
π’ UC Berkeley
AI agents can secretly collude using steganography, hiding their interactions from oversight. This research formalizes this threat, analyzes LLMs’ capabilities, and proposes mitigation strategies.
SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge
·2158 words·11 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
π’ Shanghai AI Laboratory
SearchLVLMs: A plug-and-play framework efficiently augments large vision-language models with up-to-date internet knowledge via hierarchical filtering, significantly improving accuracy on visual quest…
Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices
·2763 words·13 mins·
loading
·
loading
AI Generated
Machine Learning
Deep Learning
π’ New York University
Revolutionizing large neural networks, this paper introduces a continuous parameterization of structured matrices, discovering that full-rank structures without parameter sharing achieve optimal scali…
Search for Efficient Large Language Models
·2477 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ Northeastern University
Training-free architecture search finds optimal subnets in LLMs, boosting inference speed and slashing memory needs without retraining.
SE(3)-bi-equivariant Transformers for Point Cloud Assembly
·3085 words·15 mins·
loading
·
loading
AI Generated
Computer Vision
3D Vision
π’ University of Gothenburg
SE(3)-bi-equivariant Transformers (BITR) revolutionizes point cloud assembly by guaranteeing robust alignment even with non-overlapping clouds, thanks to its unique equivariance properties.
SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
·2596 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ Indiana University
SDP4Bit achieves up to 4.08x speedup in LLM training by quantizing weight differences and gradients to ~4 bits, maintaining accuracy.
SCube: Instant Large-Scale Scene Reconstruction using VoxSplats
·3116 words·15 mins·
loading
·
loading
Computer Vision
3D Vision
π’ University of Toronto
SCube: Instant large-scale 3D scene reconstruction from sparse images using VoxSplats, a novel 3D Gaussian splat representation.
SCOREQ: Speech Quality Assessment with Contrastive Regression
·2555 words·12 mins·
loading
·
loading
Speech and Audio
Speech Quality Assessment
π’ University College Dublin
SCOREQ: a novel triplet loss contrastive regression approach for superior speech quality prediction, addressing generalization issues in no-reference metrics.
Score-Optimal Diffusion Schedules
·2200 words·11 mins·
loading
·
loading
Machine Learning
Deep Learning
π’ University of Oxford
Researchers developed a novel algorithm to automatically find optimal schedules for denoising diffusion models (DDMs), significantly improving sample quality and efficiency without manual parameter tu…
Score-based generative models are provably robust: an uncertainty quantification perspective
·293 words·2 mins·
loading
·
loading
AI Theory
Robustness
π’ UniversitΓ© CΓ΄te D'Azur
Score-based generative models are provably robust to multiple error sources, as shown via a novel Wasserstein uncertainty propagation theorem.
Score-based 3D molecule generation with neural fields
·4106 words·20 mins·
loading
·
loading
AI Generated
Machine Learning
Deep Learning
π’ Prescient Design
FuncMol: A new neural field model generates 3D molecules efficiently, outperforming existing methods by achieving an order of magnitude faster sampling speed.
Score Distillation via Reparametrized DDIM
·4128 words·20 mins·
loading
·
loading
Computer Vision
Image Generation
π’ MIT
Researchers improved 3D shape generation from 2D diffusion models by showing that existing Score Distillation Sampling is a reparameterized version of DDIM and fixing its high-variance noise issue via…
Schur Nets: exploiting local structure for equivariance in higher order graph neural networks
·1825 words·9 mins·
loading
·
loading
AI Theory
Representation Learning
π’ University of Chicago
Schur Nets boost higher-order GNNs by efficiently exploiting local graph structure for automorphism equivariance, achieving improved performance without the computational burden of traditional methods…