Posters

ClavaDDPM: Multi-relational Data Synthesis with Cluster-guided Diffusion Models

26 September 2024·1869 words·9 mins· loading · loading

AI Applications Finance 🏢 University of Waterloo

ClavaDDPM synthesizes multi-relational data using cluster-guided diffusion models, efficiently capturing long-range dependencies and outperforming existing methods.

Classifier-guided Gradient Modulation for Enhanced Multimodal Learning

26 September 2024·2128 words·10 mins· loading · loading

Multimodal Learning Multimodal Understanding 🏢 Shanghai AI Lab

Classifier-Guided Gradient Modulation (CGGM) enhances multimodal learning by balancing the training process, considering both gradient magnitude and direction, leading to consistent performance improv…

Classification Done Right for Vision-Language Pre-Training

26 September 2024·1685 words·8 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 ByteDance Research

SuperClass, a novel vision-language pre-training method, achieves superior performance on various downstream tasks by directly using tokenized raw text as supervised classification labels, eliminating…

Classification Diffusion Models: Revitalizing Density Ratio Estimation

26 September 2024·2385 words·12 mins· loading · loading

Computer Vision Image Generation 🏢 Technion - Israel Institute of Technology

Classification Diffusion Models (CDMs) revolutionize density ratio estimation by integrating the strengths of diffusion models and classifiers, achieving state-of-the-art image generation and likeliho…

Class Distribution Shifts in Zero-Shot Learning: Learning Robust Representations

26 September 2024·2470 words·12 mins· loading · loading

AI Theory Representation Learning 🏢 Hebrew University of Jerusalem

Zero-shot learning models often fail in real-world scenarios due to unseen class distribution shifts. This work introduces a novel algorithm that learns robust representations by creating synthetic d…

CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models

26 September 2024·3973 words·19 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 University of New South Wales (UNSW Sydney)

CLAP4CLIP enhances vision-language model continual learning by using probabilistic finetuning, improving performance and uncertainty estimation.

CigTime: Corrective Instruction Generation Through Inverse Motion Editing

26 September 2024·2228 words·11 mins· loading · loading

Natural Language Processing Vision-Language Models 🏢 Hong Kong University of Science and Technology

CigTime generates corrective motion instructions from motion pairs using motion editing and large language models. This innovative approach improves upon baselines by leveraging motion triplets for f…

CIFD: Controlled Information Flow to Enhance Knowledge Distillation

26 September 2024·3139 words·15 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Samsung Research

CIFD, a novel knowledge distillation method, drastically cuts training costs while boosting performance, particularly for large datasets, by using Rate-Distortion Modules instead of Teacher Assistants…

ChronoEpilogi: Scalable Time Series Selection with Multiple Solutions

26 September 2024·2554 words·12 mins· loading · loading

AI Theory Causality 🏢 University of Cergy Paris

ChronoEpilogi efficiently finds all minimal sets of time-series variables optimally predicting a target, improving forecasting while providing crucial insights for knowledge discovery and causal model…

Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models

26 September 2024·2162 words·11 mins· loading · loading

AI Applications Healthcare 🏢 Cornell University

Chimera: a novel 2D state space model effectively captures complex multivariate time series dependencies, achieving superior forecasting, classification, and anomaly detection.

Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models

26 September 2024·1676 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Shanghai University of Finance and Economics

CherryQ, a novel quantization method, leverages parameter heterogeneity in LLMs to achieve superior performance by selectively quantizing less critical parameters while preserving essential ones.

ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model

26 September 2024·1826 words·9 mins· loading · loading

Natural Language Processing Vision-Language Models 🏢 East China Normal University

ChatTracker boosts visual tracking by intelligently using a large language model to refine object descriptions, achieving performance on par with state-of-the-art methods.

ChatQA: Surpassing GPT-4 on Conversational QA and RAG

26 September 2024·4802 words·23 mins· loading · loading

AI Generated Natural Language Processing Question Answering 🏢 NVIDIA

ChatQA, a new suite of models, outperforms GPT-4 in conversational QA and RAG by using a two-stage instruction tuning method and a cost-effective dense retriever.

ChatCam: Empowering Camera Control through Conversational AI

26 September 2024·1805 words·9 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Hong Kong University of Science and Technology

ChatCam empowers users to control cameras via natural language, using CineGPT for text-conditioned trajectory generation and an Anchor Determinator for precise placement, enabling high-quality video r…

Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers

26 September 2024·2344 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Zhejiang University

Chat-Scene: Bridging 3D scenes and LLMs using object identifiers for efficient, object-level interaction and improved scene comprehension.

CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based Multi-Entity Action Recognition

26 September 2024·2356 words·12 mins· loading · loading

Computer Vision Action Recognition 🏢 Sun Yat-Sen University

CHASE: A novel method for skeleton-based multi-entity action recognition that cleverly adapts skeleton positions to minimize data bias and boost accuracy.

Changing the Training Data Distribution to Reduce Simplicity Bias Improves In-distribution Generalization

26 September 2024·3113 words·15 mins· loading · loading

Machine Learning Deep Learning 🏢 UC Los Angeles

Boosting in-distribution generalization is achieved by strategically altering the training data distribution to reduce simplicity bias and promote uniform feature learning.

Challenges of Generating Structurally Diverse Graphs

26 September 2024·2126 words·10 mins· loading · loading

AI Theory Optimization 🏢 HSE University

Researchers developed novel algorithms to generate structurally diverse graphs, improving graph algorithm testing and neural network evaluation.

Chain-of-Thought Reasoning Without Prompting

26 September 2024·2324 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google DeepMind

LLMs can reason effectively without prompting by simply adjusting the decoding process to reveal inherent chain-of-thought paths.

Chain of Thoughtlessness? An Analysis of CoT in Planning

26 September 2024·2944 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Arizona State University

Chain of Thought prompting in LLMs offers limited generalizability, providing performance gains only when prompts are highly specific to problem types; highlighting a critical trade-off between perfor…