Skip to main content

Posters

2024

ClavaDDPM: Multi-relational Data Synthesis with Cluster-guided Diffusion Models
·1869 words·9 mins· loading · loading
AI Applications Finance 🏢 University of Waterloo
ClavaDDPM synthesizes multi-relational data using cluster-guided diffusion models, efficiently capturing long-range dependencies and outperforming existing methods.
Classifier-guided Gradient Modulation for Enhanced Multimodal Learning
·2128 words·10 mins· loading · loading
Multimodal Learning Multimodal Understanding 🏢 Shanghai AI Lab
Classifier-Guided Gradient Modulation (CGGM) enhances multimodal learning by balancing the training process, considering both gradient magnitude and direction, leading to consistent performance improv…
Classification Done Right for Vision-Language Pre-Training
·1685 words·8 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 ByteDance Research
SuperClass, a novel vision-language pre-training method, achieves superior performance on various downstream tasks by directly using tokenized raw text as supervised classification labels, eliminating…
Classification Diffusion Models: Revitalizing Density Ratio Estimation
·2385 words·12 mins· loading · loading
Computer Vision Image Generation 🏢 Technion - Israel Institute of Technology
Classification Diffusion Models (CDMs) revolutionize density ratio estimation by integrating the strengths of diffusion models and classifiers, achieving state-of-the-art image generation and likeliho…
Class Distribution Shifts in Zero-Shot Learning: Learning Robust Representations
·2470 words·12 mins· loading · loading
AI Theory Representation Learning 🏢 Hebrew University of Jerusalem
Zero-shot learning models often fail in real-world scenarios due to unseen class distribution shifts. This work introduces a novel algorithm that learns robust representations by creating synthetic d…
CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models
·3973 words·19 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 University of New South Wales (UNSW Sydney)
CLAP4CLIP enhances vision-language model continual learning by using probabilistic finetuning, improving performance and uncertainty estimation.
CigTime: Corrective Instruction Generation Through Inverse Motion Editing
·2228 words·11 mins· loading · loading
Natural Language Processing Vision-Language Models 🏢 Hong Kong University of Science and Technology
CigTime generates corrective motion instructions from motion pairs using motion editing and large language models. This innovative approach improves upon baselines by leveraging motion triplets for f…
CIFD: Controlled Information Flow to Enhance Knowledge Distillation
·3139 words·15 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 Samsung Research
CIFD, a novel knowledge distillation method, drastically cuts training costs while boosting performance, particularly for large datasets, by using Rate-Distortion Modules instead of Teacher Assistants…
ChronoEpilogi: Scalable Time Series Selection with Multiple Solutions
·2554 words·12 mins· loading · loading
AI Theory Causality 🏢 University of Cergy Paris
ChronoEpilogi efficiently finds all minimal sets of time-series variables optimally predicting a target, improving forecasting while providing crucial insights for knowledge discovery and causal model…
Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models
·2162 words·11 mins· loading · loading
AI Applications Healthcare 🏢 Cornell University
Chimera: a novel 2D state space model effectively captures complex multivariate time series dependencies, achieving superior forecasting, classification, and anomaly detection.
Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models
·1676 words·8 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Shanghai University of Finance and Economics
CherryQ, a novel quantization method, leverages parameter heterogeneity in LLMs to achieve superior performance by selectively quantizing less critical parameters while preserving essential ones.
ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model
·1826 words·9 mins· loading · loading
Natural Language Processing Vision-Language Models 🏢 East China Normal University
ChatTracker boosts visual tracking by intelligently using a large language model to refine object descriptions, achieving performance on par with state-of-the-art methods.
ChatQA: Surpassing GPT-4 on Conversational QA and RAG
·4802 words·23 mins· loading · loading
AI Generated Natural Language Processing Question Answering 🏢 NVIDIA
ChatQA, a new suite of models, outperforms GPT-4 in conversational QA and RAG by using a two-stage instruction tuning method and a cost-effective dense retriever.
ChatCam: Empowering Camera Control through Conversational AI
·1805 words·9 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 Hong Kong University of Science and Technology
ChatCam empowers users to control cameras via natural language, using CineGPT for text-conditioned trajectory generation and an Anchor Determinator for precise placement, enabling high-quality video r…
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers
·2344 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Zhejiang University
Chat-Scene: Bridging 3D scenes and LLMs using object identifiers for efficient, object-level interaction and improved scene comprehension.
CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based Multi-Entity Action Recognition
·2356 words·12 mins· loading · loading
Computer Vision Action Recognition 🏢 Sun Yat-Sen University
CHASE: A novel method for skeleton-based multi-entity action recognition that cleverly adapts skeleton positions to minimize data bias and boost accuracy.
Changing the Training Data Distribution to Reduce Simplicity Bias Improves In-distribution Generalization
·3113 words·15 mins· loading · loading
Machine Learning Deep Learning 🏢 UC Los Angeles
Boosting in-distribution generalization is achieved by strategically altering the training data distribution to reduce simplicity bias and promote uniform feature learning.
Challenges of Generating Structurally Diverse Graphs
·2126 words·10 mins· loading · loading
AI Theory Optimization 🏢 HSE University
Researchers developed novel algorithms to generate structurally diverse graphs, improving graph algorithm testing and neural network evaluation.
Chain-of-Thought Reasoning Without Prompting
·2324 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Google DeepMind
LLMs can reason effectively without prompting by simply adjusting the decoding process to reveal inherent chain-of-thought paths.
Chain of Thoughtlessness? An Analysis of CoT in Planning
·2944 words·14 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Arizona State University
Chain of Thought prompting in LLMs offers limited generalizability, providing performance gains only when prompts are highly specific to problem types; highlighting a critical trade-off between perfor…