🏢 Harbin Institute of Technology

Toward a Stable, Fair, and Comprehensive Evaluation of Object Hallucination in Large Vision-Language Models

26 September 2024·2235 words·11 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Harbin Institute of Technology

LeHaCE: a novel framework for evaluating object hallucination in LVLMs, improving evaluation stability and fairness by accounting for instruction-induced image description length variations.

Structured Matrix Basis for Multivariate Time Series Forecasting with Interpretable Dynamics

26 September 2024·2313 words·11 mins· loading · loading

AI Generated Machine Learning Deep Learning 🏢 Harbin Institute of Technology

Sumba: a novel forecasting model achieves up to 8.5% improvement by using a structured matrix basis to generate dynamic spatial structures with lower variance and better interpretability.

Rethinking Imbalance in Image Super-Resolution for Efficient Inference

26 September 2024·2134 words·11 mins· loading · loading

Computer Vision Image Generation 🏢 Harbin Institute of Technology

WBSR: A novel framework for efficient image super-resolution that tackles data and model imbalances for superior performance and approximately a 34% reduction in computational cost.

Parameter Competition Balancing for Model Merging

26 September 2024·3629 words·18 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Harbin Institute of Technology

PCB-MERGING: A training-free model merging technique boosts performance by intelligently balancing parameter competition across multiple tasks.

Optimal Transport-based Labor-free Text Prompt Modeling for Sketch Re-identification

26 September 2024·3342 words·16 mins· loading · loading

AI Generated Computer Vision Image Re-Identification 🏢 Harbin Institute of Technology

Optimal Transport-based Labor-free Text Prompt Modeling (OLTM) leverages VQA and optimal transport for highly accurate sketch-based person re-identification without manual labeling.

MoGU: A Framework for Enhancing Safety of LLMs While Preserving Their Usability

26 September 2024·2311 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Harbin Institute of Technology

MoGU: A framework dynamically balances safety and usability in LLMs by routing benign and malicious instructions to different LLM variants, leading to safer, more useful responses.

Meaningful Learning: Enhancing Abstract Reasoning in Large Language Models via Generic Fact Guidance

26 September 2024·2532 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Harbin Institute of Technology

Boosting LLMs’ abstract reasoning via ‘Meaningful Learning’: A new dataset and learning paradigm significantly enhance LLMs’ capacity for abstract reasoning, moving beyond simple memorization.

LG-VQ: Language-Guided Codebook Learning

26 September 2024·3656 words·18 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Harbin Institute of Technology

LG-VQ: A novel language-guided codebook learning framework boosts multi-modal performance.

High-Resolution Image Harmonization with Adaptive-Interval Color Transformation

26 September 2024·3030 words·15 mins· loading · loading

Computer Vision Image Generation 🏢 Harbin Institute of Technology

AICT: Adaptive-Interval Color Transformation harmonizes high-resolution images by predicting pixel-wise color changes, adaptively adjusting sampling intervals to capture local variations, and using a …

Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration

26 September 2024·2339 words·11 mins· loading · loading

Large Language Models 🏢 Harbin Institute of Technology

DEEPEN: a training-free LLM ensemble framework fusing probability distributions in a relative space to overcome vocabulary misalignment, improving performance consistently across benchmarks.

EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals

26 September 2024·3265 words·16 mins· loading · loading

Machine Learning Self-Supervised Learning 🏢 Harbin Institute of Technology

EEGPT: A pretrained transformer model revolutionizes EEG signal representation by using a dual self-supervised learning method, achieving state-of-the-art results across various tasks.

Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation

26 September 2024·2382 words·12 mins· loading · loading

Large Language Models 🏢 Harbin Institute of Technology

FUNCODER: a novel code generation framework that uses a divide-and-conquer approach with functional consensus to generate code that meets complex requirements.

Discrete Modeling via Boundary Conditional Diffusion Processes

26 September 2024·2908 words·14 mins· loading · loading

AI Generated Natural Language Processing Text Generation 🏢 Harbin Institute of Technology

Bridging the gap between continuous diffusion models and discrete data, this work introduces a novel boundary-conditional approach achieving superior performance in language modeling and image generat…