🏢 Harbin Institute of Technology
Toward a Stable, Fair, and Comprehensive Evaluation of Object Hallucination in Large Vision-Language Models
·2235 words·11 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 Harbin Institute of Technology
LeHaCE: a novel framework for evaluating object hallucination in LVLMs, improving evaluation stability and fairness by accounting for instruction-induced image description length variations.
Structured Matrix Basis for Multivariate Time Series Forecasting with Interpretable Dynamics
·2313 words·11 mins·
loading
·
loading
AI Generated
Machine Learning
Deep Learning
🏢 Harbin Institute of Technology
Sumba: a novel forecasting model achieves up to 8.5% improvement by using a structured matrix basis to generate dynamic spatial structures with lower variance and better interpretability.
Rethinking Imbalance in Image Super-Resolution for Efficient Inference
·2134 words·11 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 Harbin Institute of Technology
WBSR: A novel framework for efficient image super-resolution that tackles data and model imbalances for superior performance and approximately a 34% reduction in computational cost.
Parameter Competition Balancing for Model Merging
·3629 words·18 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Harbin Institute of Technology
PCB-MERGING: A training-free model merging technique boosts performance by intelligently balancing parameter competition across multiple tasks.
Optimal Transport-based Labor-free Text Prompt Modeling for Sketch Re-identification
·3342 words·16 mins·
loading
·
loading
AI Generated
Computer Vision
Image Re-Identification
🏢 Harbin Institute of Technology
Optimal Transport-based Labor-free Text Prompt Modeling (OLTM) leverages VQA and optimal transport for highly accurate sketch-based person re-identification without manual labeling.
MoGU: A Framework for Enhancing Safety of LLMs While Preserving Their Usability
·2311 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Harbin Institute of Technology
MoGU: A framework dynamically balances safety and usability in LLMs by routing benign and malicious instructions to different LLM variants, leading to safer, more useful responses.
Meaningful Learning: Enhancing Abstract Reasoning in Large Language Models via Generic Fact Guidance
·2532 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Harbin Institute of Technology
Boosting LLMs’ abstract reasoning via ‘Meaningful Learning’: A new dataset and learning paradigm significantly enhance LLMs’ capacity for abstract reasoning, moving beyond simple memorization.
LG-VQ: Language-Guided Codebook Learning
·3656 words·18 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 Harbin Institute of Technology
LG-VQ: A novel language-guided codebook learning framework boosts multi-modal performance.
High-Resolution Image Harmonization with Adaptive-Interval Color Transformation
·3030 words·15 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 Harbin Institute of Technology
AICT: Adaptive-Interval Color Transformation harmonizes high-resolution images by predicting pixel-wise color changes, adaptively adjusting sampling intervals to capture local variations, and using a …
Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration
·2339 words·11 mins·
loading
·
loading
Large Language Models
🏢 Harbin Institute of Technology
DEEPEN: a training-free LLM ensemble framework fusing probability distributions in a relative space to overcome vocabulary misalignment, improving performance consistently across benchmarks.
EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals
·3265 words·16 mins·
loading
·
loading
Machine Learning
Self-Supervised Learning
🏢 Harbin Institute of Technology
EEGPT: A pretrained transformer model revolutionizes EEG signal representation by using a dual self-supervised learning method, achieving state-of-the-art results across various tasks.
Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation
·2382 words·12 mins·
loading
·
loading
Large Language Models
🏢 Harbin Institute of Technology
FUNCODER: a novel code generation framework that uses a divide-and-conquer approach with functional consensus to generate code that meets complex requirements.
Discrete Modeling via Boundary Conditional Diffusion Processes
·2908 words·14 mins·
loading
·
loading
AI Generated
Natural Language Processing
Text Generation
🏢 Harbin Institute of Technology
Bridging the gap between continuous diffusion models and discrete data, this work introduces a novel boundary-conditional approach achieving superior performance in language modeling and image generat…