🏢 University of Science and Technology of China

VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation

26 September 2024·2766 words·13 mins· loading · loading

AI Generated Multimodal Learning Vision-Language Models 🏢 University of Science and Technology of China

VIDEOLLM-MOD boosts online video-language model efficiency by selectively skipping redundant vision token computations, achieving ~42% faster training and ~30% memory savings without sacrificing perfo…

Uncertainty-based Offline Variational Bayesian Reinforcement Learning for Robustness under Diverse Data Corruptions

26 September 2024·2152 words·11 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 University of Science and Technology of China

TRACER, a novel robust offline RL algorithm, uses Bayesian inference to handle uncertainty from diverse data corruptions, significantly outperforming existing methods.

Towards Next-Generation Logic Synthesis: A Scalable Neural Circuit Generation Framework

26 September 2024·2522 words·12 mins· loading · loading

AI Applications Hardware Design 🏢 University of Science and Technology of China

A novel regularized triangle-shaped neural network framework, T-Net, achieves highly accurate and scalable logic circuit generation, significantly outperforming existing methods.

Towards Neuron Attributions in Multi-Modal Large Language Models

26 September 2024·1551 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Science and Technology of China

NAM: a novel neuron attribution method for MLLMs, revealing modality-specific semantic knowledge and enabling multi-modal knowledge editing.

Towards Accurate and Fair Cognitive Diagnosis via Monotonic Data Augmentation

26 September 2024·2089 words·10 mins· loading · loading

AI Applications Education 🏢 University of Science and Technology of China

CMCD framework tackles data sparsity in cognitive diagnosis by using monotonic data augmentation to improve accuracy and fairness of diagnostic results.

Toward Dynamic Non-Line-of-Sight Imaging with Mamba Enforced Temporal Consistency

26 September 2024·2152 words·11 mins· loading · loading

Computer Vision 3D Vision 🏢 University of Science and Technology of China

Dynamic NLOS imaging gets a speed boost! New ST-Mamba method leverages temporal consistency across frames for high-resolution video reconstruction, overcoming speed limitations of traditional methods.

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy

26 September 2024·2422 words·12 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 University of Science and Technology of China

TabPedia: a novel large vision-language model, achieves superior visual table understanding by seamlessly integrating diverse tasks via a concept synergy mechanism and a new benchmark.

SocraticLM: Exploring Socratic Personalized Teaching with Large Language Models

26 September 2024·1983 words·10 mins· loading · loading

AI Applications Education 🏢 University of Science and Technology of China

SocraticLM achieves a Socratic teaching paradigm, surpassing GPT-4 by 12%, through a novel multi-agent training pipeline and a comprehensive evaluation system.

Neural Krylov Iteration for Accelerating Linear System Solving

26 September 2024·2149 words·11 mins· loading · loading

🏢 University of Science and Technology of China

Neural Krylov Iteration (NeurKItt) accelerates linear system solving by using a neural operator to predict invariant subspaces, drastically reducing iteration counts and computation time.

MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting

26 September 2024·3095 words·15 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 University of Science and Technology of China

MotionGS enhances deformable 3D Gaussian splatting for dynamic scenes by using motion flow to guide deformation, significantly improving reconstruction accuracy and outperforming state-of-the-art meth…

MILP-StuDio: MILP Instance Generation via Block Structure Decomposition

26 September 2024·3596 words·17 mins· loading · loading

AI Theory Optimization 🏢 University of Science and Technology of China

MILP-StuDio generates high-quality mixed-integer linear programming instances by preserving crucial block structures, significantly improving learning-based solver performance.

Masked Pre-training Enables Universal Zero-shot Denoiser

26 September 2024·4914 words·24 mins· loading · loading

Computer Vision Image Generation 🏢 University of Science and Technology of China

Masked Pre-training empowers a universal, fast zero-shot image denoiser!

LoTLIP: Improving Language-Image Pre-training for Long Text Understanding

26 September 2024·3183 words·15 mins· loading · loading

AI Generated Multimodal Learning Vision-Language Models 🏢 University of Science and Technology of China

LoTLIP boosts language-image pre-training for superior long text understanding by cleverly integrating corner tokens and utilizing a massive dataset of 100M long-caption images.

Improving Generalization of Dynamic Graph Learning via Environment Prompt

26 September 2024·2632 words·13 mins· loading · loading

AI Applications Smart Cities 🏢 University of Science and Technology of China

EpoD, a novel dynamic graph learning model, significantly improves generalization via a self-prompted learning mechanism for environment inference and a structural causal model utilizing dynamic subgr…

How Control Information Influences Multilingual Text Image Generation and Editing?

26 September 2024·2075 words·10 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 University of Science and Technology of China

TextGen enhances multilingual visual text generation and editing by optimizing control information using Fourier analysis and a two-stage framework, achieving state-of-the-art results.

Homology Consistency Constrained Efficient Tuning for Vision-Language Models

26 September 2024·1675 words·8 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 University of Science and Technology of China

Constraining vision-language model tuning via persistent homology ensures consistent image-text alignment, improving few-shot learning and domain generalization.

Get Rid of Isolation: A Continuous Multi-task Spatio-Temporal Learning Framework

26 September 2024·2117 words·10 mins· loading · loading

AI Applications Smart Cities 🏢 University of Science and Technology of China

CMuST: a novel continuous multi-task spatiotemporal learning framework tackles urban data limitations by enabling cross-interactions and task-level cooperation for enhanced generalization and adaptabi…

Generalization Error Bounds for Two-stage Recommender Systems with Tree Structure

26 September 2024·386 words·2 mins· loading · loading

AI Theory Generalization 🏢 University of Science and Technology of China

Two-stage recommender systems using tree structures achieve better generalization with more branches and harmonized training data distributions across stages.

FreqMark: Invisible Image Watermarking via Frequency Based Optimization in Latent Space

26 September 2024·3551 words·17 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 University of Science and Technology of China

FreqMark: Robust invisible image watermarking via latent frequency space optimization, resisting regeneration attacks and achieving >90% bit accuracy with high image quality.

Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting

26 September 2024·1899 words·9 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 University of Science and Technology of China

Frolic: A label-free framework boosts zero-shot vision model accuracy by learning prompt distributions and correcting label bias, achieving state-of-the-art performance across multiple datasets.