Skip to main content

🏢 University of Science and Technology of China

VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
·2766 words·13 mins· loading · loading
AI Generated Multimodal Learning Vision-Language Models 🏢 University of Science and Technology of China
VIDEOLLM-MOD boosts online video-language model efficiency by selectively skipping redundant vision token computations, achieving ~42% faster training and ~30% memory savings without sacrificing perfo…
Uncertainty-based Offline Variational Bayesian Reinforcement Learning for Robustness under Diverse Data Corruptions
·2152 words·11 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 University of Science and Technology of China
TRACER, a novel robust offline RL algorithm, uses Bayesian inference to handle uncertainty from diverse data corruptions, significantly outperforming existing methods.
Towards Next-Generation Logic Synthesis: A Scalable Neural Circuit Generation Framework
·2522 words·12 mins· loading · loading
AI Applications Hardware Design 🏢 University of Science and Technology of China
A novel regularized triangle-shaped neural network framework, T-Net, achieves highly accurate and scalable logic circuit generation, significantly outperforming existing methods.
Towards Neuron Attributions in Multi-Modal Large Language Models
·1551 words·8 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Science and Technology of China
NAM: a novel neuron attribution method for MLLMs, revealing modality-specific semantic knowledge and enabling multi-modal knowledge editing.
Towards Accurate and Fair Cognitive Diagnosis via Monotonic Data Augmentation
·2089 words·10 mins· loading · loading
AI Applications Education 🏢 University of Science and Technology of China
CMCD framework tackles data sparsity in cognitive diagnosis by using monotonic data augmentation to improve accuracy and fairness of diagnostic results.
Toward Dynamic Non-Line-of-Sight Imaging with Mamba Enforced Temporal Consistency
·2152 words·11 mins· loading · loading
Computer Vision 3D Vision 🏢 University of Science and Technology of China
Dynamic NLOS imaging gets a speed boost! New ST-Mamba method leverages temporal consistency across frames for high-resolution video reconstruction, overcoming speed limitations of traditional methods.
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
·2422 words·12 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 University of Science and Technology of China
TabPedia: a novel large vision-language model, achieves superior visual table understanding by seamlessly integrating diverse tasks via a concept synergy mechanism and a new benchmark.
SocraticLM: Exploring Socratic Personalized Teaching with Large Language Models
·1983 words·10 mins· loading · loading
AI Applications Education 🏢 University of Science and Technology of China
SocraticLM achieves a Socratic teaching paradigm, surpassing GPT-4 by 12%, through a novel multi-agent training pipeline and a comprehensive evaluation system.
Neural Krylov Iteration for Accelerating Linear System Solving
·2149 words·11 mins· loading · loading
🏢 University of Science and Technology of China
Neural Krylov Iteration (NeurKItt) accelerates linear system solving by using a neural operator to predict invariant subspaces, drastically reducing iteration counts and computation time.
MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting
·3095 words·15 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 University of Science and Technology of China
MotionGS enhances deformable 3D Gaussian splatting for dynamic scenes by using motion flow to guide deformation, significantly improving reconstruction accuracy and outperforming state-of-the-art meth…
MILP-StuDio: MILP Instance Generation via Block Structure Decomposition
·3596 words·17 mins· loading · loading
AI Theory Optimization 🏢 University of Science and Technology of China
MILP-StuDio generates high-quality mixed-integer linear programming instances by preserving crucial block structures, significantly improving learning-based solver performance.
Masked Pre-training Enables Universal Zero-shot Denoiser
·4914 words·24 mins· loading · loading
Computer Vision Image Generation 🏢 University of Science and Technology of China
Masked Pre-training empowers a universal, fast zero-shot image denoiser!
LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
·3183 words·15 mins· loading · loading
AI Generated Multimodal Learning Vision-Language Models 🏢 University of Science and Technology of China
LoTLIP boosts language-image pre-training for superior long text understanding by cleverly integrating corner tokens and utilizing a massive dataset of 100M long-caption images.
Improving Generalization of Dynamic Graph Learning via Environment Prompt
·2632 words·13 mins· loading · loading
AI Applications Smart Cities 🏢 University of Science and Technology of China
EpoD, a novel dynamic graph learning model, significantly improves generalization via a self-prompted learning mechanism for environment inference and a structural causal model utilizing dynamic subgr…
How Control Information Influences Multilingual Text Image Generation and Editing?
·2075 words·10 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 University of Science and Technology of China
TextGen enhances multilingual visual text generation and editing by optimizing control information using Fourier analysis and a two-stage framework, achieving state-of-the-art results.
Homology Consistency Constrained Efficient Tuning for Vision-Language Models
·1675 words·8 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 University of Science and Technology of China
Constraining vision-language model tuning via persistent homology ensures consistent image-text alignment, improving few-shot learning and domain generalization.
Get Rid of Isolation: A Continuous Multi-task Spatio-Temporal Learning Framework
·2117 words·10 mins· loading · loading
AI Applications Smart Cities 🏢 University of Science and Technology of China
CMuST: a novel continuous multi-task spatiotemporal learning framework tackles urban data limitations by enabling cross-interactions and task-level cooperation for enhanced generalization and adaptabi…
Generalization Error Bounds for Two-stage Recommender Systems with Tree Structure
·386 words·2 mins· loading · loading
AI Theory Generalization 🏢 University of Science and Technology of China
Two-stage recommender systems using tree structures achieve better generalization with more branches and harmonized training data distributions across stages.
FreqMark: Invisible Image Watermarking via Frequency Based Optimization in Latent Space
·3551 words·17 mins· loading · loading
AI Generated Computer Vision Image Generation 🏢 University of Science and Technology of China
FreqMark: Robust invisible image watermarking via latent frequency space optimization, resisting regeneration attacks and achieving >90% bit accuracy with high image quality.
Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting
·1899 words·9 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 University of Science and Technology of China
Frolic: A label-free framework boosts zero-shot vision model accuracy by learning prompt distributions and correcting label bias, achieving state-of-the-art performance across multiple datasets.