โ†“Skip to main content

๐Ÿข University of Science and Technology of China

VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
ยท2766 wordsยท13 minsยท loading ยท loading
AI Generated Multimodal Learning Vision-Language Models ๐Ÿข University of Science and Technology of China
VIDEOLLM-MOD boosts online video-language model efficiency by selectively skipping redundant vision token computations, achieving ~42% faster training and ~30% memory savings without sacrificing perfoโ€ฆ
Uncertainty-based Offline Variational Bayesian Reinforcement Learning for Robustness under Diverse Data Corruptions
ยท2152 wordsยท11 minsยท loading ยท loading
Machine Learning Reinforcement Learning ๐Ÿข University of Science and Technology of China
TRACER, a novel robust offline RL algorithm, uses Bayesian inference to handle uncertainty from diverse data corruptions, significantly outperforming existing methods.
Towards Next-Generation Logic Synthesis: A Scalable Neural Circuit Generation Framework
ยท2522 wordsยท12 minsยท loading ยท loading
AI Applications Hardware Design ๐Ÿข University of Science and Technology of China
A novel regularized triangle-shaped neural network framework, T-Net, achieves highly accurate and scalable logic circuit generation, significantly outperforming existing methods.
Towards Neuron Attributions in Multi-Modal Large Language Models
ยท1551 wordsยท8 minsยท loading ยท loading
Natural Language Processing Large Language Models ๐Ÿข University of Science and Technology of China
NAM: a novel neuron attribution method for MLLMs, revealing modality-specific semantic knowledge and enabling multi-modal knowledge editing.
Towards Accurate and Fair Cognitive Diagnosis via Monotonic Data Augmentation
ยท2089 wordsยท10 minsยท loading ยท loading
AI Applications Education ๐Ÿข University of Science and Technology of China
CMCD framework tackles data sparsity in cognitive diagnosis by using monotonic data augmentation to improve accuracy and fairness of diagnostic results.
Toward Dynamic Non-Line-of-Sight Imaging with Mamba Enforced Temporal Consistency
ยท2152 wordsยท11 minsยท loading ยท loading
Computer Vision 3D Vision ๐Ÿข University of Science and Technology of China
Dynamic NLOS imaging gets a speed boost! New ST-Mamba method leverages temporal consistency across frames for high-resolution video reconstruction, overcoming speed limitations of traditional methods.
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
ยท2422 wordsยท12 minsยท loading ยท loading
Multimodal Learning Vision-Language Models ๐Ÿข University of Science and Technology of China
TabPedia: a novel large vision-language model, achieves superior visual table understanding by seamlessly integrating diverse tasks via a concept synergy mechanism and a new benchmark.
SocraticLM: Exploring Socratic Personalized Teaching with Large Language Models
ยท1983 wordsยท10 minsยท loading ยท loading
AI Applications Education ๐Ÿข University of Science and Technology of China
SocraticLM achieves a Socratic teaching paradigm, surpassing GPT-4 by 12%, through a novel multi-agent training pipeline and a comprehensive evaluation system.
Neural Krylov Iteration for Accelerating Linear System Solving
ยท2149 wordsยท11 minsยท loading ยท loading
๐Ÿข University of Science and Technology of China
Neural Krylov Iteration (NeurKItt) accelerates linear system solving by using a neural operator to predict invariant subspaces, drastically reducing iteration counts and computation time.
MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting
ยท3095 wordsยท15 minsยท loading ยท loading
AI Generated Computer Vision 3D Vision ๐Ÿข University of Science and Technology of China
MotionGS enhances deformable 3D Gaussian splatting for dynamic scenes by using motion flow to guide deformation, significantly improving reconstruction accuracy and outperforming state-of-the-art methโ€ฆ
MILP-StuDio: MILP Instance Generation via Block Structure Decomposition
ยท3596 wordsยท17 minsยท loading ยท loading
AI Theory Optimization ๐Ÿข University of Science and Technology of China
MILP-StuDio generates high-quality mixed-integer linear programming instances by preserving crucial block structures, significantly improving learning-based solver performance.
Masked Pre-training Enables Universal Zero-shot Denoiser
ยท4914 wordsยท24 minsยท loading ยท loading
Computer Vision Image Generation ๐Ÿข University of Science and Technology of China
Masked Pre-training empowers a universal, fast zero-shot image denoiser!
LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
ยท3183 wordsยท15 minsยท loading ยท loading
AI Generated Multimodal Learning Vision-Language Models ๐Ÿข University of Science and Technology of China
LoTLIP boosts language-image pre-training for superior long text understanding by cleverly integrating corner tokens and utilizing a massive dataset of 100M long-caption images.
Improving Generalization of Dynamic Graph Learning via Environment Prompt
ยท2632 wordsยท13 minsยท loading ยท loading
AI Applications Smart Cities ๐Ÿข University of Science and Technology of China
EpoD, a novel dynamic graph learning model, significantly improves generalization via a self-prompted learning mechanism for environment inference and a structural causal model utilizing dynamic subgrโ€ฆ
How Control Information Influences Multilingual Text Image Generation and Editing?
ยท2075 wordsยท10 minsยท loading ยท loading
Multimodal Learning Vision-Language Models ๐Ÿข University of Science and Technology of China
TextGen enhances multilingual visual text generation and editing by optimizing control information using Fourier analysis and a two-stage framework, achieving state-of-the-art results.
Homology Consistency Constrained Efficient Tuning for Vision-Language Models
ยท1675 wordsยท8 minsยท loading ยท loading
Multimodal Learning Vision-Language Models ๐Ÿข University of Science and Technology of China
Constraining vision-language model tuning via persistent homology ensures consistent image-text alignment, improving few-shot learning and domain generalization.
Get Rid of Isolation: A Continuous Multi-task Spatio-Temporal Learning Framework
ยท2117 wordsยท10 minsยท loading ยท loading
AI Applications Smart Cities ๐Ÿข University of Science and Technology of China
CMuST: a novel continuous multi-task spatiotemporal learning framework tackles urban data limitations by enabling cross-interactions and task-level cooperation for enhanced generalization and adaptabiโ€ฆ
Generalization Error Bounds for Two-stage Recommender Systems with Tree Structure
ยท386 wordsยท2 minsยท loading ยท loading
AI Theory Generalization ๐Ÿข University of Science and Technology of China
Two-stage recommender systems using tree structures achieve better generalization with more branches and harmonized training data distributions across stages.
FreqMark: Invisible Image Watermarking via Frequency Based Optimization in Latent Space
ยท3551 wordsยท17 minsยท loading ยท loading
AI Generated Computer Vision Image Generation ๐Ÿข University of Science and Technology of China
FreqMark: Robust invisible image watermarking via latent frequency space optimization, resisting regeneration attacks and achieving >90% bit accuracy with high image quality.
Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting
ยท1899 wordsยท9 minsยท loading ยท loading
Multimodal Learning Vision-Language Models ๐Ÿข University of Science and Technology of China
Frolic: A label-free framework boosts zero-shot vision model accuracy by learning prompt distributions and correcting label bias, achieving state-of-the-art performance across multiple datasets.