๐ข University of Science and Technology of China
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
ยท2766 wordsยท13 minsยท
loading
ยท
loading
AI Generated
Multimodal Learning
Vision-Language Models
๐ข University of Science and Technology of China
VIDEOLLM-MOD boosts online video-language model efficiency by selectively skipping redundant vision token computations, achieving ~42% faster training and ~30% memory savings without sacrificing perfoโฆ
Uncertainty-based Offline Variational Bayesian Reinforcement Learning for Robustness under Diverse Data Corruptions
ยท2152 wordsยท11 minsยท
loading
ยท
loading
Machine Learning
Reinforcement Learning
๐ข University of Science and Technology of China
TRACER, a novel robust offline RL algorithm, uses Bayesian inference to handle uncertainty from diverse data corruptions, significantly outperforming existing methods.
Towards Next-Generation Logic Synthesis: A Scalable Neural Circuit Generation Framework
ยท2522 wordsยท12 minsยท
loading
ยท
loading
AI Applications
Hardware Design
๐ข University of Science and Technology of China
A novel regularized triangle-shaped neural network framework, T-Net, achieves highly accurate and scalable logic circuit generation, significantly outperforming existing methods.
Towards Neuron Attributions in Multi-Modal Large Language Models
ยท1551 wordsยท8 minsยท
loading
ยท
loading
Natural Language Processing
Large Language Models
๐ข University of Science and Technology of China
NAM: a novel neuron attribution method for MLLMs, revealing modality-specific semantic knowledge and enabling multi-modal knowledge editing.
Towards Accurate and Fair Cognitive Diagnosis via Monotonic Data Augmentation
ยท2089 wordsยท10 minsยท
loading
ยท
loading
AI Applications
Education
๐ข University of Science and Technology of China
CMCD framework tackles data sparsity in cognitive diagnosis by using monotonic data augmentation to improve accuracy and fairness of diagnostic results.
Toward Dynamic Non-Line-of-Sight Imaging with Mamba Enforced Temporal Consistency
ยท2152 wordsยท11 minsยท
loading
ยท
loading
Computer Vision
3D Vision
๐ข University of Science and Technology of China
Dynamic NLOS imaging gets a speed boost! New ST-Mamba method leverages temporal consistency across frames for high-resolution video reconstruction, overcoming speed limitations of traditional methods.
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
ยท2422 wordsยท12 minsยท
loading
ยท
loading
Multimodal Learning
Vision-Language Models
๐ข University of Science and Technology of China
TabPedia: a novel large vision-language model, achieves superior visual table understanding by seamlessly integrating diverse tasks via a concept synergy mechanism and a new benchmark.
SocraticLM: Exploring Socratic Personalized Teaching with Large Language Models
ยท1983 wordsยท10 minsยท
loading
ยท
loading
AI Applications
Education
๐ข University of Science and Technology of China
SocraticLM achieves a Socratic teaching paradigm, surpassing GPT-4 by 12%, through a novel multi-agent training pipeline and a comprehensive evaluation system.
Neural Krylov Iteration for Accelerating Linear System Solving
ยท2149 wordsยท11 minsยท
loading
ยท
loading
๐ข University of Science and Technology of China
Neural Krylov Iteration (NeurKItt) accelerates linear system solving by using a neural operator to predict invariant subspaces, drastically reducing iteration counts and computation time.
MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting
ยท3095 wordsยท15 minsยท
loading
ยท
loading
AI Generated
Computer Vision
3D Vision
๐ข University of Science and Technology of China
MotionGS enhances deformable 3D Gaussian splatting for dynamic scenes by using motion flow to guide deformation, significantly improving reconstruction accuracy and outperforming state-of-the-art methโฆ
MILP-StuDio: MILP Instance Generation via Block Structure Decomposition
ยท3596 wordsยท17 minsยท
loading
ยท
loading
AI Theory
Optimization
๐ข University of Science and Technology of China
MILP-StuDio generates high-quality mixed-integer linear programming instances by preserving crucial block structures, significantly improving learning-based solver performance.
Masked Pre-training Enables Universal Zero-shot Denoiser
ยท4914 wordsยท24 minsยท
loading
ยท
loading
Computer Vision
Image Generation
๐ข University of Science and Technology of China
Masked Pre-training empowers a universal, fast zero-shot image denoiser!
LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
ยท3183 wordsยท15 minsยท
loading
ยท
loading
AI Generated
Multimodal Learning
Vision-Language Models
๐ข University of Science and Technology of China
LoTLIP boosts language-image pre-training for superior long text understanding by cleverly integrating corner tokens and utilizing a massive dataset of 100M long-caption images.
Improving Generalization of Dynamic Graph Learning via Environment Prompt
ยท2632 wordsยท13 minsยท
loading
ยท
loading
AI Applications
Smart Cities
๐ข University of Science and Technology of China
EpoD, a novel dynamic graph learning model, significantly improves generalization via a self-prompted learning mechanism for environment inference and a structural causal model utilizing dynamic subgrโฆ
How Control Information Influences Multilingual Text Image Generation and Editing?
ยท2075 wordsยท10 minsยท
loading
ยท
loading
Multimodal Learning
Vision-Language Models
๐ข University of Science and Technology of China
TextGen enhances multilingual visual text generation and editing by optimizing control information using Fourier analysis and a two-stage framework, achieving state-of-the-art results.
Homology Consistency Constrained Efficient Tuning for Vision-Language Models
ยท1675 wordsยท8 minsยท
loading
ยท
loading
Multimodal Learning
Vision-Language Models
๐ข University of Science and Technology of China
Constraining vision-language model tuning via persistent homology ensures consistent image-text alignment, improving few-shot learning and domain generalization.
Get Rid of Isolation: A Continuous Multi-task Spatio-Temporal Learning Framework
ยท2117 wordsยท10 minsยท
loading
ยท
loading
AI Applications
Smart Cities
๐ข University of Science and Technology of China
CMuST: a novel continuous multi-task spatiotemporal learning framework tackles urban data limitations by enabling cross-interactions and task-level cooperation for enhanced generalization and adaptabiโฆ
Generalization Error Bounds for Two-stage Recommender Systems with Tree Structure
ยท386 wordsยท2 minsยท
loading
ยท
loading
AI Theory
Generalization
๐ข University of Science and Technology of China
Two-stage recommender systems using tree structures achieve better generalization with more branches and harmonized training data distributions across stages.
FreqMark: Invisible Image Watermarking via Frequency Based Optimization in Latent Space
ยท3551 wordsยท17 minsยท
loading
ยท
loading
AI Generated
Computer Vision
Image Generation
๐ข University of Science and Technology of China
FreqMark: Robust invisible image watermarking via latent frequency space optimization, resisting regeneration attacks and achieving >90% bit accuracy with high image quality.
Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting
ยท1899 wordsยท9 minsยท
loading
ยท
loading
Multimodal Learning
Vision-Language Models
๐ข University of Science and Technology of China
Frolic: A label-free framework boosts zero-shot vision model accuracy by learning prompt distributions and correcting label bias, achieving state-of-the-art performance across multiple datasets.