Skip to main content

🏢 Zhejiang University

ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
·2672 words·13 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Zhejiang University
ZipCache: Efficient KV cache quantization for LLMs using salient token identification, achieving 4.98x compression with minimal accuracy loss!
WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models
·3638 words·18 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Zhejiang University
WISE, a novel dual-memory architecture, solves the impossible triangle of reliability, generalization, and locality in lifelong LLM editing by employing a side memory for knowledge updates and a route…
Vision-Language Navigation with Energy-Based Policy
·1855 words·9 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 Zhejiang University
Energy-based Navigation Policy (ENP) revolutionizes Vision-Language Navigation by modeling joint state-action distributions, achieving superior performance across diverse benchmarks.
Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation
·2279 words·11 mins· loading · loading
Computer Vision Image Segmentation 🏢 Zhejiang University
DiffewS: a novel framework leverages diffusion models for few-shot semantic segmentation, significantly outperforming existing methods in multiple settings.
UniIF: Unified Molecule Inverse Folding
·2175 words·11 mins· loading · loading
AI Generated Machine Learning Deep Learning 🏢 Zhejiang University
UniIF: A unified model revolutionizes molecule inverse folding, achieving state-of-the-art results across protein, RNA, and material design by employing a novel geometric block attention network.
Unified Generative and Discriminative Training for Multi-modal Large Language Models
·3972 words·19 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 Zhejiang University
Unified generative-discriminative training boosts multimodal large language models (MLLMs)! Sugar, a novel approach, leverages dynamic sequence alignment and a triple kernel to enhance global and fin…
Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration
·1888 words·9 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 Zhejiang University
UniKE: A unified multimodal editing method achieves superior reliability, generality, and locality by disentangling knowledge into semantic and truthfulness spaces, enabling enhanced collaboration bet…
TopoFR: A Closer Look at Topology Alignment on Face Recognition
·2430 words·12 mins· loading · loading
Computer Vision Face Recognition 🏢 Zhejiang University
TopoFR enhances face recognition by aligning topological structures between input and latent spaces. Using persistent homology, it preserves crucial data structure info, overcoming overfitting. A har…
TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment
·4872 words·23 mins· loading · loading
Large Language Models 🏢 Zhejiang University
TOPA: Extending LLMs for video understanding using only text data.
TFGDA: Exploring Topology and Feature Alignment in Semi-supervised Graph Domain Adaptation through Robust Clustering
·1822 words·9 mins· loading · loading
Machine Learning Transfer Learning 🏢 Zhejiang University
TFGDA: Leveraging graph topology and feature alignment for superior semi-supervised domain adaptation.
Spec-Gaussian: Anisotropic View-Dependent Appearance for 3D Gaussian Splatting
·3727 words·18 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 Zhejiang University
Spec-Gaussian enhances 3D Gaussian splatting by using anisotropic spherical Gaussians for view-dependent appearance modeling, achieving superior real-time rendering of scenes with specular and anisotr…
Solving Zero-Sum Markov Games with Continous State via Spectral Dynamic Embedding
·391 words·2 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Zhejiang University
SDEPO, a new natural policy gradient algorithm, efficiently solves zero-sum Markov games with continuous state spaces, achieving near-optimal convergence independent of state space cardinality.
Simple and Fast Distillation of Diffusion Models
·3151 words·15 mins· loading · loading
Computer Vision Image Generation 🏢 Zhejiang University
Simple and Fast Distillation (SFD) drastically accelerates diffusion model training by 1000x, achieving state-of-the-art results in few-step image generation with minimal fine-tuning.
Scene Graph Generation with Role-Playing Large Language Models
·2597 words·13 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 Zhejiang University
SDSGG outperforms leading scene graph generation methods by using LLMs to create scene-specific descriptions, adapting to diverse visual relations.
Rethinking the Diffusion Models for Missing Data Imputation: A Gradient Flow Perspective
·3317 words·16 mins· loading · loading
Machine Learning Unsupervised Learning 🏢 Zhejiang University
NewImp boosts diffusion models’ missing data imputation by curbing sample diversity and eliminating data masking, achieving superior accuracy.
PSL: Rethinking and Improving Softmax Loss from Pairwise Perspective for Recommendation
·2716 words·13 mins· loading · loading
AI Applications Recommendation Systems 🏢 Zhejiang University
Pairwise Softmax Loss (PSL) improves recommendation accuracy by enhancing Softmax Loss (SL) with alternative activation functions, resulting in tighter ranking metric surrogates and better noise resis…
PowerPM: Foundation Model for Power Systems
·2167 words·11 mins· loading · loading
AI Applications Smart Cities 🏢 Zhejiang University
PowerPM: A foundation model revolutionizing power system analysis by mastering complex ETS data through a novel self-supervised pre-training approach, achieving state-of-the-art performance.
PointAD: Comprehending 3D Anomalies from Points and Pixels for Zero-shot 3D Anomaly Detection
·5033 words·24 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 Zhejiang University
PointAD: a novel zero-shot 3D anomaly detection method using CLIP’s strong generalization abilities to identify anomalies in unseen objects by transferring knowledge from both points and pixels.
PhyloGen: Language Model-Enhanced Phylogenetic Inference via Graph Structure Generation
·3549 words·17 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Zhejiang University
PhyloGen uses a genomic language model to generate and optimize phylogenetic trees, offering faster and more accurate evolutionary analysis than traditional methods.
Parallelizing Model-based Reinforcement Learning Over the Sequence Length
·2553 words·12 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Zhejiang University
PaMoRL framework boosts model-based reinforcement learning speed by parallelizing model and policy learning stages over sequence length, maintaining high sample efficiency.