🏢 Zhejiang University

ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification

26 September 2024·2672 words·13 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Zhejiang University

ZipCache: Efficient KV cache quantization for LLMs using salient token identification, achieving 4.98x compression with minimal accuracy loss!

WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models

26 September 2024·3638 words·18 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Zhejiang University

WISE, a novel dual-memory architecture, solves the impossible triangle of reliability, generalization, and locality in lifelong LLM editing by employing a side memory for knowledge updates and a route…

Vision-Language Navigation with Energy-Based Policy

26 September 2024·1855 words·9 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Zhejiang University

Energy-based Navigation Policy (ENP) revolutionizes Vision-Language Navigation by modeling joint state-action distributions, achieving superior performance across diverse benchmarks.

Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation

26 September 2024·2279 words·11 mins· loading · loading

Computer Vision Image Segmentation 🏢 Zhejiang University

DiffewS: a novel framework leverages diffusion models for few-shot semantic segmentation, significantly outperforming existing methods in multiple settings.

UniIF: Unified Molecule Inverse Folding

26 September 2024·2175 words·11 mins· loading · loading

AI Generated Machine Learning Deep Learning 🏢 Zhejiang University

UniIF: A unified model revolutionizes molecule inverse folding, achieving state-of-the-art results across protein, RNA, and material design by employing a novel geometric block attention network.

Unified Generative and Discriminative Training for Multi-modal Large Language Models

26 September 2024·3972 words·19 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Zhejiang University

Unified generative-discriminative training boosts multimodal large language models (MLLMs)! Sugar, a novel approach, leverages dynamic sequence alignment and a triple kernel to enhance global and fin…

Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration

26 September 2024·1888 words·9 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Zhejiang University

UniKE: A unified multimodal editing method achieves superior reliability, generality, and locality by disentangling knowledge into semantic and truthfulness spaces, enabling enhanced collaboration bet…

TopoFR: A Closer Look at Topology Alignment on Face Recognition

26 September 2024·2430 words·12 mins· loading · loading

Computer Vision Face Recognition 🏢 Zhejiang University

TopoFR enhances face recognition by aligning topological structures between input and latent spaces. Using persistent homology, it preserves crucial data structure info, overcoming overfitting. A har…

TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment

26 September 2024·4872 words·23 mins· loading · loading

Large Language Models 🏢 Zhejiang University

TOPA: Extending LLMs for video understanding using only text data.

TFGDA: Exploring Topology and Feature Alignment in Semi-supervised Graph Domain Adaptation through Robust Clustering

26 September 2024·1822 words·9 mins· loading · loading

Machine Learning Transfer Learning 🏢 Zhejiang University

TFGDA: Leveraging graph topology and feature alignment for superior semi-supervised domain adaptation.

Spec-Gaussian: Anisotropic View-Dependent Appearance for 3D Gaussian Splatting

26 September 2024·3727 words·18 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Zhejiang University

Spec-Gaussian enhances 3D Gaussian splatting by using anisotropic spherical Gaussians for view-dependent appearance modeling, achieving superior real-time rendering of scenes with specular and anisotr…

Solving Zero-Sum Markov Games with Continous State via Spectral Dynamic Embedding

26 September 2024·391 words·2 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Zhejiang University

SDEPO, a new natural policy gradient algorithm, efficiently solves zero-sum Markov games with continuous state spaces, achieving near-optimal convergence independent of state space cardinality.

Simple and Fast Distillation of Diffusion Models

26 September 2024·3151 words·15 mins· loading · loading

Computer Vision Image Generation 🏢 Zhejiang University

Simple and Fast Distillation (SFD) drastically accelerates diffusion model training by 1000x, achieving state-of-the-art results in few-step image generation with minimal fine-tuning.

Scene Graph Generation with Role-Playing Large Language Models

26 September 2024·2597 words·13 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Zhejiang University

SDSGG outperforms leading scene graph generation methods by using LLMs to create scene-specific descriptions, adapting to diverse visual relations.

Rethinking the Diffusion Models for Missing Data Imputation: A Gradient Flow Perspective

26 September 2024·3317 words·16 mins· loading · loading

Machine Learning Unsupervised Learning 🏢 Zhejiang University

NewImp boosts diffusion models’ missing data imputation by curbing sample diversity and eliminating data masking, achieving superior accuracy.

PSL: Rethinking and Improving Softmax Loss from Pairwise Perspective for Recommendation

26 September 2024·2716 words·13 mins· loading · loading

AI Applications Recommendation Systems 🏢 Zhejiang University

Pairwise Softmax Loss (PSL) improves recommendation accuracy by enhancing Softmax Loss (SL) with alternative activation functions, resulting in tighter ranking metric surrogates and better noise resis…

PowerPM: Foundation Model for Power Systems

26 September 2024·2167 words·11 mins· loading · loading

AI Applications Smart Cities 🏢 Zhejiang University

PowerPM: A foundation model revolutionizing power system analysis by mastering complex ETS data through a novel self-supervised pre-training approach, achieving state-of-the-art performance.

PointAD: Comprehending 3D Anomalies from Points and Pixels for Zero-shot 3D Anomaly Detection

26 September 2024·5033 words·24 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Zhejiang University

PointAD: a novel zero-shot 3D anomaly detection method using CLIP’s strong generalization abilities to identify anomalies in unseen objects by transferring knowledge from both points and pixels.

PhyloGen: Language Model-Enhanced Phylogenetic Inference via Graph Structure Generation

26 September 2024·3549 words·17 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Zhejiang University

PhyloGen uses a genomic language model to generate and optimize phylogenetic trees, offering faster and more accurate evolutionary analysis than traditional methods.

Parallelizing Model-based Reinforcement Learning Over the Sequence Length

26 September 2024·2553 words·12 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Zhejiang University

PaMoRL framework boosts model-based reinforcement learning speed by parallelizing model and policy learning stages over sequence length, maintaining high sample efficiency.