🏢 Zhejiang University

On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions

26 September 2024·308 words·2 mins· loading · loading

AI Theory Optimization 🏢 Zhejiang University

Adam optimizer achieves near-optimal convergence in non-convex scenarios with unbounded gradients and relaxed noise assumptions, improving its theoretical understanding and practical application.

MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence

26 September 2024·2462 words·12 mins· loading · loading

Multimodal Learning Multimodal Generation 🏢 Zhejiang University

MoMu-Diffusion: a novel framework that learns long-term motion-music synchronization, generating realistic and beat-matched sequences surpassing existing methods.

Model LEGO: Creating Models Like Disassembling and Assembling Building Blocks

26 September 2024·2425 words·12 mins· loading · loading

Machine Learning Deep Learning 🏢 Zhejiang University

Model LEGO (MDA) revolutionizes deep learning by enabling the creation of new models by assembling and disassembling task-aware components from pre-trained models, eliminating the need for retraining.

MKGL: Mastery of a Three-Word Language

26 September 2024·2110 words·10 mins· loading · loading

Large Language Models 🏢 Zhejiang University

Researchers taught a large language model (LLM) a three-word ‘Knowledge Graph Language’ (KGL) to improve knowledge graph (KG) completion, drastically reducing errors compared to other methods.

MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes

26 September 2024·1847 words·9 mins· loading · loading

Computer Vision Image Generation 🏢 Zhejiang University

MimicTalk generates realistic, expressive talking videos in minutes using a pre-trained model adapted to individual identities.

MaskFactory: Towards High-quality Synthetic Data Generation for Dichotomous Image Segmentation

26 September 2024·1960 words·10 mins· loading · loading

Computer Vision Image Segmentation 🏢 Zhejiang University

MaskFactory generates high-quality synthetic data for dichotomous image segmentation, improving model training efficiency and accuracy.

Locating What You Need: Towards Adapting Diffusion Models to OOD Concepts In-the-Wild

26 September 2024·3829 words·18 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Zhejiang University

CATOD framework improves text-to-image generation by actively learning high-quality training data to accurately depict out-of-distribution concepts.

LG-CAV: Train Any Concept Activation Vector with Language Guidance

26 September 2024·3860 words·19 mins· loading · loading

AI Generated Computer Vision Vision-Language Models 🏢 Zhejiang University

LG-CAV: Train any Concept Activation Vector with Language Guidance, leverages vision-language models to train CAVs without labeled data, achieving superior accuracy and enabling state-of-the-art model…

Learning-Augmented Algorithms for the Bahncard Problem

26 September 2024·3280 words·16 mins· loading · loading

AI Theory Optimization 🏢 Zhejiang University

PFSUM, a novel learning-augmented algorithm, leverages short-term predictions to achieve superior performance in solving the Bahncard problem, outperforming existing methods with improved consistency …

Learning Complete Protein Representation by Dynamically Coupling of Sequence and Structure

26 September 2024·2792 words·14 mins· loading · loading

AI Generated Natural Language Processing Representation Learning 🏢 Zhejiang University

CoupleNet dynamically links protein sequences and structures for improved representations, surpassing state-of-the-art methods in function prediction, particularly for uncommon proteins.

Knowledge Circuits in Pretrained Transformers

26 September 2024·3083 words·15 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Zhejiang University

Researchers unveil ‘knowledge circuits’ within LLMs, revealing how knowledge is collaboratively encoded and utilized, leading to improved LLM design and interpretations of model behavior.

Information Re-Organization Improves Reasoning in Large Language Models

26 September 2024·2018 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Zhejiang University

InfoRE: A novel method improving large language models’ reasoning by reorganizing information to highlight logical relationships, resulting in a 4% average accuracy boost across various tasks.

Improved Regret for Bandit Convex Optimization with Delayed Feedback

26 September 2024·324 words·2 mins· loading · loading

AI Theory Optimization 🏢 Zhejiang University

A novel algorithm, D-FTBL, achieves improved regret bounds for bandit convex optimization with delayed feedback, tightly matching existing lower bounds in worst-case scenarios.

Harmonizing Stochasticity and Determinism: Scene-responsive Diverse Human Motion Prediction

26 September 2024·2828 words·14 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Zhejiang University

DiMoP3D: Predicting diverse, physically realistic human motions in 3D scenes by harmonizing stochasticity and determinism.

Graph Diffusion Policy Optimization

26 September 2024·2821 words·14 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 Zhejiang University

GDPO: A novel method optimizes graph diffusion models for any objective using reinforcement learning, achieving state-of-the-art performance in diverse graph generation tasks.

Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching

26 September 2024·2541 words·12 mins· loading · loading

AI Generated Multimodal Learning Audio-Visual Learning 🏢 Zhejiang University

FRIEREN: a novel video-to-audio generation network using rectified flow matching achieves state-of-the-art performance by improving audio quality, temporal alignment, and generation efficiency.

FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention

26 September 2024·1815 words·9 mins· loading · loading

Computer Vision Video Understanding 🏢 Zhejiang University

FreeLong: Generate high-fidelity long videos without retraining using spectral blending of global and local video features!

FOOGD: Federated Collaboration for Both Out-of-distribution Generalization and Detection

26 September 2024·3370 words·16 mins· loading · loading

Machine Learning Federated Learning 🏢 Zhejiang University

FOOGD: A novel federated learning framework that simultaneously tackles out-of-distribution generalization and detection by estimating probability density for reliable global distribution guidance.

FashionR2R: Texture-preserving Rendered-to-Real Image Translation with Diffusion Models

26 September 2024·2387 words·12 mins· loading · loading

Computer Vision Image Generation 🏢 Zhejiang University

FashionR2R leverages diffusion models to realistically translate rendered fashion images into photorealistic counterparts, enhancing realism and preserving fine-grained clothing textures.

Extracting Training Data from Molecular Pre-trained Models

26 September 2024·2322 words·11 mins· loading · loading

AI Generated AI Theory Privacy 🏢 Zhejiang University

Researchers reveal a high risk of training data extraction from molecular pre-trained models, challenging the assumption that model sharing alone adequately protects against data theft.