🏢 Zhejiang University

Extending Multi-modal Contrastive Representations

26 September 2024·2089 words·10 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Zhejiang University

Ex-MCR: Efficiently build unified multi-modal representations by extending, not connecting, pre-trained spaces, achieving superior performance with less paired data and training.

Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis

26 September 2024·2012 words·10 mins· loading · loading

Computer Vision 3D Vision 🏢 Zhejiang University

eFreeSplat: a novel, epipolar-free 3D Gaussian splatting model for generalizable novel view synthesis, surpassing state-of-the-art methods by achieving superior geometry reconstruction and novel view …

Enhancing Semi-Supervised Learning via Representative and Diverse Sample Selection

26 September 2024·1698 words·8 mins· loading · loading

Machine Learning Semi-Supervised Learning 🏢 Zhejiang University

RDSS: a novel sample selection method for semi-supervised learning, boosts model accuracy by minimizing a-MMD, striking a balance between sample representativeness and diversity.

Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control

26 September 2024·3239 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Zhejiang University

Boosting LLM trustworthiness, researchers introduce Sparse Activation Control, a training-free method that concurrently enhances safety, factuality, and bias mitigation by selectively controlling atte…

Enhancing LLM’s Cognition via Structurization

26 September 2024·3694 words·18 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Zhejiang University

LLMs struggle with complex, long-form text. This paper introduces ‘context structurization,’ transforming unstructured text into a structured format to enhance LLM comprehension. Experiments across …

Enhancing LLM Reasoning via Vision-Augmented Prompting

26 September 2024·2157 words·11 mins· loading · loading

Multimodal Learning Multimodal Reasoning 🏢 Zhejiang University

Vision-Augmented Prompting (VAP) boosts LLM reasoning by automatically generating images from textual problem descriptions, incorporating visual-spatial clues to significantly improve accuracy across …

Dual-Perspective Activation: Efficient Channel Denoising via Joint Forward-Backward Criterion for Artificial Neural Networks

26 September 2024·1941 words·10 mins· loading · loading

AI Theory Interpretability 🏢 Zhejiang University

Dual-Perspective Activation (DPA) efficiently denoises ANN channels by jointly using forward and backward propagation criteria, improving sparsity and accuracy.

DRIP: Unleashing Diffusion Priors for Joint Foreground and Alpha Prediction in Image Matting

26 September 2024·1974 words·10 mins· loading · loading

Computer Vision Image Segmentation 🏢 Zhejiang University

DRIP: A novel image matting method using pre-trained latent diffusion models achieves state-of-the-art performance by jointly predicting foreground and alpha values, significantly improving accuracy a…

DreamMesh4D: Video-to-4D Generation with Sparse-Controlled Gaussian-Mesh Hybrid Representation

26 September 2024·2631 words·13 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Zhejiang University

DreamMesh4D: Generating high-fidelity dynamic 3D meshes from monocular video using a novel Gaussian-mesh hybrid representation and adaptive hybrid skinning.

DMNet: Self-comparison Driven Model for Subject-independent Seizure Detection

26 September 2024·2151 words·11 mins· loading · loading

AI Applications Healthcare 🏢 Zhejiang University

DMNet: A novel self-comparison driven model significantly improves subject-independent seizure detection from intracranial EEG, outperforming existing methods.

DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion

26 September 2024·2444 words·12 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Zhejiang University

DiffPano generates scalable, consistent, and diverse panoramic images from text descriptions and camera poses using a novel spherical epipolar-aware diffusion model.

Delving into the Reversal Curse: How Far Can Large Language Models Generalize?

26 September 2024·3631 words·18 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Zhejiang University

Large language models struggle to generalize knowledge when facing seemingly simple reversals, a phenomenon termed the ‘reversal curse.’ This study reveals that this limitation is strongly linked to t…

DECRL: A Deep Evolutionary Clustering Jointed Temporal Knowledge Graph Representation Learning Approach

26 September 2024·2468 words·12 mins· loading · loading

AI Generated Machine Learning Representation Learning 🏢 Zhejiang University

DECRL: A novel deep learning approach for temporal knowledge graph representation learning, capturing high-order correlation evolution and outperforming existing methods.

DataStealing: Steal Data from Diffusion Models in Federated Learning with Multiple Trojans

26 September 2024·3940 words·19 mins· loading · loading

AI Generated Machine Learning Federated Learning 🏢 Zhejiang University

Attackers can steal massive private data from federated learning diffusion models using multiple Trojans and an advanced attack, AdaSCP, which circumvents existing defenses.

Data-faithful Feature Attribution: Mitigating Unobservable Confounders via Instrumental Variables

26 September 2024·1976 words·10 mins· loading · loading

AI Theory Interpretability 🏢 Zhejiang University

Data-faithful feature attribution tackles misinterpretations from unobservable confounders by using instrumental variables to train confounder-free models, leading to more robust and accurate feature …

Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving

26 September 2024·3110 words·15 mins· loading · loading

AI Applications Autonomous Vehicles 🏢 Zhejiang University

LeapAD, a novel autonomous driving paradigm, uses a dual-process architecture mirroring human cognition to achieve continuous learning and improved adaptability. Employing a VLM for efficient scene u…

Context and Geometry Aware Voxel Transformer for Semantic Scene Completion

26 September 2024·2245 words·11 mins· loading · loading

3D Vision 🏢 Zhejiang University

CGFormer: a novel voxel transformer boosting semantic scene completion accuracy by using context-aware queries and 3D deformable attention, outperforming existing methods on SemanticKITTI and SSCBench…

Con4m: Context-aware Consistency Learning Framework for Segmented Time Series Classification

26 September 2024·2306 words·11 mins· loading · loading

Machine Learning Deep Learning 🏢 Zhejiang University

Con4m, a novel consistency learning framework, leverages contextual information to effectively classify segmented time series with inconsistent boundary labels and varying durations of classes, signif…

Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers

26 September 2024·2344 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Zhejiang University

Chat-Scene: Bridging 3D scenes and LLMs using object identifiers for efficient, object-level interaction and improved scene comprehension.

Can Graph Neural Networks Expose Training Data Properties? An Efficient Risk Assessment Approach

26 September 2024·1856 words·9 mins· loading · loading

AI Theory Privacy 🏢 Zhejiang University

New efficient attack reveals GNN model training data properties.