🏢 Zhejiang University
Extending Multi-modal Contrastive Representations
·2089 words·10 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 Zhejiang University
Ex-MCR: Efficiently build unified multi-modal representations by extending, not connecting, pre-trained spaces, achieving superior performance with less paired data and training.
Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis
·2012 words·10 mins·
loading
·
loading
Computer Vision
3D Vision
🏢 Zhejiang University
eFreeSplat: a novel, epipolar-free 3D Gaussian splatting model for generalizable novel view synthesis, surpassing state-of-the-art methods by achieving superior geometry reconstruction and novel view …
Enhancing Semi-Supervised Learning via Representative and Diverse Sample Selection
·1698 words·8 mins·
loading
·
loading
Machine Learning
Semi-Supervised Learning
🏢 Zhejiang University
RDSS: a novel sample selection method for semi-supervised learning, boosts model accuracy by minimizing a-MMD, striking a balance between sample representativeness and diversity.
Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control
·3239 words·16 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Zhejiang University
Boosting LLM trustworthiness, researchers introduce Sparse Activation Control, a training-free method that concurrently enhances safety, factuality, and bias mitigation by selectively controlling atte…
Enhancing LLM’s Cognition via Structurization
·3694 words·18 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Zhejiang University
LLMs struggle with complex, long-form text. This paper introduces ‘context structurization,’ transforming unstructured text into a structured format to enhance LLM comprehension. Experiments across …
Enhancing LLM Reasoning via Vision-Augmented Prompting
·2157 words·11 mins·
loading
·
loading
Multimodal Learning
Multimodal Reasoning
🏢 Zhejiang University
Vision-Augmented Prompting (VAP) boosts LLM reasoning by automatically generating images from textual problem descriptions, incorporating visual-spatial clues to significantly improve accuracy across …
Dual-Perspective Activation: Efficient Channel Denoising via Joint Forward-Backward Criterion for Artificial Neural Networks
·1941 words·10 mins·
loading
·
loading
AI Theory
Interpretability
🏢 Zhejiang University
Dual-Perspective Activation (DPA) efficiently denoises ANN channels by jointly using forward and backward propagation criteria, improving sparsity and accuracy.
DRIP: Unleashing Diffusion Priors for Joint Foreground and Alpha Prediction in Image Matting
·1974 words·10 mins·
loading
·
loading
Computer Vision
Image Segmentation
🏢 Zhejiang University
DRIP: A novel image matting method using pre-trained latent diffusion models achieves state-of-the-art performance by jointly predicting foreground and alpha values, significantly improving accuracy a…
DreamMesh4D: Video-to-4D Generation with Sparse-Controlled Gaussian-Mesh Hybrid Representation
·2631 words·13 mins·
loading
·
loading
AI Generated
Computer Vision
3D Vision
🏢 Zhejiang University
DreamMesh4D: Generating high-fidelity dynamic 3D meshes from monocular video using a novel Gaussian-mesh hybrid representation and adaptive hybrid skinning.
DMNet: Self-comparison Driven Model for Subject-independent Seizure Detection
·2151 words·11 mins·
loading
·
loading
AI Applications
Healthcare
🏢 Zhejiang University
DMNet: A novel self-comparison driven model significantly improves subject-independent seizure detection from intracranial EEG, outperforming existing methods.
DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion
·2444 words·12 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 Zhejiang University
DiffPano generates scalable, consistent, and diverse panoramic images from text descriptions and camera poses using a novel spherical epipolar-aware diffusion model.
Delving into the Reversal Curse: How Far Can Large Language Models Generalize?
·3631 words·18 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Zhejiang University
Large language models struggle to generalize knowledge when facing seemingly simple reversals, a phenomenon termed the ‘reversal curse.’ This study reveals that this limitation is strongly linked to t…
DECRL: A Deep Evolutionary Clustering Jointed Temporal Knowledge Graph Representation Learning Approach
·2468 words·12 mins·
loading
·
loading
AI Generated
Machine Learning
Representation Learning
🏢 Zhejiang University
DECRL: A novel deep learning approach for temporal knowledge graph representation learning, capturing high-order correlation evolution and outperforming existing methods.
DataStealing: Steal Data from Diffusion Models in Federated Learning with Multiple Trojans
·3940 words·19 mins·
loading
·
loading
AI Generated
Machine Learning
Federated Learning
🏢 Zhejiang University
Attackers can steal massive private data from federated learning diffusion models using multiple Trojans and an advanced attack, AdaSCP, which circumvents existing defenses.
Data-faithful Feature Attribution: Mitigating Unobservable Confounders via Instrumental Variables
·1976 words·10 mins·
loading
·
loading
AI Theory
Interpretability
🏢 Zhejiang University
Data-faithful feature attribution tackles misinterpretations from unobservable confounders by using instrumental variables to train confounder-free models, leading to more robust and accurate feature …
Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving
·3110 words·15 mins·
loading
·
loading
AI Applications
Autonomous Vehicles
🏢 Zhejiang University
LeapAD, a novel autonomous driving paradigm, uses a dual-process architecture mirroring human cognition to achieve continuous learning and improved adaptability. Employing a VLM for efficient scene u…
Context and Geometry Aware Voxel Transformer for Semantic Scene Completion
·2245 words·11 mins·
loading
·
loading
3D Vision
🏢 Zhejiang University
CGFormer: a novel voxel transformer boosting semantic scene completion accuracy by using context-aware queries and 3D deformable attention, outperforming existing methods on SemanticKITTI and SSCBench…
Con4m: Context-aware Consistency Learning Framework for Segmented Time Series Classification
·2306 words·11 mins·
loading
·
loading
Machine Learning
Deep Learning
🏢 Zhejiang University
Con4m, a novel consistency learning framework, leverages contextual information to effectively classify segmented time series with inconsistent boundary labels and varying durations of classes, signif…
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers
·2344 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Zhejiang University
Chat-Scene: Bridging 3D scenes and LLMs using object identifiers for efficient, object-level interaction and improved scene comprehension.
Can Graph Neural Networks Expose Training Data Properties? An Efficient Risk Assessment Approach
·1856 words·9 mins·
loading
·
loading
AI Theory
Privacy
🏢 Zhejiang University
New efficient attack reveals GNN model training data properties.