π’ Fudan University
Unified Lexical Representation for Interpretable Visual-Language Alignment
·1730 words·9 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
π’ Fudan University
LexVLA: A novel visual-language alignment framework learns unified lexical representations for improved interpretability and efficient cross-modal retrieval.
Towards Global Optimal Visual In-Context Learning Prompt Selection
·2618 words·13 mins·
loading
·
loading
AI Generated
Computer Vision
Image Segmentation
π’ Fudan University
Partial2Global: A novel VICL framework achieving globally optimal prompt selection, significantly improving visual in-context learning across various tasks.
Tetrahedron Splatting for 3D Generation
·2346 words·12 mins·
loading
·
loading
3D Vision
π’ Fudan University
TeT-Splatting: a novel 3D representation enabling fast convergence, real-time rendering, and precise mesh extraction for high-fidelity 3D generation.
Taming Generative Diffusion Prior for Universal Blind Image Restoration
·4450 words·21 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
π’ Fudan University
BIR-D tames generative diffusion models for universal blind image restoration, dynamically updating parameters to handle various complex degradations without assuming degradation model types.
TAIA: Large Language Models are Out-of-Distribution Data Learners
·2712 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ Fudan University
LLMs struggle with downstream tasks using mismatched data. TAIA, a novel inference-time method, solves this by selectively using only attention parameters during inference after training all parameter…
SpeechAlign: Aligning Speech Generation to Human Preferences
·1822 words·9 mins·
loading
·
loading
Natural Language Processing
Text Generation
π’ Fudan University
SpeechAlign: Iteratively aligning speech generation models to human preferences via preference optimization, bridging distribution gaps for improved speech quality.
S2HPruner: Soft-to-Hard Distillation Bridges the Discretization Gap in Pruning
·2415 words·12 mins·
loading
·
loading
Machine Learning
Deep Learning
π’ Fudan University
S2HPruner bridges the discretization gap in neural network pruning via a novel soft-to-hard distillation framework, achieving superior performance across various benchmarks without fine-tuning.
Penalty-based Methods for Simple Bilevel Optimization under HΓΆlderian Error Bounds
·1969 words·10 mins·
loading
·
loading
Machine Learning
Optimization
π’ Fudan University
This paper proposes penalty-based methods for simple bilevel optimization, achieving (Ρ, Ρβ)-optimal solutions with improved complexity under Hâlderian error bounds.
MVInpainter: Learning Multi-View Consistent Inpainting to Bridge 2D and 3D Editing
·3629 words·18 mins·
loading
·
loading
Computer Vision
3D Vision
π’ Fudan University
MVInpainter: Pose-free multi-view consistent inpainting bridges 2D and 3D editing by simplifying 3D editing to a multi-view 2D inpainting task.
MSA Generation with Seqs2Seqs Pretraining: Advancing Protein Structure Predictions
·2100 words·10 mins·
loading
·
loading
Machine Learning
Self-Supervised Learning
π’ Fudan University
Self-supervised generative model MSA-Generator boosts protein structure prediction accuracy by producing high-quality MSAs, especially for challenging sequences lacking homologs.
Motion Forecasting in Continuous Driving
·1965 words·10 mins·
loading
·
loading
AI Applications
Autonomous Vehicles
π’ Fudan University
RealMotion: a novel motion forecasting framework for continuous driving that outperforms existing methods by accumulating historical scene information and sequentially refining predictions, achieving …
Mixtures of Experts for Audio-Visual Learning
·2112 words·10 mins·
loading
·
loading
Multimodal Learning
Audio-Visual Learning
π’ Fudan University
AVMoE: a novel parameter-efficient transfer learning approach for audio-visual learning, dynamically allocates expert models (unimodal and cross-modal adapters) based on task demands, achieving superi…
MeLLoC: Lossless Compression with High-order Mechanism Learning
·1838 words·9 mins·
loading
·
loading
AI Generated
AI Applications
Healthcare
π’ Fudan University
MeLLoC: Mechanism Learning for Lossless Compression, a novel approach that combines high-order mechanism learning with classical encoding, significantly improves lossless compression for scientific da…
Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
·2500 words·12 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
π’ Fudan University
Lumen: A novel LMM architecture decouples perception learning into task-agnostic and task-specific stages, enabling versatile vision-centric capabilities and surpassing existing LMM-based approaches.
Low Precision Local Training is Enough for Federated Learning
·2011 words·10 mins·
loading
·
loading
Machine Learning
Federated Learning
π’ Fudan University
Low-precision local training, surprisingly, is sufficient for accurate federated learning, significantly reducing communication and computation costs.
LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Control and Rendering
·2138 words·11 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
π’ Fudan University
LiveScene: Language-embedded interactive radiance fields efficiently reconstruct and control complex scenes with multiple interactive objects, achieving state-of-the-art results.
Knowledge Graph Completion by Intermediate Variables Regularization
·2107 words·10 mins·
loading
·
loading
AI Generated
Machine Learning
Deep Learning
π’ Fudan University
Novel intermediate variables regularization boosts knowledge graph completion!
Iterative Methods via Locally Evolving Set Process
·3065 words·15 mins·
loading
·
loading
AI Theory
Optimization
π’ Fudan University
This paper proposes a novel framework, the locally evolving set process, to develop faster localized iterative methods for solving large-scale graph problems, achieving significant speedup over existi…
GenRec: Unifying Video Generation and Recognition with Diffusion Models
·2342 words·11 mins·
loading
·
loading
Computer Vision
Video Understanding
π’ Fudan University
GenRec: One diffusion model to rule both video generation and recognition!
FNP: Fourier Neural Processes for Arbitrary-Resolution Data Assimilation
·2089 words·10 mins·
loading
·
loading
AI Applications
Autonomous Vehicles
π’ Fudan University
Fourier Neural Processes (FNP) revolutionizes data assimilation by enabling accurate analysis of observations with varying resolutions, improving weather forecasting and Earth system modeling.