Skip to main content

🏢 EPFL

Why the Metric Backbone Preserves Community Structure
·2073 words·10 mins· loading · loading
AI Theory Optimization 🏢 EPFL
Metric backbone graph sparsification surprisingly preserves community structure, offering an efficient and robust method for analyzing large networks.
Why Do We Need Weight Decay in Modern Deep Learning?
·3285 words·16 mins· loading · loading
AI Theory Optimization 🏢 EPFL
Weight decay’s role in modern deep learning is surprisingly multifaceted, impacting optimization dynamics rather than solely regularization, improving generalization and training stability.
SuperDeepFool: a new fast and accurate minimal adversarial attack
·4315 words·21 mins· loading · loading
AI Generated AI Theory Robustness 🏢 EPFL
SuperDeepFool: a fast, accurate algorithm generating minimal adversarial perturbations, significantly improving deep learning model robustness evaluation and adversarial training.
SGD vs GD: Rank Deficiency in Linear Networks
·381 words·2 mins· loading · loading
AI Theory Optimization 🏢 EPFL
SGD surprisingly diminishes network rank, unlike GD, due to a repulsive force between eigenvalues, offering insights into deep learning generalization.
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
·4063 words·20 mins· loading · loading
Large Language Models 🏢 EPFL
Revolutionizing LLM training: Constant learning rate with cooldown replaces cosine schedule, enabling cost-effective scaling experiments!
SAMPa: Sharpness-aware Minimization Parallelized
·2453 words·12 mins· loading · loading
Machine Learning Optimization 🏢 EPFL
SAMPa: Parallelizing gradient computations in Sharpness-Aware Minimization (SAM) achieves a 2x speedup and superior generalization.
Revisiting Ensembling in One-Shot Federated Learning
·3849 words·19 mins· loading · loading
AI Generated Machine Learning Federated Learning 🏢 EPFL
FENS: a novel federated ensembling scheme that boosts one-shot federated learning accuracy to near iterative FL levels, while maintaining low communication costs.
Local to Global: Learning Dynamics and Effect of Initialization for Transformers
·2433 words·12 mins· loading · loading
AI Generated Natural Language Processing Text Generation 🏢 EPFL
Transformers’ learning dynamics depend heavily on initialization and Markovian data properties, leading to either global or local minima; this paper proves this, offers initialization guidelines, and …
Implicit Bias of Mirror Flow on Separable Data
·1523 words·8 mins· loading · loading
AI Theory Optimization 🏢 EPFL
Mirror descent’s implicit bias on separable data is formally characterized, revealing convergence towards a maximum margin classifier determined by the potential’s ‘horizon function’.
Graph Edit Distance with General Costs Using Neural Set Divergence
·3177 words·15 mins· loading · loading
Machine Learning Deep Learning 🏢 EPFL
GRAPHEDX, a novel neural network, accurately estimates graph edit distance with varying operation costs, outperforming existing methods.
Generative Modelling of Structurally Constrained Graphs
·5840 words·28 mins· loading · loading
AI Generated AI Applications Healthcare 🏢 EPFL
ConStruct: Generating realistic graphs with guaranteed structural properties via constrained diffusion.
Fine-Tuning Personalization in Federated Learning to Mitigate Adversarial Clients
·1780 words·9 mins· loading · loading
Machine Learning Federated Learning 🏢 EPFL
Fine-tune personalization in federated learning to beat adversarial clients; collaboration level depends on data heterogeneity and adversary fraction.
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
·3222 words·16 mins· loading · loading
Natural Language Processing Large Language Models 🏢 EPFL
DenseFormer enhances transformers by adding a depth-weighted averaging step, improving data efficiency and outperforming baselines in memory and inference time without increasing model size.
CoBo: Collaborative Learning via Bilevel Optimization
·1628 words·8 mins· loading · loading
Machine Learning Federated Learning 🏢 EPFL
CoBo: A novel bilevel optimization algorithm for collaborative learning surpasses existing methods by efficiently selecting helpful clients, resulting in superior performance and scalability.