Skip to main content

🏢 Gaoling School of Artificial Intelligence, Renmin University of China

StreamingDialogue: Prolonged Dialogue Learning via Long Context Compression with Minimal Losses
·2873 words·14 mins· loading · loading
Natural Language Processing Dialogue Systems 🏢 Gaoling School of Artificial Intelligence, Renmin University of China
StreamingDialogue revolutionizes prolonged dialogue learning by compressing long contexts into conversational attention sinks, minimizing information loss and achieving a 4x speedup with 18x less memo…
Reflective Multi-Agent Collaboration based on Large Language Models
·2567 words·13 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Gaoling School of Artificial Intelligence, Renmin University of China
COPPER enhances LLM-based multi-agent collaboration via a self-reflection mechanism and counterfactual PPO. It improves reflection quality, alleviates credit assignment issues, and shows strong perfo…
P$^2$C$^2$Net: PDE-Preserved Coarse Correction Network for efficient prediction of spatiotemporal dynamics
·3055 words·15 mins· loading · loading
AI Generated Machine Learning Deep Learning 🏢 Gaoling School of Artificial Intelligence, Renmin University of China
P2C2Net: A physics-encoded neural network efficiently predicts complex spatiotemporal dynamics using coarse grids and limited training data, achieving state-of-the-art results.
Over-parameterized Student Model via Tensor Decomposition Boosted Knowledge Distillation
·2892 words·14 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Gaoling School of Artificial Intelligence, Renmin University of China
Over-parameterized Distillation Framework (OPDF) boosts knowledge distillation by efficiently over-parameterizing student models via tensor decomposition, significantly improving performance without i…
On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability
·2131 words·11 mins· loading · loading
AI Generated AI Theory Optimization 🏢 Gaoling School of Artificial Intelligence, Renmin University of China
Autoregressively trained transformers surprisingly learn algorithms during pretraining, enabling in-context learning; this paper reveals when and why this ‘mesa-optimization’ happens.
Mixture of In-Context Experts Enhance LLMs' Long Context Awareness
·2372 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Gaoling School of Artificial Intelligence, Renmin University of China
MoICE, a novel plug-in, significantly enhances LLMs’ long context awareness by dynamically routing attention using multiple RoPE angles, achieving superior performance with high inference efficiency.
Lower Bounds of Uniform Stability in Gradient-Based Bilevel Algorithms for Hyperparameter Optimization
·1822 words·9 mins· loading · loading
Machine Learning Optimization 🏢 Gaoling School of Artificial Intelligence, Renmin University of China
This paper establishes tight lower bounds for the uniform stability of gradient-based bilevel programming algorithms used for hyperparameter optimization, resolving a key open problem regarding the ti…
FineCLIP: Self-distilled Region-based CLIP for Better Fine-grained Understanding
·2233 words·11 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 Gaoling School of Artificial Intelligence, Renmin University of China
FineCLIP boosts fine-grained image understanding by combining real-time self-distillation with semantically rich regional contrastive learning, significantly outperforming existing methods.
Exploring Context Window of Large Language Models via Decomposed Positional Vectors
·3403 words·16 mins· loading · loading
Large Language Models 🏢 Gaoling School of Artificial Intelligence, Renmin University of China
Researchers extended large language models’ context windows by training-free methods via analyzing and manipulating positional vectors, improving long-text processing.
Are High-Degree Representations Really Unnecessary in Equivariant Graph Neural Networks?
·2234 words·11 mins· loading · loading
AI Theory Representation Learning 🏢 Gaoling School of Artificial Intelligence, Renmin University of China
High-degree representations significantly boost the expressiveness of E(3)-equivariant GNNs, overcoming limitations of lower-degree models on symmetric structures, as demonstrated theoretically and em…