🏢 Gaoling School of Artificial Intelligence, Renmin University of China

StreamingDialogue: Prolonged Dialogue Learning via Long Context Compression with Minimal Losses

26 September 2024·2873 words·14 mins· loading · loading

Natural Language Processing Dialogue Systems 🏢 Gaoling School of Artificial Intelligence, Renmin University of China

StreamingDialogue revolutionizes prolonged dialogue learning by compressing long contexts into conversational attention sinks, minimizing information loss and achieving a 4x speedup with 18x less memo…

Reflective Multi-Agent Collaboration based on Large Language Models

26 September 2024·2567 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Gaoling School of Artificial Intelligence, Renmin University of China

COPPER enhances LLM-based multi-agent collaboration via a self-reflection mechanism and counterfactual PPO. It improves reflection quality, alleviates credit assignment issues, and shows strong perfo…

P$^2$C$^2$Net: PDE-Preserved Coarse Correction Network for efficient prediction of spatiotemporal dynamics

26 September 2024·3055 words·15 mins· loading · loading

AI Generated Machine Learning Deep Learning 🏢 Gaoling School of Artificial Intelligence, Renmin University of China

P2C2Net: A physics-encoded neural network efficiently predicts complex spatiotemporal dynamics using coarse grids and limited training data, achieving state-of-the-art results.

Over-parameterized Student Model via Tensor Decomposition Boosted Knowledge Distillation

26 September 2024·2892 words·14 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Gaoling School of Artificial Intelligence, Renmin University of China

Over-parameterized Distillation Framework (OPDF) boosts knowledge distillation by efficiently over-parameterizing student models via tensor decomposition, significantly improving performance without i…

On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability

26 September 2024·2131 words·11 mins· loading · loading

AI Generated AI Theory Optimization 🏢 Gaoling School of Artificial Intelligence, Renmin University of China

Autoregressively trained transformers surprisingly learn algorithms during pretraining, enabling in-context learning; this paper reveals when and why this ‘mesa-optimization’ happens.

Mixture of In-Context Experts Enhance LLMs' Long Context Awareness

26 September 2024·2372 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Gaoling School of Artificial Intelligence, Renmin University of China

MoICE, a novel plug-in, significantly enhances LLMs’ long context awareness by dynamically routing attention using multiple RoPE angles, achieving superior performance with high inference efficiency.

Lower Bounds of Uniform Stability in Gradient-Based Bilevel Algorithms for Hyperparameter Optimization

26 September 2024·1822 words·9 mins· loading · loading

Machine Learning Optimization 🏢 Gaoling School of Artificial Intelligence, Renmin University of China

This paper establishes tight lower bounds for the uniform stability of gradient-based bilevel programming algorithms used for hyperparameter optimization, resolving a key open problem regarding the ti…

FineCLIP: Self-distilled Region-based CLIP for Better Fine-grained Understanding

26 September 2024·2233 words·11 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Gaoling School of Artificial Intelligence, Renmin University of China

FineCLIP boosts fine-grained image understanding by combining real-time self-distillation with semantically rich regional contrastive learning, significantly outperforming existing methods.

Exploring Context Window of Large Language Models via Decomposed Positional Vectors

26 September 2024·3403 words·16 mins· loading · loading

Large Language Models 🏢 Gaoling School of Artificial Intelligence, Renmin University of China

Researchers extended large language models’ context windows by training-free methods via analyzing and manipulating positional vectors, improving long-text processing.

Are High-Degree Representations Really Unnecessary in Equivariant Graph Neural Networks?

26 September 2024·2234 words·11 mins· loading · loading

AI Theory Representation Learning 🏢 Gaoling School of Artificial Intelligence, Renmin University of China

High-degree representations significantly boost the expressiveness of E(3)-equivariant GNNs, overcoming limitations of lower-degree models on symmetric structures, as demonstrated theoretically and em…