Natural Language Processing
LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models
·4898 words·23 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 University of Oregon
LUSIFER: a novel zero-shot approach empowers English-centric LLM embedding models for multilingual tasks without explicit multilingual training data, significantly enhancing performance, especially fo…
Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing
·3334 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 University of Texas at Austin
Polarizing SSMs’ state transition matrices enhances long-range dependency modeling by mitigating recency bias and over-smoothing.
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
·3050 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Text Generation
🏢 Singapore University of Technology and Design
TANGOFLUX: Blazing-fast, high-fidelity text-to-audio generation using novel CLAP-Ranked Preference Optimization.
MapQaTor: A System for Efficient Annotation of Map Query Datasets
·3496 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Department of Computer Science and Engineering
MAPQATOR: a web app that streamlines creation of reproducible geospatial QA datasets, boosting annotation speed by 30x!
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation
·3981 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tsinghua University
New benchmarks, HumanEval Pro and MBPP Pro, reveal LLMs struggle with self-invoking code generation, highlighting a critical gap in current code reasoning capabilities.
Facilitating large language model Russian adaptation with Learned Embedding Propagation
·2350 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Lomonosov Moscow State University
Researchers introduce Learned Embedding Propagation (LEP), a novel technique that efficiently adapts large language models (LLMs) to new languages using minimal training data, thus overcoming limitati…
Efficiently Serving LLM Reasoning Programs with Certaindex
·4124 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 UC San Diego
Dynasor optimizes LLM reasoning by dynamically allocating compute based on a novel ‘certaindex’ metric, reducing compute by up to 50% and increasing query rates by 3.3x.
OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System
·379 words·2 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Information Extraction
🏢 Zhejiang University
OneKE: a dockerized, schema-guided LLM agent system efficiently extracts knowledge from diverse sources, offering adaptability and robust error handling.
Xmodel-2 Technical Report
·2582 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Xiaoduo AI Lab
Xmodel-2: A 1.2B parameter LLM achieving state-of-the-art reasoning performance through efficient architecture and training, now publicly available!
Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging
·269 words·2 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Intel Labs
Boost fine-tuned LLMs’ performance without sacrificing safety by merging pre- and post-tuning model weights!
Token-Budget-Aware LLM Reasoning
·3147 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Nanjing University
TALE: A novel framework dynamically adjusts token budgets in LLM reasoning prompts, slashing costs by ~70% with minimal accuracy loss.
Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation
·2542 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 University of Science and Technology of China
Molar: A novel multimodal LLM framework boosts sequential recommendation accuracy by cleverly aligning collaborative filtering with rich item representations from text and non-text data.
YuLan-Mini: An Open Data-efficient Language Model
·4206 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Renmin University of China
YuLan-Mini: An open, data-efficient 2.42B parameter LLM achieving top-tier performance with innovative training techniques.
In Case You Missed It: ARC 'Challenge' Is Not That Challenging
·2565 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Snowflake AI Research
LLM evaluation on multiple-choice questions is flawed; considering all options simultaneously, not individually, reveals much higher accuracy and challenges existing benchmark rankings.
Friends-MMC: A Dataset for Multi-modal Multi-party Conversation Understanding
·2127 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Dialogue Systems
🏢 Peking University
Friends-MMC: A new dataset facilitates multi-modal multi-party conversation understanding by providing 24,000+ utterances with video, audio, and speaker annotations, enabling advancements in character…
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
·2203 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tsinghua University
FoPE enhances attention’s periodic extension for better length generalization in language models by addressing spectral damage in RoPE using Fourier Series and zeroing out destructive frequencies.
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought
·402 words·2 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Machine Translation
🏢 Tencent AI Lab
DRT-01 leverages long chain-of-thought reasoning to significantly boost machine translation quality, particularly for complex sentences with metaphors and similes, achieving substantial improvements o…
Deliberation in Latent Space via Differentiable Cache Augmentation
·3569 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Google DeepMind
Frozen LLMs get a performance boost by augmenting their key-value cache with latent embeddings generated by a differentiable offline coprocessor.
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
·2172 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Hong Kong University of Science and Technology
B-STAR dynamically balances exploration and exploitation in self-taught reasoners, achieving superior performance in mathematical, coding, and commonsense reasoning tasks.
A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression
·4375 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tencent AI Lab
This study reveals that gist token-based context compression in LLMs, while effective for some tasks, suffers from key failure patterns. The authors propose fine-grained autoencoding and segment-wise…