Information Extraction

GKG-LLM: A Unified Framework for Generalized Knowledge Graph Construction

14 March 2025·2910 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Information Extraction 🏢 Xi'an Jiaotong University

GKG-LLM: Unifying Knowledge Graph Construction with a novel 3-stage framework, empowering domain adaptation & resource efficiency.

Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents

11 March 2025·3678 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Information Extraction 🏢 Renmin University of China

PLM retrievers overrate low-perplexity docs, causing source bias. This paper reveals the causal effect & offers a fix!

IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval

6 March 2025·5266 words·25 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Information Extraction 🏢 School of Advanced Interdisciplinary Sciences, University of Chinese Academy of Sciences

IFIR: a new benchmark for instruction-following retrieval in expert domains, revealing current model limitations.

Cuckoo: An IE Free Rider Hatched by Massive Nutrition in LLM's Nest

16 February 2025·3405 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Information Extraction 🏢 UC San Diego

Cuckoo: a novel information extraction (IE) model leverages LLM pre-training data, achieving superior performance in few-shot settings by reframing next-token prediction as token extraction.

OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System

28 December 2024·379 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Information Extraction 🏢 Zhejiang University

OneKE: a dockerized, schema-guided LLM agent system efficiently extracts knowledge from diverse sources, offering adaptability and robust error handling.

LongKey: Keyphrase Extraction for Long Documents

26 November 2024·3409 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Information Extraction 🏢 University of Luxembourg

LongKey: A novel framework excels at extracting keyphrases from lengthy documents using an encoder-based language model and max-pooling, outperforming existing methods across diverse datasets.

RetrieveGPT: Merging Prompts and Mathematical Models for Enhanced Code-Mixed Information Retrieval

7 November 2024·523 words·3 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Information Extraction 🏢 IIT Kharagpur

RetrieveGPT enhances code-mixed information retrieval by merging GPT-3.5 Turbo prompts with a novel mathematical model, improving the accuracy of relevant document extraction from complex, sequenced c…