Skip to main content

🏢 LMU Munich & Munich Center for Machine Learning

GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
·1865 words·9 mins
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 LMU Munich & Munich Center for Machine Learning
GlotCC: Open multilingual corpus & pipeline for minority languages, exceeding 1000 languages.