🏢 LMU Munich & Munich Center for Machine Learning
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
·1865 words·9 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 LMU Munich & Munich Center for Machine Learning
GlotCC: Open multilingual corpus & pipeline for minority languages, exceeding 1000 languages.