🏢 KTH Royal Institute of Technology
Large Language Models and Mathematical Reasoning Failures
·397 words·2 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 KTH Royal Institute of Technology
Large language models struggle with mathematical word problems, demonstrating flaws in reasoning despite achieving high accuracy; a new study highlights these persistent gaps in generalization abiliti…
Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance
·1604 words·8 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 KTH Royal Institute of Technology
LLMs’ performance on language complexity tasks (LIX & ADD) reveals a strong correlation with general capabilities, suggesting complexity metrics as noisy zero-shot proxies for model evaluation.