Skip to main content

🏢 OpenAI

Expect the Unexpected: FailSafe Long Context QA for Finance
·2633 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 OpenAI
FailSafeQA benchmark rigorously evaluates LLMs’ resilience against diverse human-interaction variations, revealing critical weaknesses in even high-performing models, particularly regarding hallucinat…
Kimi k1.5: Scaling Reinforcement Learning with LLMs
·1386 words·7 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 OpenAI
Kimi K1.5: A Multimodal LLM trained with RL achieves state-of-the-art reasoning by scaling long context RL training and improving policy optimization.