🏢 OpenAI
Expect the Unexpected: FailSafe Long Context QA for Finance
·2633 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 OpenAI
FailSafeQA benchmark rigorously evaluates LLMs’ resilience against diverse human-interaction variations, revealing critical weaknesses in even high-performing models, particularly regarding hallucinat…
Kimi k1.5: Scaling Reinforcement Learning with LLMs
·1386 words·7 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 OpenAI
Kimi K1.5: A Multimodal LLM trained with RL achieves state-of-the-art reasoning by scaling long context RL training and improving policy optimization.