Skip to main content

🏢 IBM Research

JuStRank: Benchmarking LLM Judges for System Ranking
·13985 words·66 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 IBM Research
JuStRank: LLM system ranker benchmark reveals critical judge qualities (decisiveness, bias) impacting ranking accuracy, highlighting instance-level performance doesn’t guarantee accurate system-level…
Granite Guardian
·4191 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 IBM Research
Granite Guardian: Open-source risk detection models for LLMs, surpassing existing models in accuracy and offering comprehensive coverage across multiple risk dimensions, promoting safer AI.
DELIFT: Data Efficient Language model Instruction Fine Tuning
·1830 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 IBM Research
DELIFT: Data Efficient Language Model Instruction Fine-Tuning, drastically reduces the data needed for effective LLM fine-tuning without sacrificing performance.