🏢 IBM Research
JuStRank: Benchmarking LLM Judges for System Ranking
·13985 words·66 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 IBM Research
JuStRank: LLM system ranker benchmark reveals critical judge qualities (decisiveness, bias) impacting ranking accuracy, highlighting instance-level performance doesn’t guarantee accurate system-level…
Granite Guardian
·4191 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 IBM Research
Granite Guardian: Open-source risk detection models for LLMs, surpassing existing models in accuracy and offering comprehensive coverage across multiple risk dimensions, promoting safer AI.
DELIFT: Data Efficient Language model Instruction Fine Tuning
·1830 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 IBM Research
DELIFT: Data Efficient Language Model Instruction Fine-Tuning, drastically reduces the data needed for effective LLM fine-tuning without sacrificing performance.