llm
llmgate
Failure-Taxonomy Authoring Tool for LLM Evals
Tooling for the hard 60-80% of eval-dev time (Hamel Husain) โ authoring and maintaining failure taxonomies โ which no tool owns.
evalsfailure-taxonomyobservabilitytoolingwhitespace
DISTILL-llm whitespace callout. The eval ecosystem (Langfuse, Braintrust, Phoenix, Patronus, Galileo) covers traces and scoring but not the labor-intensive failure-taxonomy authoring that dominates real eval development. Named whitespace with no dominant vendor.
source: DISTILL-llm.md