Agent-as-Judge

Evaluation of the generative capabilities of LLM agents

champ imagechamp image