Agent-as-Judge

Evaluation of the generative capabilities of LLM agents

champ imagechamp image

aij_judge_task_1_train.csv — data for training the model for the first subtask, consists of the fields:

  • prompt — prompts to the model, described in the Description tab
  • score — expert evaluation

run.py — the inference code using vllm

sample_submission.csv — example file with the results after the model's execution

The solution archive must contain the file run.py, which will accept the arguments --test_path - the path to the test CSV file with the fields id and prompt, which need to be evaluated, and --pred_path - the path where the file with the answers should be saved.