Agent-as-Judge
Evaluation of the generative capabilities of LLM agents
aij_judge_task_1_train.csv — data for training the model for the first subtask, consists of the fields:
- prompt — prompts to the model, described in the Description tab
- score — expert evaluation
run.py — the inference code using vllm
sample_submission.csv — example file with the results after the model's execution
The solution archive must contain the file run.py, which will accept the arguments --test_path - the path to the test CSV file with the fields id and prompt, which need to be evaluated, and --pred_path - the path where the file with the answers should be saved.