Spaces:

whyu
/

MM-Vet-v2_Evaluator

Sleeping

whyu commited on Aug 2, 2024

Commit

7a59021

1 Parent(s): 6829c48

Add arXiv link

Files changed (1) hide show

app.py CHANGED Viewed

@@ -301,7 +301,7 @@ def grade(file_obj, progress=gr.Progress()):
 model_result_example = "https://raw.githubusercontent.com/yuweihao/MM-Vet/main/v2/results/gpt-4o-2024-05-13_detail-high.json"
 markdown = f"""
-# [MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities](https://github.com/yuweihao/MM-Vet/tree/main/v2)
 We offer MM-Vet v2 LLM-based (GPT-4) evaluator to grade open-ended outputs from your models.

 model_result_example = "https://raw.githubusercontent.com/yuweihao/MM-Vet/main/v2/results/gpt-4o-2024-05-13_detail-high.json"
 markdown = f"""
+# [MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities](https://arxiv.org/abs/2408.00765)
 We offer MM-Vet v2 LLM-based (GPT-4) evaluator to grade open-ended outputs from your models.