Add pipeline tag, library name and link to code

This PR adds the `pipeline_tag` (image-text-to-text) and `library_name: transformers` to the model card, such that the model is discoverable at https://huggingface.co/models?pipeline_tag=image-text-to-text. It also adds a link to the Github repository,
making the code easier to find.

Files changed (1) hide show

README.md +21 -8

README.md CHANGED Viewed

@@ -1,5 +1,6 @@
 ---
-license: mit
 datasets:
 - CodeGoat24/HPD
 - CodeGoat24/LiFT-HRA
@@ -9,8 +10,9 @@ datasets:
 - CodeGoat24/VideoFeedback
 - CodeGoat24/LLaVA-Critic-113k
 - CodeGoat24/VideoDPO
-base_model:
-- lmms-lab/llava-onevision-qwen2-0.5b-ov
 ---
 ## Model Summary
@@ -19,7 +21,8 @@ base_model:
 For further details, please refer to the following resources:
 - 📰 Paper: https://arxiv.org/pdf/2503.05236
-- 🪐 Project Page: https://codegoat24.github.io/UnifiedReward/
 - 🤗 Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-models-67c3008148c3a380d15ac63a
 - 🤗 Dataset Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-training-data-67c300d4fd5eff00fa7f1ede
 - 👋 Point of Contact: [Yibin Wang](https://codegoat24.github.io)
@@ -79,12 +82,22 @@ image_tensor = [_image.to(dtype=torch.float16, device=device) for _image in imag
 conv_template = "qwen_1_5"  # Make sure you use correct chat template for different models
 # pairwise ranking
-critic_prompt = "Given an image and a corresponding question, please serve as an unbiased and fair judge to evaluate the quality of the answers provided by a Large Multimodal Model (LMM). Determine which answer is better and explain your reasoning with specific details. Your task is provided as follows:\nQuestion: [What this image presents?]\nThe first response: [The image is a black and white sketch of a line that appears to be in the shape of a cross. The line is a simple and straightforward representation of the cross shape, with two straight lines intersecting at a point.]\nThe second response: [This is a handwritten number seven.]\nASSISTANT:\n"
 # pointwise scoring
-# critic_prompt = "Given an image and a corresponding question, please serve as an unbiased and fair judge to evaluate the quality of answer answers provided by a Large Multimodal Model (LMM). Score the response out of 100 and explain your reasoning with specific details. Your task is provided as follows:\nQuestion: [What this image presents?]\nThe LMM response: [This is a handwritten number seven.]\nASSISTANT:\n "
-question = DEFAULT_IMAGE_TOKEN + "\n" + critic_prompt
 conv = copy.deepcopy(conv_templates[conv_template])
 conv.append_message(conv.roles[0], question)
 conv.append_message(conv.roles[1], None)

 ---
+base_model:
+- lmms-lab/llava-onevision-qwen2-0.5b-ov
 datasets:
 - CodeGoat24/HPD
 - CodeGoat24/LiFT-HRA
 - CodeGoat24/VideoFeedback
 - CodeGoat24/LLaVA-Critic-113k
 - CodeGoat24/VideoDPO
+license: mit
+library_name: transformers
+pipeline_tag: image-text-to-text
 ---
 ## Model Summary
 For further details, please refer to the following resources:
 - 📰 Paper: https://arxiv.org/pdf/2503.05236
+- 🪐 Project Page: https://codegoat24.github.io/UnifiedReward/think
+- 💻 Github Repo: https://github.com/CodeGoat24/UnifiedReward
 - 🤗 Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-models-67c3008148c3a380d15ac63a
 - 🤗 Dataset Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-training-data-67c300d4fd5eff00fa7f1ede
 - 👋 Point of Contact: [Yibin Wang](https://codegoat24.github.io)
 conv_template = "qwen_1_5"  # Make sure you use correct chat template for different models
 # pairwise ranking
+critic_prompt = "Given an image and a corresponding question, please serve as an unbiased and fair judge to evaluate the quality of the answers provided by a Large Multimodal Model (LMM). Determine which answer is better and explain your reasoning with specific details. Your task is provided as follows:
+Question: [What this image presents?]
+The first response: [The image is a black and white sketch of a line that appears to be in the shape of a cross. The line is a simple and straightforward representation of the cross shape, with two straight lines intersecting at a point.]
+The second response: [This is a handwritten number seven.]
+ASSISTANT:
+"
 # pointwise scoring
+# critic_prompt = "Given an image and a corresponding question, please serve as an unbiased and fair judge to evaluate the quality of answer answers provided by a Large Multimodal Model (LMM). Score the response out of 100 and explain your reasoning with specific details. Your task is provided as follows:
+Question: [What this image presents?]
+The LMM response: [This is a handwritten number seven.]
+ASSISTANT:
+ "
+question = DEFAULT_IMAGE_TOKEN + "
+" + critic_prompt
 conv = copy.deepcopy(conv_templates[conv_template])
 conv.append_message(conv.roles[0], question)
 conv.append_message(conv.roles[1], None)