Add pipeline tag, library name and link to code
Browse filesThis PR adds the `pipeline_tag` (image-text-to-text) and `library_name: transformers` to the model card, such that the model is discoverable at https://huggingface.co/models?pipeline_tag=image-text-to-text. It also adds a link to the Github repository,
making the code easier to find.
README.md
CHANGED
|
@@ -1,5 +1,6 @@
|
|
| 1 |
---
|
| 2 |
-
|
|
|
|
| 3 |
datasets:
|
| 4 |
- CodeGoat24/HPD
|
| 5 |
- CodeGoat24/LiFT-HRA
|
|
@@ -9,8 +10,9 @@ datasets:
|
|
| 9 |
- CodeGoat24/VideoFeedback
|
| 10 |
- CodeGoat24/LLaVA-Critic-113k
|
| 11 |
- CodeGoat24/VideoDPO
|
| 12 |
-
|
| 13 |
-
|
|
|
|
| 14 |
---
|
| 15 |
|
| 16 |
## Model Summary
|
|
@@ -19,7 +21,8 @@ base_model:
|
|
| 19 |
|
| 20 |
For further details, please refer to the following resources:
|
| 21 |
- π° Paper: https://arxiv.org/pdf/2503.05236
|
| 22 |
-
- πͺ Project Page: https://codegoat24.github.io/UnifiedReward/
|
|
|
|
| 23 |
- π€ Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-models-67c3008148c3a380d15ac63a
|
| 24 |
- π€ Dataset Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-training-data-67c300d4fd5eff00fa7f1ede
|
| 25 |
- π Point of Contact: [Yibin Wang](https://codegoat24.github.io)
|
|
@@ -79,12 +82,22 @@ image_tensor = [_image.to(dtype=torch.float16, device=device) for _image in imag
|
|
| 79 |
conv_template = "qwen_1_5" # Make sure you use correct chat template for different models
|
| 80 |
|
| 81 |
# pairwise ranking
|
| 82 |
-
critic_prompt = "Given an image and a corresponding question, please serve as an unbiased and fair judge to evaluate the quality of the answers provided by a Large Multimodal Model (LMM). Determine which answer is better and explain your reasoning with specific details. Your task is provided as follows
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
|
| 84 |
# pointwise scoring
|
| 85 |
-
# critic_prompt = "Given an image and a corresponding question, please serve as an unbiased and fair judge to evaluate the quality of answer answers provided by a Large Multimodal Model (LMM). Score the response out of 100 and explain your reasoning with specific details. Your task is provided as follows
|
| 86 |
-
|
| 87 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
conv = copy.deepcopy(conv_templates[conv_template])
|
| 89 |
conv.append_message(conv.roles[0], question)
|
| 90 |
conv.append_message(conv.roles[1], None)
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- lmms-lab/llava-onevision-qwen2-0.5b-ov
|
| 4 |
datasets:
|
| 5 |
- CodeGoat24/HPD
|
| 6 |
- CodeGoat24/LiFT-HRA
|
|
|
|
| 10 |
- CodeGoat24/VideoFeedback
|
| 11 |
- CodeGoat24/LLaVA-Critic-113k
|
| 12 |
- CodeGoat24/VideoDPO
|
| 13 |
+
license: mit
|
| 14 |
+
library_name: transformers
|
| 15 |
+
pipeline_tag: image-text-to-text
|
| 16 |
---
|
| 17 |
|
| 18 |
## Model Summary
|
|
|
|
| 21 |
|
| 22 |
For further details, please refer to the following resources:
|
| 23 |
- π° Paper: https://arxiv.org/pdf/2503.05236
|
| 24 |
+
- πͺ Project Page: https://codegoat24.github.io/UnifiedReward/think
|
| 25 |
+
- π» Github Repo: https://github.com/CodeGoat24/UnifiedReward
|
| 26 |
- π€ Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-models-67c3008148c3a380d15ac63a
|
| 27 |
- π€ Dataset Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-training-data-67c300d4fd5eff00fa7f1ede
|
| 28 |
- π Point of Contact: [Yibin Wang](https://codegoat24.github.io)
|
|
|
|
| 82 |
conv_template = "qwen_1_5" # Make sure you use correct chat template for different models
|
| 83 |
|
| 84 |
# pairwise ranking
|
| 85 |
+
critic_prompt = "Given an image and a corresponding question, please serve as an unbiased and fair judge to evaluate the quality of the answers provided by a Large Multimodal Model (LMM). Determine which answer is better and explain your reasoning with specific details. Your task is provided as follows:
|
| 86 |
+
Question: [What this image presents?]
|
| 87 |
+
The first response: [The image is a black and white sketch of a line that appears to be in the shape of a cross. The line is a simple and straightforward representation of the cross shape, with two straight lines intersecting at a point.]
|
| 88 |
+
The second response: [This is a handwritten number seven.]
|
| 89 |
+
ASSISTANT:
|
| 90 |
+
"
|
| 91 |
|
| 92 |
# pointwise scoring
|
| 93 |
+
# critic_prompt = "Given an image and a corresponding question, please serve as an unbiased and fair judge to evaluate the quality of answer answers provided by a Large Multimodal Model (LMM). Score the response out of 100 and explain your reasoning with specific details. Your task is provided as follows:
|
| 94 |
+
Question: [What this image presents?]
|
| 95 |
+
The LMM response: [This is a handwritten number seven.]
|
| 96 |
+
ASSISTANT:
|
| 97 |
+
"
|
| 98 |
+
|
| 99 |
+
question = DEFAULT_IMAGE_TOKEN + "
|
| 100 |
+
" + critic_prompt
|
| 101 |
conv = copy.deepcopy(conv_templates[conv_template])
|
| 102 |
conv.append_message(conv.roles[0], question)
|
| 103 |
conv.append_message(conv.roles[1], None)
|