nielsr HF Staff commited on
Commit
2fac493
Β·
verified Β·
1 Parent(s): 8dcf8cf

Add pipeline tag, library name and link to code

Browse files

This PR adds the `pipeline_tag` (image-text-to-text) and `library_name: transformers` to the model card, such that the model is discoverable at https://huggingface.co/models?pipeline_tag=image-text-to-text. It also adds a link to the Github repository,
making the code easier to find.

Files changed (1) hide show
  1. README.md +21 -8
README.md CHANGED
@@ -1,5 +1,6 @@
1
  ---
2
- license: mit
 
3
  datasets:
4
  - CodeGoat24/HPD
5
  - CodeGoat24/LiFT-HRA
@@ -9,8 +10,9 @@ datasets:
9
  - CodeGoat24/VideoFeedback
10
  - CodeGoat24/LLaVA-Critic-113k
11
  - CodeGoat24/VideoDPO
12
- base_model:
13
- - lmms-lab/llava-onevision-qwen2-0.5b-ov
 
14
  ---
15
 
16
  ## Model Summary
@@ -19,7 +21,8 @@ base_model:
19
 
20
  For further details, please refer to the following resources:
21
  - πŸ“° Paper: https://arxiv.org/pdf/2503.05236
22
- - πŸͺ Project Page: https://codegoat24.github.io/UnifiedReward/
 
23
  - πŸ€— Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-models-67c3008148c3a380d15ac63a
24
  - πŸ€— Dataset Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-training-data-67c300d4fd5eff00fa7f1ede
25
  - πŸ‘‹ Point of Contact: [Yibin Wang](https://codegoat24.github.io)
@@ -79,12 +82,22 @@ image_tensor = [_image.to(dtype=torch.float16, device=device) for _image in imag
79
  conv_template = "qwen_1_5" # Make sure you use correct chat template for different models
80
 
81
  # pairwise ranking
82
- critic_prompt = "Given an image and a corresponding question, please serve as an unbiased and fair judge to evaluate the quality of the answers provided by a Large Multimodal Model (LMM). Determine which answer is better and explain your reasoning with specific details. Your task is provided as follows:\nQuestion: [What this image presents?]\nThe first response: [The image is a black and white sketch of a line that appears to be in the shape of a cross. The line is a simple and straightforward representation of the cross shape, with two straight lines intersecting at a point.]\nThe second response: [This is a handwritten number seven.]\nASSISTANT:\n"
 
 
 
 
 
83
 
84
  # pointwise scoring
85
- # critic_prompt = "Given an image and a corresponding question, please serve as an unbiased and fair judge to evaluate the quality of answer answers provided by a Large Multimodal Model (LMM). Score the response out of 100 and explain your reasoning with specific details. Your task is provided as follows:\nQuestion: [What this image presents?]\nThe LMM response: [This is a handwritten number seven.]\nASSISTANT:\n "
86
-
87
- question = DEFAULT_IMAGE_TOKEN + "\n" + critic_prompt
 
 
 
 
 
88
  conv = copy.deepcopy(conv_templates[conv_template])
89
  conv.append_message(conv.roles[0], question)
90
  conv.append_message(conv.roles[1], None)
 
1
  ---
2
+ base_model:
3
+ - lmms-lab/llava-onevision-qwen2-0.5b-ov
4
  datasets:
5
  - CodeGoat24/HPD
6
  - CodeGoat24/LiFT-HRA
 
10
  - CodeGoat24/VideoFeedback
11
  - CodeGoat24/LLaVA-Critic-113k
12
  - CodeGoat24/VideoDPO
13
+ license: mit
14
+ library_name: transformers
15
+ pipeline_tag: image-text-to-text
16
  ---
17
 
18
  ## Model Summary
 
21
 
22
  For further details, please refer to the following resources:
23
  - πŸ“° Paper: https://arxiv.org/pdf/2503.05236
24
+ - πŸͺ Project Page: https://codegoat24.github.io/UnifiedReward/think
25
+ - πŸ’» Github Repo: https://github.com/CodeGoat24/UnifiedReward
26
  - πŸ€— Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-models-67c3008148c3a380d15ac63a
27
  - πŸ€— Dataset Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-training-data-67c300d4fd5eff00fa7f1ede
28
  - πŸ‘‹ Point of Contact: [Yibin Wang](https://codegoat24.github.io)
 
82
  conv_template = "qwen_1_5" # Make sure you use correct chat template for different models
83
 
84
  # pairwise ranking
85
+ critic_prompt = "Given an image and a corresponding question, please serve as an unbiased and fair judge to evaluate the quality of the answers provided by a Large Multimodal Model (LMM). Determine which answer is better and explain your reasoning with specific details. Your task is provided as follows:
86
+ Question: [What this image presents?]
87
+ The first response: [The image is a black and white sketch of a line that appears to be in the shape of a cross. The line is a simple and straightforward representation of the cross shape, with two straight lines intersecting at a point.]
88
+ The second response: [This is a handwritten number seven.]
89
+ ASSISTANT:
90
+ "
91
 
92
  # pointwise scoring
93
+ # critic_prompt = "Given an image and a corresponding question, please serve as an unbiased and fair judge to evaluate the quality of answer answers provided by a Large Multimodal Model (LMM). Score the response out of 100 and explain your reasoning with specific details. Your task is provided as follows:
94
+ Question: [What this image presents?]
95
+ The LMM response: [This is a handwritten number seven.]
96
+ ASSISTANT:
97
+ "
98
+
99
+ question = DEFAULT_IMAGE_TOKEN + "
100
+ " + critic_prompt
101
  conv = copy.deepcopy(conv_templates[conv_template])
102
  conv.append_message(conv.roles[0], question)
103
  conv.append_message(conv.roles[1], None)