English only LLaVA model trained on MultiInstruct. Resource associated to the paper "Extending Large Language Models to Multimodality for non-English Languages".

For further information, refer to the GitHub repository: https://github.com/swapUniba/LVLMs-NonEnglish

Inference Example

import torch
import requests

from io import BytesIO
from PIL import Image
from transformers import AutoProcessor, LlavaForConditionalGeneration

model_path = 'swap-uniba/LLaVA-LLaMA-2-base-EN-FT'

processor = AutoProcessor.from_pretrained(
    model_path
)

model = LlavaForConditionalGeneration.from_pretrained(
    model_path, torch_dtype=torch.bfloat16, device_map="auto"
)

conv_str = f"[INST] How many cats are present in this image?\n<image>[/INST]"

# Tokenize the texts and process the images
batch = processor(images=[Image.open(BytesIO(requests.get("https://farm1.staticflickr.com/36/100071458_515d1884d1_z.jpg").content))], text=[conv_str], return_tensors="pt", padding=True, truncation=True, max_length=1024)
outs = model.generate(**batch, do_sample=False, num_beams=1, max_new_tokens=512)

print(processor.tokenizer.decode(outs[0, batch.input_ids.shape[-1]:], skip_special_tokens=True))       
Downloads last month
5
Safetensors
Model size
13B params
Tensor type
F16
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for swap-uniba/LLaVA-LLaMA-2-base-EN-FT

Finetuned
(63)
this model

Collection including swap-uniba/LLaVA-LLaMA-2-base-EN-FT