MiniLingua-1b-Instruct

MiniLingua-1b-Instruct is an instruction-tuned multilingual model based on the MiniLingua-1b base model. It supports a diverse set of European languages and programming code, making it suitable for instruction-following, multilingual generation, and downstream tasks like question answering, summarisation etc.

Supported Languages

  • Bulgarian
  • Czech
  • Dutch
  • English
  • Finnish
  • French
  • German
  • Greek
  • Italian
  • Polish
  • Portuguese
  • Spanish
  • Swedish
  • Programming code

Instruction Tuning

This preview instruction-tuned version of MiniLingua-1b was trained over 1 epoch on 1.2 million instructions from the following high-quality datasets:

The supervised fine-tuning (SFT) was performed on the Triton Aalto cluster using 4 H200 GPUs.

Intended Use

This model is a preview release intended for:

  • Multilingual instruction following
  • Evaluation and benchmarking
  • Research in low- and high-resource European languages

Use with transformers

Quick start with Transformers both for GPU and CPU enabled envs:

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

model_name = "minilingua-ai/MiniLingua-1b-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", dtype=torch.float16)
gen = pipeline("text-generation", model=model, tokenizer=tokenizer, trust_remote_code=True)

prompt = "Translate from Bulgarian: Здравейте! Как сте? Translation:"
out = gen(prompt, max_new_tokens=128, do_sample=False)
print(out[0])

Limitations

  • This version is a first-stage SFT release; alignment steps is not applied.
  • Some languages may show uneven instruction-following ability depending on resource availability and instruction diversity.

License: Apache-2.0

Downloads last month
98
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support