license: cc-by-nc-4.0
language:
- sk
base_model:
- Qwen/Qwen3-0.6B
pipeline_tag: text-generation
datasets:
- kinit/synthetic-queries-and-ml-instructions
extra_gated_prompt: |
Access Request Terms:
By requesting access to the SensAI model, you confirm that you:
- will use the materials solely for research and non-commercial purposes;
- will cite the SensAI project and respect the [CC-BY-NC-4.0 License](https://creativecommons.org/licenses/by-nc/4.0/);
- will not attempt to extract, infer, or reconstruct data from the model or dataset;
- will ensure that your downstream use complies with applicable laws, regulations, and ethical AI principles.
Qwen3-0.6B QLoRA Adapter
This repository contains a QLoRA adpater for Qwen3-0.6B.
Model was fine-tuned for instruction-following task ("extract metadata") using synthetic dataset called synthetic-queries-and-ml-instructions.
Description
- Base model: Qwen3-0.6B
- Adapter: QLoRA
- Task: extract metadata
- Quantization: 4-bit
- GPU: NVIDIA RTX 3090
- Dataset: synthetic-queries-and-ml-instructions.
LoRA Configuration
| Parameter | Value |
|---|---|
r |
32 |
lora_alpha |
16 |
lora_dropout |
0.05 |
bias |
none |
task_type |
CAUSAL_LM |
target_modules |
q_proj, k_proj, v_proj, o_proj |
Training Parameters
| Parameter | Value |
|---|---|
| Max sequence length | 4096 |
| Epochs | 2 |
| Learning rate | 7e-4 |
| Train batch size per device | 4 |
| Eval batch size per device | 8 |
| Gradient accumulation steps | 4 |
| Optimizer | paged_adamw_8bit |
| Scheduler | cosine |
| Warmup ratio | 0.03 |
| FP16 | True |
| Save & Eval steps | 100 |
| Early stopping patience | 3 |
Evaluation Results
We evaluated the Qwen3-0.6B model and our fine-tuned model on the test part (1.5K rows) of the dataset mentioned in the description.
| Metric | Qwen3-0.6B | Qwen3-0.6B + QLoRA (Fine-tuned) |
|---|---|---|
| Invalidly parsed (%) | 47.8 | 0.27 |
| Complete accuracy (%) | 0.47 | 80.6 |
| Missing attributes (%) | 41.73 | 7.93 |
| Extra attributes (%) | 32.27 | 6.53 |
| Incorrect attributes (%) | 41.4 | 5.4 |
- Invalidly parsed: The percentage of examples where the model output had invalid/missing JSON format
- Complete accuracy: The percentage of examples where all attributes in the output matched the ground truth attributes
- Missing attributes: The percentage of examples where the model output is missing at least one attribute that is present in the ground truth example
- Extra attributes: The percentage of examples where the model output contains attributes which are not present in ground truth example
- Incorrect attributes: The percentage of examples where the model output has incorrect attributes compared to ground truth example.
The percentages for missing, extra and incorrect attributes may exceed 100% in total, since a single example can fall into multiple categories simultaneously. For instance, a model output could omit a required attribute (missing) while also adding an irrelevant one (extra).
How to use
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch
base_model_name = "Qwen/Qwen3-0.6B"
ft_model_name = "kinit/qwen3-0.6B-extract-ml-instructions"
tokenizer = AutoTokenizer.from_pretrained(ft_model_name)
# Load base model with specific quantization set up first
quant_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_name,
quantization_config=quant_config,
device_map="auto"
)
# Load LoRA adapter weights into the base model
model = PeftModel.from_pretrained(base_model, ft_model_name).eval()
# Preprocess input
prompt = "Chcem realizovať klasifikáciu ŠPZ čísel áut pomocou CNN architektúry za pomoci datasetu 'LicencePlates_ImageDataset'."
messages = [
{"role": "user", "content": f"User query: {prompt}"}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# Perform inference
generated_ids = model.generate(
**model_inputs,
max_new_tokens=1_024,
temperature=0.25
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
# Model response containing a tool call that needs to be parsed next
# Tool call arguments represent the extracted metadata
model_response = tokenizer.decode(output_ids, skip_special_tokens=True)
print(model_response)
Additional Information
This work was supported by the Výskumná Agentúra grant within the project SensAI - Morálna citlivosť a ľudské práva pre spracovanie jazykov s obmedzenými zdrojmi (Grant No. 09I01-03-V04-00100/2025/VA).
License & Attribution
This model was created within the SensAI project and is released under the CC-BY-NC-4.0 License. It is derivative of Qwen3-0.6B model with license: Apache license 2.0
Access Request Terms
Access to this repository is restricted. Please review and agree to the following terms before requesting access.