Translation
Safetensors
French
Latin
t5

ByT5-Small for Normalization

This models allows for normalization of ATR output using CATMuS guidelines, for both Latin and Old French. It fixes spacing, it has tendencies to overnormalize and add punctuation.

from transformers import pipeline
import unicodedata

pipe = pipeline(
    task="text2text-generation",  # change if needed
    model="comma-project/normalization-byt5-small",                  # local directory
    tokenizer="comma-project/normalization-byt5-small"
)
pipe(unicodedata.normalize("NFD", "Scͥbo uobiᷤᷤ ñ pauli ł donati. "))
# [{'generated_text': 'scribo uobis, non Pauli uel Donati''}]
Downloads last month
-
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for comma-project/normalization-byt5-small

Base model

google/byt5-small
Finetuned
(200)
this model

Dataset used to train comma-project/normalization-byt5-small