ByT5-Small for Normalization
This models allows for normalization of ATR output using CATMuS guidelines, for both Latin and Old French. It fixes spacing, it has tendencies to overnormalize and add punctuation.
from transformers import pipeline
import unicodedata
pipe = pipeline(
task="text2text-generation", # change if needed
model="comma-project/normalization-byt5-small", # local directory
tokenizer="comma-project/normalization-byt5-small"
)
pipe(unicodedata.normalize("NFD", "Scͥbo uobiᷤᷤ ñ pauli ł donati. "))
# [{'generated_text': 'scribo uobis, non Pauli uel Donati''}]
- Downloads last month
- -
Model tree for comma-project/normalization-byt5-small
Base model
google/byt5-small