Collection of corpora prepared from specific domains mainly in Galician language.
AI & ML interests
None defined yet.
Recent Activity
Collection of datasets in Galician for fine-tuning, instruction tuning or training purposes.
TTS models trained using the CoquiTTS Python library.
Experiments associated with the paper 'Continued Pretraining and Interpretability-Based Evaluation for Low-Resource Languages: A Galician Case Study'
Datasets for training and evaluation of ASR models.
CorpusNÓS is the largest collection of data in Galician language for training LLM.
Collection of datasets in Galician for LLM evaluation. It includes translations from already existing datasets as well as datasets created by us.
-
Open Generative Large Language Models for Galician
Paper • 2406.13893 • Published -
proxectonos/Carvalho-Salamandra-Instruct
Text Generation • 8B • Updated • 89 -
Nos-PT/Llama-Carvalho-PT-GL
Text Generation • 8B • Updated • 2.81k • 2 -
proxectonos/Llama-3.1-Carballo-Instr3
Text Generation • 8B • Updated • 155
Automatic Speech Recognition models
Older MT models trained with older libraries and datasets.
Datasets for training and evaluation of TTS models.
Collection of corpora prepared from specific domains mainly in Galician language.
CorpusNÓS is the largest collection of data in Galician language for training LLM.
Collection of datasets in Galician for fine-tuning, instruction tuning or training purposes.
Collection of datasets in Galician for LLM evaluation. It includes translations from already existing datasets as well as datasets created by us.
-
Open Generative Large Language Models for Galician
Paper • 2406.13893 • Published -
proxectonos/Carvalho-Salamandra-Instruct
Text Generation • 8B • Updated • 89 -
Nos-PT/Llama-Carvalho-PT-GL
Text Generation • 8B • Updated • 2.81k • 2 -
proxectonos/Llama-3.1-Carballo-Instr3
Text Generation • 8B • Updated • 155
TTS models trained using the CoquiTTS Python library.
Automatic Speech Recognition models
Experiments associated with the paper 'Continued Pretraining and Interpretability-Based Evaluation for Low-Resource Languages: A Galician Case Study'
Older MT models trained with older libraries and datasets.
Datasets for training and evaluation of ASR models.
Datasets for training and evaluation of TTS models.