Proxecto Nós

non-profit

https://nos.gal/

proxectoNos

proxectonos

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

silviapasuarez updated a dataset 2 days ago

proxectonos/summarization_gl

silviapasuarez updated a dataset 2 days ago

proxectonos/xstorycloze_gl

silviapasuarez updated a dataset 2 days ago

proxectonos/galcola

View all activity

proxectonos 's collections 12

Domain Specific Corpora

Collection of corpora prepared from specific domains mainly in Galician language.

proxectonos/corpus_dominio_legal_administrativo

Preview • Updated 5 days ago • 36
proxectonos/corpus_dominio_periodistico

Viewer • Updated 4 days ago • 280k • 34
proxectonos/corpus_dominio_cientifico

Preview • Updated 4 days ago • 42
proxectonos/corpus_dominio_museistico_patrimonio

Viewer • Updated 2 days ago • 14.5k • 48

Text Datasets for Fine-tuning and Instruction tuning

Collection of datasets in Galician for fine-tuning, instruction tuning or training purposes.

proxectonos/SciELO-GL

Preview • Updated 20 days ago • 90 • 2
proxectonos/corpus_paralelo_idioms

Viewer • Updated Dec 19, 2025 • 13.2k • 35 • 1
proxectonos/DGT-GL

Viewer • Updated Dec 19, 2025 • 640k • 12
proxectonos/Finetuning-MT

Viewer • Updated Dec 17, 2025 • 199k • 7

proxectonos/Nos_MT-CT2-es-gl

Updated Jan 19 • 4
proxectonos/Nos_MT-CT2-en-gl

Updated Sep 12, 2025 • 1
proxectonos/Nos_MT-CT2-gl-en

Updated Sep 12, 2025 • 1
proxectonos/Nos_MT-CT2-gl-es

Updated Sep 12, 2025

TTS Models

TTS models trained using the CoquiTTS Python library.

proxectonos/Nos_TTS-celtia-vits-graphemes

Text-to-Speech • Updated Nov 13, 2025 • 7 • 1
proxectonos/Nos_TTS-brais-vits-graphemes

Text-to-Speech • Updated Nov 11, 2025 • 5
proxectonos/Nos_TTS-icia-vits-phonemes

Text-to-Speech • Updated 5 days ago
proxectonos/Nos_TTS-sabela-vits-phonemes

Text-to-Speech • Updated Nov 7, 2025 • 2

Instruction Pretrained Experiments

Experiments associated with the paper 'Continued Pretraining and Interpretability-Based Evaluation for Low-Resource Languages: A Galician Case Study'

proxectonos/Llama-3.1-Carballo-Instr3

Text Generation • 8B • Updated 4 days ago • 155
proxectonos/Llama-3.1-Carballo

Text Generation • 8B • Updated 4 days ago • 271 • • 4
proxectonos/Llama-3.1-Carballo-Instr1

Text Generation • 8B • Updated Dec 2, 2025 • 3

ASR Datasets

Datasets for training and evaluation of ASR models.

proxectonos/Nos_Transcrispeech-GL

Viewer • Updated May 6, 2025 • 13.5k • 17
proxectonos/Nos_RG-Podcast-GL

Viewer • Updated Mar 27, 2025 • 38.7k • 32 • 1
proxectonos/Nos_Parlaspeech-GL

Viewer • Updated May 14, 2025 • 798k • 16 • 2
proxectonos/Nos_Telexornais-GL

Viewer • Updated Dec 5, 2025 • 148k • 23

CorpusNÓS: A massive Galician corpus for training LLM

CorpusNÓS is the largest collection of data in Galician language for training LLM.

proxectonos/corpusnos

Viewer • Updated 5 days ago • 10.8M • 65

Text Datasets for Evaluation

Collection of datasets in Galician for LLM evaluation. It includes translations from already existing datasets as well as datasets created by us.

proxectonos/GlBBQ

Viewer • Updated 16 days ago • 27.3k • 62
proxectonos/aya_nos

Updated 21 days ago • 9 • 1
proxectonos/mgsm_gl

Viewer • Updated Dec 17, 2025 • 258 • 83
proxectonos/parafrases_gl

Updated 2 days ago • 62

Text Models

Open Generative Large Language Models for Galician

Paper • 2406.13893 • Published Jun 19, 2024
proxectonos/Carvalho-Salamandra-Instruct

Text Generation • 8B • Updated 4 days ago • 89
Nos-PT/Llama-Carvalho-PT-GL

Text Generation • 8B • Updated 4 days ago • 2.81k • 2
proxectonos/Llama-3.1-Carballo-Instr3

Text Generation • 8B • Updated 4 days ago • 155

ASR Models

Automatic Speech Recognition models

proxectonos/Nos_ASR-wav2vec2-large-xlsr-53-gl-with-lm

Automatic Speech Recognition • 0.3B • Updated Feb 12, 2025 • 12 • 3
proxectonos/Nos_ASR-wav2vec2-xls-r-300m-gl

Automatic Speech Recognition • 0.3B • Updated Feb 19 • 70

MT Models (former)

Older MT models trained with older libraries and datasets.

proxectonos/Nos_MT-OpenNMT-en-gl

Updated Jun 16, 2025 • 1
proxectonos/Nos_MT-OpenNMT-es-gl

Updated Apr 11, 2025 • 1
proxectonos/Nos_MT-OpenNMT-eu-gl

Updated Apr 11, 2025
proxectonos/Nos_MT-OpenNMT-ca-gl

Updated Apr 11, 2025

TTS Datasets

Datasets for training and evaluation of TTS models.

proxectonos/CRPIH_UVigo-GL-Voices_extended

Preview • Updated Dec 9, 2025 • 258 • 1
proxectonos/Nos_Celtia-GL

Preview • Updated Nov 14, 2025 • 286 • 1
proxectonos/Nos_Brais-GL

Preview • Updated Nov 14, 2025 • 247 • 1

Domain Specific Corpora

Collection of corpora prepared from specific domains mainly in Galician language.

proxectonos/corpus_dominio_legal_administrativo

Preview • Updated 5 days ago • 36
proxectonos/corpus_dominio_periodistico

Viewer • Updated 4 days ago • 280k • 34
proxectonos/corpus_dominio_cientifico

Preview • Updated 4 days ago • 42
proxectonos/corpus_dominio_museistico_patrimonio

Viewer • Updated 2 days ago • 14.5k • 48

CorpusNÓS: A massive Galician corpus for training LLM

CorpusNÓS is the largest collection of data in Galician language for training LLM.

proxectonos/corpusnos

Viewer • Updated 5 days ago • 10.8M • 65

Text Datasets for Fine-tuning and Instruction tuning

Collection of datasets in Galician for fine-tuning, instruction tuning or training purposes.

proxectonos/SciELO-GL

Preview • Updated 20 days ago • 90 • 2
proxectonos/corpus_paralelo_idioms

Viewer • Updated Dec 19, 2025 • 13.2k • 35 • 1
proxectonos/DGT-GL

Viewer • Updated Dec 19, 2025 • 640k • 12
proxectonos/Finetuning-MT

Viewer • Updated Dec 17, 2025 • 199k • 7

Text Datasets for Evaluation

Collection of datasets in Galician for LLM evaluation. It includes translations from already existing datasets as well as datasets created by us.

proxectonos/GlBBQ

Viewer • Updated 16 days ago • 27.3k • 62
proxectonos/aya_nos

Updated 21 days ago • 9 • 1
proxectonos/mgsm_gl

Viewer • Updated Dec 17, 2025 • 258 • 83
proxectonos/parafrases_gl

Updated 2 days ago • 62

proxectonos/Nos_MT-CT2-es-gl

Updated Jan 19 • 4
proxectonos/Nos_MT-CT2-en-gl

Updated Sep 12, 2025 • 1
proxectonos/Nos_MT-CT2-gl-en

Updated Sep 12, 2025 • 1
proxectonos/Nos_MT-CT2-gl-es

Updated Sep 12, 2025

Text Models

Open Generative Large Language Models for Galician

Paper • 2406.13893 • Published Jun 19, 2024
proxectonos/Carvalho-Salamandra-Instruct

Text Generation • 8B • Updated 4 days ago • 89
Nos-PT/Llama-Carvalho-PT-GL

Text Generation • 8B • Updated 4 days ago • 2.81k • 2
proxectonos/Llama-3.1-Carballo-Instr3

Text Generation • 8B • Updated 4 days ago • 155

TTS Models

TTS models trained using the CoquiTTS Python library.

proxectonos/Nos_TTS-celtia-vits-graphemes

Text-to-Speech • Updated Nov 13, 2025 • 7 • 1
proxectonos/Nos_TTS-brais-vits-graphemes

Text-to-Speech • Updated Nov 11, 2025 • 5
proxectonos/Nos_TTS-icia-vits-phonemes

Text-to-Speech • Updated 5 days ago
proxectonos/Nos_TTS-sabela-vits-phonemes

Text-to-Speech • Updated Nov 7, 2025 • 2

ASR Models

Automatic Speech Recognition models

proxectonos/Nos_ASR-wav2vec2-large-xlsr-53-gl-with-lm

Automatic Speech Recognition • 0.3B • Updated Feb 12, 2025 • 12 • 3
proxectonos/Nos_ASR-wav2vec2-xls-r-300m-gl

Automatic Speech Recognition • 0.3B • Updated Feb 19 • 70

Instruction Pretrained Experiments

Experiments associated with the paper 'Continued Pretraining and Interpretability-Based Evaluation for Low-Resource Languages: A Galician Case Study'

proxectonos/Llama-3.1-Carballo-Instr3

Text Generation • 8B • Updated 4 days ago • 155
proxectonos/Llama-3.1-Carballo

Text Generation • 8B • Updated 4 days ago • 271 • • 4
proxectonos/Llama-3.1-Carballo-Instr1

Text Generation • 8B • Updated Dec 2, 2025 • 3

MT Models (former)

Older MT models trained with older libraries and datasets.

proxectonos/Nos_MT-OpenNMT-en-gl

Updated Jun 16, 2025 • 1
proxectonos/Nos_MT-OpenNMT-es-gl

Updated Apr 11, 2025 • 1
proxectonos/Nos_MT-OpenNMT-eu-gl

Updated Apr 11, 2025
proxectonos/Nos_MT-OpenNMT-ca-gl

Updated Apr 11, 2025

ASR Datasets

Datasets for training and evaluation of ASR models.

proxectonos/Nos_Transcrispeech-GL

Viewer • Updated May 6, 2025 • 13.5k • 17
proxectonos/Nos_RG-Podcast-GL

Viewer • Updated Mar 27, 2025 • 38.7k • 32 • 1
proxectonos/Nos_Parlaspeech-GL

Viewer • Updated May 14, 2025 • 798k • 16 • 2
proxectonos/Nos_Telexornais-GL

Viewer • Updated Dec 5, 2025 • 148k • 23

TTS Datasets

Datasets for training and evaluation of TTS models.

proxectonos/CRPIH_UVigo-GL-Voices_extended

Preview • Updated Dec 9, 2025 • 258 • 1
proxectonos/Nos_Celtia-GL

Preview • Updated Nov 14, 2025 • 286 • 1
proxectonos/Nos_Brais-GL

Preview • Updated Nov 14, 2025 • 247 • 1

AI & ML interests

Recent Activity

Team members 31

proxectonos 's collections 12