HiTZ zentroa

non-profit

https://www.hitz.eus/

hitz_zentroa

hitz-zentroa

Activity Feed Request to join this org

AI & ML interests

Natural Language Processing, Signal Processing

Recent Activity

enekovalero updated a collection about 2 hours ago

Merge and Conquer

enekovalero updated a collection about 2 hours ago

Merge and Conquer

enekovalero updated a dataset about 2 hours ago

HiTZ/ifeval_gl

View all activity

Papers

MEG-to-MEG Transfer Learning and Cross-Task Speech/Silence Detection with Limited Data

MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification

View all Papers

HiTZ 's collections 42

Merge and Conquer

Papers and resources published under the ALIA project.

HiTZ/gl_Qwen3-8B-Base

Text Generation • 8B • Updated Dec 20, 2025 • 4
HiTZ/gl_Llama-3.1-8B

Text Generation • 8B • Updated Dec 20, 2025 • 3
HiTZ/eu_Qwen3-14B-Base

Text Generation • Updated Dec 20, 2025 • 1 • 1
HiTZ/es_Qwen3-14B-Base

Text Generation • 15B • Updated Dec 20, 2025 • 3

TTS

HiTZ/TTS-gl_brais

Text-to-Speech • Updated Dec 16, 2025 • 6
HiTZ/TTS-gl_sabela

Text-to-Speech • Updated Dec 16, 2025
HiTZ/TTS-eu_antton

Text-to-Speech • Updated Dec 16, 2025 • 3
HiTZ/TTS-eu_maider

Text-to-Speech • Updated Dec 16, 2025 • 4

Multimodal Latxa

Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque

Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque

Paper • 2511.09396 • Published Nov 12, 2025
HiTZ/Latxa-Llama-3.1-VL-8B-Instruct

Image-Text-to-Text • 8B • Updated 13 days ago • 60
HiTZ/Llama-Latxa-3.1-VL-8B-Instruct

Image-Text-to-Text • 8B • Updated 14 days ago • 23
HiTZ/pixmo-ask-model-anything_eu

Viewer • Updated 14 days ago • 146k • 18

ASR Datasets

Collection with datasets for training and benchmark-evaluating ASR in Basque, Spanish and Bilingual Basque-Spanish

HiTZ/composite_corpus_eseu_v1.0

Viewer • Updated May 12, 2025 • 742k • 611 • 2
HiTZ/composite_corpus_eu_v2.1

Viewer • Updated Dec 19, 2024 • 407k • 128 • 2
HiTZ/composite_corpus_es_v1.0

Viewer • Updated May 12, 2025 • 526k • 374
HiTZ/benchmark_eseu_testsets

Updated Apr 19, 2025 • 94

Nvidia NeMo

Nvidia NeMo STT models

HiTZ/stt_eu_conformer_transducer_large_v2

Automatic Speech Recognition • Updated Feb 11 • 5 • 1
HiTZ/stt_eu_conformer_transducer_large

Automatic Speech Recognition • Updated Nov 28, 2025 • 35 • 2
HiTZ/stt_eu_conformer_ctc_large

Automatic Speech Recognition • Updated Nov 28, 2025 • 41 • 2
HiTZ/stt_eseu_conformer_transducer_large

Automatic Speech Recognition • Updated Nov 28, 2025 • 17

Whisper

Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages

Paper • 2503.23542 • Published Mar 30, 2025 • 9
HiTZ/whisper-lm-ngrams

Automatic Speech Recognition • Updated Apr 4, 2025
HiTZ/whisper-tiny-eu

Updated Dec 16, 2025 • 25
HiTZ/whisper-small-eu

Updated Dec 16, 2025 • 23

Multilingual TruthfulQA

Truth Knows No Language: Evaluating Truthfulness Beyond English

Truth Knows No Language: Evaluating Truthfulness Beyond English

Paper • 2502.09387 • Published Feb 13, 2025 • 1
HiTZ/truthfulqa-multi

Viewer • Updated May 21, 2025 • 4.12k • 336 • 2
HiTZ/truthfulqa-multi-MT

Viewer • Updated May 22, 2025 • 4.12k • 9
HiTZ/truthful_judge

Viewer • Updated May 22, 2025 • 135k • 17

Ask2Transformers

Ask2Transformers models

Ask2Transformers: Zero-Shot Domain labelling with Pre-trained Language Models

Paper • 2101.02661 • Published Jan 7, 2021
Label Verbalization and Entailment for Effective Zero- and Few-Shot Relation Extraction

Paper • 2109.03659 • Published Sep 8, 2021
Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source Learning

Paper • 2205.01376 • Published May 3, 2022
ZS4IE: A toolkit for Zero-Shot Information Extraction with simple Verbalizations

Paper • 2203.13602 • Published Mar 25, 2022 • 1

MATE

Vision-Language Models Struggle to Align Entities across Modalities

Vision-Language Models Struggle to Align Entities across Modalities

Paper • 2503.03854 • Published Mar 5, 2025 • 1
HiTZ/MATE

Viewer • Updated May 29, 2025 • 11k • 84

BERnaT

Basque Encoders for Representing Natural Textual Diversity

HiTZ/BERnaT-base

Fill-Mask • 0.1B • Updated Jan 16 • 361 • 1
HiTZ/BERnaT-medium

Fill-Mask • 51.4M • Updated Jan 16 • 10 • 1
HiTZ/BERnaT-large

Fill-Mask • 0.4B • Updated Jan 16 • 7 • 1
HiTZ/BERnaT-base-NERC

Token Classification • 0.1B • Updated Mar 16, 2025

Alpaca LoRA MT

Alpaca LoRA MT models and dataset

HiTZ/alpaca-lora-7b-en-pt-es-ca-eu-gl-at

Updated Mar 24, 2023 • 1
HiTZ/alpaca-lora-13b-en-pt-es-ca-eu-gl-at

Updated Mar 25, 2023
HiTZ/alpaca-lora-30b-en-pt-es-ca-eu-gl-at

Updated Mar 25, 2023
HiTZ/alpaca-lora-65b-en-pt-es-ca

Updated Apr 2, 2023 • 2

Pretraining Datasets

Basque Pretraining Datasets

HiTZ/latxa-corpus-v1.1

Viewer • Updated about 1 month ago • 4.13M • 139 • 1
HiTZ/euscrawl

Updated Feb 14, 2023 • 91 • 4
orai-nlp/ZelaiHandi

Viewer • Updated May 19, 2025 • 2.25M • 87 • 9

Instruction Datasets

Basque Instruction Datasets

HiTZ/alpaca_mt

Updated Apr 7, 2023 • 66 • 9
OpenAssistant/oasst1

Viewer • Updated May 2, 2023 • 88.8k • 10.8k • 1.49k
CohereLabs/aya_dataset

Viewer • Updated Apr 15, 2025 • 206k • 3.39k • 343
CohereLabs/aya_collection

Viewer • Updated Apr 15, 2025 • 514M • 4.01k • 232

OPT RM

OPT reward models

Training Language Models with Language Feedback at Scale

Paper • 2303.16755 • Published Mar 28, 2023 • 1
HiTZ/lmloss-opt-rm-1.3b

Text Generation • Updated Apr 7, 2023 • 3
HiTZ/rmloss-opt-rm-13b

Text Generation • Updated Apr 7, 2023 • 1

Medical-mT5

An open-source text-to-text multilingual model for the medical domain.

HiTZ/Medical-mT5-large

Text Generation • 1B • Updated Apr 12, 2024 • 1.53k • 23
HiTZ/Medical-mT5-xl

Text Generation • Updated Apr 12, 2024 • 37 • 4
HiTZ/Medical-mT5-large-multitask

Text Generation • 1B • Updated May 6, 2024 • 4
HiTZ/Medical-mT5-xl-multitask

Text Generation • 4B • Updated Apr 12, 2024 • 10 • 2

BasqueParl

A Bilingual Corpus of Basque Parliamentary Transcriptions

BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions

Paper • 2205.01506 • Published May 3, 2022
HiTZ/basqueparl

Viewer • Updated Mar 8, 2024 • 343k • 15 • 1

Speech to Text

Basque Speech to Text models

Running

5

Demo Basque ASR

🎤

5

Transcribe speech from an audio file
HiTZ/stt_eu_conformer_ctc_large

Automatic Speech Recognition • Updated Nov 28, 2025 • 41 • 2
HiTZ/stt_eu_conformer_transducer_large

Automatic Speech Recognition • Updated Nov 28, 2025 • 35 • 2
Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages

Paper • 2503.23542 • Published Mar 30, 2025 • 9

EriBERTa

HiTZ/EriBERTa-base

Fill-Mask • 0.1B • Updated Jul 1, 2025 • 79 • 3
HiTZ/Multilingual-Medical-Corpus

Viewer • Updated Apr 12, 2024 • 67.4M • 483 • 43

IXAmBERT

Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque

ixa-ehu/ixambert-base-cased

Updated Jan 7, 2023 • 12 • 3
ixa-hitz/elkarhizketak

Updated Jan 18, 2024 • 21 • 1

Machine Translation

HiTZ/mt-hitz-en-eu

Updated Jun 17, 2024 • 4 • 3
HiTZ/mt-hitz-es-eu

Updated Jun 17, 2024 • 87
HiTZ/mt-hitz-eu-en

Updated Jun 25, 2024
HiTZ/mt-hitz-gl-eu

Updated Jun 17, 2024

Odesia Challenge 2024

IXA Submission for the 2024 ODESIA Challenge

HiTZ/Qwen2.5-14B-Instruct_ODESIA

Text Generation • 15B • Updated Feb 4, 2025 • 1
HiTZ/Hermes-3-Llama-3.1-8B_ODESIA

Text Generation • 8B • Updated Sep 18, 2024 • 2
HiTZ/gemma-2b-it_ODESIA

Text Generation • 3B • Updated Sep 20, 2024 • 3

Latxa Instruct

Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque

Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque

Paper • 2506.07597 • Published Jun 9, 2025
HiTZ/Latxa-Llama-3.1-8B-Instruct

Text Generation • 8B • Updated Dec 15, 2025 • 2.12k • • 11
HiTZ/Latxa-Llama-3.1-70B-Instruct

Text Generation • 71B • Updated Jun 12, 2025 • 311 • 6
HiTZ/Latxa-Llama-3.1-70B-Instruct-FP8

Text Generation • 71B • Updated Jun 12, 2025 • 18 • 1

Latxa VL

Multilingual multimodal instruct models

HiTZ/Latxa-Qwen3-VL-2B-Instruct

Image-Text-to-Text • 2B • Updated Dec 15, 2025 • 608
HiTZ/Latxa-Qwen3-VL-4B-Instruct

Image-Text-to-Text • 4B • Updated Dec 15, 2025 • 209 • 3
HiTZ/Latxa-Qwen3-VL-8B-Instruct

Image-Text-to-Text • 770k • Updated 21 days ago • 232 • 2
HiTZ/Latxa-Qwen3-VL-32B-Instruct

Image-Text-to-Text • 1.14M • Updated 21 days ago • 164 • 2

Cap&Punct

MarianMT based models for translation tasks

HiTZ/cap-punct-eu

Translation • 76.9M • Updated Jan 13 • 19
HiTZ/cap-punct-es

Translation • 76.9M • Updated Jan 13 • 55

Pyannote

Diarization models for VAD and Speaker Recognition

HiTZ/pyannote-segmentation-3.0-RTVE

Automatic Speech Recognition • Updated Nov 13, 2025 • 3

Speech Collection

Collection with STT models, Diarization models and datasets for training ASR in Spanish, Basque and Bilingual

Nvidia NeMo

Collection

Nvidia NeMo STT models • 5 items • Updated 7 days ago
Whisper

Collection

30 items • Updated 14 days ago
Pyannote

Collection

Diarization models for VAD and Speaker Recognition • 1 item • Updated 14 days ago
ASR Datasets

Collection

Collection with datasets for training and benchmark-evaluating ASR in Basque, Spanish and Bilingual Basque-Spanish • 4 items • Updated 14 days ago

Latxa

Latxa: An Open Language Model and Evaluation Suite for Basque

Latxa: An Open Language Model and Evaluation Suite for Basque

Paper • 2403.20266 • Published Mar 29, 2024 • 4
HiTZ/latxa-7b-v1.2

Text Generation • Updated Jul 2, 2024 • 35 • 6
HiTZ/latxa-13b-v1.2

Text Generation • Updated Jul 2, 2024 • 4 • 2
HiTZ/latxa-70b-v1.2

Text Generation • Updated Jul 3, 2024 • 135

GoLLIE

We present GoLLIE, a Large Language Model trained to follow annotation guidelines that outperforms previous approaches on zero-shot IE.

GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction

Paper • 2310.03668 • Published Oct 5, 2023 • 1
HiTZ/GoLLIE-7B

Text Generation • Updated Oct 10, 2023 • 1.14k • 29
HiTZ/GoLLIE-13B

Text Generation • Updated Oct 20, 2023 • 30 • 7
HiTZ/GoLLIE-34B

Text Generation • Updated Oct 20, 2023 • 195 • 38

Metaphor Processing

Datasets and models for metaphor detection and interpretation via NLI in Spanish and English

Leveraging a New Spanish Corpus for Multilingual and Crosslingual Metaphor Detection

Paper • 2210.10358 • Published Oct 19, 2022
HiTZ/cometa

Viewer • Updated Apr 15, 2024 • 3.63k • 75
HiTZ/xlm-roberta-large-metaphor-detection-es

Token Classification • Updated Feb 26, 2024
HiTZ/mdeberta-base-metaphor-detection-es

Token Classification • Updated Feb 26, 2024 • 1

EusCrawl

Does Corpus Quality Really Matter for Low-Resource Languages?

Does Corpus Quality Really Matter for Low-Resource Languages?

Paper • 2203.08111 • Published Mar 15, 2022
HiTZ/euscrawl

Updated Feb 14, 2023 • 91 • 4
ixa-ehu/roberta-eus-euscrawl-large-cased

Fill-Mask • 0.4B • Updated Sep 11, 2023 • 29 • 3
ixa-ehu/roberta-eus-euscrawl-base-cased

Fill-Mask • Updated Mar 16, 2022 • 226 • 2

Basque Language Proficiency

HiTZ/EusProficiency

Viewer • Updated Apr 1, 2024 • 5.17k • 523 • 2
HiTZ/EusReading

Viewer • Updated Apr 1, 2024 • 352 • 660 • 2
orai-nlp/bl2mp

Viewer • Updated May 19, 2025 • 1.8k • 18

Lemmatization

On the Role of Morphological Information for Contextual Lemmatization

On the Role of Morphological Information for Contextual Lemmatization

Paper • 2302.00407 • Published Feb 1, 2023
HiTZ/xlm-roberta-large-lemma-eu

Token Classification • Updated Jun 24, 2024 • 2
HiTZ/xlm-roberta-large-lemma-en

Token Classification • Updated Jun 24, 2024 • 1
HiTZ/xlm-roberta-large-lemma-tr

Token Classification • Updated Jun 24, 2024

Evaluation Datasets

Basque Evaluation Datasets

HiTZ/This-is-not-a-dataset

Viewer • Updated Feb 23, 2024 • 381k • 171 • 6
HiTZ/EusProficiency

Viewer • Updated Apr 1, 2024 • 5.17k • 523 • 2
HiTZ/EusReading

Viewer • Updated Apr 1, 2024 • 352 • 660 • 2
HiTZ/EusTrivia

Viewer • Updated Apr 1, 2024 • 1.72k • 587 • 1

Basque Encoders

Basque Encoder Language Models

ixa-ehu/roberta-eus-euscrawl-large-cased

Fill-Mask • 0.4B • Updated Sep 11, 2023 • 29 • 3
ixa-ehu/roberta-eus-euscrawl-base-cased

Fill-Mask • Updated Mar 16, 2022 • 226 • 2
ixa-ehu/roberta-eus-cc100-base-cased

Fill-Mask • 0.2B • Updated Sep 11, 2023 • 9 • 1
ixa-ehu/roberta-eus-mc4-base-cased

Fill-Mask • Updated Mar 16, 2022 • 7 • 1

Composite Corpus

HiTZ/composite_corpus_eseu_v1.0

Viewer • Updated May 12, 2025 • 742k • 611 • 2
HiTZ/composite_corpus_eu_v2.1

Viewer • Updated Dec 19, 2024 • 407k • 128 • 2
HiTZ/composite_corpus_es_v1.0

Viewer • Updated May 12, 2025 • 526k • 374

Lessons in Evaluation of Spanish Encoder-only Models

State-of-the-art encoder-only models for Spanish. From the paper "Lessons learned from the evaluation of Spanish Language Models"

HiTZ/xlm-roberta-large-xnli-es

Text Classification • Updated Mar 8, 2024 • 1
Lessons learned from the evaluation of Spanish Language Models

Paper • 2212.08390 • Published Dec 16, 2022

This is not a dataset

A Large Negation Benchmark to Challenge Large Language Models

This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models

Paper • 2310.15941 • Published Oct 24, 2023 • 6
HiTZ/This-is-not-a-dataset

Viewer • Updated Feb 23, 2024 • 381k • 171 • 6

CONAN-EUS: Counternarrative Generation in Basque and Spanish

Counternarrative Generation in Basque and Spanish

Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation

Paper • 2403.09159 • Published Mar 14, 2024
HiTZ/CONAN-EUS

Viewer • Updated Mar 15, 2024 • 33.2k • 56
HiTZ/mt5-counter-narrative-eu

Text Generation • Updated Mar 15, 2024 • 4
HiTZ/mt5-counter-narrative-es

Text Generation • Updated Mar 15, 2024 • 5

BERTeus

Give your Text Representation Models some Love: the Case for Basque

Give your Text Representation Models some Love: the Case for Basque

Paper • 2004.00033 • Published Mar 31, 2020
ixa-ehu/berteus-base-cased

Feature Extraction • 0.1B • Updated Sep 11, 2023 • 130 • 5

Antidote Project

Data and models generated within the Antidote Project (https://univ-cotedazur.eu/antidote)

HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine

Paper • 2306.06029 • Published Jun 9, 2023
Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain

Paper • 2404.07613 • Published Apr 11, 2024
HiTZ/casimedicos-exp

Viewer • Updated Mar 23, 2024 • 2.49k • 264 • 3
HiTZ/casimedicos-squad

Preview • Updated Apr 14, 2024 • 20 • 1

XNLIeu

XNLIeu: a dataset for cross-lingual NLI in Basque

XNLIeu: a dataset for cross-lingual NLI in Basque

Paper • 2404.06996 • Published Apr 10, 2024
HiTZ/xnli-eu

Viewer • Updated Jul 17, 2025 • 801k • 202

Medical MT

HiTZ/medical_enes-eu

Updated Jun 27, 2024 • 3
HiTZ/medical_en-eu

Updated Jun 27, 2024
HiTZ/medical_es-eu

Updated May 15, 2025 • 6

Merge and Conquer

Papers and resources published under the ALIA project.

HiTZ/gl_Qwen3-8B-Base

Text Generation • 8B • Updated Dec 20, 2025 • 4
HiTZ/gl_Llama-3.1-8B

Text Generation • 8B • Updated Dec 20, 2025 • 3
HiTZ/eu_Qwen3-14B-Base

Text Generation • Updated Dec 20, 2025 • 1 • 1
HiTZ/es_Qwen3-14B-Base

Text Generation • 15B • Updated Dec 20, 2025 • 3

Latxa Instruct

Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque

Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque

Paper • 2506.07597 • Published Jun 9, 2025
HiTZ/Latxa-Llama-3.1-8B-Instruct

Text Generation • 8B • Updated Dec 15, 2025 • 2.12k • • 11
HiTZ/Latxa-Llama-3.1-70B-Instruct

Text Generation • 71B • Updated Jun 12, 2025 • 311 • 6
HiTZ/Latxa-Llama-3.1-70B-Instruct-FP8

Text Generation • 71B • Updated Jun 12, 2025 • 18 • 1

TTS

HiTZ/TTS-gl_brais

Text-to-Speech • Updated Dec 16, 2025 • 6
HiTZ/TTS-gl_sabela

Text-to-Speech • Updated Dec 16, 2025
HiTZ/TTS-eu_antton

Text-to-Speech • Updated Dec 16, 2025 • 3
HiTZ/TTS-eu_maider

Text-to-Speech • Updated Dec 16, 2025 • 4

Latxa VL

Multilingual multimodal instruct models

HiTZ/Latxa-Qwen3-VL-2B-Instruct

Image-Text-to-Text • 2B • Updated Dec 15, 2025 • 608
HiTZ/Latxa-Qwen3-VL-4B-Instruct

Image-Text-to-Text • 4B • Updated Dec 15, 2025 • 209 • 3
HiTZ/Latxa-Qwen3-VL-8B-Instruct

Image-Text-to-Text • 770k • Updated 21 days ago • 232 • 2
HiTZ/Latxa-Qwen3-VL-32B-Instruct

Image-Text-to-Text • 1.14M • Updated 21 days ago • 164 • 2

Multimodal Latxa

Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque

Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque

Paper • 2511.09396 • Published Nov 12, 2025
HiTZ/Latxa-Llama-3.1-VL-8B-Instruct

Image-Text-to-Text • 8B • Updated 13 days ago • 60
HiTZ/Llama-Latxa-3.1-VL-8B-Instruct

Image-Text-to-Text • 8B • Updated 14 days ago • 23
HiTZ/pixmo-ask-model-anything_eu

Viewer • Updated 14 days ago • 146k • 18

Cap&Punct

MarianMT based models for translation tasks

HiTZ/cap-punct-eu

Translation • 76.9M • Updated Jan 13 • 19
HiTZ/cap-punct-es

Translation • 76.9M • Updated Jan 13 • 55

ASR Datasets

Collection with datasets for training and benchmark-evaluating ASR in Basque, Spanish and Bilingual Basque-Spanish

HiTZ/composite_corpus_eseu_v1.0

Viewer • Updated May 12, 2025 • 742k • 611 • 2
HiTZ/composite_corpus_eu_v2.1

Viewer • Updated Dec 19, 2024 • 407k • 128 • 2
HiTZ/composite_corpus_es_v1.0

Viewer • Updated May 12, 2025 • 526k • 374
HiTZ/benchmark_eseu_testsets

Updated Apr 19, 2025 • 94

Pyannote

Diarization models for VAD and Speaker Recognition

HiTZ/pyannote-segmentation-3.0-RTVE

Automatic Speech Recognition • Updated Nov 13, 2025 • 3

Nvidia NeMo

Nvidia NeMo STT models

HiTZ/stt_eu_conformer_transducer_large_v2

Automatic Speech Recognition • Updated Feb 11 • 5 • 1
HiTZ/stt_eu_conformer_transducer_large

Automatic Speech Recognition • Updated Nov 28, 2025 • 35 • 2
HiTZ/stt_eu_conformer_ctc_large

Automatic Speech Recognition • Updated Nov 28, 2025 • 41 • 2
HiTZ/stt_eseu_conformer_transducer_large

Automatic Speech Recognition • Updated Nov 28, 2025 • 17

Speech Collection

Collection with STT models, Diarization models and datasets for training ASR in Spanish, Basque and Bilingual

Nvidia NeMo

Collection

Nvidia NeMo STT models • 5 items • Updated 7 days ago
Whisper

Collection

30 items • Updated 14 days ago
Pyannote

Collection

Diarization models for VAD and Speaker Recognition • 1 item • Updated 14 days ago
ASR Datasets

Collection

Collection with datasets for training and benchmark-evaluating ASR in Basque, Spanish and Bilingual Basque-Spanish • 4 items • Updated 14 days ago

Whisper

Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages

Paper • 2503.23542 • Published Mar 30, 2025 • 9
HiTZ/whisper-lm-ngrams

Automatic Speech Recognition • Updated Apr 4, 2025
HiTZ/whisper-tiny-eu

Updated Dec 16, 2025 • 25
HiTZ/whisper-small-eu

Updated Dec 16, 2025 • 23

Latxa

Latxa: An Open Language Model and Evaluation Suite for Basque

Latxa: An Open Language Model and Evaluation Suite for Basque

Paper • 2403.20266 • Published Mar 29, 2024 • 4
HiTZ/latxa-7b-v1.2

Text Generation • Updated Jul 2, 2024 • 35 • 6
HiTZ/latxa-13b-v1.2

Text Generation • Updated Jul 2, 2024 • 4 • 2
HiTZ/latxa-70b-v1.2

Text Generation • Updated Jul 3, 2024 • 135

Multilingual TruthfulQA

Truth Knows No Language: Evaluating Truthfulness Beyond English

Truth Knows No Language: Evaluating Truthfulness Beyond English

Paper • 2502.09387 • Published Feb 13, 2025 • 1
HiTZ/truthfulqa-multi

Viewer • Updated May 21, 2025 • 4.12k • 336 • 2
HiTZ/truthfulqa-multi-MT

Viewer • Updated May 22, 2025 • 4.12k • 9
HiTZ/truthful_judge

Viewer • Updated May 22, 2025 • 135k • 17

GoLLIE

We present GoLLIE, a Large Language Model trained to follow annotation guidelines that outperforms previous approaches on zero-shot IE.

GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction

Paper • 2310.03668 • Published Oct 5, 2023 • 1
HiTZ/GoLLIE-7B

Text Generation • Updated Oct 10, 2023 • 1.14k • 29
HiTZ/GoLLIE-13B

Text Generation • Updated Oct 20, 2023 • 30 • 7
HiTZ/GoLLIE-34B

Text Generation • Updated Oct 20, 2023 • 195 • 38

Ask2Transformers

Ask2Transformers models

Ask2Transformers: Zero-Shot Domain labelling with Pre-trained Language Models

Paper • 2101.02661 • Published Jan 7, 2021
Label Verbalization and Entailment for Effective Zero- and Few-Shot Relation Extraction

Paper • 2109.03659 • Published Sep 8, 2021
Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source Learning

Paper • 2205.01376 • Published May 3, 2022
ZS4IE: A toolkit for Zero-Shot Information Extraction with simple Verbalizations

Paper • 2203.13602 • Published Mar 25, 2022 • 1

Metaphor Processing

Datasets and models for metaphor detection and interpretation via NLI in Spanish and English

Leveraging a New Spanish Corpus for Multilingual and Crosslingual Metaphor Detection

Paper • 2210.10358 • Published Oct 19, 2022
HiTZ/cometa

Viewer • Updated Apr 15, 2024 • 3.63k • 75
HiTZ/xlm-roberta-large-metaphor-detection-es

Token Classification • Updated Feb 26, 2024
HiTZ/mdeberta-base-metaphor-detection-es

Token Classification • Updated Feb 26, 2024 • 1

MATE

Vision-Language Models Struggle to Align Entities across Modalities

Vision-Language Models Struggle to Align Entities across Modalities

Paper • 2503.03854 • Published Mar 5, 2025 • 1
HiTZ/MATE

Viewer • Updated May 29, 2025 • 11k • 84

EusCrawl

Does Corpus Quality Really Matter for Low-Resource Languages?

Does Corpus Quality Really Matter for Low-Resource Languages?

Paper • 2203.08111 • Published Mar 15, 2022
HiTZ/euscrawl

Updated Feb 14, 2023 • 91 • 4
ixa-ehu/roberta-eus-euscrawl-large-cased

Fill-Mask • 0.4B • Updated Sep 11, 2023 • 29 • 3
ixa-ehu/roberta-eus-euscrawl-base-cased

Fill-Mask • Updated Mar 16, 2022 • 226 • 2

BERnaT

Basque Encoders for Representing Natural Textual Diversity

HiTZ/BERnaT-base

Fill-Mask • 0.1B • Updated Jan 16 • 361 • 1
HiTZ/BERnaT-medium

Fill-Mask • 51.4M • Updated Jan 16 • 10 • 1
HiTZ/BERnaT-large

Fill-Mask • 0.4B • Updated Jan 16 • 7 • 1
HiTZ/BERnaT-base-NERC

Token Classification • 0.1B • Updated Mar 16, 2025

Basque Language Proficiency

HiTZ/EusProficiency

Viewer • Updated Apr 1, 2024 • 5.17k • 523 • 2
HiTZ/EusReading

Viewer • Updated Apr 1, 2024 • 352 • 660 • 2
orai-nlp/bl2mp

Viewer • Updated May 19, 2025 • 1.8k • 18

Alpaca LoRA MT

Alpaca LoRA MT models and dataset

HiTZ/alpaca-lora-7b-en-pt-es-ca-eu-gl-at

Updated Mar 24, 2023 • 1
HiTZ/alpaca-lora-13b-en-pt-es-ca-eu-gl-at

Updated Mar 25, 2023
HiTZ/alpaca-lora-30b-en-pt-es-ca-eu-gl-at

Updated Mar 25, 2023
HiTZ/alpaca-lora-65b-en-pt-es-ca

Updated Apr 2, 2023 • 2

Lemmatization

On the Role of Morphological Information for Contextual Lemmatization

On the Role of Morphological Information for Contextual Lemmatization

Paper • 2302.00407 • Published Feb 1, 2023
HiTZ/xlm-roberta-large-lemma-eu

Token Classification • Updated Jun 24, 2024 • 2
HiTZ/xlm-roberta-large-lemma-en

Token Classification • Updated Jun 24, 2024 • 1
HiTZ/xlm-roberta-large-lemma-tr

Token Classification • Updated Jun 24, 2024

Pretraining Datasets

Basque Pretraining Datasets

HiTZ/latxa-corpus-v1.1

Viewer • Updated about 1 month ago • 4.13M • 139 • 1
HiTZ/euscrawl

Updated Feb 14, 2023 • 91 • 4
orai-nlp/ZelaiHandi

Viewer • Updated May 19, 2025 • 2.25M • 87 • 9

Evaluation Datasets

Basque Evaluation Datasets

HiTZ/This-is-not-a-dataset

Viewer • Updated Feb 23, 2024 • 381k • 171 • 6
HiTZ/EusProficiency

Viewer • Updated Apr 1, 2024 • 5.17k • 523 • 2
HiTZ/EusReading

Viewer • Updated Apr 1, 2024 • 352 • 660 • 2
HiTZ/EusTrivia

Viewer • Updated Apr 1, 2024 • 1.72k • 587 • 1

Instruction Datasets

Basque Instruction Datasets

HiTZ/alpaca_mt

Updated Apr 7, 2023 • 66 • 9
OpenAssistant/oasst1

Viewer • Updated May 2, 2023 • 88.8k • 10.8k • 1.49k
CohereLabs/aya_dataset

Viewer • Updated Apr 15, 2025 • 206k • 3.39k • 343
CohereLabs/aya_collection

Viewer • Updated Apr 15, 2025 • 514M • 4.01k • 232

Basque Encoders

Basque Encoder Language Models

ixa-ehu/roberta-eus-euscrawl-large-cased

Fill-Mask • 0.4B • Updated Sep 11, 2023 • 29 • 3
ixa-ehu/roberta-eus-euscrawl-base-cased

Fill-Mask • Updated Mar 16, 2022 • 226 • 2
ixa-ehu/roberta-eus-cc100-base-cased

Fill-Mask • 0.2B • Updated Sep 11, 2023 • 9 • 1
ixa-ehu/roberta-eus-mc4-base-cased

Fill-Mask • Updated Mar 16, 2022 • 7 • 1

OPT RM

OPT reward models

Training Language Models with Language Feedback at Scale

Paper • 2303.16755 • Published Mar 28, 2023 • 1
HiTZ/lmloss-opt-rm-1.3b

Text Generation • Updated Apr 7, 2023 • 3
HiTZ/rmloss-opt-rm-13b

Text Generation • Updated Apr 7, 2023 • 1

Composite Corpus

HiTZ/composite_corpus_eseu_v1.0

Viewer • Updated May 12, 2025 • 742k • 611 • 2
HiTZ/composite_corpus_eu_v2.1

Viewer • Updated Dec 19, 2024 • 407k • 128 • 2
HiTZ/composite_corpus_es_v1.0

Viewer • Updated May 12, 2025 • 526k • 374

Medical-mT5

An open-source text-to-text multilingual model for the medical domain.

HiTZ/Medical-mT5-large

Text Generation • 1B • Updated Apr 12, 2024 • 1.53k • 23
HiTZ/Medical-mT5-xl

Text Generation • Updated Apr 12, 2024 • 37 • 4
HiTZ/Medical-mT5-large-multitask

Text Generation • 1B • Updated May 6, 2024 • 4
HiTZ/Medical-mT5-xl-multitask

Text Generation • 4B • Updated Apr 12, 2024 • 10 • 2

Lessons in Evaluation of Spanish Encoder-only Models

State-of-the-art encoder-only models for Spanish. From the paper "Lessons learned from the evaluation of Spanish Language Models"

HiTZ/xlm-roberta-large-xnli-es

Text Classification • Updated Mar 8, 2024 • 1
Lessons learned from the evaluation of Spanish Language Models

Paper • 2212.08390 • Published Dec 16, 2022

BasqueParl

A Bilingual Corpus of Basque Parliamentary Transcriptions

BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions

Paper • 2205.01506 • Published May 3, 2022
HiTZ/basqueparl

Viewer • Updated Mar 8, 2024 • 343k • 15 • 1

This is not a dataset

A Large Negation Benchmark to Challenge Large Language Models

This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models

Paper • 2310.15941 • Published Oct 24, 2023 • 6
HiTZ/This-is-not-a-dataset

Viewer • Updated Feb 23, 2024 • 381k • 171 • 6

Speech to Text

Basque Speech to Text models

Running

5

Demo Basque ASR

🎤

5

Transcribe speech from an audio file
HiTZ/stt_eu_conformer_ctc_large

Automatic Speech Recognition • Updated Nov 28, 2025 • 41 • 2
HiTZ/stt_eu_conformer_transducer_large

Automatic Speech Recognition • Updated Nov 28, 2025 • 35 • 2
Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages

Paper • 2503.23542 • Published Mar 30, 2025 • 9

CONAN-EUS: Counternarrative Generation in Basque and Spanish

Counternarrative Generation in Basque and Spanish

Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation

Paper • 2403.09159 • Published Mar 14, 2024
HiTZ/CONAN-EUS

Viewer • Updated Mar 15, 2024 • 33.2k • 56
HiTZ/mt5-counter-narrative-eu

Text Generation • Updated Mar 15, 2024 • 4
HiTZ/mt5-counter-narrative-es

Text Generation • Updated Mar 15, 2024 • 5

EriBERTa

HiTZ/EriBERTa-base

Fill-Mask • 0.1B • Updated Jul 1, 2025 • 79 • 3
HiTZ/Multilingual-Medical-Corpus

Viewer • Updated Apr 12, 2024 • 67.4M • 483 • 43

BERTeus

Give your Text Representation Models some Love: the Case for Basque

Give your Text Representation Models some Love: the Case for Basque

Paper • 2004.00033 • Published Mar 31, 2020
ixa-ehu/berteus-base-cased

Feature Extraction • 0.1B • Updated Sep 11, 2023 • 130 • 5

IXAmBERT

Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque

ixa-ehu/ixambert-base-cased

Updated Jan 7, 2023 • 12 • 3
ixa-hitz/elkarhizketak

Updated Jan 18, 2024 • 21 • 1

Antidote Project

Data and models generated within the Antidote Project (https://univ-cotedazur.eu/antidote)

HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine

Paper • 2306.06029 • Published Jun 9, 2023
Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain

Paper • 2404.07613 • Published Apr 11, 2024
HiTZ/casimedicos-exp

Viewer • Updated Mar 23, 2024 • 2.49k • 264 • 3
HiTZ/casimedicos-squad

Preview • Updated Apr 14, 2024 • 20 • 1

Machine Translation

HiTZ/mt-hitz-en-eu

Updated Jun 17, 2024 • 4 • 3
HiTZ/mt-hitz-es-eu

Updated Jun 17, 2024 • 87
HiTZ/mt-hitz-eu-en

Updated Jun 25, 2024
HiTZ/mt-hitz-gl-eu

Updated Jun 17, 2024

XNLIeu

XNLIeu: a dataset for cross-lingual NLI in Basque

XNLIeu: a dataset for cross-lingual NLI in Basque

Paper • 2404.06996 • Published Apr 10, 2024
HiTZ/xnli-eu

Viewer • Updated Jul 17, 2025 • 801k • 202

Odesia Challenge 2024

IXA Submission for the 2024 ODESIA Challenge

HiTZ/Qwen2.5-14B-Instruct_ODESIA

Text Generation • 15B • Updated Feb 4, 2025 • 1
HiTZ/Hermes-3-Llama-3.1-8B_ODESIA

Text Generation • 8B • Updated Sep 18, 2024 • 2
HiTZ/gemma-2b-it_ODESIA

Text Generation • 3B • Updated Sep 20, 2024 • 3

Medical MT

HiTZ/medical_enes-eu

Updated Jun 27, 2024 • 3
HiTZ/medical_en-eu

Updated Jun 27, 2024
HiTZ/medical_es-eu

Updated May 15, 2025 • 6

AI & ML interests

Recent Activity

Papers

Team members 61

HiTZ 's collections 42

Demo Basque ASR