Papers and resources published under the ALIA project.
AI & ML interests
Natural Language Processing, Signal Processing
Recent Activity
Papers
MEG-to-MEG Transfer Learning and Cross-Task Speech/Silence Detection with Limited Data
MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification
Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque
-
Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque
Paper • 2511.09396 • Published -
HiTZ/Latxa-Llama-3.1-VL-8B-Instruct
Image-Text-to-Text • 8B • Updated • 60 -
HiTZ/Llama-Latxa-3.1-VL-8B-Instruct
Image-Text-to-Text • 8B • Updated • 23 -
HiTZ/pixmo-ask-model-anything_eu
Viewer • Updated • 146k • 18
Collection with datasets for training and benchmark-evaluating ASR in Basque, Spanish and Bilingual Basque-Spanish
Nvidia NeMo STT models
-
HiTZ/stt_eu_conformer_transducer_large_v2
Automatic Speech Recognition • Updated • 5 • 1 -
HiTZ/stt_eu_conformer_transducer_large
Automatic Speech Recognition • Updated • 35 • 2 -
HiTZ/stt_eu_conformer_ctc_large
Automatic Speech Recognition • Updated • 41 • 2 -
HiTZ/stt_eseu_conformer_transducer_large
Automatic Speech Recognition • Updated • 17
Truth Knows No Language: Evaluating Truthfulness Beyond English
Ask2Transformers models
-
Ask2Transformers: Zero-Shot Domain labelling with Pre-trained Language Models
Paper • 2101.02661 • Published -
Label Verbalization and Entailment for Effective Zero- and Few-Shot Relation Extraction
Paper • 2109.03659 • Published -
Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source Learning
Paper • 2205.01376 • Published -
ZS4IE: A toolkit for Zero-Shot Information Extraction with simple Verbalizations
Paper • 2203.13602 • Published • 1
Vision-Language Models Struggle to Align Entities across Modalities
Basque Encoders for Representing Natural Textual Diversity
Alpaca LoRA MT models and dataset
Basque Pretraining Datasets
Basque Instruction Datasets
OPT reward models
An open-source text-to-text multilingual model for the medical domain.
A Bilingual Corpus of Basque Parliamentary Transcriptions
Basque Speech to Text models
-
Demo Basque ASR
🎤5Transcribe speech from an audio file
-
HiTZ/stt_eu_conformer_ctc_large
Automatic Speech Recognition • Updated • 41 • 2 -
HiTZ/stt_eu_conformer_transducer_large
Automatic Speech Recognition • Updated • 35 • 2 -
Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages
Paper • 2503.23542 • Published • 9
Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque
IXA Submission for the 2024 ODESIA Challenge
Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque
-
Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque
Paper • 2506.07597 • Published -
HiTZ/Latxa-Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 2.12k • • 11 -
HiTZ/Latxa-Llama-3.1-70B-Instruct
Text Generation • 71B • Updated • 311 • 6 -
HiTZ/Latxa-Llama-3.1-70B-Instruct-FP8
Text Generation • 71B • Updated • 18 • 1
Multilingual multimodal instruct models
-
HiTZ/Latxa-Qwen3-VL-2B-Instruct
Image-Text-to-Text • 2B • Updated • 608 -
HiTZ/Latxa-Qwen3-VL-4B-Instruct
Image-Text-to-Text • 4B • Updated • 209 • 3 -
HiTZ/Latxa-Qwen3-VL-8B-Instruct
Image-Text-to-Text • 770k • Updated • 232 • 2 -
HiTZ/Latxa-Qwen3-VL-32B-Instruct
Image-Text-to-Text • 1.14M • Updated • 164 • 2
MarianMT based models for translation tasks
Diarization models for VAD and Speaker Recognition
Collection with STT models, Diarization models and datasets for training ASR in Spanish, Basque and Bilingual
Latxa: An Open Language Model and Evaluation Suite for Basque
We present GoLLIE, a Large Language Model trained to follow annotation guidelines that outperforms previous approaches on zero-shot IE.
Datasets and models for metaphor detection and interpretation via NLI in Spanish and English
-
Leveraging a New Spanish Corpus for Multilingual and Crosslingual Metaphor Detection
Paper • 2210.10358 • Published -
HiTZ/cometa
Viewer • Updated • 3.63k • 75 -
HiTZ/xlm-roberta-large-metaphor-detection-es
Token Classification • Updated -
HiTZ/mdeberta-base-metaphor-detection-es
Token Classification • Updated • 1
Does Corpus Quality Really Matter for Low-Resource Languages?
On the Role of Morphological Information for Contextual Lemmatization
-
On the Role of Morphological Information for Contextual Lemmatization
Paper • 2302.00407 • Published -
HiTZ/xlm-roberta-large-lemma-eu
Token Classification • Updated • 2 -
HiTZ/xlm-roberta-large-lemma-en
Token Classification • Updated • 1 -
HiTZ/xlm-roberta-large-lemma-tr
Token Classification • Updated
Basque Evaluation Datasets
Basque Encoder Language Models
-
ixa-ehu/roberta-eus-euscrawl-large-cased
Fill-Mask • 0.4B • Updated • 29 • 3 -
ixa-ehu/roberta-eus-euscrawl-base-cased
Fill-Mask • Updated • 226 • 2 -
ixa-ehu/roberta-eus-cc100-base-cased
Fill-Mask • 0.2B • Updated • 9 • 1 -
ixa-ehu/roberta-eus-mc4-base-cased
Fill-Mask • Updated • 7 • 1
State-of-the-art encoder-only models for Spanish. From the paper "Lessons learned from the evaluation of Spanish Language Models"
A Large Negation Benchmark to Challenge Large Language Models
Counternarrative Generation in Basque and Spanish
Give your Text Representation Models some Love: the Case for Basque
Data and models generated within the Antidote Project (https://univ-cotedazur.eu/antidote)
-
HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine
Paper • 2306.06029 • Published -
Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain
Paper • 2404.07613 • Published -
HiTZ/casimedicos-exp
Viewer • Updated • 2.49k • 264 • 3 -
HiTZ/casimedicos-squad
Preview • Updated • 20 • 1
XNLIeu: a dataset for cross-lingual NLI in Basque
Papers and resources published under the ALIA project.
Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque
-
Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque
Paper • 2506.07597 • Published -
HiTZ/Latxa-Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 2.12k • • 11 -
HiTZ/Latxa-Llama-3.1-70B-Instruct
Text Generation • 71B • Updated • 311 • 6 -
HiTZ/Latxa-Llama-3.1-70B-Instruct-FP8
Text Generation • 71B • Updated • 18 • 1
Multilingual multimodal instruct models
-
HiTZ/Latxa-Qwen3-VL-2B-Instruct
Image-Text-to-Text • 2B • Updated • 608 -
HiTZ/Latxa-Qwen3-VL-4B-Instruct
Image-Text-to-Text • 4B • Updated • 209 • 3 -
HiTZ/Latxa-Qwen3-VL-8B-Instruct
Image-Text-to-Text • 770k • Updated • 232 • 2 -
HiTZ/Latxa-Qwen3-VL-32B-Instruct
Image-Text-to-Text • 1.14M • Updated • 164 • 2
Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque
-
Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque
Paper • 2511.09396 • Published -
HiTZ/Latxa-Llama-3.1-VL-8B-Instruct
Image-Text-to-Text • 8B • Updated • 60 -
HiTZ/Llama-Latxa-3.1-VL-8B-Instruct
Image-Text-to-Text • 8B • Updated • 23 -
HiTZ/pixmo-ask-model-anything_eu
Viewer • Updated • 146k • 18
MarianMT based models for translation tasks
Collection with datasets for training and benchmark-evaluating ASR in Basque, Spanish and Bilingual Basque-Spanish
Diarization models for VAD and Speaker Recognition
Nvidia NeMo STT models
-
HiTZ/stt_eu_conformer_transducer_large_v2
Automatic Speech Recognition • Updated • 5 • 1 -
HiTZ/stt_eu_conformer_transducer_large
Automatic Speech Recognition • Updated • 35 • 2 -
HiTZ/stt_eu_conformer_ctc_large
Automatic Speech Recognition • Updated • 41 • 2 -
HiTZ/stt_eseu_conformer_transducer_large
Automatic Speech Recognition • Updated • 17
Collection with STT models, Diarization models and datasets for training ASR in Spanish, Basque and Bilingual
Latxa: An Open Language Model and Evaluation Suite for Basque
Truth Knows No Language: Evaluating Truthfulness Beyond English
We present GoLLIE, a Large Language Model trained to follow annotation guidelines that outperforms previous approaches on zero-shot IE.
Ask2Transformers models
-
Ask2Transformers: Zero-Shot Domain labelling with Pre-trained Language Models
Paper • 2101.02661 • Published -
Label Verbalization and Entailment for Effective Zero- and Few-Shot Relation Extraction
Paper • 2109.03659 • Published -
Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source Learning
Paper • 2205.01376 • Published -
ZS4IE: A toolkit for Zero-Shot Information Extraction with simple Verbalizations
Paper • 2203.13602 • Published • 1
Datasets and models for metaphor detection and interpretation via NLI in Spanish and English
-
Leveraging a New Spanish Corpus for Multilingual and Crosslingual Metaphor Detection
Paper • 2210.10358 • Published -
HiTZ/cometa
Viewer • Updated • 3.63k • 75 -
HiTZ/xlm-roberta-large-metaphor-detection-es
Token Classification • Updated -
HiTZ/mdeberta-base-metaphor-detection-es
Token Classification • Updated • 1
Vision-Language Models Struggle to Align Entities across Modalities
Does Corpus Quality Really Matter for Low-Resource Languages?
Basque Encoders for Representing Natural Textual Diversity
Alpaca LoRA MT models and dataset
On the Role of Morphological Information for Contextual Lemmatization
-
On the Role of Morphological Information for Contextual Lemmatization
Paper • 2302.00407 • Published -
HiTZ/xlm-roberta-large-lemma-eu
Token Classification • Updated • 2 -
HiTZ/xlm-roberta-large-lemma-en
Token Classification • Updated • 1 -
HiTZ/xlm-roberta-large-lemma-tr
Token Classification • Updated
Basque Pretraining Datasets
Basque Evaluation Datasets
Basque Instruction Datasets
Basque Encoder Language Models
-
ixa-ehu/roberta-eus-euscrawl-large-cased
Fill-Mask • 0.4B • Updated • 29 • 3 -
ixa-ehu/roberta-eus-euscrawl-base-cased
Fill-Mask • Updated • 226 • 2 -
ixa-ehu/roberta-eus-cc100-base-cased
Fill-Mask • 0.2B • Updated • 9 • 1 -
ixa-ehu/roberta-eus-mc4-base-cased
Fill-Mask • Updated • 7 • 1
OPT reward models
An open-source text-to-text multilingual model for the medical domain.
State-of-the-art encoder-only models for Spanish. From the paper "Lessons learned from the evaluation of Spanish Language Models"
A Bilingual Corpus of Basque Parliamentary Transcriptions
A Large Negation Benchmark to Challenge Large Language Models
Basque Speech to Text models
-
Demo Basque ASR
🎤5Transcribe speech from an audio file
-
HiTZ/stt_eu_conformer_ctc_large
Automatic Speech Recognition • Updated • 41 • 2 -
HiTZ/stt_eu_conformer_transducer_large
Automatic Speech Recognition • Updated • 35 • 2 -
Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages
Paper • 2503.23542 • Published • 9
Counternarrative Generation in Basque and Spanish
Give your Text Representation Models some Love: the Case for Basque
Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque
Data and models generated within the Antidote Project (https://univ-cotedazur.eu/antidote)
-
HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine
Paper • 2306.06029 • Published -
Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain
Paper • 2404.07613 • Published -
HiTZ/casimedicos-exp
Viewer • Updated • 2.49k • 264 • 3 -
HiTZ/casimedicos-squad
Preview • Updated • 20 • 1
XNLIeu: a dataset for cross-lingual NLI in Basque
IXA Submission for the 2024 ODESIA Challenge