Russian Relation Extraction Model

This model is trained for the task of Relation Extraction between named entities in Russian text. It takes a piece of text and two marked entities within it as input and predicts the most likely semantic relationship between them (e.g., WORKS_AS, WORKPLACE, SPOUSE, etc.).

The model is based on the R-BERT architecture and has been fine-tuned on the Nerel dataset.

Model Details

Base Model: DeepPavlov/rubert-base-cased
Architecture: R-BERT. The model leverages not only the [CLS] token representation but also the averaged representations of each entity's tokens, along with embeddings for their types (e.g., PERSON, ORGANIZATION). This allows the model to better understand the context and the nature of the interacting entities.
Language: Russian

How to Use

This model is intended to be used in a pipeline, following a Named Entity Recognition (NER) model. After a NER model has identified entities in the text, this model can be used to predict relationships between all possible pairs of those entities.

For practical use, the easiest way to deploy this model is via the provided Docker container, which exposes a REST API.

Deployment with Docker

Pull the Docker image:
```
docker pull mrpzzios/bertre:1.3
```

Run the container (with GPU acceleration):

docker run -d -p 8000:8000 --name bertre-api mrpzzios/bertre:1.3

Send a request to the API:

curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{
  "chunks": ["Президент Башкирии Муртаза Рахимов решил поменять главу своей администрации."],
  "entities_list": [
      [[19, 34, "PERSON"], [0, 18, "PROFESSION"], [50, 75, "PROFESSION"]]
    ]
}'

Training and Inference Methodology

Training Process (Multi-Label Approach)

The model was trained on the Nerel dataset using a multi-label formulation, which is crucial for handling cases where a single pair of entities can have multiple valid relationships.

Label Representation: The relationship labels for each training example were converted into a binary vector (bitmask). In this vector, each index corresponds to a specific relation type. A value of 1 indicates that the relation exists, while 0 indicates it does not.
Loss Function: Consequently, BCEWithLogitsLoss was used as the loss function. This function is ideal for multi-label tasks as it evaluates each output logit from the model independently against the corresponding value in the target bitmask. This teaches the model to assess the "evidence" for each relationship type on its own merits, rather than forcing it to choose just one during training. This results in a more nuanced understanding of the data.

Inference Process (Single-Label Output)

During inference, the model produces a vector of logits (raw scores) for all possible relationship types. To provide a single, most confident prediction, the following steps are taken:

The model identifies the single relationship with the highest logit score.
The confidence score for this winning relationship (the relation_strength) is calculated by applying a Sigmoid function to its logit value. This converts the raw score into a more interpretable value between 0 and 1.

This approach combines the robust learning of a multi-label setup with a decisive single-label output, making it practical for downstream applications that expect one definitive relationship per entity pair.

Constrained Decoding

During inference and evaluation, a schema of type constraints derived from the Nerel dataset's annotation guidelines was applied. This prevents the model from predicting logically impossible relations (e.g., a SPOUSE relation between a PERSON and an ORGANIZATION), which significantly improves prediction precision.

Evaluation

The model was evaluated on the validation split of the Nerel dataset. The following metrics were achieved (macro average):

Metric	Value
F1-score	0.7500
Precision	0.8286
Recall	0.7246

Note: These metrics were obtained with constrained decoding applied during evaluation.

Limitations and Bias

The model's performance is highly dependent on the quality of the upstream Named Entity Recognition (NER) model. Errors from the NER stage will propagate and cause errors in relation extraction.
Performance may degrade on texts from domains that differ significantly from the news and encyclopedic articles found in the Nerel dataset.
Like many language models, this model may reproduce statistical biases present in its training data. For example, it might associate certain professions more strongly with a particular gender.

Downloads last month: 2

Safetensors

Model size

0.2B params

Tensor type

F32