CompBioDSA
/

pig-mutbert-ref

Feature Extraction

Model card Files Files and versions

pig-mutbert-ref / README.md

weicaijaden's picture

Create README.md

e6ecc86 verified 5 months ago

|

history blame contribute delete

2.27 kB

	---
	license: mit
	tags:
	- biology
	- transformers
	- Feature Extraction
	---

	## Usage

	### Load tokenizer and model

	```python
	from transformers import AutoTokenizer, AutoModel

	model_name = "CompBioDSA/pig-mutbert-ref"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
	```

	The default attention is flash attention("sdpa"). If you want use basic attention, you can replace it with "eager". Please refer to [here](https://huggingface.co/CompBioDSA/MutBERT/blob/main/modeling_mutbert.py#L438).

	### Get embeddings

	```python
	import torch
	import torch.nn.functional as F

	from transformers import AutoTokenizer, AutoModel

	model_name = "CompBioDSA/pig-mutbert-ref"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModel.from_pretrained(model_name, trust_remote_code=True)

	dna = "ATCGGGGCCCATTA"
	inputs = tokenizer(dna, return_tensors='pt')["input_ids"]

	mut_inputs = F.one_hot(inputs, num_classes=len(tokenizer)).float().to("cpu") # len(tokenizer) is vocab size
	last_hidden_state = model(mut_inputs).last_hidden_state # [1, sequence_length, 768]
	# or: last_hidden_state = model(mut_inputs)[0] # [1, sequence_length, 768]

	# embedding with mean pooling
	embedding_mean = torch.mean(last_hidden_state[0], dim=0)
	print(embedding_mean.shape) # expect to be 768

	# embedding with max pooling
	embedding_max = torch.max(last_hidden_state[0], dim=0)[0]
	print(embedding_max.shape) # expect to be 768
	```

	### Using as a Classifier

	```python
	from transformers import AutoModelForSequenceClassification

	model_name = "CompBioDSA/pig-mutbert-ref"
	model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True, num_labels=2)
	```

	### With RoPE scaling

	Allowed types for RoPE scaling are: `linear` and `dynamic`. To extend the model's context window you need to add rope_scaling parameter.

	If you want to scale your model context by 2x:

	```python
	model_name = "CompBioDSA/pig-mutbert-ref"
	model = AutoModel.from_pretrained(model_name,
	trust_remote_code=True,
	rope_scaling={'type': 'dynamic','factor': 2.0}
	) # 2.0 for x2 scaling, 4.0 for x4, etc..
	```