从预训练语言模型看MLM预测任务

深度学习自然语言处理 2022-11-14 3093

描述

Prompt Learning是当前NLP的一个重要话题，已经有许多文章进行论述。

从本质上来说，Prompt Learning 可以理解为一种下游任务的重定义方法，将几乎所有的下游任务均统一为预训练语言模型任务，从而避免了预训练模型和下游任务之间存在的 gap。

如此一来，几乎所有的下游 NLP 任务均可以使用，不需要训练数据，在小样本数据集的基础上也可以取得超越 Fine-Tuning 的效果，使得所有任务在使用方法上变得更加一致，而局限于字面意义上的理解还远远不够，我们可以通过一种简单、明了的方式进行讲述。

为了解决这一问题，本文主要从预训练语言模型看MLM预测任务、引入prompt_template的MLM预测任务、引入verblize类别映射的Prompt-MLM预测、基于zero-shot的prompt情感分类实践以及基于zero-shot的promptNER实体识别实践五个方面，进行代码介绍，供大家一起思考。

一、从预训练语言模型看MLM预测任务

MLM和NSP两个任务是目前BERT等预训练语言模型预训任务，其中MLM要求指定周围词来预测中心词，其模型机构十分简单，如下所示：

import torch.nn as nn from transformers import BertModel,BertForMaskedLM class Bert_Model(nn.Module): def __init__(self, bert_path ,config_file ): super(Bert_Model, self).__init__() self.bert = BertForMaskedLM.from_pretrained(bert_path,config=config_file)# 加载预训练模型权重def forward(self, input_ids, attention_mask, token_type_ids): outputs = self.bert(input_ids, attention_mask, token_type_ids)#masked LM 输出的是 mask的值 对应的ids的概率 ，输出 会是词表大小，里面是概率logit = outputs[0]# 池化后的输出 [bs, config.hidden_size]returnlogit

下面一段代码，简单的使用了hugging face中的bert-base-uncased进行空缺词预测，先可以得到预训练模型对指定[MASK]位置上概率最大的词语【词语来自于预训练语言模型的词表】。

例如给定句子"natural language processing is a [MASK] technology."，要求预测出其中的[MASK]的词：

>>> from transformers import pipeline >>> unmasker = pipeline('fill-mask', model='bert-base-uncased') >>> unmasker("natural language processing is a [MASK] technology.") [{'score': 0.18927036225795746,'token': 3274,'token_str':'computer','sequence':'natural language processing is a computer technology.'}, {'score': 0.14354903995990753,'token': 4807,'token_str':'communication','sequence':'natural language processing is a communication technology.'}, {'score': 0.09429361671209335,'token': 2047,'token_str':'new','sequence':'natural language processing is a new technology.'}, {'score': 0.05184786394238472,'token': 2653,'token_str':'language','sequence':'natural language processing is a language technology.'}, {'score': 0.04084266722202301,'token': 15078,'token_str':'computational','sequence':'natural language processing is a computational technology.'}]

从结果中，可以显然的看到，[MASK]按照概率从大到小排序后得到的结果是，computer、communication、new、language以及computational，这直接反馈出了预训练语言模型能够有效刻画出NLP是一种计算机、交流以及语言技术。

二、引入prompt_template的MLM预测任务

因此，既然语言模型中的MLM预测结果能够较好地预测出指定的结果，那么其就必定包含了很重要的上下文知识，即上下文特征，那么，我们是否可以进一步地让它来执行文本分类任务？即使用[MASK]的预测方式来预测相应分类类别的词，然后再将词做下一步与具体类别的预测？

实际上，这种思想就是prompt的思想，将下游任务对齐为预训练语言模型的预训练任务，如NPS和MLM，至于怎么对齐，其中引入两个概念，一个是prompt_template，即提示模版，以告诉模型要生成与任务相关的词语。因此，将任务原文text和prompt_template进行拼接，就可以构造与预训练语言模型相同的预训练任务。

例如，

>>> from transformers import pipeline >>> unmasker = pipeline('fill-mask', model='bert-base-uncased') >>> text ="I really like the film a lot.">>> prompt_template ="Because it was [MASK].">>> pred1 = unmasker(text + prompt_template) >>> pred1 [ {'score': 0.14730973541736603,'token': 2307,'token_str':'great','sequence':'i really like the film a lot. because it was great.'}, {'score': 0.10884211212396622,'token': 6429,'token_str':'amazing','sequence':'i really like the film a lot. because it was amazing.'}, {'score': 0.09781625121831894,'token': 2204,'token_str':'good','sequence':'i really like the film a lot. because it was good.'}, {'score': 0.04627735912799835,'token': 4569,'token_str':'fun','sequence':'i really like the film a lot. because it was fun.'}, {'score': 0.043138038367033005,'token': 10392,'token_str':'fantastic','sequence':'i really like the film a lot. because it was fantastic.'}] >>> text ="this movie makes me very disgusting. ">>> prompt_template ="Because it was [MASK].">>> pred2 = unmasker(text + prompt_template) >>> pred2 [ {'score': 0.05464331805706024,'token': 9643,'token_str':'awful','sequence':'this movie makes me very disgusting. because it was awful.'}, {'score': 0.050322480499744415,'token': 2204,'token_str':'good','sequence':'this movie makes me very disgusting. because it was good.'}, {'score': 0.04008950665593147,'token': 9202,'token_str':'horrible','sequence':'this movie makes me very disgusting. because it was horrible.'}, {'score': 0.03569378703832626,'token': 3308,'token_str':'wrong','sequence':'this movie makes me very disgusting. because it was wrong.'}, {'score': 0.033358603715896606,'token': 2613,'token_str':'real','sequence':'this movie makes me very disgusting. because it was real.'}]

上面，我们使用了表达正面和负面的两个句子，模型得到最高的均是与类型相关的词语，这也验证了这种方法的可行性。

三、引入verblize类别映射的Prompt-MLM预测

与构造prompt-template之外，另一个重要的点是verblize，做词语到类型的映射，因为MLM模型预测的词语很不确定，需要将词语与具体的类别进行对齐，比如将"great", "amazing", "good", "fun", "fantastic", "better"等词对齐到"positive"上，当模型预测结果出现这些词时，就可以将整个预测的类别设定为positive；

同理，将"awful", "horrible", "bad", "wrong", "ugly"等词映射为“negative”时，即可以将整个预测的类别设定为negative；

>>> verblize_dict = {"pos": ["great","amazing","good","fun","fantastic","better"],"neg": ["awful","horrible","bad","wrong","ugly"] ... } >>> hash_dict = dict() >>>fork, vinverblize_dict.items(): ...forv_inv: ... hash_dict[v_] = k >>> hash_dict {'great':'pos','amazing':'pos','good':'pos','fun':'pos','fantastic':'pos','better':'pos','awful':'neg','horrible':'neg','bad':'neg','wrong':'neg','ugly':'neg'}

因此，我们可以将这类方法直接加入到上面的预测结果当中进行修正，得到以下结果，

>>> [{"label":hash_dict[i["token_str"]],"score":i["score"]}foriinpred1] [{'label':'pos','score': 0.14730973541736603}, {'label':'pos','score': 0.10884211212396622}, {'label':'pos','score': 0.09781625121831894}, {'label':'pos','score': 0.04627735912799835}, {'label':'pos','score': 0.043138038367033005}] >>> [{"label":hash_dict.get(i["token_str"], i["token_str"]),"score":i["score"]}foriinpred2] [{'label':'neg','score': 0.05464331805706024}, {'label':'pos','score': 0.050322480499744415}, {'label':'neg','score': 0.04008950665593147}, {'label':'neg','score': 0.03569378703832626}, {'label':'real','score': 0.033358603715896606}]

通过取top1，可直接得到类别分类结果，当然也可以综合多个预测结果，可以获top10中各个类别的比重，以得到最终结果：

{"text":"I really like the film a lot.","label":"pos""text":"this movie makes me very disgusting. ","label":"neg"}

至此，我们可以大致就可以大致了解在zero-shot场景下，prompt的核心所在。而我们可以进一步的想到，如果我们有标注数据，又如何进行继续训练，如何更好的设计prompt-template以及做好这个词语映射词表，这也是prompt-learning的后续研究问题。

因此，我们可以进一步地形成一个完整的基于训练数据的prompt分类模型，其代码实现样例具体如下，从中我们可以大致在看出具体的算法思想，我们命名为prompt.py

from transformers import AutoModelForMaskedLM , AutoTokenizer import torch class Prompting(object): def __init__(self, **kwargs): model_path=kwargs['model'] tokenizer_path= kwargs['model']if"tokenizer"inkwargs.keys(): tokenizer_path= kwargs['tokenizer'] self.model = AutoModelForMaskedLM.from_pretrained(model_path) self.tokenizer = AutoTokenizer.from_pretrained(model_path) def prompt_pred(self,text):""" 输入带有[MASK]的序列，输出LM模型Vocab中的词语列表及其概率 """indexed_tokens=self.tokenizer(text, return_tensors="pt").input_ids tokenized_text= self.tokenizer.convert_ids_to_tokens (indexed_tokens[0]) mask_pos=tokenized_text.index(self.tokenizer.mask_token) self.model.eval() with torch.no_grad(): outputs = self.model(indexed_tokens) predictions = outputs[0] values, indices=torch.sort(predictions[0, mask_pos], descending=True) result=list(zip(self.tokenizer.convert_ids_to_tokens(indices), values)) self.scores_dict={a:bfora,binresult}returnresult def compute_tokens_prob(self, text, token_list1, token_list2):""" 给定两个词表，token_list1表示表示正面情感positive的词，如good, great，token_list2表示表示负面情感positive的词，如good, great，bad, terrible. 在计算概率时候，统计每个类别词所占的比例，score1/(score1+score2)并归一化，作为最终类别概率。 """_=self.prompt_pred(text) score1=[self.scores_dict[token1]iftoken1inself.scores_dict.keys()else0fortoken1intoken_list1] score1= sum(score1) score2=[self.scores_dict[token2]iftoken2inself.scores_dict.keys()else0fortoken2intoken_list2] score2= sum(score2) softmax_rt=torch.nn.functional.softmax(torch.Tensor([score1,score2]), dim=0)returnsoftmax_rt def fine_tune(self, sentences, labels, prompt=" Since it was [MASK].",goodToken="good",badToken="bad"):""" 对已有标注数据进行Fine tune训练。 """good=tokenizer.convert_tokens_to_ids(goodToken) bad=tokenizer.convert_tokens_to_ids(badToken) from transformers import AdamW optimizer = AdamW(self.model.parameters(),lr=1e-3)forsen, labelinzip(sentences, labels): tokenized_text = self.tokenizer.tokenize(sen+prompt) indexed_tokens = self.tokenizer.convert_tokens_to_ids(tokenized_text) tokens_tensor = torch.tensor([indexed_tokens]) mask_pos=tokenized_text.index(self.tokenizer.mask_token) outputs = self.model(tokens_tensor) predictions = outputs[0] pred=predictions[0, mask_pos][[good,bad]] prob=torch.nn.functional.softmax(pred, dim=0) lossFunc = torch.nn.CrossEntropyLoss() loss=lossFunc(prob.unsqueeze(0), torch.tensor([label])) loss.backward() optimizer.step()

四、基于zero-shot的prompt情感分类实践

下面我们直接以imdb中的例子进行zero-shot的prompt分类实践，大家可以看看其中的大致逻辑：

1、加入

>>from transformers import AutoModelForMaskedLM , AutoTokenizer >>import torch >>model_path="bert-base-uncased">>tokenizer = AutoTokenizer.from_pretrained(model_path) >>from prompt import Prompting >>prompting= Prompting(model=model_path)

2、使用prompt_pred直接进行情感预测

>>prompt="Because it was [MASK].">>text="I really like the film a lot.">>prompting.prompt_pred(text+prompt)[:10] [('great', tensor(9.5558)), ('amazing', tensor(9.2532)), ('good', tensor(9.1464)), ('fun', tensor(8.3979)), ('fantastic', tensor(8.3277)), ('wonderful', tensor(8.2719)), ('beautiful', tensor(8.1584)), ('awesome', tensor(8.1071)), ('incredible', tensor(8.0140)), ('funny', tensor(7.8785))] >>text="I did not like the film.">>prompting.prompt_pred(text+prompt)[:10] [('bad', tensor(8.6784)), ('funny', tensor(8.1660)), ('good', tensor(7.9858)), ('awful', tensor(7.7454)), ('scary', tensor(7.3526)), ('boring', tensor(7.1553)), ('wrong', tensor(7.1402)), ('terrible', tensor(7.1296)), ('horrible', tensor(6.9923)), ('ridiculous', tensor(6.7731))]

2、加入neg/pos词语vervlize进行情感预测

>>text="not worth watching">>prompting.compute_tokens_prob(text+prompt, token_list1=["great","amazin","good"], token_list2= ["bad","awfull","terrible"]) tensor([0.1496, 0.8504]) >>text="I strongly recommend that moview">>prompting.compute_tokens_prob(text+prompt, token_list1=["great","amazin","good"], token_list2= ["bad","awfull","terrible"]) tensor([0.9321, 0.0679]) >>text="I strongly recommend that moview">>prompting.compute_tokens_prob(text+prompt, token_list1=["good"], token_list2= ["bad"]) tensor([0.9223, 0.0777])

五、基于zero-shot的promptNER实体识别实践

进一步的，我们可以想到，既然分类任务可以进行分类任务，那么是否可以进一步用这种方法来做实体识别任务呢？

实际上是可行的，暴力的方式，通过获取候选span，然后询问其中实体所属的类型集合。

1、设定prompt-template

同样的，我们可以设定template，以一个人物为例，John是一个非常常见的名字，模型可以直接知道它是一个人，而不需要上下文

Sentence. John is atypeof [MASK]

2、使用prompt_pred直接进行预测我们直接进行处理，可以看看效果：

>>prompting.prompt_pred("John went to Paris to visit the University. John is a type of [MASK].")[:5] [('man', tensor(8.1382)), ('john', tensor(7.1325)), ('guy', tensor(6.9672)), ('writer', tensor(6.4336)), ('philosopher', tensor(6.3823))] >>prompting.prompt_pred("Savaş went to Paris to visit the university. Savaş is a type of [MASK].")[:5] [('philosopher', tensor(7.6558)), ('poet', tensor(7.5621)), ('saint', tensor(7.0104)), ('man', tensor(6.8890)), ('pigeon', tensor(6.6780))]

2、加入类别词语vervlize进行情感预测
进一步的，我们加入类别词，进行预测，因为我们需要做的识别是人物person识别，因此我们可以将person类别相关的词作为token_list1，如["person","man"]，其他类型的，作为其他词语，如token_list2为["location","city","place"])，而在其他类别时，也可以通过构造wordlist字典完成预测。

>>> prompting.compute_tokens_prob("It is a type of [MASK].", token_list1=["person","man"], token_list2=["location","city","place"]) tensor([0.7603, 0.2397]) >>> prompting.compute_tokens_prob("Savaş went to Paris to visit the parliament. Savaş is a type of [MASK].", token_list1=["person","man"], token_list2=["location","city","place"])//确定概率为0.76，将大于0.76的作为判定为person的概率 tensor([9.9987e-01, 1.2744e-04])

从上面的结果中，我们可以看到，利用分类方式来实现zero shot实体识别，是直接有效的，“Savaş”判定为person的概率为0.99，

prompting.compute_tokens_prob("Savaş went to Laris to visit the parliament. Laris is a type of [MASK].", token_list1=["person","man"], token_list2=["location","city","place"]) tensor([0.3263, 0.6737])

而在这个例子中，将“Laris”这一地点判定为person的概率仅仅为0.3263，也证明其有效性。

总结

本文主要从预训练语言模型看MLM预测任务、引入prompt_template的MLM预测任务、引入verblize类别映射的Prompt-MLM预测、基于zero-shot的prompt情感分类实践以及基于zero-shot的promptNER实体识别实践五个方面，进行了代码介绍。

关于prompt-learning，我们可以看到，其核心就在于将下游任务统一建模为了预训练语言模型的训练任务，从而能够最大地挖掘出预训模型的潜力，而其中的prompt-template以及对应词的构造，这个十分有趣，大家可以多关注。

审核编辑：李倩

打开APP阅读更多精彩内容