涞源县住房和城乡建设局网站/怎么找需要做推广的公司
选择一个正确的用例和工作流
Choosing the right recipe and workflow
所以你有一个需要解决的NER问题,以及需要注释的数据。你要尽可能高效地完成它。但是如何为你的用例选择正确的工作流呢?
So you have an NER problem you want to solve, and data to annotate. And you want to get it done as efficiently as possible. But how do you pick the right workflow for your use case?
1. 纯人工的方式进行命名实体识别标注:这个方式是最经典的数据注释方法。在一个原始文本中突出显示所有的文本。在流程的最后,您将导出“gold-standard”数据,您可以使用这些数据来训练模型。如果您想创建一个新的数据集,但是完全手工注释通常是最好的选择,但是没有任何现有资源可以利用。在Prodigy中,可以将此工作流与ner.manual配方。所有的命名实体识别最开始都需要人工进行标注,这是命名实体识别的种子,这样的标注结果可以训练一个初始的模型,对你未标注的数据进行一次基于这个种子生成的命名实体识别模型预测出来的标注。
- Fully manual: This is the most classic way of annotating data. You’re shown a raw text and highlight all entity spans in the text. At the end of the process, you export “gold-standard” data that you can train your model with. Fully manual annotation is often the best choice if you want to create a new dataset, but are starting completely from scratch without any existing resources you can leverage. In Prodigy, you can use this workflow with the <
a href="ht
tps://prodi.gy/docs/recipes#ner-manual"> ner.manual recipe. - 所以第二步要讲到的就是如何利用第一步的种子数据集构建的命名实体识别模型预测的结果进行二次人工审核的过程。完全手动的注释很容易变得乏味,这会导致错误和不一致的数据。你经常一遍又一遍地做同样的事情——例如,在你的数据中,提到“纽约”几乎总是指这个地方。您可以使用关键字列表和模式来描述您正在寻找的标记,而不是每次都手动标记它,并让Prodigy预先为您突出显示候选对象。即使你的模式只在50%的时间里起作用,那对你来说还是少了50%的工作量。在Prodigy中,你可以使用ner.manual和--patterns选项工作流实现这个工作。另请参阅模式文档。
- Manual with suggestions from patterns: Fully manual annotation can easily get tedious, which leads to mistakes and inconsistent data. You’re often doing the same thing over and over – for instance, in your data, mentions of “New York” may pretty much always refer to the location. Instead of labelling it by hand every single time, you can use keyword lists and patterns describing the tokens you’re looking for and make Prodigy pre-highlight the candidates for you. Even if your patterns only help 50% of the time, that’s still 50% less work for you. In Prodigy, you can use this workflow with <
a href="ht
tps://prodi.gy/docs/recipes#ner-manual"> ner.manual and the--patterns
option. Also see the docs on patterns. - 模型建议手册:您还可以使用现有模型来突出显示建议实体的开始与结束,从而节省时间,而不是使用模式。如果要训练一个包含新类别和现有类别的模型,此工作流也很有用。您可以使用现有模型来帮助您标记感兴趣的实体类型,更正它所犯的任何错误,并在顶部添加新的类别。在Prodigy中,可以将此ner.correct工作流。正确空间模型的配方或任何其他预测命名实体的模型的自定义配方。更正模型的犯得错误并重新训练你的模型让模型对新的数据进行预测,在预测结果之上进行人工的标注。
- Manual with suggestions from model: Instead of patterns, you can also use an existing model to highlight suggestions and save you time. This workflow is also useful if you want to train a model with a mix of new and existing categories. You can use the existing model to help you label the entity types you’re interested in, correct any mistakes it makes and add a new category on top. In Prodigy, you can use this workflow with the <
a href="htt
ps://prodi.gy/docs/recipes#ner-correct"> ner.correct recipe for spaCy models or a custom recipe for any other model that predicts named entities. - 二元主动学习和模型流:如果您已经有了一个模型并希望对更多数据进行微调,则此工作流非常有用。您可以使用模型来建议您对最相关的示例进行注释,并对其预测提供反馈,而不是对每个示例进行注释。有许多不同的方法可以选择“最好的”例子,以及一系列致力于探索主动学习技术的研究。Prodigy的ner.teach 配方使用beam search实现了简单的不确定性采样,可以找到模型中的边界语料并返回给人工进行判定,边界语料就是模型没有百分百把握识别正确的语料:对于每个示例,命名实体识别模型都会获得大量分析,并要求您接受或拒绝它最不确定的实体识别结果。根据您的决定,模型将在循环中更新,并引导您进行更好的预测。Prodigy还包括一些实用程序,这些工具允许您使用循环中的模型实现自定义工作流。
- Binary with active learning and a model in the loop: This workflow is useful if you already have a model and want to fine-tune it on more data. Instead of annotating every example, you can use the model to suggest you the most relevant examples to annotate and give it feedback on its predictions. There are many different ways you can select the “best” examples, and a whole line of research dedicated to exploring active learning techniques. Prodigy’s <
a href="h
ttps://prodi.gy/docs/recipes#ner-teach"> ner.teach recipe implements simple uncertainty sampling with beam search: for each example, the annotation model gets a number of analyses and asks you to accept or reject the entity analyses it’s most uncertain about. Based on your decisions, the model is updated in the loop and guided towards better predictions. Prodigy also includes utilities that let you implement custom workflows with a model in the loop.