The entity recognition part is performed by using danish and english pre-trained models published by SpaCy.
The danish model has been trained on top of the danish pre-trained SpaCy model to improve its accuracy and be able to recognize literals. See Pypi Repository for more information on where to find the custom model.
import en_core_web_lg
import da_core_news_lg
nlp_en = en_core_web_lg.load()
nlp_da = da_core_news_lg.load()
Full code available here.
The entity recognition is performed using either the nlp_en
or nlp_da
variable defined in Loading a SpaCy Model.
def GetTokens(text: str):
result = DetectLang(text)
if result == "da":
return nlp_da(text)
elif result == "en":
return nlp_en(text)
else:
raise UndetectedLanguageException()
Full code available here.
The return type of this function is a Doc containing information such as the entity's start and end index, the entity's belonging sentence, and so on.