The pipeline starts when a new file (article) is detected in a watched directory by the DirectoryWatcher. This new file is produced by pipeline A (text-extraction)
Since the sudden exit of the controversial CEO Martin Kjær last week,
both he and the executive board in Region North Jutland
have been in hiding.
some/article.txt
Before the Entity Recognizer can use the input, it must be preprocessed. This entails removing newlines and adding punctuation where needed.
Since the sudden exit of the controversial CEO Martin Kjær last week,
both he and the executive board in Region North Jutland. have been in hiding.