Concept Linking is responsible for forming triples in the form of a subject (entity IRI), predicate, and object (DBPedia class URI), where the subject has "type" as the relation to the object. The subject along with its entity IRI will be provided by group B.
Four different solutions have been implemented. These will be mentioned further in solutions.
At default the solution that will be running is PromptEngineering. To change this, see Change the solution running
The Concept Linking part of KNOX has 4 different solutions.
Each have their own requirements, which can be toggled when needed.
Read the documentation for the TripleConstruction API.
Clone the TripleConstruction Github Repository
To install the necessary requirements for all solutions, execute the command pip install -r requirements.txt
within the PreprocessingLayer_TripleConstruction
folder. Alternatively, if you want to install requirements for a particular solution only, go to the file concept_linking/requirements.txt
and comment out the requirements for the solutions that are not needed and install the requirements again from the PreprocessingLayer_TripleConstruction
folder.
Only Needed for Prompt Engineering Solution
To be able to run the Prompt Engineering Solution, a Llama API Server needs to be running. This can be done either by using a local docker, by running it directly from the IDE or by using the Llama API Server in the KNOX pipeline.
It is possible to use a local LlamaServer. It can be found in ../concept_linking/tools/LlamaServer.
Using Docker(Local)
run docker-compose up --build
in the folder PreprocessingLayer_TripleConstruction/concept_linking/tools/LlamaServer
.
Using IDE(Local)
For running Prompt Engineering Solution solely from IDE, the Llama Server can be started from a terminal
NOTE: Remember to include the requirements for the llama_server in /concept_linking/requirements.txt
when installing overall requirements OR install them directly from /concept_linking/tools/LlamaServer/requirements.txt
Change directory to /concept_linking/tools/LlamaServer
and run the command python .\llama_cpu_server.py
to start the server.
Since this is meant as a tool for running Llama locally on Windows. It is required to have a C++ installation. C++ can be installed via the Visual Studio Installer. Select "Desktop development with C++" and press modify/install.
Go to the directory /concept_linking/PromptEngineering/main
set the api_url accordingly
api_url={domain or ip+port of llama server hosted in the knox pipeline}
Refer to the Server Distribution document
for specific dns and ip+port information.
Run the solution locally using Docker (download Docker desktop).
Call the api on http://127.0.0.1:4444/tripleconstruction
, with a correctly syntaxed body as well as API_SECRET
as Authorization in Headers. A body could be:
[
{
"fileName": "Artikel.txt",
"language": "en",
"sentences": [
{
"sentence": "Barack Obama is a person and he is married to Michelle Obama.",
"sentenceStartIndex": 0,
"sentenceEndIndex": 24,
"entityMentions": [
{
"name": "Barack Obama",
"type": "Entity",
"label": "PERSON",
"startIndex": 0,
"endIndex": 11,
"iri": "knox-kb01.srv.aau.dk/Barack_Obama"
},
{
"name": "Michelle Obama",
"type": "Entity",
"label": "PERSON",
"startIndex": 12,
"endIndex": 24,
"iri": "knox-kb01.srv.aau.dk/Michelle_Obama"
}
]
}
]
}
]
./{solution_name.py}
PreprocessingLayer_TripleConstruction/concept_linking/main.py
. Uncomment the specific solution that needs to run on the server, and comment the rest. All requirements needs to be uncommented in order to build docker image and for github actions to work.Prerequisites for running code from IDE
- A Python interpreter version 3.10 or 3.11(Not newer!)
- All relevant requirements installed
This solution can either be run with train
or predict
mode.
Train
will train the model.
While Predict
will perform Entity Type Classification.
Consult with code in /concept_linking/solutions/MachineLearning/main.py
for more info.
Uses the LLM Llama2. A prompt is given to the model.
prompt_template = {
"system_message": ("The input sentence is all your knowledge. \n"
"Do not answer if it can't be found in the sentence. \n"
"Do not use bullet points. \n"
"Do not identify entity mentions yourself, use the provided ones \n"
"Given the input in the form of the content from a file: \n"
"[Sentence]: {content_sentence} \n"
"[EntityMention]: {content_entity} \n"),
"user_message": ("Classify the [EntityMention] in regards to ontology classes: {ontology_classes} \n"
"The output answer must be in JSON in the following format: \n"
"{{ \n"
"'Entity': 'Eiffel Tower', \n"
"'Class': 'ArchitecturalStructure' \n"
"}} \n"),
"max_tokens": 4092
}
The variables {content_sentence} and {content_entity} is found in a previous part of the KNOX pipeline.
The variable {ontology_classes} fetched by the Ontology endpoint provided by group E(Database Layer)
Perform simple string similarity.
Maps Spacy labels to relevant ontology classes
Local API server based on Llama2.
The OntologyGraphBuilder has primarily been used to help with the evaluation of the solutions as well as visualizing the ontology classes.
Install the necessary requirements.
Within the OntologyGraphBuilder folder, you'll find the DAG.py
file. This file is dedicated to building and visualizing a Directed Acyclic Graph (DAG) based on the ontology. The core function in DAG.py
, named build_dag()
, requires a specified node
to act as the root in the visualized graph. Additionally, the boolean parameter include_superclasses
determines whether the superclasses of the node
should also be visualized.
The build_dag()
function can be called like in the file main.py
:
import DAG
if __name__ == '__main__':
# Default if only a node is provided is FullTree: False, Visualization: False
DAG.build_dag('Person', False, True)
Install the requirements for this tool
Navigate to the following directory
../concept_linking/tools/OntologyGraphBuilder/
And run the following command
pip install -r requirements.txt
The evalutation folder contains functions to generate a graph based on evaluation data in the file data/files/EvaluationData/Results/name_of_result.json
.
Install the necessary requirements.
Read the evaluation file using the function read_scores_from_json()
and then run the function evaluate_dataset()
with the data loaded from read_scores_from_json()
.
Example:
scores = read_scores_from_json(json_file_path)
evaluate_dataset(scores,'Distribution of Points: Machine Learning Solution on Danish Dataset')
Install the requirements for this tool
Navigate to the following directory
../concept_linking/tools/Evaluation/
And run the following command
pip install -r requirements.txt
The data folder contains files and documents used for input and output to the solutions.
The documents folder contains text files containing ontology related classes and data types as well as spaCy labels used for the untrained spaCy solution
The files folde contains the output folder for each of the four solutions as well as the evaluation data folder containing evaluation sets and their results. Furthermore the onotlogy.ttl file is also found in the files folder.
The Python testing framework unittest has been utilised to test the solution. The testing framework discovers all directories with the naming convention test_
. In addition, all Python files beginning with test_
inside those directories will be run.
Read more about testing the TripleConstruction API
The connection to the Meta Data API has not yet been implemented, and can be found in future work.