Relation extraction is responsible for forming triples in the form of a subject (entity IRI), predicate (DBpedia IRI), and object (entity IRI), where the subject has some relation to the object. The subject and object along with their entity IRI will be provided by group B.
The code for Relation Extraction can be found in the TripleConstruction repository in the /relation_extraction
folder.
A basic explanation of what relation extraction is, can be read here.
Read the documentation for the POST endpoint on the TripleConstruction API.
The input (output from group B - Entity Extraction) for the Relation Extraction solution must be in the format of the body of the POST request to the TripleConstrution API.
The output data from the Relation Extraction solution is sent as a POST request to the upsert triples endpoint by group E (Database API). The output is in the following format:
{
"triples": [
[
"knox-kb01.srv.aau.dk/Barack_Obama",
"http://dbpedia.org/ontology/spouse",
"knox-kb01.srv.aau.dk/Michelle_Obama"
],
[
"knox-kb01.srv.aau.dk/Michelle_Obama",
"http://dbpedia.org/ontology/spouse",
"knox-kb01.srv.aau.dk/Barack_Obama"
]
]
}
The architecture of the Relation Extraction solution can be found below.
Making changes to the method of relation extraction can be done through the RelationExtractor
class.
from relation_extraction.multilingual.main import begin_relation_extraction
class RelationExtractor():
@classmethod
def begin_extraction(self, data):
begin_relation_extraction(data)
The class contains a single method begin_extraction
which should call the main entry point that begins the relation extraction.
Making changes to subclasses of APIHandler
could be done individually by changing their implementation of the API_endpoint
or send_request
.
from abc import ABCMeta, abstractmethod
class APIHandler(metaclass=ABCMeta):
@property
@classmethod
@abstractmethod
def API_endpoint():
"""Property used to define the API_endpoint for the subclass of APIHandler"""
pass
@classmethod
@abstractmethod
def send_request(request):
pass
Making changes to the APIHandler
should include the decorator @abstractmethod
and @classmethod
. This means that subclasses of the APIHandler
must implement these methods and or properties. If some methods or properties should not be shared between all subclasses of APIHandler
, then these should be added to the classes where they apply themselves.
The Python testing framework unittest has been utilised to test the solution. The testing framework discovers all directories with the naming convention test_
. In addition, all Python files beginning with test_
inside those directories will be run.
Read more about testing the TripleConstruction API
The solution utilises Watchtower to fetch the latest updates that have been pushed to GitHub.
Read more about CI/CD for the TripleConstruction API.
Making changes to the components will generally only affect the layer that is being worked on.
This does not work using Docker - you must run the server without Docker.
Add the following to requirements.txt:llama_cpp_python==0.2.20
If you wish to use a local Llama 2 model instead of the Llama API first install Llama-cpp, then download either the 7B model or the 13B model. Once the desired model is downloaded, change the send_request
method on the llm_messenger from the following implementation:
def send_request(request):
HEADERS = {"Access-Authorization": os.getenv("ACCESS_SECRET")}
response = requests.post(url=LLMMessenger.API_endpoint(), json=request, headers=HEADERS)
return response
to the following implementation:
from llama_cpp import Llama
def send_request(request):
# Put the location of the GGUF model that you've download from HuggingFace here
model_path = "path/to/Llama2Model"
# Create a llama model
model = Llama(model_path=model_path, n_ctx=4096)
prompt = f"""<s>[INST] <<SYS>>
{request["system_message"]}
<</SYS>>
{request["user_message"]} [/INST]"""
# Run the model
output = model(prompt, max_tokens=request["max_tokens"], echo=True)
return output
Once the send_request
method is changed, run the program as described in How to run and deploy
To calculate Precision, Recall and F1 for the solution, a script has been made. This script is found in relation_extraction/evaluation/evaluation.py. The script should be run from the root using the following command.
python -m relation_extraction.evaluation.evauation
The script will evaluate the multilingual solution using the WebNLG dataset found in the same folder, and export evaluation_results.json
. In the bottom of this file, the metrics can be found in an object of following format:
"result": {
"total_expected_triples": xxx,
"hits": xxx,
"hit_percentage": x.xx
},
"score": {
"precision": x.xx,
"recall": x.xx,
"F1_score": x.xx
}
Note that when running the evaluation script, the LLMMessenger can be implemented using either the local implementatoin of Llama (local implementation) or the API implementation.
The following points are elaborated in the 'Future Work' section of the Relation Extraction report.