An API built by groups C (Relation Extraction) and D (Concept Linking) | KNOX 2023
PreprocessingLayer TripleConstruction is responsible for creating triples that can be utilised by group E (Database API) to construct a knowledge graph.
The triples will be data stored in the form of a subject (entity IRI), predicate (DBpedia IRI), and object (entity IRI), where the subject has some relation to the object.
The code can be found in the TripleConstruction repository which is created by the following groups:
The service running on:
http://knox-preproc01.srv.aau.dk:4444/tripleconstruction
but should only be accessed trough the Access-API:
knox-proxy01.srv.aau.dk/tripleconstruction-api/tripleconstruction
Note that making a POST request to this endpoint requires the
Access-Authorization
andAuthorization
header.
Key | Value |
---|---|
Access-Authorization |
The value of the key ACCESS_SECRET |
Authorization |
The value of the key API_SECRET |
Learn more about .env secrets in KNOX.
The body of the request must be valid JSON and match the following format (this is an example of the output from Entity extraction):
[
{
"language": "en",
"metadataId": "790261e8-b8ec-4801-9cbd-00263bcc666d",
"sentences": [
{
"sentence": "Barack Obama was married to Michelle Obama two days ago.",
"sentenceStartIndex": 20,
"sentenceEndIndex": 62,
"entityMentions":
[
{ "name": "Barack Obama", "type": "Entity", "label": "PERSON", "startIndex": 0, "endIndex": 12, "iri": "knox-kb01.srv.aau.dk/Barack_Obama" },
{ "name": "Michelle Obama", "type": "Entity", "label": "PERSON", "startIndex": 59, "endIndex": 73, "iri": "knox-kb01.srv.aau.dk/Michelle_Obama" },
{ "name": "two days ago", "type": "Literal", "label": "DATE", "startIndex": 74, "endIndex": 86, "iri": null }
]
}
]
}
]
Code | Description | Schema |
---|---|---|
200 | The post request was correctly formatted, and has been received by the server. | Relation Extraction and Concept Linking has run and was completed without errors. |
401 | Unauthorised request. | Nothing is executed. |
404 | Invalid endpoint. | Nothing is executed. |
422 | The post request was incorrectly formatted, and the server could therefore not parse the data. | Nothing is executed. |
Clone the TripleConstruction Github repository.
git clone https://github.com/Knox-AAU/PreprocessingLayer_TripleConstruction.git
Make sure to create a
.env
file with the correct environment variables.
Learn more about .env secrets in KNOX.
To run the solution, a .env
file must be created in the root folder and contain the following keys with their corresponding value.
API_SECRET=***
ACCESS_SECRET=***
If you wish to run the solution locally (not through docker), you should first download Python (version 3.11). Run the following commands to install the necessary libraries/modules for the solution and start the server.
pip install -r requirements.txt
python -m server.server
Docker should be installed if you want to run the solution in a container (download Docker desktop).
Run Docker container on local machine using this command (only if the command includes the --build
flag, it will respect any changes to the files):
docker-compose up --build
Access the knox-preproc01.srv.aau.dk
server (the TripleConstruction API is running on port 4444
):
Remember to replace <your-aau-mail@student.aau.dk>
and <your_port>
.
ssh <your-aau-mail@student.aau.dk>@knox-preproc01.srv.aau.dk -L <your_port>:localhost:4444
Deployment is normally handled by Watchtower on push to main. However, in case of the need of manual deployment, run the following command:
Remember to replace <your_port>
and the key for API_SECRET
and ACCESS_SECRET
.
docker run --name tc_api -p 0.0.0.0:4444:<your_port> --add-host=host.docker.internal:host-gateway -e API_SECRET=*** -e ACCESS_SECRET=*** -d ghcr.io/knox-aau/preprocessinglayer_tripleconstruction:main
The Python testing framework unittest has been utilised to test the API. The testing framework discovers all directories with the naming convention test_
. In addition, all Python files beginning with test_
inside those directories will be run.
Run test using this command:
python -m unittest
If you wish to run tests in local Docker container, first make sure the container is running. Then SSH into the container and run tests using the following commands:
docker exec -ti server-container /bin/bash
python -m unittest
Read more about how tests are contiously run with GitHub Actions.
The TripleConstruction API is built using Flask.
Change the code of server.py to expand upon the API.
from flask import Flask, request, jsonify
import json
import os
from relation_extraction.relation_extractor import RelationExtractor
from concept_linking.main import entity_type_classification
app = Flask(__name__)
@app.route('/tripleconstruction', methods=["POST"])
def do_triple_construction():
print("Received POST request...")
authorization_header = request.headers.get("Authorization")
if authorization_header != os.getenv("API_SECRET"):
message = "Unauthorized"
return jsonify(error=f"Error occurred! {message}"), 401
try:
post_data = request.get_data().decode('utf-8')
post_json = json.loads(post_data)
RelationExtractor.begin_extraction(post_json) # Relation Extraction
entity_type_classification(post_json) # Concept Linking
message = "Post request was successfully processed. Relation Extraction and Concept Linking completed."
return jsonify(message=message), 200
except Exception as e:
return jsonify(error=f"Error occured: {str(e)}"), 422
@app.errorhandler(404)
def page_not_found(error):
message = "Invalid endpoint"
return jsonify(error=message), 404
if __name__ == '__main__':
app.run(host='0.0.0.0', port=4444)
The TripleConstruction API contains a single POST endpoint, and the implementation for this endpoint can be changed in lines 10-30.
Every time something is pushed to main, a new Docker image will be build and deployed (Continuous deployment).
name: build-and-deploy-docker-image
on:
push:
branches: ["main"]
env:
# Use docker.io for Docker Hub if empty
REGISTRY: ghcr.io
# github.repository as <account>/<repo>
IMAGE_NAME: ${{ github.repository }}
jobs:
docker_build_and_deploy_image:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v3
- name: Log into registry ${{ env.REGISTRY }}
uses: docker/login-action@28218f9b04b4f3f62068d7b6ce6ca5b26e35336c
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract Docker metadata
id: meta
uses: docker/metadata-action@98669ae865ea3cffbcbaa878cf57c20bbf1c6c38
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Build and push Docker image
uses: docker/build-push-action@ad44023a93711e3deb337508980b4b5e9bcdc5dc
with:
context: ./
file: ./Dockerfile
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
This workflow is responsible for creating and deploying the new Docker image. The workflow runs only on push to the main branch.
The workflow for testing (Continuous integration) is defined as such:
name: test
on:
push:
branches: ["**"]
jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout Repository
uses: actions/checkout@v3
- name: Setup python
uses: actions/setup-python@v3
with:
python-version: 3.11.0
- name: Install dependencies
run: |
echo "Installing dependencies"
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Run tests
run: |
echo "Testing..."
python -m unittest -b || exit 1
This workflow runs the tests discovered in the /test
directory. The workflow run on a push to all branches in the repository.
Group C (Relation Extraction)
Group D (Concept Linking)