The KNOX chatbot, is an interactive tool that allows users to write prompts regarding the data stored in the KNOX knowledge graph and receive a response in natural language.
The current state of the KNOX chatbot can be observed in the workflow below. All modules in KNOX are connected using the Knowledge Retriever which works as follows:
spaCy
moduleDatabase API
module, which currently queries Wikidata for the subjects, objects and predicates that pertain to each given entity.Llama 2
module, together with the original user prompt and in the future, an initial prompt stored in a file 'Prompt Format.txt'Input: The input to the chatbot component of KNOX is a user prompt.
Output: The output of the chatbot component of KNOX is a response in natural language.
A workflow showing how the chatbot component of KNOX works. For a more in-depth description of the workflow, see the report made by Group G 2023.
In order to get started with the KNOX chatbot, you first need to pull down the chatbot repository
git pull https://github.com/Knox-AAU/FunctionalityLayer_Chatbot
Then, open a console in the repository /FunctionalityLayer_Chatbot. To set up the whole project in Docker, run the following command (Make sure to have Docker installed and the Docker engine running):
docker-compose -f docker-compose-dev-nollama.yml up -d --build
This command will build and start the whole project. the docker-compose-dev-nollama file is used when llama is not locally installed. If Llama2.gguf is locally installed, and placed inside the /llama
directory from the repository, run the same command, but on docker-compose-dev.yml instead. (Llama2.gguf is not per default on the repository, as it is too large to store on github)
With the services running, you should be able to call them, either programmatically or via an API platform such as Postman. Postman collections with example calls can be found in the FunctionalityLayer_Chatbot/PostmanCollections
directory. It is a good idea to set up an environment to store the "access-authorization" header as a secret if calls to the Knox database API are to be made from Postman. You can also define the Postman requests yourself by using the endpoints and bodies described below. See How to run the model as an API? for an example of how to do this. See https://learning.postman.com/docs/sending-requests/requests/ for info about Postman and creating requests in general.
Knowledge Retriever:
http://localhost:5001/knowledge_retriever
Example body:
{
"input_string": "Hvor ligger Aalborg og Berlin?",
"run_llama": true
}
This will call the Knowledge retriever with the prompt "Hvor ligger Aalborg?". "run_llama" is a property, which dictates whether the Knowledge Retriever will call the Llama module or skip it. The Knowledge Retriever is the main module, and has the responsibility for calling the other modules.
spaCy:
http://localhost:5003/extract_entities
Example body:
{
"input_string": "Hvor ligger Aalborg og Berlin?"
}
CallApi Module
http://localhost:5002/get_triples
Example body:
{
"keywords": ["Aalborg", "Berlin"]
}
Llama module
http://localhost:5004/llama
Example body
{
"system_message": "You are a helpful assistant",
"user_message": "Generate a list of 5 funny dog names",
"max_tokens": 100
}
In order to call the Knowledge Retriever on the server, the endpoint http://knox-func01.srv.aau.dk:5001/knowledge_retriever
with the same body format as shown previously can be used. It should be noted that the caller is required to be connected to the AAU network, either physically or through a VPN, in order to call it.
The purpose of spaCy in the KNOX chatbot is to extract entities from a user prompt, which can then be used as the subject in a query on the KNOX knowledge graphs.
To use spaCy's entity extraction, run the following installation commands
pip install -U pip setuptools wheel
pip install -U spacy
python -m spacy download da_core_news_md
When installation is finished you can now import spaCy into your python files and use the model you have installed. (In this case its danish medium model)
The following snippet is an example of how spaCy can be used to extract entities, given a string. It is also the direct implementation of spaCy in the KNOX chatbot. This implementation is found in the function extract_entities()
in /spaCy/spaCy.py
.
import spacy
nlp = spacy.load('da_core_news_md')
def extract_entities(input_string):
doc = nlp(input_string)
entities = [ent.text for ent in doc.ents]
return entities
SpaCy is the best tool we could find for Natural Language Understanding (NLU). SpaCy's entity extraction works by identifying words in a given string, which correlate to a named entity in the spaCy model.
"A named entity is a “real-world object” that is assigned a name – for example, a person, a country, a product or a book title. spaCy can recognize various types of named entities in a document, by asking the model for a prediction." - spaCy.io
Entity extraction is only a small part of the functionality behind spaCy, which covers features on a large spectrum of Natural Language Processing (NLP) concepts.
The purpose of setting up an API to the knowledge base, is to use the API for querying the knowledge base in order to retrieve a knowledge graph. This knowledge graph is used as part of a prompt for the LLM so the chatbot can answer based on the data in KNOX.
The 'API_connections' folder under the KNOX chatbot repository, contains the code for connecting to two different knowledge bases. Within this folder there are two Python files used for connecting to knowledge bases. one is 'database_API' and the other is 'wikidata_API'.
The 'database_API' file is used to connect to the KNOX database. This file is currently unfinished as it was decided to use WikiData instead as a placeholder for the KNOX database
The 'wikidata_API' file is used to connect to the WikiData API, in order to query WikiData's knowledge graphs containing the desired information.
Some of the functions in 'database_API' may not currently work, because of changes made to the KNOX database endpoint.
get_triples()
get_triples()
is called using a POST request, with a JSON object keywords
as input, which contains a list of keywords. This function iterates through each of the keywords, and generates two queries for each keyword using get_endpoint_url()
. It then calls the KNOX endpoint twice for each keyword, using it as a subject first and then an object. This is done to get all data related to the keywords where the keyword is both a subject and an object. a JSON object containing the triples is returned.
get_endpoint_url()
get_endpoint_url()
is a function used to generate the query which get_triples()
needs. It takes a string keyword as input, and the type of query to generate. This type can be either "object" or "subject". It does this by taking the KNOX URL and appending &s=
or &o=
to the URL, depending on if we are querying for a subject or object.
All functions in 'wikidata_API' should work, unless WikiData themselves changed their SparQL endpoint.
get_triples()
get_triples()
works similar to the one found in 'database_API'. It also takes a JSON object keywords
as input and generates queries for each keyword. In order to create the query, the function calls get_wikidata_id()
, which gets the ID for the keyword. This is necessary for generating the WikiData query. The WikiData query is then made, which queries for subject
, object
and predicate
. It then calls get_triples_from_wikidata()
with the query as input, and the result is returned as a JSON object containing triples.
get_wikidata_id()
get_wikidata_id()
is a function used to retrieve a keywords WikiData ID. A keyword (or entity) is given as input. It uses a WikiData endpoint https://www.wikidata.org/w/api.php
in order to get all results, in WikiData which has 'entity' as it's 'title'. The function then returns the first ID in the list.
get_triples_from_wikidata()
get_triples_from_wikidata()
is used to call the WikiData SparQL endpoint using a query as input. It uses the SparQLWrapper python library to connect to the WikiData SparQL endpoint using the url https://query.wikidata.org/sparql
. It then queries the endpoint and returns JSON object containing triples. The function then calls format_triple_object()
which formats the triples so it matches the desired input to the LLM. It then returns these formatted triples.
format_triple_object()
format_triple_object()
takes a specific triple JSON object as input from WikiData, and formats it. The JSON gets formatted to contain triples with the keywords s
, p
and o
. This new formatted triple is then returned.
The purpose of the Llama 2 LLM in the KNOX chatbot, is response generation.
Llama 2 comes in multiple sizes. 7B, 13B, and 70B, where B stands for billion parameters. What this means is the larger the model, the better the performance and greater resources needed.
Another thing besides size is different types:
Usually Large Language Models(LLM) require high-end GPU's to be ran which the standard Meta Llama does. However a community has made llama.cpp which allows Llama models to run solely on your CPU, greatly increasing performance if you have lacking GPU resources. llama.cpp applies a custom quantization approach to compress the models in a GGUF format. This reduces their size and resource needed.
The model that has been used in this project is Llama2.gguf.
Because the whole project is written in Python the Llama handling should also be in Python. But because the Llama2.gguf file is written in C++, we need to run it in a pythonic way. This has also been considered by the community and exists as a project called llama-cpp-python which can be installed with:
pip install llama-cpp-python
The Llama model in this project was deployed as a docker container, so you will need Docker running on your system. For this you can use Docker Desktop.
llama-cpp-python helps us create a fast and local model endpoint using Flask. The model was invoked in a parametrized way so that you are able to dynamically select the number of maximum tokens, and system and user prompt.
First step is to install Flask to create a server:
pip install Flask
A serving endpoint is now able to be created, which will accept POST requests on http://localhost:5004/llama and expect a JSON input with max_tokens, system_message, and user_message properties specified.
Here is how this is done:
from flask import Flask, request, jsonify
from llama_cpp import Llama
# Create a Flask object
app = Flask(__name__)
model = None
@app.route('/llama', methods=['POST'])
def generate_response():
global model
try:
data = request.get_json()
# Check if the required fields are present in the JSON data
if 'system_message' in data and 'user_message' in data and 'max_tokens' in data:
system_message = data['system_message']
user_message = data['user_message']
max_tokens = int(data['max_tokens'])
# Prompt creation
prompt = f"""<s>[INST] <<SYS>>
{system_message}
<</SYS>>
{user_message} [/INST]"""
# Create the model if it was not previously created
if model is None:
# Put the location of to the GGUF model that you've download from HuggingFace here
model_path = "Llama2.gguf"
# Create the model
model = Llama(model_path=model_path)
# Run the model
output = model(prompt, max_tokens=max_tokens, echo=True)
return jsonify(output)
else:
return jsonify({"error": "Missing required parameters"}), 400
except Exception as e:
return jsonify({"Error": str(e)}), 500
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5004, debug=True)
This file is saved in the same directory as Llama2.gguf with the name llama_cpu_server.py.
To test locally, run:
python llama_cpy_server.py
This starts a local server on port 5004. Now you can use Postman and create a POST request to http://localhost:5004/llama. Set the "Body" to raw and enter this:
{
"system_message": "You are a helpful assistant",
"user_message": "Generate a list of 5 funny dog names",
"max_tokens": 50
}
When you hit "Send" the output should look something like this:
{
"id":"cmpl-078b3c61-8ced-4d7e-8fb1-688083f97a89",
"object":"text_completion",
"created":1698571747,
"model":"D:\\models\\llama-2-7b-chat.Q2_K.gguf",
"choices":[
{
"text":"<s>[INST] <<SYS>>\nYou are a helpful assistant\n<</SYS>>\nGenerate a list of 5 funny dog names [/INST] Of course! Here are five funny dog names that might bring a smile to your face:\n\n1. Barky McSnugglepants - This name is perfect for a fluffy, snuggly dog who loves to get cozy with their human family.\n2. Puddles McSquishy - This name is great for a dog that's always getting into sticky situations, like muddy paws and accidental kiss",
"index":0,
"logprobs":"None",
"finish_reason":"length"
}
],
"usage":{
"prompt_tokens":38,
"completion_tokens":100,
"total_tokens":138
}
}
The Llama file was too large to store in the GitHub repository, so another solution had to be utilized instead of having GitHub Actions build and push the Llama image. Therefore, it was decided to build and push it to the Docker Hub manually. The Docker image was built for both ARM and AMD processors, to ensure that multiple types of machines would be able to run the image.
A Dockerfile was created that contains the model and the server logic:
# Use python as base image
FROM python
# Set the working directory in the container
WORKDIR /llama
# Copy the current directory contents into the container at /llama
COPY ./llama_cpu_server.py /llama/llama_cpu_server.py
COPY ./Llama2.gguf* /llama/Llama2.gguf
# Install the needed packages
RUN pip install llama-cpp-python
RUN pip install Flask
#Expose port 5004 outisde of the container
EXPOSE 5004
# Run llama_cpu_server.py when the container launches
CMD ["python", "llama_cpu_server.py"]
This Dockerfile is saved in the same folder as llama_cpu_server.py.
Afterwards the Docker container can be built when you are in the root folder of the project with:
docker-compose -f docker-compose-dev.yml build
Once built you have to tag the build with:
docker tag functionalitylayer_chatbot-llama:latest eshes/knox-group-g:master_llama
Then you have to login to Docker:
docker login
Now that you have logged in you can push to the Docker Hub repository with:
docker buildx build --platform linux/amd64,linux/arm64 -t eshes/knox-group-g:master_llama --push .
The build does take a while, so be patient.
Once it's done Watchtower will pull the image from the repository once every hour, see Making changes to the project.
Once the image is on the server and is running you are able to use Postman to create a POST request for the Llama API.
ssh <Student Initials>@student.aau.dk@knox-func01.srv.aau.dk -L 8000:localhost:5001
http://knox-func01.srv.aau.dk:5004/llama
Content-Type
in the Keyapplication/json
in the Value{
"system_message": "You are a helpful assistant",
"user_message": "Generate a list of 5 funny dog names",
"max_tokens": 50
}
Right now the model is not production ready code. Flask is running in debug mode, we are not exposing all model parameters. The model is not utilizing the full chat capabilities because there is no user session implemented, and previous context will be lost at every new request. Another thing to mention is that this solution is not scalable in its current form, and consequent requests will break the server.
The purpose of the initial prompt, is to tell the LLM how it should process and interpret data and the user input.
In Docker a .txt file named 'Prompt Format' is stored. This file stores the initial prompt to the LLM, which contains a description of how the LLM should respond, aswell as a placeholder for the user input (the question) and for the RDF-data from KNOX.
Below the full 'Prompt Format' .txt file can be seen:
##Context:
You are a virtual assistant.
The data is all your knowledge.
Do not answer if it can't be found in the data. If so, respond in one sentence "I can't answer that question".
Do not recognize the data.
Do not use bullet points
##Prompt format:
data:
{DATA_PLACEHOLDER}
question:
{QUESTION_PLACEHOLDER}
This prompt is currently designed specifically for a 13B Llama model, meaning it contains 13 billion parameters which it uses to interpret and generate a response. The model ended up being used was LLama 7B which can at max take 16384 tokens as input.
Hugging Face contains online 'spaces' for using Llama which were used to test the initial prompt.
The "Prompt Format.txt" file is found in /Llama
. Changing the contents of this file does not change the prompt given to Llama, as this feature was never fully implemented. The prompt contains {DATA_PLACEHOLDER}
and {QUESTION_PLACEHOLDER}
. They are supposed to act as placeholders specifying the placement of important information.
{DATA_PLACEHOLDER}
defines the placement of the data recieved from the KNOX API
{QUESTION_PLACEHOLDER}
defines the placement of the user input (the question recieved from the frontent)
The project on the server is running on the docker-compose-prod.yml file, which gets the images of the project modules form the Docker Hub.
When making changes to the main branch on the GitHub repository, GitHub Actions automatically builds new images with the new changes, and pushes them to the Docker Hub. Since Watchtower is running on the server, it then detects changes to the images of the running containers once every hour, and rebuilds the containers on the server, with the latest changes that were pushed to the GitHub repository. This works for all modules except for the Llama module, The Llama image has a size of 3 GB, which is too large to store in a regular GitHub repository. Therefore, it has to be manually pushed to the Docker Hub when local changes are made to the module as shown in the Llama 2 LLM section.
To connect to the server, use the following command:
ssh <Student Initials>@student.aau.dk@knox-func01.srv.aau.dk -L 8000:localhost:5001
The port forwarding configuration is not required unless you want to call the knowledge retriever on the server using localhost, while connected. Per default, you are placed in the directory of your student id when connecting to the server: /home/student.aau.dk/<StudentId>
To use Docker on the server, one must be part of the "Docker" group, or have sudo access. To add someone to the Docker group, use the following command:
sudo usermod -aG docker <StudentId>@student.aau.dk
If changes are made to the docker-compose-prod file, it should be uploaded to the server again. The file is placed at /srv/data
on the knox-func01.srv.aau.dk
server.
The new docker-compose-prod.yml file can be uploaded to the server from your local computer, using SFTP. However, first you need to have write access to the folder where the docker-compose-prod file is placed. For that, a group has been created on the server. To add a user to that group, use the following command:
sudo usermod -a -G edit-data <StudentId>@student.aau.dk
To replace the old docker-compose-prod.yml file with the new one, the following commands can be used. See the link for an explanation of the commands.
Open a terminal window and navigate to the FunctionalityLayer_Chatbot repository on your local machine.
Connect to the server with SFTP in the same terminal window.
sftp <Student Initials>@student.aau.dk@knox-func01.srv.aau.dk
Navigate to the location of the docker-compose-prod.yml file on the server
cd /srv/data
Put the new version of the docker-compose-prod.yml file on the server, from your pc
put docker-compose-prod.yml
Connect to the server with ssh
ssh <Student Initials>@student.aau.dk@knox-func01.srv.aau.dk
Navigate to the docker-compose file location
cd /srv/data
Rebuild the project
docker-compose -f docker-compose-prod.yml up -d --build
The project should now be up and running with the new changes.
To continue working on the project you will have to replace the current integrated Docker Hub repository: eshes/knox-group-g
to your own repository.
Make a Docker Hub repository on Docker Hub
You now have to replace any call to the Docker Hub repository inside the workflow files on the GitHub repository, with the credentials to the new Docker Hub repository that you have created for the group.
To do this you need to access the 2 workflow files: .github/workflows/docker-pr.yml
and .github/workflows/docker-push-main-pylint.yml
Inside the files you need find all instances where the eshes/knox-group-g
Docker Hub repository is called and replace it with your own Docker Hub repository name, seen in the example below:
tags: eshes/knox-group-g:${{ steps.extract_branch.outputs.branch }}_KnowledgeRetriever
tags: {username}/{docker hub repo name}:${{ steps.extract_branch.outputs.branch }}_KnowledgeRetriever
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
DISCORD_WEBHOOK: ${{ secrets.DISCORD_WEBHOOK }}
To replace the current DOCKERHUB_USERNAME
& DOCKERHUB_TOKEN
you have to go into the GitHub repository settings for actions and simply hit the Edit button on each of them, and enter the required information. In this case the username for the account that has push access, and the PAT(Personal access token) for the account, this can be created on the Docker Hub page.
You could just place the real keys in the files, however, it is safe to call them through the secrets in settings which is not accessible to the public.
spacy:
# Get the Docker image for the app service using the Dockerfile in the ./spaCy directory.
image: eshes/knox-group-g:Master_spaCy # Change this to refer to the new repository
container_name: entity_extraction_prod
# Map port 5003 in the container to port 5003 on the host machine.
ports:
- "5003:5003"
networks:
- chatbotnetwork
hostname: spacy-container
If you wish to add a new service to the solution, there are a few things which require changing. First of all, a dockerfile for the service has to be created, so it can run in an isolated environment and be part of the solution.
Secondly, the service has to be added to the docker-compose files. The added service will likely have the same structure as the the other services in the docker-compose files. When the service has been added, the docker-compose-prod file has to be uploaded to the server as described here.
Finally, the GitHub Actions must also be updated to build and push the service to the Docker Hub as a Docker Image. The GitHub actions workflows can be found in in the .github folder of the GitHub repository. Here, the docker-pr.yml
and docker-push-main-pylint.yml
files have to be updated. The docker-pr.yml
workflow is run whenever a pull request to the main branch is made. It runs tests, builds and pushes the Docker images to the docker hub in order to test that the images can be built without issues (These images do not trigger the project being updated on the server). The docker-push-main-pylint.yml
workflow is triggered when the main branch is pushed to. It pushes all the images to the Docker Hub with the tag Master_[ServiceName]. An example of the code to push one of the services is displayed below. It is important to push for both amd64 and arm64 architecture, as different machines are only able to run docker images on specific architectures.
- name: Push Api connections image
uses: docker/build-push-action@v5
with:
context: API_Connections/
push: true
tags: eshes/knox-group-g:Master_ApiConnect
platforms: linux/amd64,linux/arm64
When this is done, the new service has been properly added to the server and the CI/CD pipeline.
The Llama module is unstable in its current state, and is unable to handle multiple requests, in which case it crashes and has to be started manually again. To restart Llama, you must be a member of the "Docker" group, see Joining the Docker group section, on the linux server or have sudo access. Then, use the following command to start it again:
docker start llama_prod
The biggest problem of the KNOX chatbot is the efficiency and quality of the responses it generates. There are a couple options for improving this issue:
Currently the chatbot uses Wikidata as its main source of information, outside of the data Llama was trained on. However this is not the purpose of the KNOX project. It should be connected to the KNOX knowledge base and use the relevant data from that source. The problem is that the KNOX knowledge base does not currently carry enough relevant information to support any user prompt through the chatbot. Fortunately there already exists an endpoint for quickly connecting the chatbot to the KNOX knowledge base, when enough data is available. See more in section API to knowledge base on how to setup the chatbot with the KNOX knowledge base. For more information regarding future works, see the report by Group G 2023.