Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Dockerfile to Ubuntu 22.04 + integrate qlever script #1439

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

hannahbast
Copy link
Member

The old Dockerfile called ServerMain directly using a small selection of environment variables with outdated names. It was also outdated in other respects.

The new Dockerfile installs the qlever script, so that it can be called from inside the container.

Remaining questions / TODOs, feedback welcome:

  1. Right now, the script is installed as part of the docker build via pipx install qlever. Is this the right way to do it? Alternatives would be to clone the GitHub repo and pipx install -e . from there, or include the the GitHub repo as a submodule of this repository.

  2. How do we handle the QLever UI. We could just call qlever ui from inside the container. But that would pull the Docker image for the Qlever UI and run a Docker container inside of a Docker container. It's possible, but not the right way to do it. If both are needed, the container for the QLever engine and the contaner for the QLever UI should run side by side.

  3. Theqlever setup-config command should have options that overwrite the variables in the produced Qleverfile. In particular, there should be an option for setting SYSTEM = native. Otherwise it has to be stated explictly for each command, where that is relevant (in particular: qlever index, qlever start, qlever example-queries).

The old Dockerfile called `ServerMain` directly using a small selection
of environment variables with outdated names. It was also outdated in
other respects.

The new Dockerfile installs the `qlever` script, so that it can be
called from inside the container.

Remaining questions / TODOs, feedback welcome:

1. Right now, the script is installed as part of the docker build via
   `pipx install qlever`. Is this the right way to do it?
   Alternatives would be to clone the GitHub repo and `pipx install -e
   .` from there, or include the the GitHub repo as a submodule of this
   repository.

2. How do we handle the QLever UI. We could just call `qlever ui` from
   inside the container. But that would pull the Docker image for the
   Qlever UI and run a Docker container inside of a Docker container.
   It's possible, but not the right way to do it. If both are needed,
   the container for the QLever engine and the contaner for the QLever
   UI should run side by side.

3. The`qlever setup-config` command should have options that overwrite
   the variables in the produced Qleverfile. In particular, there should
   be an option for setting `SYSTEM = native`. Otherwise it has to be
   stated explictly for each command, where that is relevant (in
   particular: `qlever index`, `qlever start`, `qlever
   example-queries`).
@hannahbast
Copy link
Member Author

@ludovicm67 Can you try this and let me know your feedback? It addresses some of the issues that we have discussed. In particular, you can now use the qlever script from inside the container (with autocompletion). Or you can run the container directly with a sequence of qlever commands.

If I understood you correctly, you would also like to run the QLever UI. This would also work here, but running a Docker container (for the QLever UI) inside of a Docker container is generally not a good idea. I don't have a solution yet, but here are some ideas:

  1. Run the Docker container with -v /var/run/docker.sock:/var/run/docker.sock so that Docker commands inside of the container are using the Docker daemon outside of the container. I am not sure how portable this is and whether there are security concerns.

  2. Use docker-compose to start two containers, one for the QLever backend and one for the QLever UI. What I don't like about this approach is that it adds complexity that is not needed for users that want just the backend and not the UI. Then again, for other (but not all) triple stores, the UI is often considered an integral part of the software.

  3. Have a Python package also for the QLever UI. Then it could be installed with pix install qlever-ui, analogously to the pipx install qlever for the qlever script. It does make sense because the QLever UI is a Python application. I have to think about that a little more (and play around with it and see how it works).

Copy link

codecov bot commented Aug 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 58.33%. Comparing base (0b9d26f) to head (fabb137).
Report is 4 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #1439       +/-   ##
===========================================
- Coverage   89.66%   58.33%   -31.33%     
===========================================
  Files         345      568      +223     
  Lines       29947    65968    +36021     
  Branches     3306     8727     +5421     
===========================================
+ Hits        26851    38485    +11634     
- Misses       1954    24839    +22885     
- Partials     1142     2644     +1502     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ludovicm67
Copy link

@hannahbast Thanks for the updates!

Remaining questions / TODOs, feedback welcome:

  1. Right now, the script is installed as part of the docker build via pipx install qlever. Is this the right way to do it? Alternatives would be to clone the GitHub repo and pipx install -e . from there, or include the the GitHub repo as a submodule of this repository.

I think it's great like this.
I would just recommend pinning a specific version of the qlever package, so that the build is reproducible and does not break when a new version is released.

  1. How do we handle the QLever UI. We could just call qlever ui from inside the container. But that would pull the Docker image for the Qlever UI and run a Docker container inside of a Docker container. It's possible, but not the right way to do it. If both are needed, the container for the QLever engine and the contaner for the QLever UI should run side by side.

I think it's fine to run the QLever UI in a separate container.
That way, the user can decide if they want to use the UI or not easily.
The user can also run the UI on a different machine than the engine, or one instance of the UI for multiple server instances, which can be useful in some cases.

  1. Theqlever setup-config command should have options that overwrite the variables in the produced Qleverfile. In particular, there should be an option for setting SYSTEM = native. Otherwise it has to be stated explictly for each command, where that is relevant (in particular: qlever index, qlever start, qlever example-queries).

Yes, I'm currently experimenting with environment variables and a simple shell script that generates the Qleverfile from them.
I will provide some feedback on this soon.

@ludovicm67
Copy link

ludovicm67 commented Aug 12, 2024

@hannahbast

@ludovicm67 Can you try this and let me know your feedback? It addresses some of the issues that we have discussed. In particular, you can now use the qlever script from inside the container (with autocompletion). Or you can run the container directly with a sequence of qlever commands.

Yes, I will give it a try soon and let you know my feedback.

If I understood you correctly, you would also like to run the QLever UI. This would also work here, but running a Docker container (for the QLever UI) inside of a Docker container is generally not a good idea.

Fully agree with you. It is better to have the QLever UI as a separate container.
Running a Docker container inside a Docker container should be avoided.

I don't have a solution yet, but here are some ideas:

  1. Run the Docker container with -v /var/run/docker.sock:/var/run/docker.sock so that Docker commands inside of the container are using the Docker daemon outside of the container. I am not sure how portable this is and whether there are security concerns.

I would not recommend this approach. It may lead to security issues, and I don't think it is a good practice.

  1. Use docker-compose to start two containers, one for the QLever backend and one for the QLever UI. What I don't like about this approach is that it adds complexity that is not needed for users that want just the backend and not the UI. Then again, for other (but not all) triple stores, the UI is often considered an integral part of the software.

This is a good approach.
It is better to have two separate containers for the backend and the UI.
It will make the deployment easier and more flexible.

It's also the easiest way to deploy the full stack (backend and UI) for users who want to use both ; a simple docker compose up will be enough.

  1. Have a Python package also for the QLever UI. Then it could be installed with pix install qlever-ui, analogously to the pipx install qlever for the qlever script. It does make sense because the QLever UI is a Python application. I have to think about that a little more (and play around with it and see how it works).

The issue with this is that you will run multiple processes at the same time (the backend and the UI) in a single container which is not recommended.
It will also increase the complexity of the container and the image size.

In the future, the best approach would be to have two separate containers: one for the backend and one for the UI.

And two variants: a minimal one, and a complete one, which will results in the following images:

  • qlever (server):
    • qlever:vX.Y.Z (minimal => only the server binary, no shell)
    • qlever:vX.Y.Z-cli (complete => the server binary, the qlever CLI, auto-generation of the Qleverfile and a shell)
  • qlever-ui (UI):
    • qlever-ui:vX.Y.Z (minimal => only the minimum to run the UI, no shell)
    • qlever-ui:vX.Y.Z-cli (complete => the content of the minimal version with the qlever-ui CLI, auto-generation of the Qleverfile and a shell)

That way, qlever-control can use the minimal images by default.
People that want to run binaries directly can also use that variant.

For people that want to have autoconfiguration, work with a Qleverfile, or have a shell to debug things in an easier way, they can use the complete images.

Copy link

@ludovicm67 ludovicm67 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the build of the image, the following warnings are shown:

#1 WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 1)
#1 WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 9)
#1 WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 21)

I did some suggestions in order to fix the issue mentioned in the warning message.

Example of logs where the warning is visible: https://github.com/ad-freiburg/qlever/actions/runs/10345045337/job/28631489324#step:8:175

Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
@hannahbast
Copy link
Member Author

@ludovicm67 Thank you for your comments. I change all the as to AS. And I agree that it makes sense to have separate containers for the QLever Engine and the QLever UI. But I see the following problem:

  1. Right now, leaving the QLever UI aside, everything is in one Docker container, including the compiled binaries and the qlever script. That way, one does not have to call the binaries directly (like before), but one can simply write something like qlever index from within the Docker container.

  2. With two different containers running in parallel (one for the QLever Engine and one for the QLever UI), where does the qlever script run? In one of the containers, with access to the other container? Or in a third container, with access to both containers? I think it's important that there is one container, where one can use the qlever script interactively if one wants to.

@ludovicm67
Copy link

@hannahbast

@ludovicm67 Thank you for your comments. I change all the as to AS.

Perfect 👍

And I agree that it makes sense to have separate containers for the QLever Engine and the QLever UI. But I see the following problem:

  1. Right now, leaving the QLever UI aside, everything is in one Docker container, including the compiled binaries and the qlever script. That way, one does not have to call the binaries directly (like before), but one can simply write something like qlever index from within the Docker container.

I'm not sure to see the problem here, as it's what is expected, no?

  1. With two different containers running in parallel (one for the QLever Engine and one for the QLever UI), where does the qlever script run? In one of the containers, with access to the other container? Or in a third container, with access to both containers? I think it's important that there is one container, where one can use the qlever script interactively if one wants to.

In both ; one will call the qlever get-data, qlever index and qlever start commands for the server part (probably with an option to skip the two first to avoid redownloading the data and building the index on every restart of the container if there is persistency).

The other one will call qlever ui for the UI part.

As in both containers, the Qleverfile will be generated on the fly using the environment variables (or can be mounted directly).

The qlever commands to debug the index, and so on would need to be run in the server container, as it is where the data is stored.

You can additionally release a qlever-control (or qlever-cli) container that will have the qlever script, but I will need to be able to get information from a remote server (as the data, the index, etc… are stored in the server container) so that it can be useful.
If so, it could also be used by people that doesn't want to/can't have a proper Python environment to run qlever.

@hannahbast
Copy link
Member Author

@ludovicm67 Just for clarification: Item 1 in my previous comment was not a problem (on the contrary), but just the first item on the list leading up to the problem description. Let me think about what you wrote and then come back to you. Having the qlever script in both containers looks wrong to me since it would also require to have the Qleverfile in both containers. I understand that it could be generated (by the same mechanism in both), but it still looks wrong. Mounting it is not always an option: if I understood you correctly, there are scenarios, where one wants everything inside of the Docker container(s).

@ludovicm67
Copy link

@hannahbast basically what we need to have for the UI container image:

  • Know the endpoint to target:
    • What is the hostname of the server?
    • What is the port of the server?
  • Display information about the database:
    • Get the database name
    • Get the description of the database
    • Maybe some useful other metadata?
  • Support for multiple endpoints (as a second step if not possible easily in a first version)
  • Manage users

By default, it ships with a basic test admin user if I remember correctly.
It could be great if we can disable/enable the admin part and/or dynamically configure the user.

I suggested using the qlever script, as for now it's already taking care about the endpoints to target and the database metadata, so that there is a single thing to maintain.
But I'm not opposed to having another way to have this ; the thing I would just expect is that it's possible to have this dynamically configured on the fly if needed.
As generating the Qleverfile dynamically would be done in the server part, reusing that work for the UI could make sense, to reduce the effort to get into something that works well and that can be easily maintained.

But I remain completely open for any other solution.
What I wrote are just simple ideas that could solve the base issues (having containers that could be easily deployed and configured) in order to open a discussion.

@hannahbast
Copy link
Member Author

@ludovicm67 Thanks a lot and the exchange is much appreciated. I will think about it and come back to you.

ENV CACHE_MAX_NUM_ENTRIES 1000
# Need the shell to get the INDEX_PREFIX environment variable
ENTRYPOINT ["/bin/sh", "-c", "exec ServerMain -i \"/index/${INDEX_PREFIX}\" -j 8 -m ${MEMORY_FOR_QUERIES} -c ${CACHE_MAX_SIZE_GB} -e ${CACHE_MAX_SIZE_GB_SINGLE_ENTRY} -k ${CACHE_MAX_NUM_ENTRIES} -p 7001 \"$@\"", "--"]
ENTRYPOINT ["bash"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest using the default entrypoint from the Ubuntu Docker image (preferred):

Suggested change
ENTRYPOINT ["bash"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or explicitly set it to an empty one:

Suggested change
ENTRYPOINT ["bash"]
ENTRYPOINT [""]

USER qlever
ENV PATH=/app/:$PATH
RUN PIPX_HOME=/qlever/.local PIPX_BIN_DIR=/qlever/.local/bin PIPX_MAN_DIR=/qlever/.local/share pipx install qlever

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that might be good to do is to lock a specific version of the qlever CLI, to avoid unwanted breaking changes if the image gets rebuilt.

First define an ARG like this at the top of the Dockerfile (so that it's easy to know what to change in case of upgrade):

ARG QLEVER_VERSION="0.5.3"

Then you might need to tell the layer to use the QLEVER_VERSION arg, by just adding:

ARG QLEVER_VERSION

in the layer.

And then, you can use it that way, if I'm not mistaking on how to specify a package version with pipx (I guess it should behave like pip):

Suggested change
RUN PIPX_HOME=/qlever/.local PIPX_BIN_DIR=/qlever/.local/bin PIPX_MAN_DIR=/qlever/.local/share pipx install qlever
RUN PIPX_HOME=/qlever/.local PIPX_BIN_DIR=/qlever/.local/bin PIPX_MAN_DIR=/qlever/.local/share pipx install "qlever==${QLEVER_VERSION}"

You can take a look here for an example: https://github.com/zazukoians/qlever-tests/blob/76ada0b53174beb79d11ed14662536689a165fff/docker/server.Dockerfile

Copy link

sonarcloud bot commented Aug 15, 2024

Copy link
Member

@Qup42 Qup42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image can be optimized for a smaller size. The 2 suggested changes are the low-hanging fruit and reduce the compressed image size by 40MB/15% (uncompressed: 100MB/12.5%). There is probably potential for further space savings. Further investigation is required to evaluate the cost/benefit of these.

It should be 24.04 in the PR title instead of 22.04..

ENV LC_CTYPE C.UTF-8
ENV LANG=C.UTF-8
ENV LC_ALL=C.UTF-8
ENV LC_CTYPE=C.UTF-8
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y software-properties-common && add-apt-repository -y ppa:mhier/libboost-latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer required. The used boost packages are from the official package repositories now.

Suggested change
RUN apt-get update && apt-get install -y software-properties-common && add-apt-repository -y ppa:mhier/libboost-latest

ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.81-dev libboost-program-options1.81-dev libboost-iostreams1.81-dev libboost-url1.81-dev
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.83-dev libboost-program-options1.83-dev libboost-iostreams1.83-dev libboost-url1.83-dev pipx bash-completion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove apt cache.

Suggested change
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.83-dev libboost-program-options1.83-dev libboost-iostreams1.83-dev libboost-url1.83-dev pipx bash-completion
RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.83-dev libboost-program-options1.83-dev libboost-iostreams1.83-dev libboost-url1.83-dev pipx bash-completion && rm -rf /var/lib/apt/lists/*

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants