Update Dockerfile to Ubuntu 22.04 + integrate qlever script #1439

hannahbast · 2024-08-12T02:40:00Z

The old Dockerfile called ServerMain directly using a small selection of environment variables with outdated names. It was also outdated in other respects.

The new Dockerfile installs the qlever script, so that it can be called from inside the container.

Remaining questions / TODOs, feedback welcome:

Right now, the script is installed as part of the docker build via pipx install qlever. Is this the right way to do it? Alternatives would be to clone the GitHub repo and pipx install -e . from there, or include the the GitHub repo as a submodule of this repository.
How do we handle the QLever UI. We could just call qlever ui from inside the container. But that would pull the Docker image for the Qlever UI and run a Docker container inside of a Docker container. It's possible, but not the right way to do it. If both are needed, the container for the QLever engine and the contaner for the QLever UI should run side by side.
Theqlever setup-config command should have options that overwrite the variables in the produced Qleverfile. In particular, there should be an option for setting SYSTEM = native. Otherwise it has to be stated explictly for each command, where that is relevant (in particular: qlever index, qlever start, qlever example-queries).

The old Dockerfile called `ServerMain` directly using a small selection of environment variables with outdated names. It was also outdated in other respects. The new Dockerfile installs the `qlever` script, so that it can be called from inside the container. Remaining questions / TODOs, feedback welcome: 1. Right now, the script is installed as part of the docker build via `pipx install qlever`. Is this the right way to do it? Alternatives would be to clone the GitHub repo and `pipx install -e .` from there, or include the the GitHub repo as a submodule of this repository. 2. How do we handle the QLever UI. We could just call `qlever ui` from inside the container. But that would pull the Docker image for the Qlever UI and run a Docker container inside of a Docker container. It's possible, but not the right way to do it. If both are needed, the container for the QLever engine and the contaner for the QLever UI should run side by side. 3. The`qlever setup-config` command should have options that overwrite the variables in the produced Qleverfile. In particular, there should be an option for setting `SYSTEM = native`. Otherwise it has to be stated explictly for each command, where that is relevant (in particular: `qlever index`, `qlever start`, `qlever example-queries`).

hannahbast · 2024-08-12T03:30:22Z

@ludovicm67 Can you try this and let me know your feedback? It addresses some of the issues that we have discussed. In particular, you can now use the qlever script from inside the container (with autocompletion). Or you can run the container directly with a sequence of qlever commands.

If I understood you correctly, you would also like to run the QLever UI. This would also work here, but running a Docker container (for the QLever UI) inside of a Docker container is generally not a good idea. I don't have a solution yet, but here are some ideas:

Run the Docker container with -v /var/run/docker.sock:/var/run/docker.sock so that Docker commands inside of the container are using the Docker daemon outside of the container. I am not sure how portable this is and whether there are security concerns.
Use docker-compose to start two containers, one for the QLever backend and one for the QLever UI. What I don't like about this approach is that it adds complexity that is not needed for users that want just the backend and not the UI. Then again, for other (but not all) triple stores, the UI is often considered an integral part of the software.
Have a Python package also for the QLever UI. Then it could be installed with pix install qlever-ui, analogously to the pipx install qlever for the qlever script. It does make sense because the QLever UI is a Python application. I have to think about that a little more (and play around with it and see how it works).

codecov · 2024-08-12T09:05:55Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 58.33%. Comparing base (0b9d26f) to head (fabb137).
Report is 4 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #1439       +/-   ##
===========================================
- Coverage   89.66%   58.33%   -31.33%     
===========================================
  Files         345      568      +223     
  Lines       29947    65968    +36021     
  Branches     3306     8727     +5421     
===========================================
+ Hits        26851    38485    +11634     
- Misses       1954    24839    +22885     
- Partials     1142     2644     +1502

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ludovicm67 · 2024-08-12T09:41:51Z

@hannahbast Thanks for the updates!

Remaining questions / TODOs, feedback welcome:

Right now, the script is installed as part of the docker build via pipx install qlever. Is this the right way to do it? Alternatives would be to clone the GitHub repo and pipx install -e . from there, or include the the GitHub repo as a submodule of this repository.

I think it's great like this.
I would just recommend pinning a specific version of the qlever package, so that the build is reproducible and does not break when a new version is released.

How do we handle the QLever UI. We could just call qlever ui from inside the container. But that would pull the Docker image for the Qlever UI and run a Docker container inside of a Docker container. It's possible, but not the right way to do it. If both are needed, the container for the QLever engine and the contaner for the QLever UI should run side by side.

I think it's fine to run the QLever UI in a separate container.
That way, the user can decide if they want to use the UI or not easily.
The user can also run the UI on a different machine than the engine, or one instance of the UI for multiple server instances, which can be useful in some cases.

Theqlever setup-config command should have options that overwrite the variables in the produced Qleverfile. In particular, there should be an option for setting SYSTEM = native. Otherwise it has to be stated explictly for each command, where that is relevant (in particular: qlever index, qlever start, qlever example-queries).

Yes, I'm currently experimenting with environment variables and a simple shell script that generates the Qleverfile from them.
I will provide some feedback on this soon.

ludovicm67 · 2024-08-12T09:55:57Z

@hannahbast

@ludovicm67 Can you try this and let me know your feedback? It addresses some of the issues that we have discussed. In particular, you can now use the qlever script from inside the container (with autocompletion). Or you can run the container directly with a sequence of qlever commands.

Yes, I will give it a try soon and let you know my feedback.

If I understood you correctly, you would also like to run the QLever UI. This would also work here, but running a Docker container (for the QLever UI) inside of a Docker container is generally not a good idea.

Fully agree with you. It is better to have the QLever UI as a separate container.
Running a Docker container inside a Docker container should be avoided.

I don't have a solution yet, but here are some ideas:

Run the Docker container with -v /var/run/docker.sock:/var/run/docker.sock so that Docker commands inside of the container are using the Docker daemon outside of the container. I am not sure how portable this is and whether there are security concerns.

I would not recommend this approach. It may lead to security issues, and I don't think it is a good practice.

Use docker-compose to start two containers, one for the QLever backend and one for the QLever UI. What I don't like about this approach is that it adds complexity that is not needed for users that want just the backend and not the UI. Then again, for other (but not all) triple stores, the UI is often considered an integral part of the software.

This is a good approach.
It is better to have two separate containers for the backend and the UI.
It will make the deployment easier and more flexible.

It's also the easiest way to deploy the full stack (backend and UI) for users who want to use both ; a simple docker compose up will be enough.

Have a Python package also for the QLever UI. Then it could be installed with pix install qlever-ui, analogously to the pipx install qlever for the qlever script. It does make sense because the QLever UI is a Python application. I have to think about that a little more (and play around with it and see how it works).

The issue with this is that you will run multiple processes at the same time (the backend and the UI) in a single container which is not recommended.
It will also increase the complexity of the container and the image size.

In the future, the best approach would be to have two separate containers: one for the backend and one for the UI.

And two variants: a minimal one, and a complete one, which will results in the following images:

qlever (server):
- qlever:vX.Y.Z (minimal => only the server binary, no shell)
- qlever:vX.Y.Z-cli (complete => the server binary, the qlever CLI, auto-generation of the Qleverfile and a shell)
qlever-ui (UI):
- qlever-ui:vX.Y.Z (minimal => only the minimum to run the UI, no shell)
- qlever-ui:vX.Y.Z-cli (complete => the content of the minimal version with the qlever-ui CLI, auto-generation of the Qleverfile and a shell)

That way, qlever-control can use the minimal images by default.
People that want to run binaries directly can also use that variant.

For people that want to have autoconfiguration, work with a Qleverfile, or have a shell to debug things in an easier way, they can use the complete images.

ludovicm67

During the build of the image, the following warnings are shown:

#1 WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 1)
#1 WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 9)
#1 WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 21)

I did some suggestions in order to fix the issue mentioned in the warning message.

Example of logs where the warning is visible: https://github.com/ad-freiburg/qlever/actions/runs/10345045337/job/28631489324#step:8:175

Dockerfile

hannahbast · 2024-08-12T15:04:30Z

@ludovicm67 Thank you for your comments. I change all the as to AS. And I agree that it makes sense to have separate containers for the QLever Engine and the QLever UI. But I see the following problem:

Right now, leaving the QLever UI aside, everything is in one Docker container, including the compiled binaries and the qlever script. That way, one does not have to call the binaries directly (like before), but one can simply write something like qlever index from within the Docker container.
With two different containers running in parallel (one for the QLever Engine and one for the QLever UI), where does the qlever script run? In one of the containers, with access to the other container? Or in a third container, with access to both containers? I think it's important that there is one container, where one can use the qlever script interactively if one wants to.

ludovicm67 · 2024-08-12T15:43:34Z

@hannahbast

@ludovicm67 Thank you for your comments. I change all the as to AS.

Perfect 👍

And I agree that it makes sense to have separate containers for the QLever Engine and the QLever UI. But I see the following problem:

Right now, leaving the QLever UI aside, everything is in one Docker container, including the compiled binaries and the qlever script. That way, one does not have to call the binaries directly (like before), but one can simply write something like qlever index from within the Docker container.

I'm not sure to see the problem here, as it's what is expected, no?

With two different containers running in parallel (one for the QLever Engine and one for the QLever UI), where does the qlever script run? In one of the containers, with access to the other container? Or in a third container, with access to both containers? I think it's important that there is one container, where one can use the qlever script interactively if one wants to.

In both ; one will call the qlever get-data, qlever index and qlever start commands for the server part (probably with an option to skip the two first to avoid redownloading the data and building the index on every restart of the container if there is persistency).

The other one will call qlever ui for the UI part.

As in both containers, the Qleverfile will be generated on the fly using the environment variables (or can be mounted directly).

The qlever commands to debug the index, and so on would need to be run in the server container, as it is where the data is stored.

You can additionally release a qlever-control (or qlever-cli) container that will have the qlever script, but I will need to be able to get information from a remote server (as the data, the index, etc… are stored in the server container) so that it can be useful.
If so, it could also be used by people that doesn't want to/can't have a proper Python environment to run qlever.

hannahbast · 2024-08-12T15:51:45Z

@ludovicm67 Just for clarification: Item 1 in my previous comment was not a problem (on the contrary), but just the first item on the list leading up to the problem description. Let me think about what you wrote and then come back to you. Having the qlever script in both containers looks wrong to me since it would also require to have the Qleverfile in both containers. I understand that it could be generated (by the same mechanism in both), but it still looks wrong. Mounting it is not always an option: if I understood you correctly, there are scenarios, where one wants everything inside of the Docker container(s).

ludovicm67 · 2024-08-12T16:20:50Z

@hannahbast basically what we need to have for the UI container image:

Know the endpoint to target:
- What is the hostname of the server?
- What is the port of the server?
Display information about the database:
- Get the database name
- Get the description of the database
- Maybe some useful other metadata?
Support for multiple endpoints (as a second step if not possible easily in a first version)
Manage users

By default, it ships with a basic test admin user if I remember correctly.
It could be great if we can disable/enable the admin part and/or dynamically configure the user.

I suggested using the qlever script, as for now it's already taking care about the endpoints to target and the database metadata, so that there is a single thing to maintain.
But I'm not opposed to having another way to have this ; the thing I would just expect is that it's possible to have this dynamically configured on the fly if needed.
As generating the Qleverfile dynamically would be done in the server part, reusing that work for the UI could make sense, to reduce the effort to get into something that works well and that can be easily maintained.

But I remain completely open for any other solution.
What I wrote are just simple ideas that could solve the base issues (having containers that could be easily deployed and configured) in order to open a discussion.

hannahbast · 2024-08-12T16:24:19Z

@ludovicm67 Thanks a lot and the exchange is much appreciated. I will think about it and come back to you.

ludovicm67 · 2024-08-14T15:58:45Z

Dockerfile

-ENV CACHE_MAX_NUM_ENTRIES 1000
-# Need the shell to get the INDEX_PREFIX environment variable
-ENTRYPOINT ["/bin/sh", "-c", "exec ServerMain -i \"/index/${INDEX_PREFIX}\" -j 8 -m ${MEMORY_FOR_QUERIES} -c ${CACHE_MAX_SIZE_GB} -e ${CACHE_MAX_SIZE_GB_SINGLE_ENTRY} -k ${CACHE_MAX_NUM_ENTRIES} -p 7001 \"$@\"", "--"]
+ENTRYPOINT ["bash"]


I would suggest using the default entrypoint from the Ubuntu Docker image (preferred):

Suggested change

ENTRYPOINT ["bash"]

Or explicitly set it to an empty one:

Suggested change

ENTRYPOINT ["bash"]

ENTRYPOINT [""]

ludovicm67 · 2024-08-14T16:03:25Z

Dockerfile

 USER qlever
-ENV PATH=/app/:$PATH
+RUN PIPX_HOME=/qlever/.local PIPX_BIN_DIR=/qlever/.local/bin PIPX_MAN_DIR=/qlever/.local/share pipx install qlever


Something that might be good to do is to lock a specific version of the qlever CLI, to avoid unwanted breaking changes if the image gets rebuilt.

First define an ARG like this at the top of the Dockerfile (so that it's easy to know what to change in case of upgrade):

ARG QLEVER_VERSION="0.5.3"

Then you might need to tell the layer to use the QLEVER_VERSION arg, by just adding:

ARG QLEVER_VERSION

in the layer.

And then, you can use it that way, if I'm not mistaking on how to specify a package version with pipx (I guess it should behave like pip):

Suggested change

RUN PIPX_HOME=/qlever/.local PIPX_BIN_DIR=/qlever/.local/bin PIPX_MAN_DIR=/qlever/.local/share pipx install qlever

RUN PIPX_HOME=/qlever/.local PIPX_BIN_DIR=/qlever/.local/bin PIPX_MAN_DIR=/qlever/.local/share pipx install "qlever==${QLEVER_VERSION}"

You can take a look here for an example: https://github.com/zazukoians/qlever-tests/blob/76ada0b53174beb79d11ed14662536689a165fff/docker/server.Dockerfile

sonarcloud · 2024-08-15T20:34:51Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

Qup42

The image can be optimized for a smaller size. The 2 suggested changes are the low-hanging fruit and reduce the compressed image size by 40MB/15% (uncompressed: 100MB/12.5%). There is probably potential for further space savings. Further investigation is required to evaluate the cost/benefit of these.

It should be 24.04 in the PR title instead of 22.04..

Qup42 · 2024-08-22T10:44:47Z

Dockerfile

-ENV LC_CTYPE C.UTF-8
+ENV LANG=C.UTF-8
+ENV LC_ALL=C.UTF-8
+ENV LC_CTYPE=C.UTF-8
 ENV DEBIAN_FRONTEND=noninteractive
 RUN apt-get update && apt-get install -y software-properties-common && add-apt-repository -y ppa:mhier/libboost-latest


This is no longer required. The used boost packages are from the official package repositories now.

Suggested change

RUN apt-get update && apt-get install -y software-properties-common && add-apt-repository -y ppa:mhier/libboost-latest

Qup42 · 2024-08-22T10:45:24Z

Dockerfile

 ENV DEBIAN_FRONTEND=noninteractive
-RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.81-dev libboost-program-options1.81-dev libboost-iostreams1.81-dev libboost-url1.81-dev
+RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.83-dev libboost-program-options1.83-dev libboost-iostreams1.83-dev libboost-url1.83-dev pipx bash-completion


Remove apt cache.

Suggested change

RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.83-dev libboost-program-options1.83-dev libboost-iostreams1.83-dev libboost-url1.83-dev pipx bash-completion

RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.83-dev libboost-program-options1.83-dev libboost-iostreams1.83-dev libboost-url1.83-dev pipx bash-completion && rm -rf /var/lib/apt/lists/*

hannahbast requested a review from joka921 August 12, 2024 13:33

ludovicm67 reviewed Aug 12, 2024

View reviewed changes

Dockerfile Outdated Show resolved Hide resolved

Dockerfile Outdated Show resolved Hide resolved

Dockerfile Outdated Show resolved Hide resolved

Replace as by AS (three times)

471d2d1

ludovicm67 mentioned this pull request Aug 12, 2024

Use Ubuntu 24.04 as base image #1436

Open

ludovicm67 reviewed Aug 14, 2024

View reviewed changes

Use ENV=... instead of ENV ... everywhere

fabb137

Qup42 reviewed Aug 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Dockerfile to Ubuntu 22.04 + integrate qlever script #1439

Update Dockerfile to Ubuntu 22.04 + integrate qlever script #1439

hannahbast commented Aug 12, 2024

hannahbast commented Aug 12, 2024

codecov bot commented Aug 12, 2024 •

edited

Loading

ludovicm67 commented Aug 12, 2024

ludovicm67 commented Aug 12, 2024 •

edited

Loading

ludovicm67 left a comment •

edited

Loading

hannahbast commented Aug 12, 2024

ludovicm67 commented Aug 12, 2024

hannahbast commented Aug 12, 2024

ludovicm67 commented Aug 12, 2024

hannahbast commented Aug 12, 2024

ludovicm67 Aug 14, 2024

ludovicm67 Aug 14, 2024

ludovicm67 Aug 14, 2024

sonarcloud bot commented Aug 15, 2024

Qup42 left a comment •

edited

Loading

Qup42 Aug 22, 2024

Qup42 Aug 22, 2024

	RUN PIPX_HOME=/qlever/.local PIPX_BIN_DIR=/qlever/.local/bin PIPX_MAN_DIR=/qlever/.local/share pipx install qlever
	RUN PIPX_HOME=/qlever/.local PIPX_BIN_DIR=/qlever/.local/bin PIPX_MAN_DIR=/qlever/.local/share pipx install "qlever==${QLEVER_VERSION}"

	RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.83-dev libboost-program-options1.83-dev libboost-iostreams1.83-dev libboost-url1.83-dev pipx bash-completion
	RUN apt-get update && apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime make lbzip2 libjemalloc-dev libzstd-dev libssl-dev libboost1.83-dev libboost-program-options1.83-dev libboost-iostreams1.83-dev libboost-url1.83-dev pipx bash-completion && rm -rf /var/lib/apt/lists/*

Update Dockerfile to Ubuntu 22.04 + integrate qlever script #1439

Are you sure you want to change the base?

Update Dockerfile to Ubuntu 22.04 + integrate qlever script #1439

Conversation

hannahbast commented Aug 12, 2024

hannahbast commented Aug 12, 2024

codecov bot commented Aug 12, 2024 • edited Loading

Codecov Report

ludovicm67 commented Aug 12, 2024

ludovicm67 commented Aug 12, 2024 • edited Loading

ludovicm67 left a comment • edited Loading

Choose a reason for hiding this comment

hannahbast commented Aug 12, 2024

ludovicm67 commented Aug 12, 2024

hannahbast commented Aug 12, 2024

ludovicm67 commented Aug 12, 2024

hannahbast commented Aug 12, 2024

ludovicm67 Aug 14, 2024

Choose a reason for hiding this comment

ludovicm67 Aug 14, 2024

Choose a reason for hiding this comment

ludovicm67 Aug 14, 2024

Choose a reason for hiding this comment

sonarcloud bot commented Aug 15, 2024

Quality Gate passed

Qup42 left a comment • edited Loading

Choose a reason for hiding this comment

Qup42 Aug 22, 2024

Choose a reason for hiding this comment

Qup42 Aug 22, 2024

Choose a reason for hiding this comment

codecov bot commented Aug 12, 2024 •

edited

Loading

ludovicm67 commented Aug 12, 2024 •

edited

Loading

ludovicm67 left a comment •

edited

Loading

Qup42 left a comment •

edited

Loading