Skip to content

Commit

Permalink
feat: add azure blob storage support
Browse files Browse the repository at this point in the history
  • Loading branch information
khyurri committed Jul 25, 2024
1 parent 1e18f79 commit 584b2e2
Show file tree
Hide file tree
Showing 76 changed files with 1,028 additions and 1,591 deletions.
19 changes: 11 additions & 8 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -17,22 +17,27 @@ POSTGRES_DB=badgerdoc
# You should repeat aws creds in both of sections
# because minio lib doesn't use env vars

S3_PROVIDER=minio
STORAGE_PROVIDER=minio

# Boto configuration
# Minio configuration
# In case of public host differs from minio internal address
# setup this value
MINIO_PUBLIC_HOST=
# Minio dev configuration
S3_SECURE=false

# Boto configuration
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_DEFAULT_REGION=
AWS_REGION=

# Minio dev configuration
# Azure Blob Storage Configuration
AZURE_CONNECTION_STRING=

S3_SECURE=false
AWS_REGION=

# TODO: We need to unify this configuration, boto3 requires with http, Minio without
# TODO: DEPRECATED S3_ENDPOINT_URL
S3_ENDPOINT_URL=http://badgerdoc-minio:9000
S3_ENDPOINT=badgerdoc-minio:9000
S3_ACCESS_KEY=minioadmin
S3_SECRET_KEY=minioadmin
Expand Down Expand Up @@ -91,8 +96,6 @@ USERS_SERVICE_PORT=8080
# Web configuration

WEB_CORS=*
KAFKA_BOOTSTRAP_SERVER=badgerdoc-kafka:9092 # TODO: remove port
KAFKA_SEARCH_TOPIC=search
AGREEMENT_SCORE_SERVICE_HOST=localhost:5000 # TODO: remove port
MAX_REQ_SIZE=100M

Expand Down
1 change: 1 addition & 0 deletions .github/workflows/annotation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ jobs:
poetry install --no-root
poetry add ../lib/filter_lib
poetry add ../lib/tenants
poetry add ../lib/badgerdoc_storage
poetry run pytest
env:
POSTGRES_HOST: 127.0.0.1
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/assets.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ jobs:
poetry install --no-root --no-interaction
poetry add ../lib/filter_lib
poetry add ../lib/tenants
poetry add ../lib/badgerdoc_storage
- name: Test with pytest
run: |
cd assets
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/convert.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ jobs:
poetry install --all-extras
poetry add --editable ../lib/filter_lib
poetry add --editable ../lib/tenants
poetry add ../lib/badgerdoc_storage
- name: Run linters and checkers [mypy -> pylint]
working-directory: ./convert
run: |
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ build_base:
mkdir -p build_dir
cp -r lib/ build_dir/lib
cp infra/docker/python_base/Dockerfile build_dir
${_DOCKER_} build --target base build_dir/ -t 818863528939.dkr.ecr.eu-central-1.amazonaws.com/badgerdoc/python_base:0.1.7
${_DOCKER_} build --target base build_dir/ -t 818863528939.dkr.ecr.eu-central-1.amazonaws.com/badgerdoc/python_base:0.1.8

build_base_3.12:
mkdir -p build_dir_3.12
Expand Down
16 changes: 3 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,21 +159,11 @@ docker-compose -f airflow/docker-compose-dev.yaml up -d
This docker-compose file was downloaded from the Apache Airflow website:
https://airflow.apache.org/docs/apache-airflow/2.7.0/docker-compose.yaml with only a few modifications added.

## Set up ClearML as Models service in local mode
## Set up Azure Blob Storage

ClearML runs using its own resources without sharing them with BadgerDoc.
### Enable CORS
https://learn.microsoft.com/en-us/rest/api/storageservices/cross-origin-resource-sharing--cors--support-for-the-azure-storage-services

1. Copy `clearml/.env.example` to `clearml/.env` running:
```
cp clearml/.env.example clearml/.env
```

2. Run:
```
docker-compose -f clearml/docker-compose-dev.yaml up -d
```

This docker-compose file was downloaded from the ClearML GitHub: https://github.com/allegroai/clearml-server/blob/master/docker/docker-compose.yml with a few modifications added.

## How to install required dependencies locally

Expand Down
1 change: 1 addition & 0 deletions airflow/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ openai==0.28.0
pydantic==2.3.0
apache-airflow-providers-amazon==8.7.1
minio==7.1.16
epam.indigo==1.19.0
2 changes: 1 addition & 1 deletion annotation/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ARG base_image=818863528939.dkr.ecr.eu-central-1.amazonaws.com/badgerdoc/python_base:0.1.7
ARG base_image=818863528939.dkr.ecr.eu-central-1.amazonaws.com/badgerdoc/python_base:0.1.8
FROM ${base_image} as build

ENV PYTHONPATH /opt/annotation
Expand Down
2 changes: 0 additions & 2 deletions annotation/annotation/annotations/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
S3_START_PATH,
DuplicateAnnotationError,
accumulate_pages_info,
add_search_annotation_producer,
check_task_pages,
construct_annotated_doc,
create_manifest_json,
Expand All @@ -13,7 +12,6 @@
)

__all__ = [
add_search_annotation_producer,
row_to_dict,
accumulate_pages_info,
S3_START_PATH,
Expand Down
Loading

0 comments on commit 584b2e2

Please sign in to comment.