Skip to content

Commit

Permalink
Merge pull request #197 from RADAR-base/dev
Browse files Browse the repository at this point in the history
Release 2.0.2
  • Loading branch information
yatharthranjan committed Jul 15, 2019
2 parents a6bd481 + c73074d commit 50123b6
Show file tree
Hide file tree
Showing 10 changed files with 81 additions and 22 deletions.
55 changes: 54 additions & 1 deletion dcompose-stack/radar-cp-hadoop-stack/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,13 @@

This docker-compose stack contains the full operational RADAR-base platform. Once configured, it is meant to run on a single server with at least 16 GB memory and 4 CPU cores. It is tested on Ubuntu 16.04 and on macOS 11.1 with Docker 17.06.

## Prerequisites

- A Linux server that is available 24/7 with HTTP(S) ports open to the internet and with a domain name
- Root access on the server.
- Docker, Docker-compose, Java (JDK or JRE) and Git are installed
- Basic knowledge on docker, docker-compose and git.

## Configuration

### Required
Expand Down Expand Up @@ -195,7 +202,53 @@ You can check the logs of CRON by typing `grep CRON /var/log/syslog`.
### HDFS
This folder contains useful scripts to manage the extraction of data from HDFS in the RADAR-base Platform.
#### Advanced Tuning
To increase the amount of storage horizontally you can add multiple paths as destinations for data storage as follows -
- Add the required paths as environment variables in `.env` file similar to the other hdfs paths like HDFS_DATA_DIR_<NODE#>_<VOLUME#> -
```
...
HDFS_DATA_DIR_1_1=/usr/local/var/lib/docker/hdfs-data-1
HDFS_DATA_DIR_2_1=/usr/local/var/lib/docker/hdfs-data-2
HDFS_DATA_DIR_3_1=/usr/local/var/lib/docker/hdfs-data-3
HDFS_DATA_DIR_1_2=/usr/local/var/lib/docker/hdfs-data-4
HDFS_DATA_DIR_2_2=/usr/local/var/lib/docker/hdfs-data-5
HDFS_DATA_DIR_3_2=/usr/local/var/lib/docker/hdfs-data-6
...
```
- mount these to the required paths on the container using volume mounts (similar to the one already present) like -
```yaml
...
volumes:
- "${HDFS_DATA_DIR_1_1}:/hadoop/dfs/data"
- "${HDFS_DATA_DIR_1_2}:/hadoop/dfs/data2"
...
```
Assuming you named the environment variable for the host path as `HDFS_DATA_DIR_1_1` and `HDFS_DATA_DIR_1_2`
- Add the `HADOOP_DFS_DATA_DIR` to each datanode adding a comma-delimited set of paths (possibly different volumes) to the environment of datanode services in ./docker-compose.yml file like -
```yaml
...
environment:
SERVICE_9866_NAME: datanode
SERVICE_9867_IGNORE: "true"
SERVICE_9864_IGNORE: "true"
HADOOP_HEAPSIZE: 1000
HADOOP_NAMENODE1_HOSTNAME: hdfs-namenode-1
HADOOP_DFS_REPLICATION: 2
HADOOP_DFS_DATA_DIR: file:///hadoop/dfs/data,file:///hadoop/dfs/data2
...
```
- Add a check at the top of the `./lib/perform-install` script to make sure that the directory exists for each host directory-
```bash
...
check_parent_exists HDFS_DATA_DIR_1_1 ${HDFS_DATA_DIR_1_1}
...
```
#### Management
The RADAR-base platform contains useful scripts to manage the extraction of data from HDFS in the RADAR-base Platform.
- `bin/hdfs-upgrade VERSION`
- Perform an upgrade from an older version of the [Smizy HDFS base image](https://hub.docker.com/r/smizy/hadoop-base/) to a newer one. E.g. from `2.7.6-alpine`, which is compatible with the `uhopper` image, to `3.0.3-alpine`.
Expand Down
2 changes: 1 addition & 1 deletion dcompose-stack/radar-cp-hadoop-stack/bin/hdfs-restructure
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ cd $DIR
. .env

# HDFS restructure version
DOCKER_IMAGE=radarbase/radar-hdfs-restructure:0.5.3
DOCKER_IMAGE=radarbase/radar-hdfs-restructure:0.5.7

NUM_THREADS=${RESTRUCTURE_NUM_THREADS:-3}
# HDFS restructure script flags
Expand Down
9 changes: 6 additions & 3 deletions dcompose-stack/radar-cp-hadoop-stack/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ networks:
management:
driver: bridge
internal: true
# driver_opts:
# com.docker.network.driver.mtu: 1450
hadoop:
external: true

Expand Down Expand Up @@ -279,8 +281,8 @@ services:
KAFKA_ZOOKEEPER_CONNECT: zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
KAFKA_SCHEMA_REGISTRY: http://schema-registry-1:8081
KAFKA_NUM_BROKERS: 3
RADAR_NUM_PARTITIONS: 3
RADAR_NUM_REPLICATION_FACTOR: 3
KAFKA_NUM_PARTITIONS: 3
KAFKA_NUM_REPLICATION: 3

#---------------------------------------------------------------------------#
# RADAR Hot Storage #
Expand Down Expand Up @@ -527,7 +529,7 @@ services:
# RADAR HDFS connector #
#---------------------------------------------------------------------------#
radar-hdfs-connector:
image: radarbase/radar-connect-hdfs-sink:0.2.0
image: radarbase/radar-connect-hdfs-sink:0.2.1
restart: on-failure
volumes:
- ./etc/hdfs-connector/sink-hdfs.properties:/etc/kafka-connect/sink-hdfs.properties
Expand Down Expand Up @@ -703,6 +705,7 @@ services:
MANAGEMENTPORTAL_CATALOGUE_SERVER_SERVER_URL: http://catalog-server:9010/source-types
MANAGEMENTPORTAL_COMMON_ADMIN_PASSWORD: ${MANAGEMENTPORTAL_COMMON_ADMIN_PASSWORD}
MANAGEMENTPORTAL_COMMON_PRIVACY_POLICY_URL: ${MANAGEMENTPORTAL_COMMON_PRIVACY_POLICY_URL}
MANAGEMENTPORTAL_OAUTH_META_TOKEN_TIMEOUT: PT2H
SPRING_APPLICATION_JSON: '{"managementportal":{"oauth":{"checkingKeyAliases":["${MANAGEMENTPORTAL_OAUTH_CHECKING_KEY_ALIASES_0}","${MANAGEMENTPORTAL_OAUTH_CHECKING_KEY_ALIASES_1}"]}}}'
JHIPSTER_SLEEP: 10 # gives time for the database to boot before the application
JAVA_OPTS: -Xmx256m # maximum heap size for the JVM running ManagementPortal, increase this as necessary
Expand Down
2 changes: 1 addition & 1 deletion dcompose-stack/radar-cp-hadoop-stack/etc/env.template
Original file line number Diff line number Diff line change
Expand Up @@ -39,4 +39,4 @@ ENABLE_OPTIONAL_SERVICES=false
FITBIT_API_CLIENT_ID=fitbit-client
FITBIT_API_CLIENT_SECRET=fitbit-secret
NGINX_PROXIES=
RADAR_SCHEMAS_VERSION=0.4.3
RADAR_SCHEMAS_VERSION=0.5.1
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ rotate.interval.ms=900000
hdfs.url=hdfs://hdfs-namenode-1:8020
format.class=org.radarcns.sink.hdfs.AvroFormatRadar
topics.dir=topicAndroidNew
avro.codec=snappy
3 changes: 2 additions & 1 deletion dcompose-stack/radar-cp-hadoop-stack/images/hdfs/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
ARG BASE_VERSION=3.0.3-alpine
FROM smizy/hadoop-base:${BASE_VERSION}

ENV HADOOP_DFS_NAME_DIR file://hadoop/dfs/name
ENV HADOOP_DFS_NAME_DIR file:///hadoop/dfs/name
ENV HADOOP_DFS_DATA_DIR file:///hadoop/dfs/data

COPY ./hdfs-site.xml.mustache ${HADOOP_CONF_DIR}/
COPY ./entrypoint.sh /usr/local/bin/
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,11 @@
<value>{{HADOOP_DFS_NAME_DIR}}</value>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>{{HADOOP_DFS_DATA_DIR}}</value>
</property>

<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,31 +1,26 @@
FROM openjdk:8-alpine
FROM openjdk:11-jdk-slim

MAINTAINER Joris Borgdorff <joris@thehyve.nl>
LABEL authors="Joris Borgdorff<joris@thehyve.nl>,Yatharth Ranjan<yatharth.ranjan@kcl.ac.uk>"

ENV KAFKA_SCHEMA_REGISTRY=http://schema-registry-1:8081
ENV KAFKA_NUM_PARTITIONS=3
ENV KAFKA_NUM_REPLICATION=2
ENV KAFKA_NUM_BROKERS=3
ENV KAFKA_ZOOKEEPER_CONNECT=zookeeper-1:2181

RUN apk add --no-cache \
bash \
curl \
rsync \
tar
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
rsync \
&& rm -rf /var/lib/apt/lists/*

RUN mkdir -p /schema/merged /schema/java/src /schema/java/classes /usr/share/java

WORKDIR /schema

ENV JQ_VERSION=1.5
RUN curl -L#o /usr/bin/jq https://github.com/stedolan/jq/releases/download/jq-${JQ_VERSION}/jq-linux64 \
&& chmod +x /usr/bin/jq
RUN curl -#o /usr/share/java/avro-tools.jar \
"$(curl -s http://www.apache.org/dyn/closer.cgi/avro/\?as_json \
| jq --raw-output ".preferred")avro/avro-1.8.2/java/avro-tools-1.8.2.jar"
"http://archive.apache.org/dist/avro/avro-1.8.2/java/avro-tools-1.8.2.jar"

ARG SCHEMAS_VERSION=0.4.2
ARG SCHEMAS_VERSION=0.5.1

ENV RADAR_SCHEMAS_VERSION=${SCHEMAS_VERSION}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ fi
# Check provided directories and configurations
check_parent_exists HDFS_DATA_DIR_1 ${HDFS_DATA_DIR_1}
check_parent_exists HDFS_DATA_DIR_2 ${HDFS_DATA_DIR_2}
check_parent_exists HDFS_DATA_DIR_3 ${HDFS_DATA_DIR_3}
check_parent_exists HDFS_NAME_DIR_1 ${HDFS_NAME_DIR_1}
check_parent_exists HDFS_NAME_DIR_2 ${HDFS_NAME_DIR_2}
check_parent_exists MONGODB_DIR ${MONGODB_DIR}
Expand Down
4 changes: 2 additions & 2 deletions dcompose-stack/radar-cp-hadoop-stack/optional-services.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ services:
volumes:
- "./etc/redcap-integration:/usr/local/etc/radar-redcap-int"
healthcheck:
test: ["CMD", "wget", "-IX", "POST", "http://localhost:8080/redcap/trigger"]
test: ["CMD-SHELL", "wget --post-data {} http://localhost:8080/redcap/trigger 2>&1 | grep -q 500 || exit 1"]
interval: 1m
timeout: 5s
retries: 3
Expand All @@ -29,7 +29,7 @@ services:
# RADAR Fitbit connector #
#---------------------------------------------------------------------------#
radar-fitbit-connector:
image: radarbase/kafka-connect-rest-fitbit-source:0.2.0
image: radarbase/kafka-connect-rest-fitbit-source:0.2.1
restart: on-failure
volumes:
- ./etc/fitbit/docker/source-fitbit.properties:/etc/kafka-connect/source-fitbit.properties
Expand Down

0 comments on commit 50123b6

Please sign in to comment.