Greenish provides a few REST endpoints:
Basically, for all the jobs, find the job that misses the most number of period datasets, and return the number.
$ curl --silent -G http://0.0.0.0:8080/maxlag | jq .
{
"lag": 0
}
Display the summary of all the monitoring tasks. Very good for a quick glance:
$ curl --silent -G http://0.0.0.0:8080/summary | jq .
[
{
"group_id": 0,
"name": "Group1",
"status": [
{
"job_id": 0,
"name": "Job1",
"missing": 4,
"oldest_mising_period": 10,
"alert_level": "warn"
},
{
"job_id": 1,
"name": "Job2",
"missing": 2,
"oldest_mising_period": 3,
"alert_level": "normal"
}
]
},
{
"group_id": 0,
"name": "Group2",
"status": [
{
"job_id": 0,
"name": "Job3",
"missing": 6,
"oldest_mising_period": 6,
"alert_level": "critical"
},
{
"job_id": 1,
"name": "Job4",
"missing": 0,
"oldest_mising_period": 0,
"alert_level": "great"
}
]
}
]
$ curl --silent -G http://0.0.0.0:8080/missing | jq .
[
{
"group": {
"group_id": 0,
"name": "Group1",
"jobs": [
{
"job_id": 0,
"name": "Job1",
"cmd": "/tmp/first_script",
"time_pattern": "yyyy-MM-dd-HH",
"frequency": "hourly",
"timezone": {
"zone_id": "UTC"
},
"lookback": 24,
"start_at": 1593093930,
"alert_levels": {
"great": 0,
"normal": 1,
"warn": 2,
"critical": 3
}
env: []
}
]
},
"status": [
{
"job": {
"job_id": 0,
"name": "Job1",
"cmd": "/tmp/first_script",
"time_pattern": "yyyy-MM-dd-HH",
"frequency": "hourly",
"timezone": {
"zone_id": "UTC"
},
"lookback": 24,
"start_at": 1593093930,
"alert_levels": {
"great": 0,
"normal": 1,
"warn": 2,
"critical": 3
}
env: []
},
"updated_at": 1593567901,
"period_health": [
{
"period": "2020-06-27-20",
"ok": false
}
...
A very detailed view for all monitoring tasks:
$ curl --silent -G http://0.0.0.0:8080/state | jq .
[
{
"group": {
"group_id": 0,
"name": "Group1",
"jobs": [
{
"job_id": 0,
"name": "Job1",
"owner": "Reporting Team"
"cmd": "/tmp/first_script",
"time_pattern": "yyyy-MM-dd-HH",
"frequency": "hourly",
"timezone": {
"zone_id": "UTC"
},
"lookback": 24,
"start_at": 1593093930,
"alert_levels": {
"great": 0,
"normal": 1,
"warn": 2,
"critical": 3
}
env: []
}
]
},
"status": [
{
"job": {
"job_id": 0,
"name": "Job1",
"cmd": "/tmp/first_script",
"time_pattern": "yyyy-MM-dd-HH",
"frequency": "hourly",
"timezone": {
"zone_id": "UTC"
},
"lookback": 24,
"start_at": 1593093930,
"alert_levels": {
"great": 0,
"normal": 1,
"warn": 2,
"critical": 3
}
env: []
},
"updated_at": 1593567901,
"period_health": [
{
"period": "2020-06-27-20",
"ok": true
},
{
"period": "2020-06-27-21",
"ok": true
},
...
You can query a single group by its id:
$ curl --silent -G localhost:8080/group/1 | jq .
{
"group": {
"group_id": 1,
"name": "Group2",
"jobs": [
{
"job_id": 0,
"name": "Job3",
"cmd": "/tmp/third_script",
"time_pattern": "yyyy-MM-dd",
"frequency": "monthly",
"timezone": {
...
You can also focus on a single job, and query it:
$ curl --silent -G localhost:8080/group/1/job/0 | jq .
{
"job": {
"job_id": 0,
"name": "Job3",
"cmd": "/tmp/third_script",
"time_pattern": "yyyy-MM-dd",
"frequency": "monthly",
"timezone": {
"zone_id": "UTC"
},
"lookback": 3,
"start_at": 1593093930,
"alert_levels": {
"great": 0,
"normal": 1,
"warn": 2,
"critical": 3
}
env: []
},
"updated_at": 1593585049298,
"period_health": [
{
"period": "2020-05-01",
"ok": true
},
{
"period": "2020-06-01",
"ok": true
},
{
"period": "2020-07-01",
"ok": true
}
]
}
Finally, you can alos get the output of a check (monitoring script) for a job:
$ curl --silent -G localhost:8080/group/0/job/0/stdout
This is useful when debugging the monitoring script, or if there are further information in the stdout that is not expressed in the UI/API.
You can refresh the entire at once:
$ curl --silent -G localhost:8080/state/refresh | jq .
{
"ok": "State refresh is scheduled"
}
You can point refresh the state of a single group by its id:
$ curl --silent -G localhost:8080/group/0/refresh | jq .
{
"ok": "Group status refresh is scheduled"
}
You can also point refresh the state of a single job by its id:
$ curl --silent -G localhost:8080/group/0/job/0/refresh | jq .
{
"ok": "Job status refresh is scheduled"
}
Checks if any of the last 5 state refreshes succeeded, if yes, then it is considered a good health.
$ curl --silent -G http://0.0.0.0:8080/health | jq .
{
"health": "good"
}
Print basic information about the service.
$ curl --silent -G localhost:8080/system | jq .
{
"service": "Greenish",
"namespace": "Staging",
"version": "1.4.0-SNAPSHOT",
"uptime": 1784338
}
Greenish can also export data to Prometheus. These are the supported metrics:
TYPE: GAUGE
NAME: greenish_active_refresh_tasks
HELP: Current number active state refresh tasks
LABELS: job_id
TYPE: HISTOGRAM
NAME: greenish_state_refresh_time_seconds
HELP: Job state refreshing time
LABELS: job_id
TYPE: COUNTER
NAME: greenish_state_refresh_total
HELP: Total number of job state refresh instances
LABELS: job_id
TYPE: COUNTER
NAME: greenish_state_refresh_expired_total
HELP: Total number of expired job state refresh instances
LABELS: job_id
TYPE: COUNTER
NAME: greenish_state_refresh_failed_total
HELP: Total number of failed job state refresh instances
LABELS: job_id
TYPE: GAUGE
NAME: greenish_missing_periods_total
HELP: Current number of missing dataset periods
LABELS: job_id
TYPE: GAUGE
NAME: greenish_oldest_missing_period
HELP: The oldest missing period
LABELS: job_id
Prometheus metrics can be accessed at /prometheus
endpoint:
$ curl --silent -G localhost:8080/prometheus
# HELP greenish_active_refresh_tasks Current number active state refresh tasks
# TYPE greenish_active_refresh_tasks gauge
greenish_active_refresh_tasks{job_id="job_2",} 1.0
greenish_active_refresh_tasks{job_id="job_1",} 0.0
greenish_active_refresh_tasks{job_id="job_4",} 1.0
greenish_active_refresh_tasks{job_id="job_3",} 1.0
...