Metrics

The Grey Matter Metrics Filter sets up a local metrics server to gather and report real-time statistics for the sidecar, microservice, and host system.

Gathered Metrics

Total Stats

  • metrics version

  • total requests

  • total HTTP

  • total HTTPs

  • total RPC

  • total RPC/TLS

  • total requests

  • total 200

  • total 2xx

  • latency (avg)

  • latency (count)

  • latency max

  • latency min

  • latency sum

  • latency p50

  • latency p90

  • latency p95

  • latency p99

  • latency p9990

  • latency p9999

  • number of errors

  • incoming throughput

  • outgoing throughput

Route Stats

For each route that is addressed, the following stats will be computed and reported.

  • total requests

  • total 200

  • total 2xx

  • latency (avg)

  • latency (count)

  • latency max

  • latency min

  • latency sum

  • latency p50

  • latency p90

  • latency p95

  • latency p99

  • latency p9990

  • latency p9999

  • number of errors

  • incoming throughput

  • outgoing throughput

Host Stats

  • num goroutines

  • start time

  • CPU percent used

  • CPU cores on system

  • os

  • os Architecture

  • memory available

  • memory used

  • memory used %

  • process memory used

Prometheus

Optionally, this filter can serve the computed statistics in a form suitable for scraping by Prometheus. The prometheus endpoint will be hosted at {METRICS_HOST}:{METRICS_PORT}{METRICS_PROMETHEUS_URI_PATH}, which can then be scraped directly through the supported Prometheus service discovery mechanisms.

AWS CloudWatch

The metrics filter can also push the compiled statistics directly to AWS Cloudwatch. This allows the Grey Matter Proxy metrics to be directly used to trigger things like AutoScale actions or just for tighter monitoring directly in AWS.

Filter Configuration Option

Name

Type

Default

Description

metrics_port

Integer

8081

Port the metrics server listens on

metrics_host

String

0.0.0.0

Host the metrics server listens on

metrics_dashboard_uri_path

String

/metrics

The HTTP path to query JSON metrics data

metrics_prometheus_uri_path

String

/prometheus

The HTTP path to be scraped by Prometheus

prometheus_system_metrics_interval_seconds

Integer

15

metrics_ring_buffer_size

Integer

4096

Size of the cache of active metrics data

metrics_key_function

String

""

Function to provide internal rollup of URL paths when reporting metrics

metrics_key_depth

String

"1"

Truncate URLs to the first path section

use_metrics_tls

Boolean

false

If true, metrics server

uses TLS

server_ca_cert_path

String

SSL Trust file to use when serving metrics over TLS

server_cert_path

String

SSL Certificate to use when serving metrics over TLS

server_key_path

String

SSL Private Key file to use when serving metrics over TLS

enable_cloudwatch

Boolean

false

If true, report metrics to AWS Cloudwatch

cw_reporting_interval_seconds

Integer

Interval to send metrics to AWS Cloudwatch

cw_namespace

String

Namespace for Cloudwatch Metrics

cw_dimensions

String

Dimensions to report to Cloudwatch

cw_metrics_routes

String

URI paths to send metrics for

cw_metrics_values

String

Metrics keys to send metrics for

cw_debug

Boolean

false

Verbose debugging for Cloudwatch connection

aws_region

String

AWS region for access

aws_access_key_id

String

AWS access key

aws_secret_access_key

String

AWS Secrete Access Key

aws_session_token

String

AWS Session Token

aws_profile

String

AWS Profile to use for login

aws_config_file

String

Location on disk of AWS config file

Example Configuration

http_filters:
- name: gm.metrics
config:
metrics_port: 9080
metrics_host: 0.0.0.0
metrics_dashboard_uri_path: "/metrics"
metrics_prometheus_uri_path: "/prometheus"
metrics_ring_buffer_size: 4096
use_metrics_tls: false
enable_cloudwatch: false

Example Responses

/metrics

{
"grey-matter-metrics-version": "1.0.0",
"Total/requests": 22091,
"HTTP/requests": 0,
"HTTPS/requests": 22091,
"RPC/requests": 0,
"RPC_TLS/requests": 0,
"route/services/catalog/1.0/summary/GET/requests": 3345,
"route/services/catalog/1.0/summary/GET/routes": "",
"route/services/catalog/1.0/summary/GET/status/200": 3345,
"route/services/catalog/1.0/summary/GET/status/2XX": 3345,
"route/services/catalog/1.0/summary/GET/latency_ms.avg": 0.000000,
"route/services/catalog/1.0/summary/GET/latency_ms.count": 7,
"route/services/catalog/1.0/summary/GET/latency_ms.max": 0,
"route/services/catalog/1.0/summary/GET/latency_ms.min": 0,
"route/services/catalog/1.0/summary/GET/latency_ms.sum": 0,
"route/services/catalog/1.0/summary/GET/latency_ms.p50": 0,
"route/services/catalog/1.0/summary/GET/latency_ms.p90": 0,
"route/services/catalog/1.0/summary/GET/latency_ms.p95": 0,
"route/services/catalog/1.0/summary/GET/latency_ms.p99": 0,
"route/services/catalog/1.0/summary/GET/latency_ms.p9990": 0,
"route/services/catalog/1.0/summary/GET/latency_ms.p9999": 0,
"route/services/catalog/1.0/summary/GET/errors.count": 0,
"route/services/catalog/1.0/summary/GET/in_throughput": 0,
"route/services/catalog/1.0/summary/GET/out_throughput": 25970425,
"route/services/sense/1.0/recommendation/GET/requests": 3350,
"route/services/sense/1.0/recommendation/GET/routes": "",
"route/services/sense/1.0/recommendation/GET/status/200": 3341,
"route/services/sense/1.0/recommendation/GET/status/503": 9,
"route/services/sense/1.0/recommendation/GET/status/2XX": 3341,
"route/services/sense/1.0/recommendation/GET/status/5XX": 9,
"route/services/sense/1.0/recommendation/GET/latency_ms.avg": 0.000000,
"route/services/sense/1.0/recommendation/GET/latency_ms.count": 7,
"route/services/sense/1.0/recommendation/GET/latency_ms.max": 0,
"route/services/sense/1.0/recommendation/GET/latency_ms.min": 0,
"route/services/sense/1.0/recommendation/GET/latency_ms.sum": 0,
"route/services/sense/1.0/recommendation/GET/latency_ms.p50": 0,
"route/services/sense/1.0/recommendation/GET/latency_ms.p90": 0,
"route/services/sense/1.0/recommendation/GET/latency_ms.p95": 0,
"route/services/sense/1.0/recommendation/GET/latency_ms.p99": 0,
"route/services/sense/1.0/recommendation/GET/latency_ms.p9990": 0,
"route/services/sense/1.0/recommendation/GET/latency_ms.p9999": 0,
"route/services/sense/1.0/recommendation/GET/errors.count": 0,
"route/services/sense/1.0/recommendation/GET/in_throughput": 0,
"route/services/sense/1.0/recommendation/GET/out_throughput": 1450994,
"all/requests": 21924,
"all/routes": "",
"all/status/304": 112,
"all/status/200": 21803,
"all/status/503": 9,
"all/status/2XX": 21803,
"all/status/5XX": 9,
"all/status/3XX": 112,
"all/latency_ms.avg": 0.013428,
"all/latency_ms.count": 4096,
"all/latency_ms.max": 13,
"all/latency_ms.min": 0,
"all/latency_ms.sum": 55,
"all/latency_ms.p50": 0,
"all/latency_ms.p90": 0,
"all/latency_ms.p95": 0,
"all/latency_ms.p99": 0,
"all/latency_ms.p9990": 4,
"all/latency_ms.p9999": 13,
"all/errors.count": 0,
"all/in_throughput": 132437,
"all/out_throughput": 3622059,
"route//GET/requests": 13,
"route//GET/routes": "",
"route//GET/status/304": 12,
"route//GET/status/200": 1,
"route//GET/status/3XX": 12,
"route//GET/status/2XX": 1,
"route//GET/latency_ms.avg": 0.000000,
"route//GET/latency_ms.count": 1,
"route//GET/latency_ms.max": 0,
"route//GET/latency_ms.min": 0,
"route//GET/latency_ms.sum": 0,
"route//GET/latency_ms.p50": 0,
"route//GET/latency_ms.p90": 0,
"route//GET/latency_ms.p95": 0,
"route//GET/latency_ms.p99": 0,
"route//GET/latency_ms.p9990": 0,
"route//GET/latency_ms.p9999": 0,
"route//GET/errors.count": 0,
"route//GET/in_throughput": 0,
"route//GET/out_throughput": 1628356,
"go_metrics/runtime/num_goroutines": 6,
"system/start_time": 1570507704592,
"system/cpu.pct": 100.000000,
"system/cpu_cores": 4,
"os": "linux",
"os_arch": "amd64",
"system/memory/available": 5576384512,
"system/memory/used": 10214662144,
"system/memory/used_percent": 63.169011,
"process/memory/used": 72286456
}

/prometheus

...
http_request_duration_seconds_bucket{key="all",method="",status="401",le="0.005"} 1
http_request_duration_seconds_bucket{key="all",method="",status="401",le="0.01"} 2
http_request_duration_seconds_bucket{key="all",method="",status="401",le="0.025"} 2
http_request_duration_seconds_bucket{key="all",method="",status="401",le="0.05"} 2
http_request_duration_seconds_bucket{key="all",method="",status="401",le="0.1"} 2
http_request_duration_seconds_bucket{key="all",method="",status="401",le="0.25"} 2
http_request_duration_seconds_bucket{key="all",method="",status="401",le="0.5"} 2
http_request_duration_seconds_bucket{key="all",method="",status="401",le="1"} 2
http_request_duration_seconds_bucket{key="all",method="",status="401",le="2.5"} 2
http_request_duration_seconds_bucket{key="all",method="",status="401",le="5"} 2
http_request_duration_seconds_bucket{key="all",method="",status="401",le="10"} 2
http_request_duration_seconds_bucket{key="all",method="",status="401",le="+Inf"} 2
http_request_duration_seconds_sum{key="all",method="",status="401"} 0.01088538
http_request_duration_seconds_count{key="all",method="",status="401"} 2
http_request_duration_seconds_bucket{key="all",method="",status="503",le="0.005"} 0
http_request_duration_seconds_bucket{key="all",method="",status="503",le="0.01"} 0
http_request_duration_seconds_bucket{key="all",method="",status="503",le="0.025"} 0
http_request_duration_seconds_bucket{key="all",method="",status="503",le="0.05"} 0
http_request_duration_seconds_bucket{key="all",method="",status="503",le="0.1"} 0
http_request_duration_seconds_bucket{key="all",method="",status="503",le="0.25"} 7
http_request_duration_seconds_bucket{key="all",method="",status="503",le="0.5"} 9
http_request_duration_seconds_bucket{key="all",method="",status="503",le="1"} 9
http_request_duration_seconds_bucket{key="all",method="",status="503",le="2.5"} 9
http_request_duration_seconds_bucket{key="all",method="",status="503",le="5"} 9
http_request_duration_seconds_bucket{key="all",method="",status="503",le="10"} 9
http_request_duration_seconds_bucket{key="all",method="",status="503",le="+Inf"} 9
http_request_duration_seconds_sum{key="all",method="",status="503"} 1.9743323400000001
http_request_duration_seconds_count{key="all",method="",status="503"} 9
# HELP http_request_size_bytes number of bytes read from the request
# TYPE http_request_size_bytes counter
http_request_size_bytes{key="/",method="GET",status="200"} 0
http_request_size_bytes{key="/",method="GET",status="304"} 0
http_request_size_bytes{key="/app-icon-144x144.png",method="GET",status="200"} 0
http_request_size_bytes{key="/app-icon-144x144.png",method="GET",status="304"} 0
http_request_size_bytes{key="/appConfig.js",method="GET",status="304"} 0
http_request_size_bytes{key="/favicon.ico",method="GET",status="200"} 0
http_request_size_bytes{key="/manifest.json",method="GET",status="304"} 0
http_request_size_bytes{key="/outdatedbrowser.min.css",method="GET",status="200"} 0
http_request_size_bytes{key="/outdatedbrowser.min.css",method="GET",status="304"} 0
http_request_size_bytes{key="/outdatedbrowser.min.js",method="GET",status="200"} 0
http_request_size_bytes{key="/outdatedbrowser.min.js",method="GET",status="304"} 0
http_request_size_bytes{key="/services/catalog/1.0/metrics",method="GET",status="200"} 0
http_request_size_bytes{key="/services/catalog/1.0/summary",method="GET",status="200"} 0
http_request_size_bytes{key="/services/data/latest/props",method="GET",status="200"} 0
http_request_size_bytes{key="/services/data/latest/read",method="POST",status="200"} 1379
http_request_size_bytes{key="/services/data/latest/self",method="GET",status="200"} 0
http_request_size_bytes{key="/services/data/latest/show",method="GET",status="200"} 0
http_request_size_bytes{key="/services/data/latest/static",method="GET",status="200"} 0
http_request_size_bytes{key="/services/data/latest/static",method="GET",status="304"} 0
http_request_size_bytes{key="/services/data/latest/stream",method="GET",status="200"} 0
http_request_size_bytes{key="/services/data/latest/stream",method="GET",status="206"} 0
http_request_size_bytes{key="/services/data/latest/stream",method="GET",status="304"} 0
http_request_size_bytes{key="/services/gm-control-api/1.0/v1.0",method="GET",status="200"} 0
http_request_size_bytes{key="/services/jwt/latest/policies",method="GET",status="200"} 0
http_request_size_bytes{key="/services/jwt/latest/policies",method="GET",status="401"} 0
http_request_size_bytes{key="/services/jwt/latest/tokens",method="GET",status="307"} 0
http_request_size_bytes{key="/services/kibana/1.0/api",method="GET",status="200"} 0
http_response_size_bytes{key="all",method="",status="200"} 1.61519157e+08
http_response_size_bytes{key="all",method="",status="206"} 8.7419618e+07
http_response_size_bytes{key="all",method="",status="304"} 0
http_response_size_bytes{key="all",method="",status="307"} 67
http_response_size_bytes{key="all",method="",status="401"} 102
http_response_size_bytes{key="all",method="",status="503"} 513
# HELP non_tls_requests Number of requests not using TLS.
...