Monitoring and observing Vector

Use logs and metrics generated by Vector itself in your Vector topology

Although Vector is primarily used to handle observability data from a wide variety of sources, we also strive to make Vector highly observable itself. To that end, Vector provides two sources, internal_logs and internal_metrics, that you can use to handle logs and metrics produced by Vector just like you would logs and metrics from any other source.

Logs

Vector provides clear, informative, well-structured logs via the internal_logs source. This section shows you how to use them in your Vector topology.

Which logs Vector pipes through the internal_logs source is determined by the log level, which defaults to info.

In addition to the internal_logs source, Vector also writes its logs to stderr, which can be captured by Kubernetes, SystemD, or however you are running Vector.

Accessing logs

You can access Vector’s logs by adding an internal_logs source to your topology. Here’s an example configuration that takes Vector’s logs and pipes them to the console as plain text:

[sources.vector_logs]
type = "internal_logs"

[sinks.console]
type = "console"
inputs = ["vector_logs"]
encoding.codec = "text"

Using Vector logs

Once Vector logs enter your topology through the internal_logs source, you can treat them like logs from any other system, i.e. you can transform them and send them off to any number of sinks. The configuration below, for example, transforms Vector’s logs using the remap transform and Vector Remap Language and then stores those logs in ClickHouse:

[sources.vector_logs]
type = "internal_logs"

[transforms.modify]
type = "remap"
inputs = ["vector_logs"]

# Reformat the timestamp to Unix time
source = '''
  .timestamp = to_unix_timestamp!(to_timestamp!(.timestamp))
'''

[sinks.database]
type = "clickhouse"
inputs = ["modify"]
host = "http://localhost:8123"
table = "vector-log-data"

Configuring logs

Levels

Vector logs at the info level by default. You can set a different level when starting up your instance using either command-line flags or the VECTOR_LOG environment variable. The table below details these options:

MethodDescription
-v flagDrops the log level to debug
-vv flagDrops the log level to trace
-q flagRaises the log level to warn
-qq flagRaises the log level to error
-qqq flagDisables logging
VECTOR_LOG=<level> environment variableSet the log level. Must be one of trace, debug, info, warn, error, off.

Stack traces

You can enable full error backtraces by setting the RUST_BACKTRACE=full environment variable. More on this in the Troubleshooting guide.

Metrics

You can monitor metrics produced by Vector using the internal_metrics source. As with Vector’s internal logs, you can configure an internal_metrics source and use the piped-in metrics however you wish. Here’s an example configuration that delivers Vector’s metrics to a Prometheus remote write endpoint.

[sources.vector_metrics]
type = "internal_metrics"

[sinks.prometheus]
type = ["prometheus_remote_write"]
endpoint = ["https://localhost:8087/"]
inputs = ["vector_metrics"]

Metrics catalogue

The table below provides a list of internal metrics provided by Vector. See the docs for the internal_metrics source for more detailed information about the available metrics.

NameDescriptionData type
Number of clients attached to a component.gauge
The average round-trip time (RTT) for the current window.histogram
The number of outbound requests currently awaiting a response.histogram
The concurrency limit that the adaptive concurrency feature has decided on for this current window.histogram
The observed round-trip time (RTT) for requests.histogram
The number of events recorded by the aggregate transform.counter
The number of failed metric updates, incremental adds, encountered by the aggregate transform.counter
The number of flushes done by the aggregate transform.counter
The number of times the Vector GraphQL API has been started.counter
The number of bytes current in the buffer.gauge
The number of events dropped by this non-blocking buffer.counter
The number of events currently in the buffer.gauge
The number of bytes received by this buffer.counter
The number of events received by this buffer.counter
The duration spent sending a payload to this buffer.histogram
The number of bytes sent by this buffer.counter
The number of events sent by this buffer.counter
Has a fixed value of 1.0. Contains build information such as Rust and Vector versions.gauge
The total number of files checkpointed.counter
The total number of errors identifying files via checksum.counter
The total number of metrics collections completed for this component.counter
The duration spent collecting of metrics for this component.histogram
The total number of times a command has been executed.counter
The command execution duration in seconds.histogram
The number of events dropped by this component.counter
The total number of errors encountered by this component.counter
The size in bytes of each event received by the source.histogram
The number of raw bytes accepted by this component from source origins.counter
The number of event bytes accepted by this component either from tagged origins like file and uri, or cumulatively from other origins.counter

A histogram of the number of events passed in each internal batch in Vector’s internal topology.

Note that this is separate than sink-level batching. It is mostly useful for low level debugging performance issues in Vector due to small internal batches.

histogram
The number of events accepted by this component either from tagged origins like file and uri, or cumulatively from other origins.counter
The number of raw bytes sent by this component to destination sinks.counter
The total number of event bytes emitted by this component.counter
The total number of events emitted by this component.counter
The total number of times a connection has been established.counter
The total number of errors reading datagram.counter
The total number of errors sending data via the connection.counter
The total number of times the connection has been shut down.counter
The total number of container events processed.counter
The total number of times Vector stopped watching for container logs.counter
The total number of times Vector started watching for container logs.counter
The total number of events discarded by this component.counter
The total number of files Vector has found to watch.counter
The total number of files deleted.counter
The total number of times Vector has resumed watching a file.counter
The total number of times Vector has stopped watching a file.counter
The duration spent handling a gRPC request.histogram
The total number of gRPC messages received.counter
The total number of gRPC messages sent.counter
The total number of sent HTTP requests, tagged with the request method.counter
The round-trip time (RTT) of HTTP requests, tagged with the response code.histogram
The total number of HTTP requests, tagged with the response code.counter
The round-trip time (RTT) of HTTP requests.histogram
The total number of HTTP requests issued by this component.counter
The duration spent handling a HTTP request.histogram
The total number of HTTP requests received.counter
The total number of HTTP responses sent.counter
The total number of metrics emitted from the internal metrics registry.gauge
The total number of metrics emitted from the internal metrics registry. This metric is deprecated in favor of internal_metrics_cardinality.counter
The total number of invalid records that have been discarded.counter
The total number of failures to parse a message as a JSON object.counter
The total number of edge cases encountered while picking format of the Kubernetes log message.counter
Total number of message bytes (including framing) received from Kafka brokers.counter
Total number of messages consumed, not including ignored messages (due to offset, etc), from Kafka brokers.counter
The Kafka consumer lag.gauge
Total number of message bytes (including framing, such as per-Message framing and MessageSet/batch framing) transmitted to Kafka brokers.counter
Total number of messages transmitted (produced) to Kafka brokers.counter
Current number of messages in producer queues.gauge
Current total size of messages in producer queues.gauge
Total number of bytes transmitted to Kafka brokers.counter
Total number of requests sent to Kafka brokers.counter
Total number of bytes received from Kafka brokers.counter
Total number of responses received from Kafka brokers.counter
The total memory currently being used by the Lua runtime.gauge
The total number of failed efforts to refresh AWS EC2 metadata.counter
The total number of AWS EC2 metadata refreshes.counter
The number of current open connections to Vector.gauge
The total number of open files.counter
The total number of Protocol Buffers errors thrown during communication between Vector instances.counter
The total number of times the Vector instance has quit.counter
The total number of times the Vector instance has been reloaded.counter
The total number of errors sending messages.counter
The difference between the timestamp recorded in each event and the time when it was ingested, expressed as fractional seconds.histogram
The number of outstanding Splunk HEC indexer acknowledgement acks.gauge
The total number of successful deletions of SQS messages.counter
The total number of SQS messages successfully processed.counter
The total number of times successfully receiving SQS messages.counter
The total number of received SQS messages.counter
The total number of times an S3 record in an SQS message was ignored (for an event that was not ObjectCreated).counter
The number of stale events that Vector has flushed.counter
The total number of times the Vector instance has been started.counter
The total number of errors reading from stdin.counter
The total number of times the Vector instance has been stopped.counter
The total number of streams.counter
The total number of events discarded because the tag has been rejected after hitting the configured value_limit.counter
The total number of errors encountered parsing RFC 3339 timestamps.counter
The total number of seconds the Vector instance has been up.gauge
The total number of errors converting bytes to a UTF-8 string in UDP mode.counter
A ratio from 0 to 1 of the load on a component. A value of 0 would indicate a completely idle component that is simply waiting for input. A value of 1 would indicate a that is never idle. This value is updated every 5 seconds.gauge
The total number of times new values for a key have been rejected because the value limit has been reached.counter
The total number of times the Windows service has been installed.counter
The total number of times the Windows service has been restarted.counter
The total number of times the Windows service has been started.counter
The total number of times the Windows service has been stopped.counter
The total number of times the Windows service has been uninstalled.counter

Troubleshooting

More information in our troubleshooting guide:

How it works

Event-driven observability

Vector employs an event-driven observability strategy that ensures consistent and correlated telemetry data. You can read more about our approach in RFC 2064.

Log rate limiting

Vector rate limits log events in the hot path. This enables you to get granular insight without the risk of saturating IO and disrupting the service. The trade-off is that repetitive logs aren’t logged.

OSZAR »