Skip to main content

SiftD Query Engine (SQE) App for Splunk

The SQE App for Splunk enables the real-time aggregate analysis of logs, metrics, and other operational information across multiple providers, including Datadog, AWS Cloudwatch, GCP Logging, Grafana, Kubernetes, Github, and more.

Installation

  • You need have the proper administrative capabilities on your Splunk clusters to install apps.
  • SQE only needs to be to installed on Splunk's search head tier. The default app package supports Linux servers running on x86-64. For other platforms, please contact us at info@siftd.ai
  • SQE works on Splunk Enterprise (customer hosted) or Splunk Cloud. However, SQE is not yet available as a public app for Splunk Cloud. To use in a Splunk Cloud environment, you will need to deploy it as a private App.

Getting Started

  1. Set up your provider connections in the 'Connections' tab. See provider specific details here
  2. Use the 'Explorer' tab to discover resource names (rn) from your providers

Query Syntax

Logs

| siftd <OPTIONS> [FILTER.<provider_type>="<provider-specific-filter>"] [FILTER <generic-filter>]

Metrics

| siftd <OPTIONS> [METRICS_QUERY <provider specific query>]

Options

OptionDescriptionDefaultRequired
rnComma-separated list of resource URIs-Yes
query_typeType of query (search or metrics)"search"No
minPageSizeMinimum chunk size for paging events100No
maxPageSizeMaximum chunk size for paging events10000No
limitMaximum number of events to return100000No
stepDuration for metrics queries (`[hms
timeoutProvider connection timeout in seconds60No
earliestEarliest time in seconds.subsecondsSplunk time pickerNo
latestLatest time in seconds.subsecondsSplunk time pickerNo
maxEventSizeMaximum raw event size (0 for unlimited)10KNo
conn.<cxn_name>.filter_byList of tag=value to filter metric results by""No
conn.<cxn_name>.<agg_fn>_byList of tag entries to group metric results by, aggregated by <agg_fn>""No

where:

  • cxn_name is the connection name
  • <agg_fn> is one of sum, min, max, avg.
  • Enclose the conn.<cxn_name>.xxxx options in double quotes to avoid interpretation by splunk.

Resource URI Format

Resource URIs follow this structure:

<cxn_name>://<resource_path>
  • cxn_name: Connection name from your siftd-connections.conf file
  • resource_path: Path discoverable in the Explorer tab

Note: For query_type="metrics", only a single resource URI is allowed. For query_type="search", multiple comma-separated URIs are supported.

Filters

SiftD supports two types of filters:

  1. Provider-specific filters (FILTER.<provider_type>)

    • Passed directly to the provider
    • Must be enclosed in double quotes
    • Must be specified before any generic filters
    • Syntax follows provider's API documentation
  2. Generic filters (FILTER)

    • Uses SPL-like boolean keyword expressions
    • Supports AND/OR/NOT operators
    • Automatically translated for each provider

Provider specific metric queries

To write more complex metric queries, use the METRICS_QUERY escape hatch.

This is a query in the provider-specific language - promql for p8s or the datadog query language for a datadog connection - that is used as is. Enclose this in double-quotes to avoid any interpretation by splunk.

When METRICS_QUERY is specified, conn.<cxn_name>.filter_by and conn.<cxn_name>.<agg_fn>_by are ignored and the given METRICS_QUERY is passed to the provider.

For convenience, to avoid repeating the metric name (which should already be provided as part of rn="..." option), use __1 to represent the metric name.

See examples below

Examples

Search logs across AWS and GCP providers:

| siftd query_type="search" 
rn="aws0://group/accessloggroup/stream/accesslogstream,gcp0://projects/myk8slogs"

Search with keyword filters and result limit:

| siftd query_type=search 
rn="aws0://group/accessloggroup/stream/accesslogstream,gcp0://projects/myk8slogs"
limit=1000
FILTER (ERROR OR WARN) AND sourcetype=access_combined

Mixed Filter Types

Using provider-specific and generic filters:

| siftd query_type=search 
rn="aws0://group/accessloggroup/stream/accesslogstream,gcp0://projects/myk8slogs"
FILTER.aws "(err or warning) and (sourcetype=access_combined)"
FILTER (ERROR OR WARN) AND (sourcetype=access_combined)

Metrics Queries

Query Datadog for k8s container CPU metrics:

| siftd query_type=metrics
rn="mydog://metrics/container.cpu.system"
conn.mydog.avg_by=container_name
conn.mydog.filter_by="container_name:*kube*"

Query Prometheus for Kubernetes cpu usage for a specific container:

| siftd query_type=metrics 
rn="p8s0://metrics/kubernetes_io:container_cpu_core_usage_time"
conn.p8s0.filter_by="pod_name=~.*otel.*,container_name=otel-collector"

Query Prometheus for Kubernetes cpu usage using promql

| siftd query_type=metrics
rn="p8s0://metrics/kubernetes_io:container_cpu_core_usage_time"
METRICS_QUERY "sum by (pod_name) (__1{pod_name=~'.*otel.*'})"

Role Based Resource Filters

  • Administrators of the SQE App can control which connections and subset of resources within those connections are accessible to different Splunk users based on their role(s).
  • To configure, go to the Resource Filters page under the app's "Settings" dropdown menu.
    • On this page, you can create associations between Splunk roles (like admin, power, user, or any Splunk configured role) and a regex that is applied to resource URIs with the <cxn_name>://<resource_path> pattern.
    • For example, by default, the admin roles allows access to every connection and every resource within those connections, and this rule is expressed as a regex pattern of .*:.*. A pattern of gcp.*:.* would mean access to all resources under connections that are named with gcp as prefix.
  • Because Splunk users can have many roles, the effective filter for any user is a disjunction (logical ORing) of the access of all of their roles.
    • I.e., a user will have access to a resource if the URI for that resources matches the configured regex for any of their roles.

Additional Information

  • Once events are retrieved, you can use the full power of the SPL search pipeline for further analysis.
  • The siftd command can be used anywhere that a generating Splunk search command can be used, including in subsearches.
  • For authoring queries, it's recommended to use the Explorer tab
  • The "Event Sampling" option on the Splunk Search page currently has no impact on siftd queries. It is possible that additional providers in the future will support this option.
  • Contact us at info@siftd.ai with questions or suggestions for improvement, or to discuss options for enterprise support.

Configuring Connections

Most connection types require you to specify credentials such as an API key. Those credentials need to be linked with a user or service account that has sufficient privilieges to read or list the resources that you want to be able to search through the siftd command. Details for each provider type below. Note that you can configure multiple connections with the same connection type with different credentials

Connection configurations can be set via the SQE App's Connections page, or manually configured in the $SPLUNK_HOME/apps/siftd-query-engine/local/siftd-connections.conf Note that the authentican token setting is stored separated in passwords.conf. It is strongly recommended to use the app's UI to edit configurations for this reason.


AWS CloudWatch Logs

  • provider = aws
  • Credentials Format: {"Region":<AWS_REGION>,"AwsAccessKeyId":<ACCESS_KEY_ID>,"AwsSecretAccessKey":<SECRET_ACCESS_KEY>}
  • Additional Required Settings: none

Getting your AWS Credentials

  1. Log into your AWS console
  2. Navigate to IAM->Security Credentials
  3. In the "Access Keys" section, click on "Create access key"
  4. Follow prompts to create access key. Fill in the region, accessKeyId and secretAccessKey in the credentials format referenced above and pasted into the Credentials edit box for your connection.

Azure Monitor Logs

  • provider = azr
  • Credentials Format: {"TenantId":<TENANT_ID>,"WorkspaceId":<WORKSPACE_ID>,"ClientId":<CLIENT_ID>,"ClientSecret":<CLIENT_SECRET>}
  • Additional Required Settings: none

Getting your Azure Credentials

  1. Follow the instructions to register SQE as an app in Microsoft Entra to treat it as a service principal. Generate a client secret for it.
  2. From the app overview page in Entra, copy ClientId and TenantId. Also copy the ClientSecret that you generated.
  3. Follow the instructions provided here to add required permissions for the app.
  4. In addition to the Data.Read permissions, SQE also requires the Reader access to your log analytics workspace to properly list available logging resources and user_impersonation to be able to use the Query Api
  5. Copy the WorkspaceId for the workspace you provided access to above.
  6. Fill in the relevant IDs and Secrets into the credentials format specified above

Datadog

  • provider = ddog
  • Credentials Format: {"api_key": <API_KEY>, "application_key": <APPLICATION_KEY>}
  • Additional Required Settings: api_server (e.g. us5.datadoghq.com)

Getting your Datadog Credentials

  1. Follow the instructions here

GitHub

  • provider = github
  • Credentials Format: <GITHUB_PERSONAL_ACCESS_TOKEN>
  • Additional Required Settings: org_name (e.g. siftd)

Getting your Github Credentials

  1. Follow the instructions here

Google Cloud Logging

  • provider = gcp
  • Credentials Format: {"type":"service_account","project_id":<PROJECT>,"private_key_id":<ID>,"private_key":<PRIVATE_KEY>, ...}
  • Additional Required Settings: none

Getting your GCP Credentials

  1. Go to your Google Cloud Console
  2. Create a new service account that has at least the persmission of the "Logs Viewer" role (e.g. "Logging Admin" also works). See here
  3. Create a new service account JSON key
  4. Paste in the entire contents of the private key JSON file as the Credentials (see format above)

Grafana Server

  • provider = grafana
  • Credentials Format: <GRAFANA_API_KEY>
  • Additional Required Settings: server_url (e.g. http://localhost:3000)

Getting your Grafana Server API Token

  1. In your Grafana UI, Navigate to Dashboard Settings -> Administration -> Users and Access -> Service Accounts
  2. Select a service account you would like to use or create a new one by click on Add Service
  3. Open the page for your Service Account and clikc on Add Service Account Token -> Generate Token
  4. Paste this token into the Credentials field in your connection setup

Jenkins Automation Server

  • provider = jenkins
  • Credentials Format: <JENKINS_API_KEY>
  • Additional Required Settings: server_url (e.g. http://localhost:8280), username (e.g. admin)

Getting your Jenkins API Token

  1. See documentation here

Kubernetes Cluster

  • provider = k8s
  • Credentials Format: <SERVICE_ACCOUNT_TOKEN>
  • Additional Required Settings: api_server (address of your kubernetes cluster's API server, e.g. https://127.0.0.1)

Setting up your Kubernetes Cluster Access Token

  1. This process assumes you have kubectl set up already to manage your cluster. E.g. for a cluster hosted on GKE, the directions are here
  2. Create a service account on your cluster:
E.g.
kubectl create serviceaccount pod-log-reader -n default
  1. Create a role with necessary permissions:
kubectl apply -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: pod-log-reader-role
rules:
- apiGroups: [""]
resources: ["namespaces", "nodes"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["pods", "pods/log"]
verbs: ["get", "list", "watch"]
EOF
  1. Attach teh role to the service account
kubectl apply -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: pod-log-reader-binding
subjects:
- kind: ServiceAccount
name: pod-log-reader
namespace: default
roleRef:
kind: ClusterRole
name: pod-log-reader-role
apiGroup: rbac.authorization.k8s.io
EOF
  1. Create a secret for the service account
kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
name: pod-log-reader-secret
annotations:
kubernetes.io/service-account.name: pod-log-reader
type: kubernetes.io/service-account-token
EOF
  1. Retrieve the secret for the service account
kubectl describe secrets/pod-log-reader-secret

---------------------
Example output:
---------------------

Name: pod-log-reader-secret
Namespace: default
Labels: kubernetes.io/legacy-token-last-used=2024-07-20
Annotations: kubernetes.io/service-account.name: pod-log-reader
kubernetes.io/service-account.uid: 67e3471d-1be2-48a4-9d92-1fa2f7e96d9d

Type: kubernetes.io/service-account-token

Data
====
ca.crt: 1509 bytes
namespace: 7 bytes
token: ...
  1. Copy the token field in the Data part of the output. This is the bearer token that should be used as your connection Credentail.

Prometheus

SQE supports either a standalone server or the GCP managed prometheus service

  • provider = p8s
  • Credentials Format:
    • For standalone server {"username": "<username>", "password": "<secret>"}
    • For GCP managed prometheus use the downloaded service_account key json (see configuring Google Cloud above)
  • Additional Required Settings:
    • server_url (e.g https://monitoring.googleapis.com/v1/projects/my-project/location/global/prometheus or http://localhost:9090)
    • type (gcp or server)