SiftD Query Engine (SQE) App for Splunk

The SQE App for Splunk enables the real-time aggregate analysis of logs, metrics, and other operational information across multiple providers, including Datadog, AWS Cloudwatch, GCP Logging, Grafana, Kubernetes, Github, and more.

Installation

You need have the proper administrative capabilities on your Splunk clusters to install apps.
SQE only needs to be to installed on Splunk's search head tier. The default app package supports Linux servers running on x86-64. For other platforms, please contact us at info@siftd.ai
SQE works on Splunk Enterprise (customer hosted) or Splunk Cloud. However, SQE is not yet available as a public app for Splunk Cloud. To use in a Splunk Cloud environment, you will need to deploy it as a private App.

Getting Started

Set up your provider connections in the 'Connections' tab. See provider specific details here
Use the 'Explorer' tab to discover resource names (rn) from your providers

Query Syntax

Logs

| siftd <OPTIONS> [FILTER.<provider_type>="<provider-specific-filter>"] [FILTER <generic-filter>]

Metrics

| siftd <OPTIONS> [METRICS_QUERY <provider specific query>]

Options

Option	Description	Default	Required
`rn`	Comma-separated list of resource URIs	-	Yes
`query_type`	Type of query (`search` or `metrics`)	`"search"`	No
`minPageSize`	Minimum chunk size for paging events	100	No
`maxPageSize`	Maximum chunk size for paging events	10000	No
`limit`	Maximum number of events to return	100000	No
`step`	Duration for metrics queries (`[h	m	s
`timeout`	Provider connection timeout in seconds	60	No
`earliest`	Earliest time in seconds.subseconds	Splunk time picker	No
`latest`	Latest time in seconds.subseconds	Splunk time picker	No
`maxEventSize`	Maximum raw event size (0 for unlimited)	10K	No
`conn.<cxn_name>.filter_by`	List of tag=value to filter metric results by	""	No
`conn.<cxn_name>.<agg_fn>_by`	List of tag entries to group metric results by, aggregated by `<agg_fn>`	""	No

where:

cxn_name is the connection name
<agg_fn> is one of sum, min, max, avg.
Enclose the conn.<cxn_name>.xxxx options in double quotes to avoid interpretation by splunk.

Resource URI Format

Resource URIs follow this structure:

<cxn_name>://<resource_path>

cxn_name: Connection name from your siftd-connections.conf file
resource_path: Path discoverable in the Explorer tab

Note: For query_type="metrics", only a single resource URI is allowed. For query_type="search", multiple comma-separated URIs are supported.

Filters

SiftD supports two types of filters:

Provider-specific filters (FILTER.<provider_type>)
- Passed directly to the provider
- Must be enclosed in double quotes
- Must be specified before any generic filters
- Syntax follows provider's API documentation
Generic filters (FILTER)
- Uses SPL-like boolean keyword expressions
- Supports AND/OR/NOT operators
- Automatically translated for each provider

Provider specific metric queries

To write more complex metric queries, use the METRICS_QUERY escape hatch.

This is a query in the provider-specific language - promql for p8s or the datadog query language for a datadog connection - that is used as is. Enclose this in double-quotes to avoid any interpretation by splunk.

When METRICS_QUERY is specified, conn.<cxn_name>.filter_by and conn.<cxn_name>.<agg_fn>_by are ignored and the given METRICS_QUERY is passed to the provider.

For convenience, to avoid repeating the metric name (which should already be provided as part of rn="..." option), use __1 to represent the metric name.

See examples below

Examples

Basic Log Search

Search logs across AWS and GCP providers:

| siftd query_type="search" 
    rn="aws0://group/accessloggroup/stream/accesslogstream,gcp0://projects/myk8slogs"

Filtered Log Search

Search with keyword filters and result limit:

| siftd query_type=search 
    rn="aws0://group/accessloggroup/stream/accesslogstream,gcp0://projects/myk8slogs" 
    limit=1000 
    FILTER (ERROR OR WARN) AND sourcetype=access_combined

Mixed Filter Types

Using provider-specific and generic filters:

| siftd query_type=search 
    rn="aws0://group/accessloggroup/stream/accesslogstream,gcp0://projects/myk8slogs" 
    FILTER.aws "(err or warning) and (sourcetype=access_combined)"
    FILTER (ERROR OR WARN) AND (sourcetype=access_combined)

Metrics Queries

Query Datadog for k8s container CPU metrics:

| siftd query_type=metrics
    rn="mydog://metrics/container.cpu.system"
    conn.mydog.avg_by=container_name
    conn.mydog.filter_by="container_name:*kube*"

Query Prometheus for Kubernetes cpu usage for a specific container:

| siftd query_type=metrics 
    rn="p8s0://metrics/kubernetes_io:container_cpu_core_usage_time"
    conn.p8s0.filter_by="pod_name=~.*otel.*,container_name=otel-collector"

Query Prometheus for Kubernetes cpu usage using promql

| siftd query_type=metrics
    rn="p8s0://metrics/kubernetes_io:container_cpu_core_usage_time"
    METRICS_QUERY "sum by (pod_name) (__1{pod_name=~'.*otel.*'})"

Role Based Resource Filters

Administrators of the SQE App can control which connections and subset of resources within those connections are accessible to different Splunk users based on their role(s).
To configure, go to the Resource Filters page under the app's "Settings" dropdown menu.
- On this page, you can create associations between Splunk roles (like admin, power, user, or any Splunk configured role) and a regex that is applied to resource URIs with the <cxn_name>://<resource_path> pattern.
- For example, by default, the admin roles allows access to every connection and every resource within those connections, and this rule is expressed as a regex pattern of .*:.*. A pattern of gcp.*:.* would mean access to all resources under connections that are named with gcp as prefix.
Because Splunk users can have many roles, the effective filter for any user is a disjunction (logical ORing) of the access of all of their roles.
- I.e., a user will have access to a resource if the URI for that resources matches the configured regex for any of their roles.

Additional Information

Once events are retrieved, you can use the full power of the SPL search pipeline for further analysis.
The siftd command can be used anywhere that a generating Splunk search command can be used, including in subsearches.
For authoring queries, it's recommended to use the Explorer tab
The "Event Sampling" option on the Splunk Search page currently has no impact on siftd queries. It is possible that additional providers in the future will support this option.
Contact us at info@siftd.ai with questions or suggestions for improvement, or to discuss options for enterprise support.

Configuring Connections

Most connection types require you to specify credentials such as an API key. Those credentials need to be linked with a user or service account that has sufficient privilieges to read or list the resources that you want to be able to search through the siftd command. Details for each provider type below. Note that you can configure multiple connections with the same connection type with different credentials

Connection configurations can be set via the SQE App's Connections page, or manually configured in the $SPLUNK_HOME/apps/siftd-query-engine/local/siftd-connections.conf Note that the authentican token setting is stored separated in passwords.conf. It is strongly recommended to use the app's UI to edit configurations for this reason.

AWS CloudWatch Logs

provider = aws
Credentials Format: {"Region":<AWS_REGION>,"AwsAccessKeyId":<ACCESS_KEY_ID>,"AwsSecretAccessKey":<SECRET_ACCESS_KEY>}
Additional Required Settings: none

Getting your AWS Credentials

Log into your AWS console
Navigate to IAM->Security Credentials
In the "Access Keys" section, click on "Create access key"
Follow prompts to create access key. Fill in the region, accessKeyId and secretAccessKey in the credentials format referenced above and pasted into the Credentials edit box for your connection.

Azure Monitor Logs

provider = azr
Credentials Format: {"TenantId":<TENANT_ID>,"WorkspaceId":<WORKSPACE_ID>,"ClientId":<CLIENT_ID>,"ClientSecret":<CLIENT_SECRET>}
Additional Required Settings: none

Getting your Azure Credentials

Follow the instructions to register SQE as an app in Microsoft Entra to treat it as a service principal. Generate a client secret for it.
From the app overview page in Entra, copy ClientId and TenantId. Also copy the ClientSecret that you generated.
Follow the instructions provided here to add required permissions for the app.
In addition to the Data.Read permissions, SQE also requires the Reader access to your log analytics workspace to properly list available logging resources and user_impersonation to be able to use the Query Api
Copy the WorkspaceId for the workspace you provided access to above.
Fill in the relevant IDs and Secrets into the credentials format specified above

Datadog

provider = ddog
Credentials Format: {"api_key": <API_KEY>, "application_key": <APPLICATION_KEY>}
Additional Required Settings: api_server (e.g. us5.datadoghq.com)

Getting your Datadog Credentials

Follow the instructions here

GitHub

provider = github
Credentials Format: <GITHUB_PERSONAL_ACCESS_TOKEN>
Additional Required Settings: org_name (e.g. siftd)

Getting your Github Credentials

Follow the instructions here

Google Cloud Logging

provider = gcp
Credentials Format: {"type":"service_account","project_id":<PROJECT>,"private_key_id":<ID>,"private_key":<PRIVATE_KEY>, ...}
Additional Required Settings: none

Getting your GCP Credentials

Go to your Google Cloud Console
Create a new service account that has at least the persmission of the "Logs Viewer" role (e.g. "Logging Admin" also works). See here
Create a new service account JSON key
Paste in the entire contents of the private key JSON file as the Credentials (see format above)

Grafana Server

provider = grafana
Credentials Format: <GRAFANA_API_KEY>
Additional Required Settings: server_url (e.g. http://localhost:3000)

Getting your Grafana Server API Token

In your Grafana UI, Navigate to Dashboard Settings -> Administration -> Users and Access -> Service Accounts
Select a service account you would like to use or create a new one by click on Add Service
Open the page for your Service Account and clikc on Add Service Account Token -> Generate Token
Paste this token into the Credentials field in your connection setup

Jenkins Automation Server

provider = jenkins
Credentials Format: <JENKINS_API_KEY>
Additional Required Settings: server_url (e.g. http://localhost:8280), username (e.g. admin)

Getting your Jenkins API Token

See documentation here

Kubernetes Cluster

provider = k8s
Credentials Format: <SERVICE_ACCOUNT_TOKEN>
Additional Required Settings: api_server (address of your kubernetes cluster's API server, e.g. https://127.0.0.1)

Setting up your Kubernetes Cluster Access Token

This process assumes you have kubectl set up already to manage your cluster. E.g. for a cluster hosted on GKE, the directions are here
Create a service account on your cluster:

E.g.
kubectl create serviceaccount pod-log-reader -n default

Create a role with necessary permissions:

kubectl apply -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: pod-log-reader-role
rules:
- apiGroups: [""] 
  resources: ["namespaces", "nodes"] 
  verbs: ["get", "list"]
- apiGroups: [""]
  resources: ["pods", "pods/log"]
  verbs: ["get", "list", "watch"]
EOF

Attach teh role to the service account

kubectl apply -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: pod-log-reader-binding
subjects:
- kind: ServiceAccount
  name: pod-log-reader
  namespace: default
roleRef:
  kind: ClusterRole
  name: pod-log-reader-role
  apiGroup: rbac.authorization.k8s.io
EOF

Create a secret for the service account

kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: pod-log-reader-secret
  annotations:
    kubernetes.io/service-account.name: pod-log-reader
type: kubernetes.io/service-account-token
EOF

Retrieve the secret for the service account

kubectl describe secrets/pod-log-reader-secret

---------------------
Example output:
---------------------

Name:         pod-log-reader-secret
Namespace:    default
Labels:       kubernetes.io/legacy-token-last-used=2024-07-20
Annotations:  kubernetes.io/service-account.name: pod-log-reader
              kubernetes.io/service-account.uid: 67e3471d-1be2-48a4-9d92-1fa2f7e96d9d

Type:  kubernetes.io/service-account-token

Data
====
ca.crt:     1509 bytes
namespace:  7 bytes
token:      ...

Copy the token field in the Data part of the output. This is the bearer token that should be used as your connection Credentail.

Prometheus

SQE supports either a standalone server or the GCP managed prometheus service

provider = p8s
Credentials Format:
- For standalone server {"username": "<username>", "password": "<secret>"}
- For GCP managed prometheus use the downloaded service_account key json (see configuring Google Cloud above)
Additional Required Settings:
- server_url (e.g https://monitoring.googleapis.com/v1/projects/my-project/location/global/prometheus or http://localhost:9090)
- type (gcp or server)

Installation​

Getting Started​

Query Syntax​

Logs​

Metrics​

Options​

Resource URI Format​

Filters​

Provider specific metric queries​

Examples​

Basic Log Search​

Filtered Log Search​

Mixed Filter Types​

Metrics Queries​

Role Based Resource Filters​

Additional Information​

Configuring Connections​

AWS CloudWatch Logs​

Getting your AWS Credentials​

Azure Monitor Logs​

Getting your Azure Credentials​

Datadog​

Getting your Datadog Credentials​

GitHub​

Getting your Github Credentials​

Google Cloud Logging​

Getting your GCP Credentials​

Grafana Server​

Getting your Grafana Server API Token​

Jenkins Automation Server​

Getting your Jenkins API Token​

Kubernetes Cluster​

Setting up your Kubernetes Cluster Access Token​

Prometheus​