Skip to main content

Partial Data Fetch

info

The English user guide is currently in beta preview. Most of the documents have been automatically translated from the Japanese version. Should you find any inaccuracies, please reach out to Flatt Security.

Typically, a single workflow contains multiple jobs, and each job inspects all auditable data. Therefore, if a workflow contains multiple jobs or there is a large amount of auditable data, the inspection may take a long time.

Shisho Cloud achieves faster inspections by enhancing the following features:

  • The ability to execute only some of the jobs in a workflow
  • The ability to retrieve only some of the auditable data and execute inspection code within a workflow job

This document describes optimization related to data fetching. Currently, the partial data fetching feature is automatically applied when certain conditions are met during partial workflow execution. For information on how to execute only specific jobs in a workflow, see this page.

warning

This feature is currently only available for a limited number of resources. We plan to expand support in the future.

info

The term "auditable resource" used repeatedly below is synonymous with auditable data. For the definition of a resource, see this page.

Specifications of Partial Data Fetch

The decide block and notify block of each job in a workflow contain the definition of a GraphQL query. Basically, all resources are retrieved according to the definition and passed to the policy code. For example, the following query retrieves all Compute Engine instances in a Google Cloud project.

{
googleCloud {
projects {
computeEngine {
instances {
metadata {
id
displayName
}
shieldedInstanceConfiguration {
enableSecureBoot
}
}
} 
}
}
}

This feature refers to the respective directives given to the GraphQL query and the GraphQL schema when executing the GraphQL query. If the target field meets the following conditions and the target resource ID is specified, only some of the resources are retrieved.

  1. The @canBePartial directive is given to the target field on the GraphQL schema.
  2. The resource kind is specified using the @resource directive on the target field of the GraphQL schema.
  3. The @mustBeTotal directive is not given to the target field on the GraphQL query.

The @mustBeTotal directive in 3 is described in detail in the Explicit Fetch of All Resources section.

type GoogleCloudComputeEngine {

...

"""
All Google Cloud Compute Engine instances
"""
instances(
condition: GoogleCloudComputeEngineCondition
): [GoogleCloudComputeEngineInstance!]!
@resource(kind: "googlecloud-ce-instance")
@canBePartial

...

}
info

For details on the GraphQL directives defined in Shisho Cloud, see this page.

Further Resource Narrowing by Locator

We have described how to narrow down auditable resources using the partial data fetching feature. Some resources can be further narrowed down using the @locatable directive given to the target field in the GraphQL schema.

info

For details on the @locatable directive, see this page.

In the following example, the scenarios (scenarios) field is defined as a child resource of "Web Application" ("user-web-application") in Shisho Cloud.

type Query {
"""
All data from web application integration
"""
webApps: [WebApp!]! @resource(kind: "user-web-application")
}

type WebApp {
"""
Scenarios that the finding relates to
"""
scenarios: [WebAppScenario!]! @locatable(parentKind: "user-web-application")

...
}

By specifying the locator information for each resource when partially executing a workflow, you can retrieve only the scenarios associated with a specific web application.

Explicit Fetch of All Resources

So far, we have seen how to narrow down resources. Now let's look at the following case.

{
aws {
accounts {
network {
vpcs {
routeTables { # Retrieve some of the route tables
id

...

}
}
}
ec2 {
instances {
vpc {
routeTables { # Retrieve some of the route tables
metadata {
id
}

...

}
}
}
}
}
}
}

In this case, if the conditions for narrowing down resources are met for both the aws.accounts.network.vpcs.routeTables field and the aws.accounts.ec2.instances.vpc.routeTables field, auditable resources are narrowed down for each field.

However, there should be cases where you want to pass all resources to the policy code. In such a case, you can retrieve all resources by giving the @mustBeTotal directive to the target field on the GraphQL query.

{
aws {
accounts {
network {
vpcs {
routeTables @mustBeTotal { # Always retrieve all route tables
id

...

}
}
}
ec2 {
instances {
vpc {
routeTables { # Retrieve some of the route tables
metadata {
id
}

...

}
}
}
}
}
}
}

If you are going to modify an existing workflow or create a custom workflow in the future, and the GraphQL query contains multiple fields with the same resource kind, please consider using this directive as needed.

info

For details on the @mustBeTotal directive, see this page.

Summary

The partial data fetching feature is automatically applied when certain conditions are met during partial workflow execution. Therefore, you can write workflows as usual without being particularly conscious of it, and it will be automatically optimized. There is also a way to explicitly avoid the partial data fetching feature using the @mustBeTotal directive. Please use this feature.