Blackbox Engine Improvements

February 4, 2026 · 3 min read

Software Engineer @ GMO Flatt Security Inc.

We've introduced improvements to enhance both recall and precision rate of the Takumi Blackbox Engine.

Overview

We've optimized verification methods for each vulnerability type to improve precision. Additionally, we've improved recall rate by incorporating information reported by the crawling agent into the pentest agent's context earlier and more efficiently.

Verification Strategies by Vulnerability Type

The Takumi Blackbox Assessment Engine consists of three main phases:

The crawling agent discovers assessment targets
The pentest agent discovers vulnerabilities and attempts exploitation
The verification agent investigates whether discovered issues can be exploited

In the third phase, we've addressed false positives (reports of non-existent or non-reproducible vulnerabilities) caused by LLM hallucination by preparing dedicated non-LLM programs for each vulnerability type. However, in cases where judgment criteria differ depending on the application's specific threat model or defense implementation, such approaches can be difficult to implement, or even when implemented, can lead to false negatives (missing existing vulnerabilities).

In this update, we've expanded the conventional non-LLM verification programs while introducing new verification methods for cases requiring greater flexibility.

Tier 1:

For vulnerabilities where attack success can be deterministically determined from responses, we use conventional automated verification without LLMs. We've now added validators for SQL Injection and Server-Side Template Injection.

Vulnerability	Verification Method
SQLi	Time-based detection using delay payloads and response time measurement
SSTI	Searching proxy logs for evaluated mathematical expressions (e.g., the product of two numbers)

Tier 2:

For vulnerabilities where deterministic validation is difficult but exploitation conditions stay the same independent of application context, we've introduced verification by LLM agents equipped with checklists. Here are some examples:

Vulnerability	What to Check
CORS	Whether the combination of Access-Control-* headers is vulnerable and the response contains sensitive information
CSRF	Whether the combination of Cookie SameSite attribute, Content-Type, and HTTP method is vulnerable, and the request involves a state transition

These agents do not have access to the test environment, and just approve or reject given vulnerabilities based on these criteria

Tier 3: Focused-Perspective Validation

For vulnerabilities where attack methods and exploitation conditions vary by application, neither verification method is suitable. For example, authentication vulnerabilities have widely varying exploitation conditions depending on the application's specifications.

For such vulnerabilities, we decompose testing perspectives down to a granularity where exploitation conditions are clear (similar to CORS and CSRF), and then verify the following two points:

Whether the discovered vulnerability deviates from the specified perspective
Whether the discovered vulnerability meets the specified exploitation conditions

Improving Recall Through Early Parameter Collection

Previously, the pentest agent collected parameter information on its own during assessment. With this improvement, parameter information contained in URLs and request bodies is now systematically collected during the crawling phase and pre-included in the pentest agent's context. This enables more efficient target understanding and attack vector enumeration, improving recall rate.

Availability

These improvements are already available to all Takumi by GMO users. Simply start a blackbox assessment to use the latest features.

Overview​

Verification Strategies by Vulnerability Type​

Tier 1:​

Tier 2:​

Tier 3: Focused-Perspective Validation​

Improving Recall Through Early Parameter Collection​

Availability​