Threat Hunting with Intelligence Requirements: Hypothesis-Driven Sweeps That Actually Find Things

Most hunts are searches for the things you already know how to detect. A PIR driven hunting program asks the questions your detections cannot answer.

Threat hunting is one of those terms that has come to mean many different things to many different teams. For some teams it is whatever the SOC does between alerts. For others it is a rebrand of detection engineering. For a few it is a disciplined practice that produces findings detection cannot produce.

The disciplined version starts from intelligence requirements. If your team has Priority Intelligence Requirements that the business actually uses, those requirements are also the best possible source of hunt hypotheses. The PIR tells you what the business cares about. The hunt tries to falsify the comforting assumption that nothing is happening.

This post is about how to convert PIRs into hunts, how to structure the hunt, and how to measure whether the program is producing value or just consuming analyst time.

What a PIR driven hunt is

A PIR driven hunt is a structured search for evidence that would change the answer to a specific intelligence question. It has four required elements.

A hypothesis tied to a PIR. Not a search for badness in general. A search for one specific behavior that, if found, would update what the team tells the business about a named risk.

A defined data scope. The hosts, the data sources, and the time window the hunt will cover. The scope is set in advance and not changed during the hunt.

A defined falsification criterion. The hunt is designed to either find evidence or to confidently say none exists within the scope. The criterion for none exists is written before the queries run.

A defined output. Either a finding that becomes an incident, a detection requirement that becomes a rule, or a documented null result that retires the hypothesis for a defined period.

A hunt without all four elements is exploration. Exploration is fine. It is not a hunt.

A worked example

Suppose the PIR is the following. We need to understand whether adversary groups targeting our sector are abusing the OAuth consent grant flow to gain persistent access to our cloud productivity environment.

The PIR is a question. The hunt is a hypothesis. The hypothesis might be the following. We will find at least one OAuth application granted high privilege scopes by a user account in the last ninety days where the application publisher is not on our approved publisher list and the grant occurred outside business hours.

The hypothesis is specific. It names the data, the privilege level, the time window, the exclusion list, and the timing condition. Any analyst on the team could pick it up and run it without further guidance.

The data scope is clear. The cloud audit log for consent grant events. The last ninety days. All user accounts. The approved publisher list is a known artifact maintained by the cloud governance team.

The falsification criterion is written in advance. We will conclude no evidence if the cloud audit log returns no consent grant events that match the conditions, after we have confirmed that the log is complete for the time window by spot checking three known grants from the same period.

The defined output is one of three things. A finding, in which case we open an incident and pivot. A detection requirement, in which case we write a rule that fires on the same conditions going forward. A null result, in which case we document the hunt with the scope, the queries, and the date, and we retire the hypothesis for the next six months unless the PIR changes.

Hypothesis generation that does not run dry

Teams worry about running out of hypotheses. In practice the bottleneck is structure, not creativity. Three sources reliably produce more hypotheses than any team can run.

The PIR list itself. Every active PIR can be decomposed into three or four hypotheses. A PIR about ransomware can become a hunt for staging directories, a hunt for unusual archive creation, a hunt for shadow copy deletion, and a hunt for backup configuration changes. Each is independent.

Recent intelligence reports. Every finished report should be reviewed for hypotheses. If the report says adversaries are increasingly using a specific living off the land binary, the hunt is whether that binary has been invoked in your environment with the parameters described.

Public incident retrospectives. When a peer organization publishes a postmortem with technical detail, the hunt is whether the same behavior has occurred in your environment under your current detections.

A small backlog board with one card per hypothesis, tagged to the PIR it serves, sized in hours, and assigned an owner, is the only project management instrument the program needs.

The structure of a single hunt engagement

A single hunt should fit in a defined time window. A common shape is the following.

Day one is scoping. The hunter confirms the hypothesis, the scope, and the falsification criterion. They review the data sources and confirm coverage. They write the planned queries in advance and validate them against known events to confirm they return what is expected.

Days two and three are execution. The hunter runs the queries, reviews the results, pivots into related data when results require it, and either confirms a finding or rules out the hypothesis.

Day four is writeup. The hunter writes a short report. The hypothesis. The scope. The queries actually used. The findings or the documented null result. The follow up actions, including any detection rule proposed and any change to the PIR.

A hunt that drags past four working days is usually a hunt without a clear falsification criterion. Pull it back. Resize. Rerun.

Tooling that helps, tooling that does not

The technology debate around threat hunting often inflates the importance of platforms and underweights the importance of disciplined queries. Three things matter operationally.

Centralized log search with adequate retention. The data has to be queryable. Ninety days is a useful minimum for most cloud and identity hunts. Endpoint behavioral telemetry often needs thirty days at a minimum.

A notebook or query history that survives the engagement. Whatever the hunter ran has to be re runnable by the next analyst. A copy paste into a wiki page works. A proper notebook environment works better.

A small library of pivot queries by data source. When a hunt produces a candidate event, the analyst should be able to pivot to all related events from the same host, the same user, and the same time window with two clicks. Build the pivot queries once and reuse them.

What does not matter as much as vendor decks suggest. AI assisted hypothesis generation. Graph visualization platforms. Pre packaged hunt libraries from feeds the team has not validated. These are nice to have. They are not the difference between a program that finds things and a program that does not.

Measuring whether the program is working

The hardest part of running a hunting program is justifying its budget. The wrong metric is hunts run per quarter. That measures activity, not value.

Three better metrics.

Detection rules produced from hunts. Every null result that confirms an absence is still an opportunity to write a rule that will alert if the absence ever changes. Every finding hunt produces obvious rules. Count the rules and track their precision in production.

Incidents discovered by hunts. These are the wins that pay the program's salary. A handful per year is normal for a healthy program. Zero in a year is a warning sign about hypothesis quality or data coverage.

Reduction in time to detect for the categories the program covers. Compare the dwell time on incidents discovered through the hunt program with the dwell time on incidents in the same category discovered through alerts. If the hunt program is consistently shortening dwell on its categories, the program is paying for itself even when individual hunts produce null results.

The point

A threat hunting program built on intelligence requirements is a falsification engine. It takes the questions the business cares about, turns them into specific hypotheses, runs disciplined searches, and produces findings, detections, or documented null results. The structure does not require expensive tooling. It requires a backlog tied to PIRs, a four day engagement shape, a few well kept pivot queries, and three metrics that map to business outcomes. Done that way, hunting stops being a vague good and becomes a defensible practice.

Share this post