Dark Web Monitoring for CTI: Collection Tradecraft for Underground Markets

Dark web monitoring is the most oversold capability in CTI. Done right, it is also one of the most useful. Here is the collection tradecraft that separates the two.

Dark web monitoring is the capability most likely to be over promised in a sales meeting and under delivered in production. Most platforms market themselves as comprehensive coverage of underground forums. In reality, coverage is partial, the signal to noise ratio is low, and the value depends almost entirely on the human tradecraft applied to the results.

Done well, dark web monitoring produces three kinds of value. It detects compromised credentials and data before the rest of the world sees them. It identifies sale or rent offers for access into your environment or your sector. It provides ground truth about which actors are active, what tooling they are advertising, and what services they are buying.

Done poorly, it generates a daily digest of three hundred mentions of your company name that no one reads.

This post is about the collection tradecraft that separates the two outcomes.

What dark web actually means

The term is loose. For collection purposes it covers four distinct surfaces.

Tor hidden services. The marketplaces, forums, and leak sites that run on the onion network. Tradition holds that this is the dark web. In practice it has shifted in recent years toward private and access controlled forums on the same network.

Invite only forums on the clear web. Many active criminal forums run on regular HTTPS but require an invitation or a reputation gate to access content beyond the front page. These are sometimes called deep web. The distinction does not matter operationally.

Telegram channels and groups. A significant share of operational underground commerce has moved to Telegram. Some channels are public, many are private, and the most useful are paid subscription or vouched only. Telegram is now arguably the largest single source of underground criminal commerce by volume.

Niche platforms. Discord servers for specific tools, paste sites used as drop zones, encrypted notes, and short lived sites that exist for hours during a campaign. These are the long tail. Most automated platforms miss them.

A monitoring program that covers only Tor hidden services is incomplete. A program that covers all four surfaces is harder to build and far more useful.

The collection plan that drives everything

Like any CTI capability, dark web monitoring should be driven by a written collection plan tied to your PIRs. Without the plan, the program defaults to keyword alerting on your brand name and on your domain, which produces volume without insight.

A useful collection plan has four columns.

The intelligence question. For example, are credentials for our remote access infrastructure being sold on access broker markets.

The surface to collect from. For example, the named access broker forums on Tor and the two Telegram channels known to broker similar listings.

The keywords, sellers, and patterns to watch. For example, the company name, common misspellings, the registered domain, the VPN appliance vendor name, and the names of three known brokers who specialize in this geography.

The escalation rule. For example, if a credible sounding listing appears, escalate within four hours to the IT operations lead for credential rotation and forensic check.

A plan with twenty rows is enough to start. The plan is reviewed quarterly. Rows that produce no value are retired. Rows that produce false positives are tuned. Rows that produce missed signals get new keywords.

Persona hygiene that does not get you burned

Some collection requires presence. A passive scraper can read what is public. Many forums and channels require a registered account with reputation and history.

Persona work has its own discipline. Treat it as a small program of its own.

Each persona has a single purpose. Do not use the same persona for multiple forums, multiple sectors, or multiple operations. Cross contamination is how personas get burned.

Each persona has separate infrastructure. Dedicated browser profile, dedicated VPN exit, dedicated email address, dedicated phone number where required, dedicated cryptocurrency wallet where required. No infrastructure is reused across personas.

Each persona has a written backstory and a behavior plan. When the persona joined, how, what they have posted, what they are interested in, when they are typically active. The backstory is referenced before every interaction. Inconsistency burns personas faster than anything else.

No active engagement without legal review. Lurking is one thing. Posting, buying samples, or messaging sellers crosses into legal territory that varies by jurisdiction and by company policy. The legal review happens once for each engagement category, not for each interaction.

A program without persona discipline either avoids restricted forums altogether or burns personas every few months. Both outcomes reduce coverage.

Filtering for signal in the noise

The volume problem is real. A monitoring stack that surfaces every mention of the company name will produce hundreds of hits per day for a mid sized enterprise. Most are duplicates, scrapes, or trivial mentions.

A useful filtering pipeline has three stages.

First, deduplicate by post content hash across surfaces. Most underground content is reposted across multiple forums and channels within hours. One canonical copy per post is enough.

Second, classify by intent. Is the post a sale, a rent, a leak, a recruitment, a discussion, or a scrape from another source. A simple keyword and pattern classifier gets ninety percent of this right. An LLM classifier can get the rest if the cost is acceptable. The classification drives downstream priority.

Third, score by credibility. Posts from accounts with established reputation in the relevant forum, from accounts with prior verified listings, or with technical detail that matches your environment, score higher. Posts from new accounts with vague claims score lower. The score is not perfect. It is the difference between five hits per day for a human to review and three hundred.

The reviewer at the end of the pipeline is the most expensive resource in the chain. Optimize for their time, not for completeness of the pipeline output.

What to do with a credible hit

A credible hit follows a small playbook. Three pieces.

Verify. Use whatever evidence the seller has offered, in the sample they provide or in the screenshots they post, to confirm that the data or the access is real and is yours. Verification often requires creative pivot work. A leaked file's metadata, a screenshot's wallpaper, a sample employee's identifier. The verification is the most important step. Acting on an unverified claim wastes incident response cycles and burns trust with the business owners you call.

Escalate. The escalation target depends on what was found. Compromised credentials go to identity operations for rotation and forensic review. Sold access to a known appliance goes to network operations and the appliance vendor. Sold or leaked data goes to legal and to communications. Each path is written in advance.

Track. Follow the post. If it is taken down, follow the seller. If the seller reposts or sells to a new buyer, the lifecycle continues. The verified hit is the start of an engagement, not the end.

What to expect from vendors

Vendor monitoring services have a place. They scale collection in ways that small teams cannot. They also have predictable limitations.

They underperform on access controlled forums and on Telegram. The places with the highest value content are the hardest to scrape and the most likely to require persona work that vendors do not do at scale.

They overperform on volume. The daily digest will be larger than the in house pipeline produces. Most of the extra volume is noise that an internal classifier would filter out.

They underperform on context. A vendor knows their feed. They do not know your environment, your appliances, your acceptable misspellings, your sector specific argot. The contextual filtering has to be done in house.

A reasonable vendor strategy is to use one vendor for breadth across Tor and clear web forums, and to invest the team's own time in Telegram, in persona work, and in the in house contextual filter. Buying three vendors that cover overlapping surfaces produces three times the noise, not three times the signal.

The point

Dark web monitoring is a useful CTI capability if the program runs on tradecraft instead of on tools. Define a written collection plan tied to PIRs. Cover all four surfaces, not only Tor. Treat persona work as a small disciplined program of its own. Filter aggressively for intent and credibility. Act on verified hits with a small written playbook. Use vendors for breadth and your team for depth. The output is a small number of high value findings per quarter and a clear picture of the actors and offers most relevant to your organization.

Share this post