Gotcha10 Jun 2026· 5 min read

SIT confidence levels do not mean what you think

Information ProtectionData Loss Prevention

Setting a pattern to high confidence does not make detection stricter. Confidence is a label you assign, not a quality score Purview calculates. Here is what actually controls detection quality, and how to use confidence properly.

The myth

Most people building a custom sensitive information type assume the confidence setting works like a sensitivity dial: set the pattern to high confidence and Purview detects more carefully, with fewer false positives.

It does not. Confidence is metadata you assign to a pattern, not a quality score Purview calculates.

Define a pattern with a single keyword as the primary element, no supporting elements, no validation, and mark it high confidence. Purview will happily report every stray mention of that keyword as a high confidence match. Nothing checks whether the pattern deserves the label.

Credit where due: Tommy Nielsen's LinkedIn article on this misconception is what prompted the Custom SIT Builder.

What actually controls detection quality

Detection quality comes entirely from pattern design. A SIT pattern has four parts:

Primary element. The trigger: a regex, keyword list, or keyword dictionary. If this does not match, nothing happens.

Supporting elements. Corroborating evidence: context words or a second regex that validates the primary match means what you think it means.

Character proximity. How close, in characters, the supporting evidence must sit to the primary match. And it is a hard window: the entire supporting element must be inside it. A keyword 350 characters away fails a 300 character window, even if it partially overlaps.

Additional checks. Validation on the match itself: Luhn and custom checksums, excluded test values, required prefixes, duplicate digit exclusions.

A keyword-only pattern marked high confidence has none of this. It is a noise generator with a reassuring label.

And if you need proof that confidence is just a label, look at Microsoft's own SITs: crack one open in the SIT X-Ray and you will find patterns at 55 and 95, values the portal dropdown never offers you. The IP Address SIT matches at 95 when an ip keyword is nearby, and the portal's own Test feature quietly reports it as plain high.

Highest match wins. No averaging.

The second misconception is that confidence somehow aggregates across patterns. It does not. At runtime Purview:

Evaluates every pattern in the SIT independently
Finds which ones matched
Returns one result: the highest confidence of any matched pattern

No averaging, no weighting, no rollup. The practical consequence: one weak high confidence pattern dominates your results. You can build a beautifully validated medium pattern, but if a lazy keyword-only high pattern sits next to it in the same SIT, every document containing that keyword reports as a high confidence match, and your DLP rules act on it.

Instance count and the cosmetic setting

Two more things people get wrong.

Instance count is the number of distinct complete pattern matches. Not keywords found, not digits found. One instance means the primary matched, the supporting evidence was within proximity, and every check passed. Ten keyword mentions with two validated matches is a count of 2. And it counts unique values: the same card number appearing five times in a document is still one instance (that is how Purview counts them).

The recommended confidence level does nothing for detection. That last question in the SIT wizard only sets the default selected when someone adds the SIT to a rule. It does not change how content is scanned, how patterns are evaluated, or what matches.

How to use confidence properly

Stop thinking of confidence as accuracy and start thinking of it as a routing mechanism for policy actions. A well-designed SIT uses graduated patterns:

Low confidence: primary element only, broad detection, expect noise. Useful in simulation to spot false negatives.
Medium confidence: primary plus supporting keywords within a generous window (around 300 characters).
High confidence: primary plus supporting evidence in tight proximity (50 to 100 characters) plus validation checks.

Then your DLP rules route by level: audit on low, notify on medium, block on high. You are not tuning sensitivity, you are designing graduated enforcement.

And keep the low and medium patterns even if your rules only act on high. When a document almost matches high but fails on proximity or a supporting element, the lower tiers show you, and that is exactly the knowledge you need to tune the pattern.

Build graduated patterns and watch the myths break in the live tester, including a demo for each misconception in this article.

Try the Custom SIT Builder

Plan this in a tool

Free planners to design and test this before you deploy. No login.

DLP Planner Inline Web DLP Planner Label Taxonomy Builder

GotchaMicrosoft lets you use Purview features you aren't licensed for GotchaFive things Purview's SIT engine does that the docs never mention GotchaYour auto-labeling policy is not labelling anything