← Back to Learn
Discussion14 Mar 2026· 5 min read

Pay-as-you-go Purview features - are they worth turning on?

Information ProtectionData Loss PreventionInsider Risk Management

Microsoft Purview has a growing list of features that bill through Azure consumption instead of your E5 licence. On-demand classification, OCR, and collection policies fill real gaps - but they can also generate unexpected costs if you turn them on without scoping carefully.

The PAYG layer most people ignore

Your E5 licence covers a lot. Information protection, DLP, Insider Risk, eDiscovery. But there are gaps - and Microsoft has been filling them with pay-as-you-go features that bill through Azure.

Three matter most right now:

Each one closes a genuine blind spot. Each one has consumption-based billing that can surprise you if scoped too broadly.

On-demand classification

The gap: Auto-labelling is event-driven - it only evaluates files when they are created or modified. Anything sitting untouched in SharePoint for years has never been classified. With Copilot now surfacing everything a user has access to, that unclassified data is a live risk.

What it does: Targeted scans against SharePoint, OneDrive, and endpoints. You define the scope, pick your classifiers, and Purview estimates the cost before you commit. Billed per 10,000 assets scanned.

What to watch out for:

  • This is not a quiet background scan. Every scanned file gets evaluated against all your active DLP, Information Protection, Data Lifecycle, and Insider Risk policies. A broad scan on historical data can trigger thousands of alerts, apply labels, start retention clocks, and generate IRM signals - all on documents users have not touched in years
  • Files in cold storage are invisible. SharePoint moves inactive content to cold storage automatically. On-demand classification cannot reach it, and you have no visibility into what is cold vs hot. Your oldest, most dormant files - the ones most likely to be unclassified - may be silently skipped
  • Scans are capped at 50,000 locations and 20 million files. Large tenants need multiple scans
  • Content Explorer lags up to seven days behind the scan

Verdict: Worth it if you have years of unclassified content and are preparing for Copilot. Start with your highest-risk sites, not all of SharePoint.

OCR

The gap: Without OCR, someone photographs a screen with credit card numbers and DLP sees nothing. Scanned contracts, ID cards, receipts - all invisible to classification. This is one of the easiest DLP bypasses and users do not even need to be malicious.

What it does: Extends your existing SITs, trainable classifiers, and document fingerprints to scan text inside images. Works across Exchange, SharePoint, OneDrive, Teams, and endpoints. No new classifiers needed.

What it costs: $1 per 1,000 images. Each PDF page counts separately. 2,500 images per month free. Purview caches results - Exchange logos cached for 5 days, endpoint images for 30 days.

What to watch out for:

  • The cost estimator can only be run once per tenant, ever. You get one 30-day window within 90 days. Dashboards reset if you restart. Reports are deleted after 90 days. Plan your scope before you enable it - you will not get a second chance
  • The estimator has blind spots. It cannot estimate images in PDFs for SharePoint and OneDrive, only Exchange and Teams. Your actual bill may be higher than the estimate
  • Volume adds up. Enable OCR across all locations and every image in every email, every screenshot on every device gets scanned

Verdict: Essential if you run DLP on endpoints or Exchange. Start there, use the estimator carefully, expand later.

Collection policies

The gap: Enable endpoint monitoring or expand to non-M365 sources and you start ingesting a firehose of events. Every file copy, print, and cloud upload. Activity Explorer fills with noise. Insider Risk processes signals you do not care about.

What it does: Collection policies filter which events from which data sources get ingested into Purview. A policy is built from three parts:

  • Conditions - what data to detect. Content containing specific classifiers, file extensions, or document size thresholds. If you add no conditions, devices collect everything while other sources default to classifier matches only
  • Activities - what user actions to capture. For devices, there are 19 activities including file copied to USB, printed, uploaded to cloud, transferred by Bluetooth, and file deleted. For cloud and AI sources, it covers text and files sent to or received from unmanaged apps
  • Data sources - where to apply the policy. Devices, Copilot experiences, Enterprise AI, unmanaged cloud apps (via browser or network detection), and adaptive app scopes for AI apps

Multiple collection policies for the same data source are merged on the back end into a single effective policy. All conditions across all policies for that source are combined during evaluation.

Where it matters most:

  • Filtering endpoint noise. If "Always audit file activity" is enabled, all Office, PDF, and CSV activity is collected by default. Collection policies let you narrow that to only events involving sensitive content
  • AI governance. For generative AI data sources, you can capture and store all prompts and responses - or only those containing sensitive information. Without content capture enabled, you only see sensitive information matches, not the full interaction. To capture everything, the "Content contains classifiers" condition must be set to All
  • Network visibility. For unmanaged cloud apps like ChatGPT, Dropbox, or Slack, collection policies work with browser detection (Edge for Business) or network detection (via a SASE provider integration) to monitor data shared outside your trust boundary

What to watch out for:

  • Collection policies override Insider Risk indicators. If your collection policy filters out an activity type, IRM cannot see it - even if the indicator is enabled. This only applies to device indicators, but it catches teams off guard. Audit your IRM indicators against your collection policy filters before deploying
  • No conditions on devices means everything gets collected. Other data sources default to classifier matches only, but devices collect all data regardless. Be explicit with your conditions
  • Some data sources are PAYG. Copilot experiences, Enterprise AI, and unmanaged cloud app activity detected via network data security all require an Azure subscription and bill through consumption

Verdict: Necessary if you have enabled Always Audit on endpoints, are expanding to non-M365 sources, or need to govern AI interactions. Without collection policies, you either collect everything and drown in noise, or collect nothing from sources outside M365.

The bottom line - it comes down to risk appetite

Microsoft knows these gaps exist. Auto-labelling does not touch cold data. DLP cannot read images. Activity Explorer ingests everything without filtering. They have put a consumption cost on closing them.

The FAIR framework offers a useful lens here. Instead of asking "should we turn this on?" ask "what is the financial exposure if we do not?" 500,000 unclassified files in SharePoint with Copilot about to surface them - the cost of an on-demand scan is trivial against a data exposure incident. DLP that cannot see text in scanned documents - the cost of OCR is a rounding error against a regulatory fine.

Whether you turn them on comes down to risk appetite. For most organisations deploying Copilot or handling regulated data, the gaps are too large to ignore.

Scope tightly. Estimate first. Set Azure budget alerts. Expand gradually.

Browse all 400+ built-in classifiers and plan which ones you need.

Try the Data Classifier Explorer

Comments

No comments yet. Be the first to share your experience.