Skip to content
← Back to Learn
Guide10 Jun 2026· 5 min read

Building a custom trainable classifier: what it actually takes

Information ProtectionData Lifecycle Management

A custom trainable classifier is not a configuration task, it is a small machine learning project. The real sample counts, the timeline, and the one-way door at publish time that Microsoft buries in the docs.

When a trainable classifier is the right tool

Sensitive information types detect data that follows a pattern. Trainable classifiers detect content by what it is: a contract, a financial statement, source code, a CV. No regex describes a merger agreement.

Check the [pretrained catalog](/tools/sit-explorer) before building anything. There are dozens of ready-made classifiers covering legal, finance, HR, and IT categories, and they support multiple languages. A custom classifier is only the answer when the category is genuinely specific to your organisation - your deal memos, your pricing sheets, your design documents.

One scoping fact that surprises people: custom trainable classifiers only support English content.

The numbers that define the project

Training needs two sets of examples, chosen by a human:

  • 50 to 500 positive samples that strongly represent the category
  • 150 to 1,500 negative samples that clearly do not belong

The classifier processes up to the 2,000 most recently created samples. For testing, Microsoft recommends at least 200 items: 50 or more positive, 150 or more negative.

Collecting these samples is most of the project. The best negative samples look superficially similar to the positives - same teams, same templates, same vocabulary - but are not the category. Garbage examples produce a garbage classifier, and you will not find out until weeks in.

The mechanics

The seed content goes into dedicated SharePoint folders: one for positives, one for negatives, containing nothing else. Use a Communication site or other SharePoint site type, not a Teams folder. If the site or folders are new, allow at least an hour for indexing before you create the classifier.

Then create the classifier in the Purview portal under Data classification, point it at the positive folder and the negative folder, and wait. Processing takes up to 24 hours. Automated testing (currently in preview) has shortened the end-to-end workflow from around 12 days to about two, sometimes hours.

One quirk to plan around: by default, only the account that creates the classifier can train it and review its predictions. Pick the owner deliberately.

The one-way door at publish

Here is the constraint that should shape your whole project plan: retraining a published custom classifier is not supported.

Before publishing, you can improve a classifier freely - review the test predictions, add more seed data, restart training, repeat. After publishing, that door closes. If production accuracy disappoints, the only path is to remove the classifier and start over with larger sample sets.

So treat the pre-publish review as the real quality gate. Work through the predictions one by one, and do not publish a classifier you have doubts about because 'we can tune it later'. You cannot.

Where you can use it, and where you cannot

A published custom classifier becomes a condition in auto-labelling with sensitivity labels, auto-apply retention label policies, and DLP.

Two gaps to know about:

Communication Compliance only supports the Microsoft-provided classifiers. If the business case was supervising messages for a custom category, stop now.

Classifiers do not evaluate encrypted items. Content already protected with encrypting sensitivity labels is invisible to them - worth remembering when you sequence a labelling rollout.

When the classifier goes live, run the consuming policies in simulation mode first and review what it actually matches. The classifier passed its tests on the content you chose; production will show you the content you did not think of.

The full lifecycle, a decision guide across all five classifier types, and an exportable project checklist.

Open the Trainable Classifier Lab

Plan this in a tool

Free planners to design and test this before you deploy. No login.