Unico Connect

Built an intelligent PHI detection and redaction platform for DICOM medical imaging that preserves diagnostic quality while removing patient identifiers

An AI-powered platform that automatically identifies and redacts protected health information from DICOM files — including burned-in text and metadata that standard anonymisation tools miss — with regulatory compliance and diagnostic image quality both preserved.

IndustryHealthcare / AI
Country🇪🇺 Europe
Diagnostic QualityPreserved
DeploymentIn-network

Key Takeaways

A healthcare technology provider working with medical imaging data came to Unico Connect to solve a problem standard anonymisation tools could not. DICOM files carry protected health information in burned-in text, image regions and metadata, and existing tools missed enough of it to make automated workflows unsafe.

We built an intelligent PHI detection and redaction platform that handles each of those PHI surfaces automatically while preserving the diagnostic information clinicians need — removing manual redaction effort and reducing compliance risk.

DICOM PHI detection key screens

The Challenge

Medical imaging operates under a strict regulatory boundary. Patient data — from a name embedded in a DICOM header to a date of birth burned into an X-ray corner — must be removed before images can be shared, used for research, or fed into downstream AI training pipelines. The boundary is not negotiable, and the penalty for getting it wrong is significant. The problem is that doing it correctly, at scale, is much harder than it looks.

Standard anonymisation tools handle DICOM metadata reasonably well — the structured fields where patient name, ID and dates live. What they tend to miss is everything else: identifiers burned into the image itself as machine overlays or clinician annotations, text in image regions that looks like imagery, and identifiers in private DICOM tags that vary by manufacturer. The manual workaround — someone reviewing each file and redacting what the tool missed — is time-consuming, error-prone and impossible to scale.

01PHI beyond metadata (burned-in text, private tags)
02Manual redaction does not scale
03Diagnostic image quality must be preserved
04No cloud — must run inside the customer network

What made the engagement interesting was that the problem was not one technique applied well. It was several techniques — computer vision for burned-in text detection, structured anonymisation for metadata, vendor-specific handling for private tags — working together with the right thresholds and the right human override paths. The client had explored building this in-house and decided the work was specialised enough to warrant a dedicated engineering partner.

Our Approach

Designing the DICOM PHI detection pipeline

We took the engagement on as a focused engineering build, with the team structured to combine machine learning depth with medical imaging domain experience. The first phase was understanding the actual PHI surfaces the client’s customers were dealing with, which varied across imaging modalities and manufacturers. We surveyed the real file set rather than the documented one, because the documented one rarely tells the whole story.

Key decisions:

01.

Three layers for three PHI surfaces

Structured anonymisation for metadata and private vendor tags, a computer-vision pipeline for burned-in text, and a human-in-the-loop path for ambiguous annotations.


02.

Diagnostic quality as a hard constraint

The CV pipeline is conservative around regions carrying diagnostic information, biased toward referring to a human reviewer rather than making a destructive automatic edit.


03.

Network-isolated deployment

The platform runs within the customer’s network against their storage — medical imaging data never leaves their environment, which cloud-only tools cannot offer.

The solution we built

The platform processes DICOM files end-to-end across three integrated workflows, with the imaging data never leaving the customer’s environment and a complete decision record captured for every file.

Metadata anonymisation

Standard anonymisation against the public DICOM tag dictionary plus configurable handling of private vendor-specific tags, tightened against the real private-tag inventory.


Computer-vision redaction

Scans each image for burned-in text, classifies it as PHI or diagnostic content using position, format and context, and redacts identifiers while leaving diagnostics untouched.


Human-in-the-loop review

When the classifier is uncertain, the file is routed to a reviewer rather than redacted automatically — the safeguard that keeps diagnostic content safe.


Audit trail and in-network deployment

A per-file record of what was detected, redacted, reviewed and decided, in a unit that deploys inside the customer network with operational interfaces for their teams.

DICOM PHI detection — approach and solution
DICOM PHI detection platform

Tech stack

Outcomes & impact

Compliant

Complete per-file audit trail, defensible in a compliance review

Preserved

Diagnostic image quality across processed files

In-network

Deploys within the customer network; imaging data never leaves

Have a healthcare data or compliance-sensitive platform that needs the same kind of lift?

Talk to an Expert

Frequently Asked Questions

We built an intelligent PHI detection and redaction platform for DICOM medical imaging. It automatically identifies and removes protected health information from DICOM files (header metadata, burned-in text, private vendor tags) while preserving diagnostic image quality and producing a complete audit trail.

Standard tools handle the structured metadata in DICOM headers, but they miss PHI burned into images, in private vendor tags, or in regions that look like imagery but contain text. Our platform handles each of these surfaces in a single automated workflow with human-in-the-loop review on uncertain cases.

Yes. The computer vision pipeline is tuned to be conservative around image regions that carry diagnostic information, with uncertain cases routed to a human reviewer rather than redacted automatically. Diagnostic quality is preserved across processed files.

Within the customer’s network. Medical imaging data does not leave the customer’s environment. The platform is designed for deployment in compliance-sensitive contexts where cloud-only architectures are not viable.

A complete per-file record covering what PHI was detected, what was redacted, what was sent for review, who reviewed it and what they decided. This is what makes the platform defensible in a compliance review.

Yes. The platform handles DICOM files across imaging modalities (radiology, dermatology, ophthalmology and others) and across manufacturer variations in private tags. The CV pipeline was tuned against actual file inventories rather than documented standards.

Yes. Healthcare AI, computer vision and compliance-aware platform design are an established part of our portfolio. The engagement covers ML pipeline development, DICOM-native processing and network-isolated deployment.

Yes. The platform is in production use, processing DICOM files across the client’s customer base.

Related insights

View All

Let's Build Your Team

Tell us about your data, the compliance constraints you operate under and where you want the platform to be in twelve months. We'll get back within one business day with a plan and next steps.

Prefer to book directly?

🗓️ Schedule on Calendly →

For more information about how we handle your personal information, please visit our .privacy policy.