← Back to selected work

Case study · Healthcare privacy

PHI Sanitizer — Glendor

Automatic, on-prem de-identification of Protected Health Information across medical images, pathology reports, video, photos, and audio — for clinical research and AI development.

🩺

Background

Glendor builds privacy software for clinical research. Their PHI Sanitizer takes any medical asset — DICOM images, pathology reports, surgical videos, patient photographs, audio recordings of dictations — and automatically redacts every trace of Protected Health Information so the asset can leave the originating hospital safely.

What makes this hard

PHI lives in two very different places at once:

Pixel data. Patient stickers stuck onto imaging, scribbled annotations on radiographs, monitor screens visible in surgical video, faces in photos, voices in audio recordings.

Metadata. DICOM headers, EXIF, file names, embedded text, free-text fields buried in study manifests.

Existing solutions handle one or the other, often by asking the operator to define templates or filters per modality. PHI Sanitizer's promise is that none of that is required — drop in raw assets, get back de-identified ones, no tuning.

Engineering work

Multi-modal pipeline. Each modality has its own detection front-end — OCR for stamped text, face detection for photos, ASR plus NER for audio, classical computer-vision techniques for monitor screens visible in surgical recordings — and a unified back-end that performs the actual redaction (pixel masking, blur, audio bleeping, metadata stripping).

On-prem only. No customer data leaves the hospital network. The product ships as a single installer the IT team runs on their own infrastructure — no Business Associate Agreement required from Glendor's side, because Glendor never touches the data.

Robustness over freshness. Healthcare data is messy, old, and inconsistent. The pipeline is conservative: it errs on the side of over-redacting rather than missing PHI, and gives the operator a confidence-scored review queue for borderline cases.

One-minute installation. Designed so a hospital IT team can stand it up without a deployment engineer on the call. No external services, no kubernetes manifests, no secrets manager — just an installer.

Outcome

Used in clinical research workflows that previously required either expensive manual de-identification or lawyer-mediated data-use agreements. Researchers get usable assets in minutes; hospitals get a tool that doesn't expand their compliance surface; Glendor gets a product that scales without scaling their data-handling responsibilities.

Technologies

ML / NLP OCR Image processing Audio processing DICOM Python OpenCV On-prem deployment HIPAA workflow
Have a similar project? Let's talk