HALO-UNet: How I Built a Lightweight Medical AI That Runs on Almost Nothing

The Problem: Medical AI That Can't Go Anywhere

Most medical AI research is built to impress on benchmark leaderboards, not to run in a Ghanaian district hospital with one aging GPU and inconsistent power supply.

When I started working on thyroid nodule segmentation for my BSc thesis at KNUST, the gap was obvious. State-of-the-art models like TransUNet achieved impressive Dice scores — but demanded compute that doesn't exist in most African clinical settings. A tool that can't deploy isn't a tool. It's a paper.

My goal was simple: build something that works well enough to be clinically useful, and runs fast enough to actually be deployed.

The Architecture: Multistage + Hardware-Aware

HALO-UNet is built around three ideas:

1. Lightweight multistage design. Rather than a single deep encoder-decoder, HALO-UNet processes ultrasound images through staged feature extraction with progressive refinement. Each stage captures coarser-to-finer representations of the nodule boundary without stacking layers until the model can't fit in memory.

2. Hardware-aware optimization pipeline. After training, the model goes through a three-phase optimization:

Dynamic preprocessing to normalize inconsistent ultrasound acquisition artifacts
Structured pruning to remove redundant filters
Post-training quantization (INT8) to cut memory footprint and inference latency

3. Validated on real clinical data. I trained on two public datasets (TN3K and DDTI) covering over 3,000 ultrasound images, then validated on 150+ real patient scans. This is where most papers stop at the public benchmark — but clinical relevance means testing on the messy, inconsistent data from actual imaging equipment.

Results

The numbers came out better than I expected:

Model	Dice Score	IoU	Inference Time
Baseline UNet	62.3%	50.1%	180ms
DeepLabV3+	66.1%	53.7%	210ms
TransUNet	68.9%	57.2%	340ms
HALO-UNet	71.0%	58.4%	52ms

The inference time improvement was the real win. More than 70% faster than TransUNet while achieving better Dice — on the same hardware. That's what makes it deployable.

Deployment

After defending the thesis, I deployed HALO-UNet to Hugging Face Inference Endpoints:

from huggingface_hub import InferenceClient

client = InferenceClient("ransjnr/halo-unet")
result = client.image_segmentation("thyroid_scan.jpg")

The public endpoint means any clinician or researcher can test it without setting up infrastructure. Try it at huggingface.co/spaces/ransjnr/halo-unet.

What I Learned

The biggest lesson wasn't technical — it was about the difference between research and engineering. A model that gets one more percentage point on a benchmark but needs 3x the compute is worse than a simpler model in any context where resources are constrained.

Africa doesn't need to wait for affordable compute to use medical AI. We need AI designed for the compute we have.

The next phase of this work is multi-hospital validation across Ghana and integration with point-of-care ultrasound devices. If you're a radiologist, clinician, or research institution in Africa and want to collaborate, reach out.