Mammography Image Preprocessing
The following describes the preprocessing pipeline applied to all mammography images prior to model training and evaluation.
1. Data Format Conversion
- All mammograms were originally stored in DICOM format
- Converted to 16-bit grayscale PNG images
- This conversion preserves high-intensity resolution while enabling efficient processing in deep learning pipelines
2. Image Preprocessing Pipeline
All images were preprocessed following the same protocol:
- Background removal to eliminate irrelevant black borders and artifacts
- Breast region segmentation to isolate the region of interest and remove non-breast content, including annotations and text commonly present in mammograms
- Resizing to a fixed resolution of 1664 × 2048 pixels
- Intensity normalization to standardize pixel value distributions across scans
3. Dataset Split
- Data split was performed at the patient level to avoid data leakage
- Splits were defined as:
- Training: 50%
- Validation: 20%
- Test: 30%
4. Summary
This preprocessing pipeline ensures:
- Consistent image resolution and intensity scaling
- Removal of irrelevant background and annotation artifacts
- Proper separation of patient data across splits
- Compatibility with deep learning-based mammography analysis models