MICCAI 2020 RibFrac Challenge:¶
Rib Fracture Detection and Classification¶
1. Dataset Overview¶
Figure. Illustrations of CT examples
This challenge establishes a large-scale benchmark dataset to automatically detect and classify around 5,000 rib fractures from 660 computed tomography (CT) scans, which consists of 420 training CTs (all with fractures), 80 validation CTs (20 without fractures) and 160 evaluation CTs. Each annotation consists of a pixel-level mask of rib fracture regions (for serving detection), plus a 4-type classification. Both detection and classification tasks are involved in this challenge.
Table. Data split
2. Tasks¶
Task 1: Detection¶
In this task, the participants are expected to detect the rib fractures from CT scans. However, due to the elongated object shape of the targets (rib fractures), the detection task is required to be implemented in an instance segmentation fashion. Instance segmentation masks are provided from training the models. The evaluation is based on FROC analysis for detection.
Figure. Illustrations of annotations
Training and validation cases have instance annotations of rib fractures. Each rib fracture instance has a voxel-level mask of fracture region by the radiologists, therefore an instance segmentation problem is formulated. However, due to the ambiguity of fracture region, the instance masks tend to be noisy. For this reason, the segmentation prediction is only used for computing the overlap for the detection metric.
Evaluation of Task 1: Detection¶
Please refer to the official evaluation code for the details of evaluation.
The evaluation of the detection performance is based on Free-Response Receiver Operating Characteristic (FROC) analysis, an evaluation approach balancing both sensitivity and false positives. The FROC analysis is reported with sensitivities at various false positive (FP) levels. We use *the average of FP= 0.5, 1 , 2, 4, 8 *as the overview challenge metric for the detection task.
For each detection proposal, it is regarded as a hit when overlapped with IoU> 0.2 between any rib fracture annotation. Please note that for objects with elongated shape, the IoU tended to vary, which is the reason why we chose IoU> 0.2 as the detection hit criterion.
Task 2: Classification¶
In this task, the participants are expected to classify the detected rib fractures into 4 clinical categories (buckle, nondisplaced, displaced or segmental rib fractures).
Figure. Illustrations of the 4 categories of rib fracture
Buckle Rib Fracture: Although buckle fracture is a common entity in pediatric patients across a variety of bones, it is not only a pediatric phenomenon. Because of its importance in forensics, buckle fracture has been identified in adult trauma patients in at least two case series, these fractures are commonly missed at imaging.
Nondisplaced Rib Fracture: Detection of nondisplaced rib fractures at radiography is difficult, and these injuries may be seen radiographically only at follow-up imaging, after signs of healing have manifested. Because no cortical offset occurs, there may be no direct signs of nondisplaced fracture at radiography, and the radiologist should look for associated injuries.
Displaced Rib Fracture: When cortical disruption and a substantial abnormality in alignment are evident, a rib fracture is classified as displaced. Displacement may be minimal or obvious. Injury to the surrounding tissues and structures can occur, and several lethal complications have been documented in the literature.
Segmental Rib Fracture: Segmental fractures are high-grade injuries with at least two separate complete fractures located in the same rib. Segmental fractures may remain anatomically aligned but often become partially or significantly displaced at one or both fracture sites. Segmental rib fractures affecting three or more contiguous rib levels are associated with increased risk for flail chest. This remains a clinical diagnosis. True flail segment will reveal paradoxical respiratory motion, wherein the affected chest wall segment retracts inward during inspiration and balloons outward during expiration.
Evaluation of Task 2: Classification¶
Please refer to the official evaluation code for the details of evaluation.
The evaluation of the classification is based on the Macro-average F1 of the classification predictions and ground truth labels.
The confusion matrix will be 5 rows ("Buckle", "Displaced", "Nondisplaced", "Segmental", "FN") x 6 cols ("Buckle", "Displaced", "Nondisplaced", "Segmental", "FP", "Ignore"), that is to say:
- By summing up the all rows, you get 6 cols for ground truth, where FP denotes false positive detection, and Ignore denotes undefined labels (they will be ignored in the calculation),
- By summing up the all cols, you get 5 rows for prediction, where FN denotes false negative, i.e., the missing hit.
Then we use the Macro-average F1 by the 4 categories ("Buckle", "Displaced", "Nondisplaced", "Segmental").
Please note that there are 3 classification F1 metrics, which measures the different aspects of the classification system:
- Overall F1: It evaluates the overall classification performance integrated with the detection system.
- Target-Aware F1: It evaluates classifcation performance on the classification targets (excluding all detection FPs).
- Prediction-Aware F1: It evaluates classifcation performance on the classification predictions (excluding all detection FPs and FNs).
Note that although the performance of Task 2 is partially dependent on that of Task 1, we award the 2 tasks separately. The winner of Task 1 could submit results for Task 1 only, but a well-performing Task 1 algorithm is an advantage for Task 2.
This work is licensed under
a Creative Commons Attribution-NonCommercial 4.0
International License.