Evaluate and Ranking
For the first phase (validation phase), the participants are required to submit the output of their algorithms as a single compressed zip file to the organization team. The submitted zip files should be formatted like the one below. Make sure the results in the submitted zip file all be matched with the validation images one-to-one, which contains 2 predictions for task1 and 6 predictions for task2 for each case. Otherwise, the results are considered invalid submissions and no score will be generated.
- results/
- SegRap_0001
- GTVnx.nii.gz
- GTVnd.nii.gz
- …
- GTVnx.nii.gz
- GTVnd.nii.gz
- SegRap_xxxx
- GTVnx.nii.gz
- GTVnd.nii.gz
Note: The participants are allowed to submit the results once per week. A maximum of 5 submissions are allowed during the validation stage to ensure fair evaluation and prevent overfitting.
For the second phase (test phase), participants must prepare and submit their Docker containers along with a short paper outlining their method.
Two classical medical segmentation metrics: Dice Similarity Coefficient (DSC), and normalized surface dice (NSD), will be used to assess different aspects of the performance of the segmentation methods.
Firstly, for each organ, we calculate the average DSC and NSD across all the patients respectively. Secondly, each participant will be ranked based on the organ-level DSC and NSD; each participant will have 2 × 2 or 6 × 2 rankings. Finally, average all these rankings and then normalize them by the number of teams. At the same time, we will take the statistical ranking. (Allow equal teams if there is no significant difference).
In addition, if the submissions have some missing results on test cases, the corresponding organ's DSC and NSD will be set to 0 and 0 for ranking. For example, a test case missed an organ and the ranking value of this organ in average DSC and NSD will degrade.