Evaluate and Ranking
For the first phase (validation phase), the participants are required to submit the output of their algorithms as a single compressed zip file to the organization team (segrap2025@163.com). The submitted zip files should be formatted like the one below. Make sure the results in the submitted zip file all be matched with the validation cases one-to-one (2 classes for Task01 and 6 classes for Task02). Otherwise, the results are considered invalid submissions and no score will be generated.
- team_name/
- segrap_0001.nii.gz
- ......
- segrap_xxxx.nii.gz
- Task0*_results/
Note: A maximum of 5 submissions are allowed during the validation stage to ensure fair evaluation and prevent overfitting.
For the second phase (test phase), participants must prepare and submit their Docker containers along with a short paper outlining their method before 31st August 2025.
Docker container:
- The Docker submission tutorial can be found here.
- Memory constraint: GPU memory usage less than 24 GB, CPU memory usage less than 64 GB
- Execution time constraint: no more than 3 minutes per case
- Once you successfully build your Docker container, save it to a zipped file 'Task_{Task_id}_{TeamName}.zip' and upload to cloud platform such as Google Drive and BaiduNetDisk. To submit your algorithm we ask you to send us the download link and step-by-step command to segrap2025@163.com with the Subject 'SegRap2025 Testing Phase Submission - Task_id TeamName Docker Container'.
Short paper: please provide a description of your model highlighting the main features using Template Latex or MS Word. Basic information and the description of the model must include the following details:
- Team name, Team members (maximum three) (names, emails, affiliations)
- Framework (ie, MONAI, nnUNet, etc.)
- Model architecture
- Number of layers
- Convolution kernel size
- Initialization
- Optimizer
- Cross-validation used?
- Number of epochs
- Number of trainable parameters
- Learning Rate and schedule
- Loss Function
- Dimensionality of input/output (ie, 2D, 3D, 2.5D, etc.)
- Batch Size
- Preprocessing steps used (data normalization, creation of patches, etc.)
- Data Augmentation steps (rotation, flipping, scaling, blur, noise, etc.)
- Pretrained model used? (allowed, but it needs to be publicly available)
- Number of models trained for final submission
- Post-Processing Steps (ensemble network, voting, label fusion, etc.)
- Clearly state which aspects are original work (if any) or already existing work
- Include relevant citations, as well as if existing code/software libraries/packages were used
- Which cases were included in the training and testing (all cases, only labeled cases, only paired cases, etc.)
- Training/validation/testing data splits
- Hyperparameter tuning performed
- Training time
Two classical medical segmentation metrics: Dice Similarity Coefficient (DSC), and normalized surface dice (NSD), will be used to assess different aspects of the performance of the segmentation methods.
Firstly, for each cohort (set), we calculate the average DSC and NSD across all the patients for each class, respectively. Secondly, each participant will be ranked based on the class-level DSC and NSD, with 2 × 2 or 6 × 2 rankings. Then, rankings for all classes are averaged in each cohort (set). Finally, rankings for all cohorts (sets) are averaged and then normalized by the number of teams. At the same time, we will take the statistical ranking. (Allow equal teams if there is no significant difference).
In addition, if the submissions have some missing results on test cases, the corresponding class's DSC and NSD will be set to 0 and 0 for ranking. For example, a test case missed a class and the ranking value of this class in average DSC and NSD will degrade.