Number of Modalities Across the Dataset
|
Patients |
CT |
MRI |
WSI |
Genomics |
Clinical |
Train |
497 |
189 |
35 |
497 |
361 |
497 |
Test |
121 |
59 |
13 |
121 |
101 |
121 |
Total |
618 |
248 |
48 |
618 |
462 |
618 |
Patients' Clinical and Genomic Data
Here you can access the metadata with the train/test split (from CPTAC and TCGA repositories).
Download the Multi-Modal Data
Download WSIs
Below you have the links where you can access the original dataset both from TCGA (XXGB) and CPTAC ('Tissue Slide Images': with 190GB) and the CSV containing the selected files that we used. Be aware that for the TCGA we provide the manifest file that you can use directly with the GDC Data Transfer tool.
Download the CT and MRI images
Below you can access the CT and MRI images from the same patients both from TCGA ('Images': 91.56GB) and CPTAC repositories ('Radiology Images': 56.58GB), as well as the CSV with the filenames and IDs that we used in our repository.
We discarded the localization scans, the pre-contrast ones, and scans that were acquired with a significant time lapse from the diagnosis (years). We also removed the scans from the coronal and sagittal views, to minimize
domain shifts. We reduced the number of volumes to 736 for CT and 552 for MRI.
Use Our Features
If you don't want to download the original images, you can use our features.
Access the CT/MRI/WSI Features
Here you can access the different folders containing the multi-modal features:
Per Patient Files (Selected Using MIL)
Access the CT/MRI/WSI Features
Several patients presented more than one CT/MRI scan and WSI image. We opted to reduce the amount of data used by our multi-modal system to a single CT, MRI, and WSI per patient.
We implemented a novel patient-level MIL framework that automatically selects the best imaging modalities from the available pool.
Here you can access the different CSV files containing the selected files per patient that were used in our paper:
Model Hyperparameter Settings
The following tables provide a detailed summary of the hyperparameters and configurations used for various model architectures in our study. Each table specifies the parameters applied for different data types and model configurations.
MIL Models
This table summarizes the hyperparameters for the Multiple Instance Learning (MIL) models, including specific settings for CT, MRI, and Pathology data. Key hyperparameters such as epochs, architecture, learning rates, optimizers, and validation metrics are outlined for each modality.
Table A.1: MIL Models Hyperparameters
Models |
MIL CT |
MIL MRI |
MIL Pathology |
Epochs |
60 |
60 |
100 |
Architecture |
3 FC |
3 FC |
4 FC |
Hidden Sizes |
[256,128] |
[256,128] |
[512, 256, 128] |
Initial LR |
1e-3 |
1e-3 |
1e-3 |
LR Scheduler |
Step |
Step |
Step |
LR Settings |
Step size = 30 Gamma = 1e-2 |
Step size = 30 Gamma = 1e-2 |
Step size = 30 Gamma = 1e-2 |
Optimizer |
SGD |
Adam |
AdamW |
Oversample |
8x |
16x |
8x |
Base Models
Table A.2 provides the hyperparameter settings for base models across different modalities and configurations, including CT, MRI, Pathology, Clingen, and multi-modality approaches like Weighted Sum, Learn Weights, Mean, and Concat. Each model has its specific optimizer, learning rate schedule, batch size, and other settings detailed in this table.
Table A.2: Base Models Hyperparameters
Models |
Base CT |
Base MRI |
Base Path |
Base Clingen |
Weighted Sum |
Learn Weights |
Mean |
Concat |
Epochs |
60 |
60 |
60 |
60 |
60 |
60 |
120 |
120 |
Architecture |
3L |
3L |
3L |
3L |
3L |
3L |
5L |
5L |
Hidden Sizes |
128 |
128 |
128 |
128 |
128 |
128 |
128 |
128 |
Initial LR |
1e-3 |
1e-3 |
1e-4 |
1e-3 |
1e-3 |
1e-3 |
1e-3 |
1e-3 |
LR Scheduler |
Cosine |
Cosine |
Const |
Const |
Const |
Cosine |
Cosine |
Cosine |
LR Settings |
Default |
Default |
None |
None |
None |
Default |
Default |
Default |
Optimizer |
AdamW |
SGD |
SGD |
AdamW |
AdamW |
AdamW |
AdamW |
SGD |
Oversample |
None |
None |
6x |
6x |
6x |
6x |
6x |
6x |
Batch Size |
14 |
14 |
14 |
14 |
14 |
14 |
14 |
14 |
Batch Norm |
No |
No |
No |
No |
Yes |
Yes |
Yes |
Yes |
Reconstruction Encoder-Decoder Model
This table outlines the hyperparameter settings for the encoder-decoder model architecture. It includes the number of epochs, learning rate, optimizer, and other relevant hyperparameters used specifically for this model.
Table A.3: Encoder-Decoder Model Hyperparameters
Model |
Encoder - Decoder |
Epochs |
600 |
Architecture Encoder |
2 FC |
Architecture Decoder |
2 FC |
Hidden Sizes |
128 |
Initial LR |
1e-3 |
LR Scheduler |
Cosine |
Optimizer |
AdamW |
Oversample |
6x |
Batch Size |
14 |
Batch Normalization |
No |
Scientific Paper
If you want more information about this dataset feel free to read our paper:
Tiago Mota, Maria Rita Fonseca Verdelho, Diogo José Pereira Araújo, Alceu Bissoto, Carlos Santiago, Catarina Barata,
MMIST-ccRCC: A Real-World Medical Dataset for the Development of Multi-Modal Systems, Data Curation and Augmentation in Enhancing Medical Imaging Applications Workshop (archival) @CVPR 2024.