MMIST ccRCC Dataset

ccRCC is the most common type of kidney cancer, accounting for up to 80% of all renal cell carcinoma cases in adults. Estimating the prognosis is critical for patient management, but it is still a very challenging task. Ongoing research on this topic has led to the creation of two public studies: CPTAC-CCRCC and TCGA-KIRC, from which we curated MMIST-ccRCC.

Number of Modalities Across the Dataset

Patients CT MRI WSI Genomics Clinical
Train 497 189 35 497 361 497
Test 121 59 13 121 101 121
Total 618 248 48 618 462 618

Patients' Clinical and Genomic Data

Here you can access the metadata with the train/test split (from CPTAC and TCGA repositories).

Download the Multi-Modal Data

Download WSIs

Below you have the links where you can access the original dataset both from TCGA (XXGB) and CPTAC ('Tissue Slide Images': with 190GB) and the CSV containing the selected files that we used. Be aware that for the TCGA we provide the manifest file that you can use directly with the GDC Data Transfer tool.

Download the CT and MRI images

Below you can access the CT and MRI images from the same patients both from TCGA ('Images': 91.56GB) and CPTAC repositories ('Radiology Images': 56.58GB), as well as the CSV with the filenames and IDs that we used in our repository.

We discarded the localization scans, the pre-contrast ones, and scans that were acquired with a significant time lapse from the diagnosis (years). We also removed the scans from the coronal and sagittal views, to minimize domain shifts. We reduced the number of volumes to 736 for CT and 552 for MRI.

Use Our Features

If you don't want to download the original images, you can use our features.

Access the CT/MRI/WSI Features

Here you can access the different folders containing the multi-modal features:

Per Patient Files (Selected Using MIL)

Access the CT/MRI/WSI Features

Several patients presented more than one CT/MRI scan and WSI image. We opted to reduce the amount of data used by our multi-modal system to a single CT, MRI, and WSI per patient. We implemented a novel patient-level MIL framework that automatically selects the best imaging modalities from the available pool. Here you can access the different CSV files containing the selected files per patient that were used in our paper:

Scientific Paper

If you want more information about this dataset feel free to read our paper:

Tiago Mota, Maria Rita Fonseca Verdelho, Diogo José Pereira Araújo, Alceu Bissoto, Carlos Santiago, Catarina Barata, MMIST-ccRCC: A Real-World Medical Dataset for the Development of Multi-Modal Systems, Data Curation and Augmentation in Enhancing Medical Imaging Applications Workshop (archival) @CVPR 2024.

How to cite us?

Get in Touch at