A Pathology Foundation Model for Cancer Diagnosis and Prognosis Prediction (PDF)
Document Details
Uploaded by Deleted User
Xiyue Wang, Junhan Zhao, Eliana Marostica, Wei Yuan, Jietian Jin, Jiayu Zhang, Ruijiang Li, Hongping Tang, Kanran Wang, Yu Li, Fang Wang, Yulong Peng, Junyou Zhu, Jing Zhang, Christopher R. Jackson, J
Tags
Summary
This research article introduces a new pathology foundation model for cancer diagnosis and prognosis prediction. The model, called CHIEF, employs a weakly supervised machine learning framework to extract pathology imaging features for systematic cancer evaluation. The authors utilized 60,530 whole-slide images and 44 terabytes of high-resolution pathology imaging datasets for pretraining CHIEF.
Full Transcript
Article A pathology foundation model for cancer diagnosis and prognosis prediction https://doi.org/10.1038/s41586-024-07894-z Xiyue Wang1,2,24, Junhan Zhao1,3,24, Eliana Marostica1,4, Wei Yuan5, Jietian Jin6, Jiayu Zhang5,...
Article A pathology foundation model for cancer diagnosis and prognosis prediction https://doi.org/10.1038/s41586-024-07894-z Xiyue Wang1,2,24, Junhan Zhao1,3,24, Eliana Marostica1,4, Wei Yuan5, Jietian Jin6, Jiayu Zhang5, Ruijiang Li2, Hongping Tang7, Kanran Wang8, Yu Li9, Fang Wang10, Yulong Peng11, Received: 16 November 2023 Junyou Zhu12, Jing Zhang5, Christopher R. Jackson1,13,14, Jun Zhang15, Deborah Dillon16, Accepted: 1 August 2024 Nancy U. Lin17, Lynette Sholl16,18, Thomas Denize16,18, David Meredith16, Keith L. Ligon16,18, Sabina Signoretti16,18, Shuji Ogino16,19,20, Jeffrey A. Golden16,21, MacLean P. Nasrallah22, Published online: 4 September 2024 Xiao Han15, Sen Yang1,2 ✉ & Kun-Hsing Yu1,16,23 ✉ Check for updates Histopathology image evaluation is indispensable for cancer diagnoses and subtype classification. Standard artificial intelligence methods for histopathology image analyses have focused on optimizing specialized models for each diagnostic task1,2. Although such methods have achieved some success, they often have limited generalizability to images generated by different digitization protocols or samples collected from different populations3. Here, to address this challenge, we devised the Clinical Histopathology Imaging Evaluation Foundation (CHIEF) model, a general- purpose weakly supervised machine learning framework to extract pathology imaging features for systematic cancer evaluation. CHIEF leverages two complementary pretraining methods to extract diverse pathology representations: unsupervised pretraining for tile-level feature identification and weakly supervised pretraining for whole-slide pattern recognition. We developed CHIEF using 60,530 whole-slide images spanning 19 anatomical sites. Through pretraining on 44 terabytes of high- resolution pathology imaging datasets, CHIEF extracted microscopic representations useful for cancer cell detection, tumour origin identification, molecular profile characterization and prognostic prediction. We successfully validated CHIEF using 19,491 whole-slide images from 32 independent slide sets collected from 24 hospitals and cohorts internationally. Overall, CHIEF outperformed the state-of-the-art deep learning methods by up to 36.1%, showing its ability to address domain shifts observed in samples from diverse populations and processed by different slide preparation methods. CHIEF provides a generalizable foundation for efficient digital pathology evaluation for patients with cancer. Histopathology image evaluation is integral to the diagnosis of cancers computational pathology analyses have revealed quantitative morpho- and cancer subtype classification. Previous studies on artificial intel- logical signals indicative of clinically important molecular markers18,19, ligence (AI)-based histopathology image analysis primarily rely on train- demonstrating the potential of AI methods in identifying cellular fea- ing task-specific models optimized for each use case1,2. For example, tures imperceptible to the human eyes20. Although these advances offer specialized deep neural networks have been developed for cancer cell promising avenues for improving cancer evaluation, several limitations identification4,5, histological and molecular subtype classification6–10, continue to plague quantitative pathology image analyses. To begin prognosis evaluation11–14 and treatment response prediction using with, standard deep learning methods require a large amount of data gigapixel whole-slide images (WSIs)15–17. Moreover, state-of-the-art to train a performing model for each task. As it is difficult to obtain 1 Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA. 2Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA, USA. 3Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA. 4Division of Health Sciences and Technology, Harvard-Massachusetts Institute of Technology, Boston, MA, USA. 5 College of Biomedical Engineering, Sichuan University, Chengdu, China. 6Department of Pathology, Sun Yat-sen University Cancer Center, Guangzhou, China. 7Department of Pathology, Shenzhen Maternity & Child Healthcare Hospital, Shenzhen, China. 8Department of Radiation Oncology, Chongqing University Cancer Hospital, Chongqing, China. 9Department of Pathology, Chongqing University Cancer Hospital, Chongqing, China. 10Department of Pathology, The Affiliated Yantai Yuhuangding Hospital of Qingdao University, Yantai, China. 11Department of Pathology, The First Affiliated Hospital of Jinan University, Guangzhou, China. 12Department of Burn, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China. 13Department of Pathology and Laboratory Medicine, Pennsylvania State University, Hummelstown, PA, USA. 14Department of Pathology, Massachusetts General Hospital, Boston, MA, USA. 15Tencent AI Lab, Shenzhen, China. 16Department of Pathology, Brigham and Women’s Hospital, Boston, MA, USA. 17Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA. 18Department of Pathology, Dana-Farber Cancer Institute, Boston, MA, USA. 19Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA. 20Broad Institute of MIT and Harvard, Cambridge, MA, USA. 21Department of Pathology, Cedars-Sinai Medical Center, Los Angeles, CA, USA. 22Department of Pathology and Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA. 23Harvard Data Science Initiative, Harvard University, Cambridge, MA, USA. 24These authors contributed equally: Xiyue Wang, Junhan Zhao. ✉e-mail: [email protected]; [email protected] 970 | Nature | Vol 634 | 24 October 2024 comprehensive pathology representations that cover the heterogeneity performance of CHIEF in a wide range of pathology evaluation tasks, of diverse tissue microenvironments, existing approaches mainly focus including cancer detection, tumour origin prediction, genomic profile on solving each narrow diagnostic task individually1,7. In addition, most identification and survival prediction (Fig. 1a). The details of model AI models for pathology imaging analyses are tailored from general design and implementation are described in the Methods. computer vision models designed for classifying macroscopic objects (for example, animals, cars and buses)2. These conventional approaches do not leverage the general tissue pathology patterns when training CHIEF augmented cancer cell detection specialized diagnostic models. Furthermore, AI models trained by Detecting malignant cells from pathological images is crucial for can- images from a single source tend to overfit the training data distribution cer diagnoses4,5. State-of-the-art AI methods for cancer cell detec- and suffer from substantial performance deterioration when applied tion predominantly concentrate on training models for specific to images processed by different pathology laboratories3,21. These cancer types, without leveraging the commonalities of malignant limitations have hindered the effective application of state-of-the-art cell morphology across cancers. The resulting models are not easily AI models for reliable pathology evaluation. extensible to other cancer categories. To address this gap, we built a Self-supervised learning has emerged as a promising approach for weakly supervised cancer detection platform using CHIEF and evalu- obtaining robust image feature representation useful for a wide range ated its generalizability across cancers. We conducted an extensive of prediction tasks using samples collected in diverse settings22,23. As external validation using 15 independent datasets with a total of diverse unlabelled training data are relatively straightforward to col- 13,661 WSIs. These datasets encompass both public (for example, lect and the model training process is task-agnostic, self-supervised Clinical Proteomic Tumor Analysis Consortium (CPTAC), Diagset-B31, learning has achieved robust performance across different tasks and Dataset-patient-level-test (Dataset-PT)32, the Diagnostic Reference data distributions, such as image retrieval24–26 and weakly supervised Oncology Imaging Database (DROID)-Breast and TissueNet33 cohorts) WSI analysis27. Recent advancements in self-supervised learning for and institutional (for example, samples from Shenzhen Maternity & pathology image analyses further utilized both images and their Child Healthcare Hospital (SMCH) and Chongqing University Cancer text descriptions to augment the performance of computer vision Hospital (CUCH)) data sources, contain biopsy and surgical resection models28,29. However, these methods have two major limitations. First, slides and span 11 different primary cancer sites, including the breast, they primarily focus on individual image tiles in the WSIs, without uterus–endometrium, oesophagus, stomach, cervix, colon, prostate, considering the interactions of different regions of the same tissue. kidney, skin, pancreas and lung. To better assess the performance of Second, previous studies focused on narrow diagnostic tasks and did CHIEF, we compared it with three weakly supervised WSI classification not evaluate the generalizability of the extracted quantitative imaging methods: clustering-constrained-attention multiple instance learning features in different prediction tasks across cancer types and samples (CLAM)6, attention-based deep multiple instance learning (ABMIL)34 from several sources. As pathologists often face a variety of disease and dual-stream multiple instance learning networks (DSMIL)35. samples and need to assimilate contextual information from the tissue CHIEF consistently attained superior performance in a variety microenvironment, developing a general-purpose pathology AI system of cancer identification tasks using either biopsy or surgical resec- capable of accommodating a wide range of tissue types and evaluation tion slides (Fig. 2a). CHIEF achieved a macro-average area under the tasks is of paramount importance. receiver operating characteristic curve (AUROC) of 0.9397 across 15 To address these pressing clinical needs, we established the CHIEF datasets representing 11 cancer types (Fig. 2a), which is approximately model, a general-purpose machine learning framework that provides 10% higher than that attained by DSMIL (a macro-average AUROC of the foundation for various pathology diagnosis and prediction tasks 0.8409), 12% higher than that of ABMIL (a macro-average AUROC (Fig. 1a). We leveraged two complementary forms of AI model pretrain- of 0.8233) and 14% higher than that of CLAM (a macro-average AUROC ing: self-supervised pretraining using 15 million pathology image tiles of 0.8016). In all five biopsy datasets collected from independent for tile-level feature representation and weakly supervised pretraining cohorts, CHIEF possessed AUROCs of greater than 0.96 across several on 60,530 WSIs across 19 anatomical sites for tissue context representa- cancer types, including oesophagus (CUCH-Eso), stomach (CUCH-Sto), tion. In addition, we devised an efficient framework for tile-level feature colon (CUCH-Colon) and prostate (Diagset-B and CUCH-Pros). On inde- aggregation in large-scale WSI analysis. We further validated CHIEF’s pendent validation with seven surgical resection slide sets spanning capability in cancer detection, tumour origin characterization, genomic five cancer types (that is, colon (Dataset-PT), breast (DROID-Breast), mutation identification and survival prediction using 32 independent endometrium (SMCH-Endo and CPTAC-uterine corpus endometrial car- datasets consisting of 19,491 weakly annotated WSIs. Our approach cinoma (UCEC)), lung (CPTAC-lung squamous cell carcinoma (LUSC)) challenges conventional attention-based tile-aggregation methods, and cervix (SMCH-Cervix and TissueNet)), CHIEF attained AUROCs offering a holistic representation of WSI features. CHIEF enables sys- greater than 0.90. Both CHIEF and the set of baseline methods had tematic microscopic feature identification and lays the groundwork lower performance in CPTAC. Nonetheless, CHIEF significantly outper- for reliable pathology evaluation. formed all other methods in cancer cell identification in these datasets (DeLong test P value < 0.001). These results demonstrated CHIEF’s generalizability across diverse cancer tissues and samples obtained An overview of CHIEF from heterogeneous sources internationally. We established the CHIEF model, a general-purpose machine learning We used whole-slide attention visualization to identify diagnostic framework for weakly supervised histopathological image analyses. signals utilized by the CHIEF models. Figure 2b, Extended Data Fig. 2 Unlike commonly used self-supervised feature extractors27,30, CHIEF lev- and Supplementary Fig. 1 show the original WSIs, pixel-level ground eraged two types of pretraining procedure: unsupervised pretraining truth annotated by pathologists (Methods) and attention maps output on 15 million unlabelled tile images and weakly supervised pretraining by CHIEF. CHIEF directed most of its attention to cancerous regions, on more than 60,000 WSIs. Tile-level unsupervised pretraining estab- exhibiting a remarkable alignment with ground truth annotations at the lished a general feature extractor30 for haematoxylin–eosin-stained pixel level despite being trained only on slide-level labels. Notably, tiles histopathological images collected from heterogeneous publicly avail- receiving high attention from CHIEF contained tissue with typical cyto- able databases, which captured diverse manifestations of microscopic logic and architectural patterns of malignancy (for example, increased cellular morphologies. Subsequent WSI-level weakly supervised pre- nuclear/cytoplasmic ratio, irregularly shaped nuclei, cellular pleomor- training constructed a general-purpose model by characterizing the phism and disorganized architecture), showing the model’s capacity to similarities and differences between cancer types. We evaluated the identify key diagnostic features using a weakly supervised approach. Nature | Vol 634 | 24 October 2024 | 971 Article a b Number of slides Brain Pan-cancer WSIs 00 00 0 0 0 0 Anatomical sites WSIs 00 00 00 00 ,0 ,0 Tiles 10 12 2, 4, 6, 8, 0 Thyroid Oesophagus Kidney... Prostate Lung Skin Liver Colon Breast... Cropping Liver Stomach Stomach... Adrenal gland Prostate Breast Pancreas Kidney Soft tissue......... Colon... Skin Bladder Prostate Breast Oesophagus Uterus Ovary Soft tissue Testis Brain Cervix Kidney Lung CLIP embedding Pathology diagnosis Liver Text Human Cancer detection y kidney Thyroid ne encoder... Tumour origin identification Kid Pancreas Biomarker prediction Cervix Prediction of prevalent genetic Training data: Pan-cancer WSI’s feature vectors Adrenal gland mutations 19 anatomical sites feature Aggregation Testis Prediction of genes related to (60,000 slides)... targeted therapy Uterus IDH and MSI status prediction... Image Ovary Prognostic prediction... encoder Bladder Overall and disease-specific... survival prediction... c Cancer classification and molecular prediction (AUROCs) Survival prediction (c-index) DROID - breast DFCI-BRCA TCGA-LUAD CUCH - colon SMCH - endometrium 0.64 PLCO-LUAD 0.695 Dataset-PT - colon 0.990 0.587 0.981 0.951 CUCH - oesophagus PLCO-BRCA 0.611 0.994 0.973 0.644 0.578 CPTAC - colon 0.976 0.875 0.916 0.913 CUCH - stomach 0.621 0.679 0.987 PAIP2020 - colorectum 0.906 0.895 0.588 0.645 DFCI-LUAD 0.873 SMCH - cervix 0.960 0.973 TCGA-BRCA TCGA - colorectum 0.616 0.769 0.639 0.727 0.858 TissueNet - cervix 0.929 TCGA-LUSC 0.869 0.783 0.653 0.627 0.751 0.935 Diagset B - prostate HMS - GBM 0.939 0.806 PLCO-COADREAD 0.967 0.465 0.723 0.715 0.789 0.821 CUCH - prostate CPTAC-LUSC MUV - GBM 0.845 0.964 0.992 0.545 0.614 0.851 0.794 0.782 TCGA - GBM 0.903 0.737 0.802 CPTAC - kidney 0.831 0.785 0.796 0.663 0.795 TCGA-COADREAD 0.511 0.573 0.488 TCGA-UCEC HMS - LGG 0.878 0.850 0.820 0.969 CPTAC - uterus 0.847 0.852 MUV - LGG 0.910 0.611 0.742 0.646 CPTAC - melanoma CHIEF 0.909 0.827 CPTAC-UCEC Cancer detection CLAM HMS-RCC TCGA - LGG CPTAC - pancreas 0.805 CHIEF MSI status ABMIL 0.698 CPTAC - lung PORPOISE CPTAC-RCC TCGA-RCC IDH status DSMIL DSMIL Fig. 1 | An overview of the CHIEF model. a, CHIEF is a generalizable machine as the foundation for fine-tuning models for each specific task. b, A summary learning framework for weakly supervised histopathological image analysis. of the 60,530 slides for training the CHIEF model. We collected these pathology CHIEF extracts pathology imaging representations useful for cancer slides belonging to 19 anatomical sites from 14 cohorts. c, CHIEF significantly classification, tumour origin prediction, genomic profile prediction and outperformed state-of-the-art methods in cancer classification, genomic prognostic analyses. We pretrained CHIEF in a weakly supervised manner using profile identification and survival prediction tasks. The left panel summarizes 60,530 WSIs representing 19 anatomical sites. During the pretraining process, the AUROCs for cancer classification and genomic profile prediction tasks. we cropped the WSIs into non-overlapping imaging tiles, and we encoded the Overall, CHIEF outperformed state-of-the-art deep learning methods by anatomic site information of each WSI using the contrastive language–image up to 36.1% in these tasks. The right panel outlines the c-index of survival pretraining (CLIP) embedding method to obtain a feature vector for each prediction. On average, CHIEF performed 9% better than conventional anatomic site. We merged the text and image embeddings to represent the methods. Supplementary Tables 1–3 show detailed performance comparisons. heterogeneous pathology information from the training data. We then used the DFCI, Dana–Farber Cancer Institute; PAIP, Pathology AI Platform; PLCO, the pathology imaging features extracted by CHIEF to infer cancer types directly. Prostate, Lung, Colorectal and Ovarian study. The graphics of the human and In the genomic profile and prognostic prediction tasks, CHIEF features served DNA in a were created with BioRender.com. comprehensive genomic profiling of patients with cancer is not rou- CHIEF identified tumour origins tinely conducted worldwide owing to the additional cost and time We successfully used CHIEF to predict the tissue origin of cancers and involved18. Identifying quantitative morphological patterns indica- validated the results using independent test sets from CPTAC. Extended tive of genomic profiles from routine haematoxylin–eosin-stained Data Fig. 1 and Supplementary Tables 5–7 show the detailed results. slides offers an instantaneous and cost-effective alternative to genomic sequencing. We examined CHIEF’s capability to systematically predict molecular profiles of cancer samples. We focused on four clinically CHIEF predicted genomic profiles important prediction tasks: systematic prediction of prevalent genetic Genomic profiles of cancer samples indicate patients’ treatment mutations across cancer types; identification of mutations related to responses and are crucial for formulating treatment plans19. The targeted therapies; isocitrate dehydrogenase (IDH) status prediction 972 | Nature | Vol 634 | 24 October 2024 a Colon (Dataset-PT) Colon (CUCH-Colon) Oesophagus (CUCH-Eso) Stomach (CUCH-Sto) Cervix (SMCH-Cervix) 1.0 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 Sensitivity 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 CHIEF AUROC = 0.9943 (0.9858–1.0000) CHIEF AUROC = 0.9904 (0.9825–0.9965) CHIEF AUROC = 0.9730 (0.9564–0.9871) CHIEF AUROC = 0.9870 (0.9777–0.9940) CHIEF AUROC = 0.9726 (0.9509–0.9890) CLAM AUROC = 0.8569 (0.8187–0.8905) CLAM AUROC = 0.8413 (0.8099–0.8716) CLAM AUROC = 0.8592 (0.8175–0.8991) CLAM AUROC = 0.9517 (0.9351–0.9672) 0.2 CLAM AUROC = 0.8338 (0.7825–0.8803) 0.2 0.2 0.2 0.2 ABMIL AUROC = 0.9040 (0.8761–0.9311) ABMIL AUROC = 0.9084 (0.8850–0.9315) ABMIL AUROC = 0.8874 (0.8505–0.9226) ABMIL AUROC = 0.9594 (0.9423–0.9745) ABMIL AUROC = 0.8581 (0.8095–0.8994) DSMIL AUROC = 0.9060 (0.8780–0.9314) DSMIL AUROC = 0.9164 (0.8913–0.9409) DSMIL AUROC = 0.8945 (0.8599–0.9260) DSMIL AUROC = 0.9189 (0.8963–0.9408) 0 DSMIL AUROC = 0.8535 (0.8106–0.8970) 0 0 0 0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 Breast (DROID-Breast) Endometrial (SMCH-Endo) Prostate (Diagset-B) Prostate (CUCH-Pros) Cervix (TissueNet) 1.0 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 Sensitivity 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 CHIEF AUROC = 0.9810 (0.9970–0.9924) CHIEF AUROC = 0.9512 (0.9129–0.9809) CHIEF AUROC = 0.9671 (0.9616–0.9725) CHIEF AUROC = 0.9916 (0.9856–0.9962) CHIEF AUROC = 0.9289 (0.9030–0.9526) 0.2 CLAM AUROC = 0.9761 (0.9613–0.9880) 0.2 CLAM AUROC = 0.8548 (0.7904–0.9102) 0.2 CLAM AUROC = 0.9274 (0.9194–0.9355) 0.2 CLAM AUROC = 0.9583 (0.9466–0.9699) 0.2 CLAM AUROC = 0.7127 (0.6626–0.7624) ABMIL AUROC = 0.9660 (0.9481–0.9811) ABMIL AUROC = 0.9126 (0.8622–0.9560) ABMIL AUROC = 0.9354 (0.9279–0.9430) ABMIL AUROC = 0.9537 (0.9405–0.9656) ABMIL AUROC = 0.7249 (0.6766–0.7682) 0 DSMIL AUROC = 0.9739 (0.9567–0.9868) 0 DSMIL AUROC = 0.8730 (0.8145–0.9256) 0 DSMIL AUROC = 0.9244 (0.9163–0.9326) 0 DSMIL AUROC = 0.9636 (0.9519–0.9752) 0 DSMIL AUROC = 0.7829 (0.7397–0.8230) 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 Kidney (CPTAC-CCRCC) Endometrial (CPTAC-UCEC) Melanoma (CPTAC-CM) Pancreas (CPTAC-PDA) Lung (CPTAC-LUSC) 1.0 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 Sensitivity 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 CHIEF AUROC = 0.8016 (0.7697–0.8327) CHIEF AUROC = 0.9690 (0.9584–0.9782) CHIEF AUROC = 0.8524 (0.8110–0.8917) CHIEF AUROC = 0.8269 (0.7894–0.8606) CHIEF AUROC = 0.9090 (0.8910–0.9260) CLAM AUROC = 0.4409 (0.3951–0.4830) 0.2 CLAM AUROC = 0.6766 (0.6419–0.7146) CLAM AUROC = 0.7701 (0.7241–0.8140) 0.2 CLAM AUROC = 0.6069 (0.5522–0.6644) CLAM AUROC = 0.7569 (0.7291–0.7844) 0.2 0.2 0.2 ABMIL AUROC = 0.4741 (0.4340–0.5154) ABMIL AUROC = 0.6975 (0.6619–0.7334) ABMIL AUROC = 0.7700 (0.7267–0.8148) ABMIL AUROC = 0.6600 (0.6076–0.7113) ABMIL AUROC = 0.7388 (0.7088–0.7688) DSMIL AUROC = 0.5454 (0.5082–0.5891) DSMIL AUROC = 0.7819 (0.7469–0.8155) DSMIL AUROC = 0.7954 (0.7516–0.8362) DSMIL AUROC = 0.6629 (0.6096–0.7196) DSMIL AUROC = 0.8204 (0.7952–0.8450) 0 0 0 0 0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 1 – specificity 1 – specificity 1 – specificity 1 – specificity 1 – specificity b Prostate (Diagset-B) Cervix (TissueNet) Breast (DROID-Breast) Colon (Dataset-PT) Ground truth annotation of cancerous regions Low High Attention level Patches with high attention scores Patches with low attention scores Fig. 2 | CHIEF outperformed state-of-the-art deep learning methods in of model attention scores showed CHIEF accurately identified cancerous detecting cancer cells using WSIs. a,b, We validated CHIEF’s capability of regions in WSIs. For each cancer type, the left image panel represents the cancer detection using 15 independent datasets collected from several hospitals ground truth annotations labelled by experienced pathologists. As CHIEF uses worldwide. Our test datasets encompassed 13,661 WSIs from 11 sites of origin a weakly supervised approach that requires only slide-level annotations, these (breast, endometrium–uterus, oesophagus, stomach, cervix, colon, prostate, region-level annotations were not revealed to the model during the training kidney, skin, pancreas and lung). a, CHIEF attained up to 0.9943 in the AUROCs phase. The middle panel visualizes the amount of attention CHIEF paid to across 15 independent test datasets and consistently outperformed (two-sided each region in the WSIs. The right panel shows the zoomed-in view of regions Wilcoxon signed-rank test P value = 0.000061) three deep learning methods receiving high (image tiles with red outlines) and low (image tiles with black (that is, CLAM, ABMIL and DSMIL). The receiver operating characteristic curves outlines) attention scores. Extended Data Fig. 2 and Supplementary Fig. 1 show of CHIEF and baseline methods are shown. The mean AUROC and its 95% CIs, additional result visualization of this classification task. The original WSIs and calculated using the non-parametric bootstrapping method (n = 1,000 their corresponding heat maps are available at https://yulab.hms.harvard.edu/ replicates), are presented. The diagonal dashed line in each plot represents projects/CHIEF/CHIEF.htm. Scale bars, 2 mm (left images for Cervix (TissueNet), the performance of a null model. CCRCC, clear cell renal cell carcinoma; CM, Prostate (Diagset-B) and Colon (Dataset-PT)), 3 mm (left image for Breast cutaneous melanoma; PDA, pancreatic ductal adenocarcinoma. b, Visualization (DROID-Breast)) and 50 μm (bottom right magnifications). for the new WHO (World Health Organization) classification of glioma; CHIEF predicted the mutation status of nine genes with AUROCs and microsatellite instability (MSI) prediction for assessing the benefits greater than 0.8 in our systematic pan-cancer genetic mutation analyses of immune checkpoint blockade in patients with colorectal cancer (Fig. 3). Consistent with previous studies18,36, pathology images contain (CRC). strong signals related to TP53 mutation across 19 cancer types, with high AUROCs in low-grade glioma (LGG; 0.8756; 95% confidence interval (CI) Prevalent genetic mutations 0.8624–0.8888), adrenal carcinoma (0.8119; 95% CI 0.7488–0.8751) and We conducted a systematic analysis that associated prevalent genetic UCEC (0.8115; 95% CI 0.7971–0.8259). CHIEF also identified mutations in mutations with histopathology images (Fig. 3 and Extended Data Fig. 3). GTF2I, which occur in 43.4% of patients with thymic epithelial tumours37, Our study involved 13,432 WSIs across 30 cancer types and 53 genes with an AUROC of 0.9111 (95% CI 0.8935–0.9287). Furthermore, CHIEF with the top five highest mutation rates in each cancer type. predicted BAP1 mutation in uveal melanoma (AUROC = 0.817; 95% CI Nature | Vol 634 | 24 October 2024 | 973 Article THYM 0.9111 GTF2I DLBC 0.88 BTG2 LGG 0.874 CIC UVM 0.7286 EIF1AX BRCA 0.8708 CDH1 UVM 0.7256 SF3B1 DLBC 0.8633 IGLL5 UVM 0.7215 GNA11 THCA 0.8067 NRAS TGCT 0.7654 DLBC 0.7967 IGLV3 PAAD 0.7515 KRAS PCPG 0.7894 HRAS COADREAD 0.622 THCA 0.7885 TGCT 0.7821 KIT UCS 0.7092 PPP2R1A LIHC 0.7732 CTNNB1 BRCA 0.6975 GATA3 UCEC 0.8384 UCEC 0.6975 ARID1A GBM 0.6872 PTEN SARC 0.6888 RB1 PCPG 0.7499 NF1 KIRP 0.6878 MET LGG 0.8529 ATRX UVM 0.6875 GNAQ SARC 0.6313 Genes 0.685 IGHV2 LGG 0.8756 DLBC ACC 0.8119 PRAD 0.6776 SPOP UCEC 0.8115 COADREAD 0.6699 APC BRCA 0.7928 THCA 0.6678 TG OV 0.7871 GBM 0.6676 EGFR HNSC 0.7775 ESCA 0.7445 LUAD 0.753 SYNE1 GBM 0.7517 STAD 0.5897 PRAD 0.7482 KIRC 0.6536 PBRM1 COADREAD 0.7377 TP53 UCS 0.7105 LIHC 0.7262 BRCA 0.6728 STAD 0.7223 COADREAD 0.6655 PIK3CA MESO 0.7207 CESC 0.6486 PAAD 0.7097 UCEC 0.5497 SARC 0.7038 KIRP 0.6518 BLCA 0.6819 KMT2C ESCA 0.681 CESC 0.6428 LUSC 0.6538 KIRC 0.6413 VHL KICH 0.6506 LUAD 0.6786 RYR2 LUSC 0.5781 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 PAAD 0.6399 CDKN2A AUROC HNSC 0.614 GBM 0.7346 ESCA 0.6934 LUAD 0.6775 SARC 0.6675 BLCA 0.6634 DLBC 0.9571 EZH2 STAD 0.6626 LUSC 0.8211 ERBB2 CESC 0.6202 BLCA 0.8161 FGFR3 LUSC 0.6122 MUC16 BLCA 0.7944 FGFR2 Pan-cancer mutation prediction KIRP 0.603 BRCA 0.7853 ESR1 SKCM 0.5961 LUAD 0.7217 MET PRAD 0.5948 LUSC 0.7122 OV 0.5751 HNSC 0.9084 ACC 0.5558 STAD 0.8192 LIHC 0.5549 COADREAD 0.7919 PAAD 0.782 Genes HNSC 0.5293 UCEC 0.7009 NTRK1 BLCA 0.6223 KDM6A LUAD 0.628 PAAD 0.6165 SMAD4 LUSC 0.5293 UVM 0.817 OV 0.5 KIRC 0.5535 BAP1 PRAD 0.8938 MESO 0.4669 BRCA 0.648 LUAD 0.6399 OV 0.6431 BRCA2 Prediction of genes related to targeted therapy ESCA 0.6197 CSMD3 PAAD 0.5286 LUSC 0.5741 LUAD 0.7877 LUSC 0.5599 ALK OV 0.7766 LUAD 0.6759 LUAD 0.7133 LUSC 0.6245 EGFR LGG 0.675 GBM 0.672 BRCA 0.6647 PIK3CA PRAD 0.6581 ESCA 0.8267 COADREAD 0.8127 ESCA 0.6543 PAAD 0.7257 SARC 0.6468 STAD 0.6658 UCEC 0.6389 HNSC 0.6477 MESO 0.6365 LUAD 0.6075 NTRK3 THCA 0.6173 UCEC 0.5927 Genes STAD 0.6087 THCA 0.5613 BLCA 0.6031 TTN LUSC 0.544 SKCM 0.5986 OV 0.4558 COADREAD 0.5894 THCA 0.8889 CESC 0.5785 COADREAD 0.7317 KIRC 0.5772 LUAD 0.5861 SKCM 0.5736 BRAF HNSC 0.5641 UCS 0.5569 UCEC 0.5438 BRCA 0.5518 LUSC 0.5304 COADREAD 0.7477 LIHC 0.5508 LUAD 0.6581 LUSC 0.5453 BRCA 0.6378 RET PAAD 0.5271 UCEC 0.6067 KIRP 0.497 LUSC 0.5264 PRAD 0.6442 PAAD 0.7478 DLBC 0.5844 KMT2D COADREAD 0.7305 BLCA 0.577 STAD 0.7183 STAD 0.6018 LRP1B HNSC 0.6623 HNSC 0.5943 FAT1 LUSC 0.6554 NTRK2 SKCM 0.5932 DNAH5 UCEC 0.5487 CESC 0.5718 MUC4 LUAD 0.4957 KIRC 0.5717 SETD2 OV 0.4863 UCS 0.556 FBXW7 BRCA 0.6715 SKCM 0.5526 PAAD 0.6257 BRCA1 PCLO OV 0.5568 KIRP 0.5257 BLTP1 LUAD 0.6427 MESO 0.5086 NF2 LUSC 0.5531 ROS1 LIHC 0.5059 ALB LUAD 0.6346 LUSC 0.4028 KRAS 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 AUROC 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 AUROC Fig. 3 | CHIEF successfully predicted genetic mutations across cancer types and endocervical adenocarcinoma; COADREAD, colon adenocarcinoma using histopathology images. CHIEF predicted prevalent somatic mutations and rectum adenocarcinoma; DLBC, diffuse large B cell lymphoma; ESCA, (n = 11,483) and mutations related to targeted therapies (n = 6,013) in several oesophageal carcinoma; HNSC, head and neck squamous cell carcinoma; KICH, cancer types using histopathology images alone. We stratified our analyses by chromophobe renal cell carcinoma; KIRC (also known as CCRCC), clear cell renal cancer types and organized the prediction results by genes. The detailed cell carcinoma; KIRP, papillary renal cell carcinoma; LIHC, liver hepatocellular sample counts for each cancer type can be found in Supplementary Tables 17 carcinoma; MESO, mesothelioma; OV, ovarian serous cystadenocarcinoma; and 18. Owing to differences in the tumour microenvironment in different PAAD, pancreatic adenocarcinoma; PCPG, pheochromocytoma and cancer types, variations in the prediction performance were observed. The paraganglioma; PRAD, prostate adenocarcinoma; SARC, sarcoma; SKCM, skin mean ± 95% CI for each prediction task is shown. Error bars represent the 95% cutaneous melanoma; STAD, stomach adenocarcinoma; TGCT, testicular germ CIs estimated by fivefold cross-validation. ACC, adrenocortical carcinoma; cell tumors; THCA, thyroid carcinoma; THYM, thymoma; UCS, uterine BLCA, bladder urothelial carcinoma; CESC, cervical squamous cell carcinoma carcinosarcoma; UVM, uveal melanoma. 0.7668–0.8672), which is observed in approximately 45% of uveal mela- noma cases38. Mutations related to targeted therapies We tested CHIEF in an independent patient cohort from CPTAC. We further used CHIEF to predict genes associated with FDA (Food CHIEF consistently maintained similar AUROCs for various genes in and Drug Administration)-approved targeted therapies presented these new patient cohorts (Extended Data Fig. 4). Compared with the in OncoKB39 (www.oncokb.org) across 18 genes spanning 15 cancer state-of-the-art method for histopathology-based genomic mutation types (Fig. 3). CHIEF predicted the mutation status of all 18 genes prediction (that is, the pan-cancer computational histopathology with AUROCs greater than 0.6 (Fig. 3). Mutations with high predic- (PC-CHiP) method36; Supplementary Fig. 2), CHIEF showed significantly tion performance included EZH2 in diffuse large B-cell lymphoma higher performance (Wilcoxon signed-rank test P value < 0.001), with (AUROC = 0.9571; 95% CI 0.9321–0.9822), NTRK1 in stomach adeno- a macro-average AUROC of 0.7043 (range 0.51–0.89). By contrast, the carcinoma (AUROC = 0.8192; 95% CI 0.7767–0.8618), BRCA2 in prostate PC-CHiP method attained a macro-average AUROC of 0.6523 (range adenocarcinoma (AUROC = 0.8938; 95% CI 0.8310–0.9567), BRAF in 0.39–0.92). thyroid carcinoma (AUROC = 0.8889; 95% CI 0.8715–0.9064), ERBB2 in 974 | Nature | Vol 634 | 24 October 2024 a Held-out partition of TCGA-LGG Independent test set: MUV-LGG Independent test set: HMS-LGG 1.0 1.0 1.0 0.8 0.8 0.8 Sensitivity Sensitivity Sensitivity 0.6 0.6 0.6 0.4 0.4 0.4 CHIEF (AUROC = 0.9098 ± 0.0290) CHIEF (AUROC = 0.8466 ± 0.0183) CHIEF (AUROC = 0.8783 ± 0.0356) CLAM (AUROC = 0.8355 ± 0.0548) CLAM (AUROC = 0.7859 ± 0.0168) CLAM (AUROC = 0.7280 ± 0.0514) 0.2 0.2 0.2 ABMIL (AUROC = 0.8495 ± 0.0462) ABMIL (AUROC = 0.7768 ± 0.0113) ABMIL (AUROC = 0.7372 ± 0.0410) DSMIL (AUROC = 0.8314 ± 0.0494) DSMIL (AUROC = 0.7959 ± 0.0160) DSMIL (AUROC = 0.6959 ± 0.0414) 0 0 0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 1 – specificity 1 – specificity 1 – specificity b Held-out partition of TCGA-COADREAD Independent test set: PAIP2020 Independent test set: CPTAC-COAD 1.0 1.0 1.0 0.8 0.8 0.8 Sensitivity Sensitivity Sensitivity 0.6 0.6 0.6 0.4 0.4 0.4 CHIEF (AUROC = 0.8692 ± 0.0324) CHIEF (AUROC = 0.8729 ± 0.0276) CHIEF (AUROC = 0.8750 ± 0.0199) CLAM (AUROC = 0.7464 ± 0.0528) CLAM (AUROC = 0.6998 ± 0.1024) CLAM (AUROC = 0.5971 ± 0.0220) 0.2 0.2 0.2 ABMIL (AUROC = 0.7511 ± 0.0266) ABMIL (AUROC = 0.6526 ± 0.0800) ABMIL (AUROC = 0.5897 ± 0.0388) DSMIL (AUROC = 0.7489 ± 0.0435) DSMIL (AUROC = 0.7266 ± 0.0431) DSMIL (AUROC = 0.6158 ± 0.0409) 0 0 0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 1 – specificity 1 – specificity 1 – specificity Fig. 4 | CHIEF predicted the IDH status of glioma samples and the MSI status cross-validations using the TCGA-LGG (n = 842) dataset. The middle and right of patients with CRC in several cohorts. a, CHIEF successfully identified IDH panels show the validation results in the independent datasets (MUV-LGG mutation status in low histological grade groups (n = 1,289). These results (n = 365) and HMS-LGG (n = 82)). b, CHIEF identified patient with MSI-high indicated that CHIEF characterized IDH-related morphological signals status with AUROCs of 0.869–0.875. The left panel represents the MSI independent of histological grades. As the fifth edition of the WHO Classification prediction performance in the TCGA-COADREAD dataset (n = 437) using of Tumors of the Central Nervous System 40 incorporated IDH mutation status fourfold cross-validation. The middle and right panels illustrate the in the definition of GBM and LGG, CHIEF provides molecular profile predictions performance of two independent test sets (that is, PAIP2020 (n = 77) and that enable fast cancer classification based on the new clinical guidelines. The CPTAC-COAD (n = 221)). Results in a,b are presented as mean ± s.d. across left panels show the mean receiver operating characteristic curves of tenfold cross-validation. lung squamous cell carcinoma (LUSC; AUROC = 0.8211; 95% CI 0.7597– 0.8826) and FGFR3 in bladder urothelial carcinoma (AUROC = 0.8161; MSI status prediction 95% CI 0.7921–0.8402). On independent validation, CHIEF achieved MSI is a well-established biomarker for responses to immune check- a similar level of performance in the CPTAC cohorts (Extended Data point blockade in CRCs27. To enable rapid treatment personalization at Fig. 4). Among these genes, ESR1 in breast cancer (BRCA), EGFR in lung the time of diagnosis, we examined the performance of CHIEF in pre- adenocarcinoma (LUAD) and BRAF in colon adenocarcinoma and rec- dicting MSI status using histopathological images. CHIEF significantly tum adenocarcinoma (COADREAD) all exhibited AUROCs greater than outperformed the best-performing baseline method (DSMIL) in the 0.7 in both held-out and independent test sets. TCGA-COADREAD dataset and two independent cohorts (PAIP202042 and CPTAC-COAD), with an AUROC improvement of approxi- IDH status prediction mately 12%, 15% and 26%, respectively (Fig. 4b). Attention analyses The fifth edition of the WHO Classification of Tumors of the Central showed that regions containing solid tumours, luminal necrosis and Nervous System distinguished glioblastoma (GBM) from LGG on the tumour-infiltrating lymphocytes received high attention from CHIEF basis of IDH status instead of conventional histological features8,40. (Extended Data Fig. 6). Thus, it is crucial to identify patients’ IDH status at the time of diagnosis. To identify IDH mutation-related signals independent of histologi- cal grades, we stratified our study cohorts by histological grade and CHIEF predicted survival outcomes used CHIEF to predict IDH status in each stratum. We conducted IDH Owing to differential responses to standard treatments, patients with status prediction analyses on six datasets: The Cancer Genome Atlas cancer have varying disease-specific survival outcomes after their (TCGA)-LGG, TCGA-GBM, Medical University of Vienna (MUV)-LGG41, initial diagnoses43. Although many clinical and genomic biomarkers MUV-GBM41, Harvard Medical School and the University of Pennsyl- have been proposed, they do not fully predict the prognosis of every vania (HMS)-LGG and HMS-GBM, including a total of 2,718 WSIs. The patient. To address this challenge, we extended our CHIEF framework CHIEF model demonstrated superior performance compared to other to establish stage-stratified survival prediction models for each cancer baseline methods in both the held-out and independent test sets (Wil- type under study. We used a total of 9,404 WSIs in 17 datasets (from coxon signed-rank test P value < 0.01; Fig. 4a and Supplementary Fig. 3). both publicly available and institutional sample sources) and focused To increase interpretability, we visualized the quantitative image on 7 cancer types (COADREAD, LUSC, BRCA, GBM, UCEC, LUAD and feature vectors and examined the distribution of attention scores renal cell carcinoma (RCC)) with reliable prognostic information in determined by CHIEF (Extended Data Figs. 5 and 9b). Results showed the independent cohorts. that necrotic regions received significantly higher attention when CHIEF successfully predicted patients’ survival outcomes using the identifying gliomas with IDH-wild-type status (Mann–Whitney U-test histopathology images obtained at the time of initial diagnosis. In all P < 0.0001; Extended Data Fig. 9b). cancer types and all study cohorts, CHIEF distinguished patients with Nature | Vol 634 | 24 October 2024 | 975 Article a TCGA-BRCA (held-out) DFCI-BRCA (independent) PLCO-BRCA (independent) TCGA-LUSC (held-out) TCGA-UCEC (held-out) TCGA-COADREAD (held-out) 1.00 +++++++++++ + + ++ + ++ ++++++++++++++++++ + + + ++ ++++ 1.00 ++ +++ + 1.00 ++ ++++++++++++++++++ +++ +++++++++++++++++++++++++