Breast Cancer Dataset, Classification and Detection Using Deep Learning

Journal of Health & Medical Informatics

ISSN: 2157-7420

Open Access

Perspective - (2022) Volume 13, Issue 11

Breast Cancer Dataset, Classification and Detection Using Deep Learning

Akako Kindo*
*Correspondence: Akako Kindo, Department of Pharmaceutical Sciences, Tokyo Women’s Medical University, Tokyo, Japan, Email:
Department of Pharmaceutical Sciences, Tokyo Women’s Medical University, Tokyo, Japan

Received: 03-Nov-2022, Manuscript No. jhmi-23-86857; Editor assigned: 05-Nov-2022, Pre QC No. P-86857; Reviewed: 16-Nov-2022, QC No. Q-86857; Revised: 22-Nov-2022, Manuscript No. R-86857; Published: 30-Nov-2022 , DOI: 10.37421/2157-7420.2022.13.449
Citation: Kindo, Akako. “Breast Cancer Dataset, Classification and Detection Using Deep Learning.” J Health Med Informat 13 (2022): 449.
Copyright: © 2022 Kindo A. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


The integration of information and cutting-edge digital communication networks, computational pathology (CP) has the potential to boost diagnostic quality and efficiency in clinical workflows. Complying with ethical practices, efficient data fusion, and limited processing capabilities are just a few of the difficulties associated with CP. In 2018, more than 2 million women worldwide underwent breast cancer screenings, of which approximately 0.6 million died. Chemical receptor-positive diseases make up the majority of intrusive breast cancer. Synthetic treatments focusing on the emergency room hailing pathway frequently assist patients with substance receptor-positive cancers. A pathologist draws a visual conclusion based on hematoxylin and eosin (H&E) staining after delicately segmenting a patient's example onto magnifying instrument slides for staining. Subatomic marker-explicit stains are used for confirmation and subtyping. Atomic Immunohistochemistry (IHC) is used to identify trauma centers. IHC staining, on the other hand, is timeconsuming and costly. Additionally, the skill level of the expert collecting the tissue sample and the specialist's level of expertise can all have a significant impact on test quality. Lastly, pathologists are susceptible to error in their decisions. Misdiagnosis is exacerbated by these factors. Patients run the risk of receiving subpar care because about 20% of IHC-based trauma center and PR test results are incorrect. Morphological stains have recently been demonstrated to be able to resolve tests performed in emergency rooms. However, single-focus tissue microarray datasets (TMAs) are used in these studies.


This looks at the utilization of profound learning (DL) in understanding bosom disease pictures. We begin by emphasizing the significance of imaging in nervous system science and the benefits it provides in clinical settings. The discussion of DL advancements in the diagnosis of breast cancer continues the review. These frameworks' capabilities, issues and potential solutions, and associated datasets are investigated. This paper's primary contributions are as follows: A review of recent articles (from 2018 to 2022) on the use of DL in the diagnosis of breast cancer. Open datasets related to the diagnosis of breast cancer are discussed and their web addresses are provided. The web addresses of the publicly accessible source codes for existing papers are listed. Concerning the use of DL in the diagnosis of breast cancer, current obstacles and potential future directions are discussed.

Pathologists can easily analyze the data and benefit from AI-based diagnosis tools with the assistance of cloud storage for the data from the slides. Various AI methods for medical diagnosis have already been developed by researchers in this regard. Nearly half of women with cancer are diagnosed with breast cancer, making it the most common malignant growth. The most prevalent malignant growth in women worldwide is breast carcinoma, which encompasses numerous diseases with varying histological, prognostic, and clinical outcomes. Metastatic contaminations, for example, liver and cell breakdowns in the lungs, influence a larger part of patients with harmful chest development. Patients with bosom disease underwent a comprehensive genomic analysis, and key drivers of hereditary transformations that were responsible for therapeutic implications and outcome prediction were identified.

Artificial neural networks (ANNs) make use of complex neuron structures with multiple layers to achieve high representation power, drawing inspiration from the human brain's mechanism of operation. Convolutional neural networks (CNNs) were developed by researchers to handle high-dimensional data like images due to the promising results of ANNs. A mammogram, which is an X-ray image of the breast, is an additional method of examination. This approach is even helpful for customary assessments of ladies without any indications of bosom disease. This is especially crucial for early detection and preventative measures to lessen the risk of breast cancer. As a result, Shen used DL to make a mammogram-based diagnosis of breast cancer. Two sets of training data with distinct annotations were considered in order to reduce the cost of preparing a sufficient quantity of training data. During the initial phase of training, only a small number of samples were used with lesion-level annotation. Only samples with annotation at the image level were used in the second phase. It is appealing that image-level annotation costs less than lesion-level annotation. Not only do automated diagnostic tools make the examination process more effective, but they also lighten the load on radiologists. For this reason, a commercial AI diagnostic tool was utilized for the detection of breast cancer. Patients' mammograms were triaged based on the output of the AI tool in order to reduce the number of patients who require radiology. H&E stains and IHC explanations can be used as information marks in the preparation of models. Multi-instance learning (MIL) can benefit from this. Recently, ML-driven histopathology has been predicted using MIL [1-5].


Throughout the long term, the field of DL has gained huge headway to the point that model portrayal power is seldom the restricting component. However, these powerful models won't be of any use if there aren't enough training samples. Dealing with limited training data is a growing area of research that can be approached in a variety of ways. The most obvious way to solve the problem of a lack of data is to collect high-quality datasets that can be accessed by the general public. However, not all data can be compiled. Combining two images to create new samples is an alternative promising method known as image composition. New training samples are created by combining a variety of background and foreground images in this method. Another approach to dealing with data scarcity is transfer learning. Making transfer learning domain-aware is highly desirable. Pretrained models have frequently been trained on general-purpose datasets like Image Net, which are very different from medical images. It is preferable to pre-train models on datasets that share features with our target dataset in order to address this issue. Although DL models are general-purpose learners, it is not wise to rely solely on image data. It's worth looking into the possibility of improving performance by combining data from multiple sources. Utilizing an ensemble of DL models for more robust decision-making is a different but related strategy. In order to improve performance while maintaining a manageable level of computational complexity, the difficulty lies in reducing the complexity of ensemble DL models. Using knowledge distillation techniques, ensemble methods can be made computationally efficient without sacrificing much performance.


  1. Otálora, Sebastian, Manfredo Atzori, Vincent Andrearczyk and Amjad Khan. "Staining invariant features for improving generalization of deep convolutional neural networks in computational pathology." Front Bioeng Biotechnol (2019): 198.
  2. Google Scholar, Crossref, Indexed at

  3. Griffin, Mark C., Robert A. Robinson and Douglas K. Trask. "Validation of tissue microarrays using p53 immunohistochemical studies of squamous cell carcinoma of the larynx." Mod Pathol 16 (2003): 1181-1188.
  4. Google Scholar, Crossref, Indexed at

  5. Coudray, Nicolas, Paolo Santiago Ocampo, Theodore Sakellaropoulos and Navneet Narula. "Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning." Nat Med 24 (2018): 1559-1567.
  6. Google Scholar, Crossref, Indexed at

  7. Jemal, Ahmedin, Melissa M. Center, Carol DeSantis and Elizabeth M. Ward. "Global patterns of cancer incidence and mortality rates and trends global patterns of cancer." Cancer Epidemiol Biomark Prev 19 (2010): 1893-1907.
  8. Google Scholar, Crossref, Indexed at

  9. Sparano, Joseph A., Robert J. Gray, Della F. Makower and Kathleen I. Pritchard. "Adjuvant chemotherapy guided by a 21-gene expression assay in breast cancer." N Engl J Med 379 (2018): 111-121.
  10. Google Scholar, Crossref, Indexed at

Google Scholar citation report
Citations: 2128

Journal of Health & Medical Informatics received 2128 citations as per Google Scholar report

Journal of Health & Medical Informatics peer review process verified at publons

Indexed In

arrow_upward arrow_upward