Editorial Note on Data Cleaning

George Walker

doi:10.37421/2090-4886.2022.11.152

Editorial - (2022) Volume 11, Issue 3

Editorial Note on Data Cleaning

George Walker^*

^*Correspondence: George Walker, Medical Informatics in Translational Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany, Email:

Author information

Medical Informatics in Translational Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany

Received: 08-Mar-2022, Manuscript No. sndc-22-65180; Editor assigned: 10-Mar-2022, Pre QC No. P-65180; Reviewed: 15-Mar-2022, QC No. Q-65180; Revised: 20-Mar-2022, Manuscript No. R-65180; Published: 25-Mar-2022 , DOI: 10.37421/2090-4886.2022.11.152
Citation: Walker, George. “Editorial Note on Data Cleaning.” J Sens Netw Data Commun 11 (2022): 152.
Copyright: © 2022 Walker G. This is an open-access article distributed under the terms of the creative commons attribution license which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.

Editorial

Data cleaning has long been recognized as a necessary step in data analysis, but its implementation is not uniform and differs amongst researchers. The purpose of this article is to investigate the impact of various data cleansing aspects and make recommendations. Normality, outliers, and missing values are the stages that are examined. Advances in information technology (social networks, mobile applications, the Internet of Things, and so on) have recently generated a flood of digital data; nevertheless, converting this data into valuable information for business choices is becoming increasingly difficult. Identifying legitimate, new, potentially relevant, and intelligible patterns from a large volume of data is part of the Knowledge Discovery (KD) process [1]. Preparing the data, on the other hand, is a non-trivial refining effort that necessitates technical skill in data cleansing methods and algorithms. As a result, inexperienced users have a difficult time selecting an appropriate data analysis approach.

We present a Case-based Reasoning System (CBR) to offer data cleaning strategies for classification and regression tasks to overcome these issues. The issue space is represented in our method by the dataset's meta-features, attributes, and the target variable. The data cleaning techniques employed for each dataset are stored in the solution space. A Data Cleaning Ontology is used to represent the cases. A filter and similarity stages make up the case retrieval process established of two filter techniques based on clustering and quartile analysis in the first step [2]. These filters return a smaller number of examples that are relevant. The second step uses filter techniques to rank the collected instances and rates the similarity between a new case and the retrieved cases. The suggested retrieval technique was judged by a panel of experts. A panel of judges assigns a score to the similarity of two query cases.

Massive diagnostic data is progressively created as Magnetic Confinement Fusion (MCF) research and diagnostic tools advance. Due to many interference sources and difficult measurement circumstances in MCF devices, including as mechanical vibration, electromagnetic interference, signal saturation, and hardware failures, original diagnostic data may be erroneous. Before further analysis and investigation, inaccurate diagnostic data, called filthy data, that cannot reflect genuine physical qualities of measured objects, should be cleared out to ensure the availability and trustworthiness of data sources [3].

These data cleaning systems and guidelines are frequently ineffective since they only apply to particular types of data. The ever-increasing amount of fusion data cannot be cleansed in a timely manner. Data cleaning technologies that can eliminate dirty data in a short amount of time, such as a few milliseconds, are required for real-time processing and feed-back control. Subjective considerations in manual data cleansing operations, on the other hand, contribute to uneven results [4]. The speed, efficiency, and accuracy of fusion data cleaning must all be increased urgently to fulfil the need of fusion energy research. Machine learning-based automatic data cleaning approaches are a good contender for breaking past the enormous data application bottleneck in fusion research. Machine learning, in turn, provides strong methods for cleansing diagnostic data. Using objective classification models built on original data using supervised machine learning approaches, precise data cleaning may be performed [5]. To fulfil the needs of real-time feedback control, the application speed of a well-trained model may be readily optimized. Massive fusion data may be processed properly with the help of a supercomputer, relieving researchers' data processing demands. The resilience and applicability of categorization models serve as a foundation for machine learning applications in fusion research on a broad scale.

Conflict of Interest

None.

References

Hemingway, Harry, Folkert W. Asselbergs, John Danesh, Richard Dobson, et al. “Big data from electronic health records for early and late translational cardiovascular research: challenges and potential.” Eur Heart J 39 (2018):1481–1495.

Google Scholar, Crossref, Indexed at

Weiskopf, Nicole Gray, and Chunhua Weng. “Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research.” J Am Med Inform Assoc 20(2013):144–151.

Google Scholar, Crossref, Indexed at

Feder, Shelli L. “Data quality in electronic health records research: quality domains and assessment methods.” West J Nurs Res 40 (2018):753–766.

Google Scholar, Crossref, Indexed at

Terry, Amanda L., Moira Stewart, Sonny Cejic and J. Neil Marshall, et al. “A basic model for assessing primary health care electronic medical record data quality.” BMC Med Inform Decis Mak 19 (2019): 1-11.

Google Scholar, Crossref, Indexed at

Mashoufi, Mehrnaz, Haleh Ayatollahi, and Davoud Khorasani-Zavareh. “A review of data quality assessment in emergency medical services.” Open Med Inform J 12 (2018): 19.