Data Quality Assurance in the Internet of Things Using Online Machine Learning
Data quality is crucial for sound decision-making in Internet of Things (IoT) applications, but existing tools often lack flexibility and modularity. This study addresses this gap by proposing three key contributions: (1) identifying critical limitations in existing data quality assessment approaches through a systematic literature scoping review, (2) developing an open-source, event-driven software tool called Data Quality Assurance Tool (DQAT) for real-time data assessment in diverse IoT applications, and (3) assessing the feasibility of state-of-the-art machine learning methods for data quality improvement in streaming data scenarios. DQAT's modularity and scalability allow simulation of end-to-end scenarios and integration with real-world applications. Its effectiveness will be evaluated using an agricultural dataset and metrics such as accuracy, completeness, timeliness, consistency, and overall improvement in data quality. This work aims to address the improvement of the data quality in IoT and explain how to unlock its full potential for reliable decision-making.