PAIPETIS, Alexandros (2025). Predicting Missing Data Values Using Formal Concept Analysis. Doctoral, Sheffield Hallam University. [Thesis]
Documents
36381:1096199
PDF
Paipetis_2025_PhD_PredictingMissingData.pdf - Accepted Version
Restricted to Repository staff only until 11 September 2026.
Available under License Creative Commons Attribution Non-commercial No Derivatives.
Paipetis_2025_PhD_PredictingMissingData.pdf - Accepted Version
Restricted to Repository staff only until 11 September 2026.
Available under License Creative Commons Attribution Non-commercial No Derivatives.
Download (2MB)
Abstract
The aim of this research is to use Formal Concept Analysis (FCA) to predict missing values in
the datasets.
Missing values pose a significant challenge to the accuracy and reliability of data-driven
analysis, affecting workflows and compromising outcomes across various domains. Existing
imputation methods often rely on strong statistical assumptions and lack flexibility,
applicability, and interpretability. These constraints reduce their effectiveness in real-world
scenarios.
To address these limitations, this thesis proposes Fault-Tolerant Formal Concept Analysis as a
solution to enhance the prediction of missing values. FCA organises object-attributes relations
into formal concepts under strict closure conditions. The integration of fault tolerance relaxes
these operators, constructing approximate concepts in which an object or attribute can be
included despite a bounded number of missing associations. These approximate concepts
capture patterns which can be used to predict missing relations. Building on this approach,
this research demonstrates that predicting missing values is indicative of predicting class
labels. This perspective enables Fault-Tolerant FCA to be applied in both imputation and
classification. To develop this technique, the In-Close algorithm was adapted to incorporate
fault tolerance, extending its functionality to these predictive tasks.
Two case studies were adopted to validate the proposed technique. The first utilised datasets
from the widely recognised UCI Machine Learning Repository. The datasets Mushroom, Adult
Census Income, and Nursery were selected due to their diverse characteristics and extensive
application in data analytics benchmarking. The second case study employed the Edinburgh
Mouse Atlas Gene Expression (EMAGE) database, a specialized biological resource that
presents a critical test case due to the subjective assessments involved in its gene expression
annotations. Evaluation of both case studies demonstrated that the proposed technique is
practically effective and capable of addressing real-world data challenges.
The performance of the proposed technique was rigorously evaluated through a series of
experiments. These experiments were designed to assess both the intrinsic effectiveness of
the FCA-based approach and to benchmark its performance against established machine
learning methods. The results show that the proposed FCA technique achieved high accuracy
in predicting missing values, often matching or outperforming the performance of traditional
methods across various contexts. Although the technique may not recover every missing value
in all scenarios, its overall performance remained robust and reliable.
This research contributes to the field of FCA by extending its utility beyond concept discovery.
Although FCA and fault tolerance have previously been explored for handling uncertainty and
iii
noise, their explicit application to missing values prediction, as undertaken in this study,
represents a novel advancement. The empirical findings, combined with FCA’s inherent
interpretability, demonstrate its potential for missing value restoration and highlight its
broader role in modern data science.
More Information
Metrics
Altmetric Badge
Dimensions Badge
Share
Actions (login required)
![]() |
View Item |


Tools
Tools