AMEH, Jude, OTEBOLAKU, Abayomi, SHENFIELD, Alex and IKPEHAI, Augustine (2025). C3-VULMAP: A Dataset for Privacy-Aware Vulnerability Detection in Healthcare Systems. Electronics, 14 (13): 2703. [Article]
Documents
35896:974647
PDF
electronics-14-02703.pdf - Published Version
Available under License Creative Commons Attribution.
electronics-14-02703.pdf - Published Version
Available under License Creative Commons Attribution.
Download (836kB) | Preview
Abstract
The increasing integration of digital technologies in healthcare has expanded the attack surface for privacy violations in critical systems such as electronic health records (EHRs), telehealth platforms, and medical device software. However, current vulnerability detection datasets lack domain-specific privacy annotations essential for compliance with healthcare regulations like HIPAA and GDPR. This study presents C3-VULMAP, a novel and large-scale dataset explicitly designed for privacy-aware vulnerability detection in healthcare software. The dataset comprises over 30,000 vulnerable and 7.8 million non-vulnerable C/C++ functions, annotated with CWE categories and systematically mapped to LINDDUN privacy threat types. The objective is to support the development of automated, privacy-focused detection systems that can identify fine-grained software vulnerabilities in healthcare environments. To achieve this, we developed a hybrid construction methodology combining manual threat modeling, LLM-assisted synthetic generation, and multi-source aggregation. We then conducted comprehensive evaluations using traditional machine learning algorithms (Support Vector Machines, XGBoost), graph neural networks (Devign, Reveal), and transformer-based models (CodeBERT, RoBERTa, CodeT5). The results demonstrate that transformer models, such as RoBERTa, achieve high detection performance (F1 = 0.987), while Reveal leads GNN-based methods (F1 = 0.993), with different models excelling across specific privacy threat categories. These findings validate C3-VULMAP as a powerful benchmarking resource and show its potential to guide the development of privacy-preserving, secure-by-design software in embedded and electronic healthcare systems. The dataset fills a critical gap in privacy threat modeling and vulnerability detection and is positioned to support future research in cybersecurity and intelligent electronic systems for healthcare.
More Information
Statistics
Downloads
Downloads per month over past year
Metrics
Altmetric Badge
Dimensions Badge
Share
Actions (login required)
![]() |
View Item |