The UKB and BBR cohorts collectively represent the full spectrum between health and cardiovascular disease. Both cohorts will be analysed using parallel tranSMART data warehouse infrastructures, enabling a comparison between healthy and diseased subjects and integration of high level findings from both cohorts. Thus, we will establish a foundation for translational cardiovascular research in the UK with a detailed data provenance schema and common analytical pipeline.
To achieve this, Unified Medical Language System (UMLS) coding standards will be used to standardize EHR data into official nomenclature. In order to analyze and potentially integrate multiple datasets between tranSMART instances, extensive data curation will be necessary, prior experience from the eTRIKS and AETIONOMY IMI projects will mitigate the risks associated with this process. The data will be made available to other applications, via a flexible tranSMART API, including a data constructor feeding a machine learning software layer called Ada. Using Ada's powerful machine-learning algorithms we will stratify patients and by adapting the BrainMesh package for heart data, we will investigate the MR images collected by UKB and BBR.
Docker instances will help to create reproducible and portable data warehouse instrances, while Git versioning will provide clarity in versioning. Software and datasets will be made available in public repositories where appropriate or via Zenodo and referenced by a top-level unique Digital Object Identifiers. Application of FAIR (Findable, Accessible, Interoperable and Reproducible) principles, within the broader scope of each resource access conditions will guarantee the sustainable long-term use of these tools and datasets for future researchers. Allowing us to create a critical mass of highly skilled scientists dedicated to health data research.
Funding: UKRI – MRC.