Predicting disease-causing variant combinations for oligogenic diseases
Authors: S. Papadimitriou, N. Versbraegen, C. Nachtegael, J. Aerts, Y. Moreau, S. Van Dooren, A. Nowé, G. Smits and T. Lenaerts
Publication Date: Mar. 2018
Notwithstanding advances in predicting pathogenic variants in rare monogenic diseases, identifying causative variant combinations in disorders where more genes are involved still remains a challenge. As data on digenic diseases accumulates, we can now develop predictive methods that identify pathogenic variant combinations in gene pairs. We present VarCoPP, the first Variant Combination Pathogenicity Predictor, which is an innovative tool for variant assessment in oligogenic disorders. VarCoPP was trained on pathogenic digenic variant combinations from the Digenic Diseases Database (DIDA) and neutral digenic combinations from the 1000 Genomes Project. By combining 500 different trained Random Forest predictors, VarCoPP differentiates pathogenic from neutral digenic variant combinations. After cross-validation the predictor reaches a Matthews Correlation Coefficient of 0.74. VarCoPP ranks the results and provides evaluation scores for each prediction, i.e. a Classification Score (CS), which expresses the likelihood of a digenic combination being pathogenic and a Support Score (SS) that expresses how many RFs agree with that decision. Clinically relevant confidence zones, delimited by minimum CS and SS scores, are also provided that guarantee with 95% or a 99% probability that the prediction is indeed a true positive. Validation with 23 gene pairs related to digenic diseases not yet present in DIDA shows that the method identifies 20 cases correctly, all of which have at least a 95% confidence score. The results of VarCoPP show that the first steps to oligogenic pathogenicity prediction can be taken, expanding our capability to assist clinicians and geneticists in unraveling oligogenic signatures in rare diseases.