Publication Details
Overview
 
 
grau isel, Dorien Daneels, Sonia Van Dooren, Mary-Louise Bonduelle, Dewan Md Farid, Didier Croes, Ann Nowe, Dipankar Sengupta
 

Chapter in Book/ Report/ Conference proceeding

Abstract 

High throughput screening (HTS) techniques, like genome or exome screening are becoming norms in the conventional clinical analysis. However, classifying the identified variants to be pathogenic, or potentially pathogenic or non- pathogenic, is still a manual, tedious and time consuming process for clinicians or geneticists. Thus, to facilitate the variant classification process, we have developed GEVACT, a Java based tool, designed on an algorithm, i.e. based on the existing literature and knowledge of clinical geneticists. GEVACT can classify variants annotated by Alamut Batch, with a future plan to support for inputs from other annotation software's also. INTRODUCTION With the emergence of new screening techniques, targeted or whole exome and genome screening are becoming standard diagnostic norms in clinical settings to identify the variants for a genetic disease (Ng et al., 2010; Saunders et al., 2012). However, development of bioinformatics solutions for pathogenic classification of the variants still remains a big challenge and henceforth, making the process ponderous for geneticists and clinicians. In this work, we describe GEVACT (Genomic Variant Classifier Tool), a tool for classification of genomic single nucleotide and short insertion/deletion variants. The aim of this study was to design and implement a variant classification algorithm, based on a literature review of cardiac arrhythmia syndromes (Hofman et al., 2013; Schulze-Bahr et al., 2000; Wilde & Tan, 2007) and existing knowledge of clinical geneticists. METHODS The algorithm we propose for GEVACT is based on a published variant classification schema for cardiac arrhythmia syndromes. This approach is based on the yield of DNA testing over a time span of 15 years (1996-2011), between probands with isolated/familial cases, and also between probands with or without clear disease-specific clinical characteristics (Hofman et al., 2013). It proposes two varying approaches: one to classify missense variants and another to classify nonsense and frameshift variants. The algorithm is implemented in two phases: pre- processing and classification. In the pre-processing phase, the annotated tab-delimited variant file (vcf.ann) from the Alamut batch, is refined based on the gene list for the disease-of-interest, so as to reduce the number of variants for the analysis. Filters are applied to look for variants that have already been reported in the Human Genome Mutation Database (Stenson et al., 2003) and in ClinVar (Landrum et al., 2014), or that have previously been detected and classified in an internal patient population. And lastly, the variants are filtered based on their location in the genome and their coding effect, followed by the check for minor allele frequency of the variant in a control population (Sherry ST et al. 2001). Thereafter, in the classification phase, the filtered variants are classified as missense or nonsense and frameshift variants. For missense variants the classification is based on the parameters: amino acid substitution and its impact on protein function (Adzhubei et al., 2010; Kumar et al., 2009), biochemical variation (Mathe et al., 2006), conservation (Pollard et al., 2010), frequency of variant alleles in a control population (ExAC, 2015), effects on splicing (Desmet et al., 2009), family and phenotype information and functional analysis. Whereas, for the nonsense and frameshift variants, it is based on: effects on splicing, frequency of variant alleles in a control population, family and phenotype information and functional analysis. For each parameter, a score is given to the variant, which is subsequently cumulated. Conclusively, based on the cumulative score each variant is classified into one of the five categories: Class I - Non- Pathogenic; Class II - VUS1 (unlikely pathogenic); Class III - VUS2 (unclear); Class IV - VUS3 (likely pathogenic); Class V - Pathogenic (Sharon et al., 2008). RESULTS & DISCUSSION In this study, we report a Java based tool called GEVACT, developed for classification of genomic variants. Input for the tool is an annotated vcf file, while the output depicts the cumulative classification score along with the class label for a variant. The tool was tested on a dataset of 130 cardiac arrhythmia syndrome patients, available at UZ Brussel. The results of the variant classification made by the tool were cross-validated by manual curation, performed by the clinical geneticist. Definitively, the study indicates the tool to be promising but needs to be further validated on datasets from other diseases. In addition to, we are working on the tool to be adaptable for file inputs from other annotation software.

Reference