Personal tools
Views

CHD:Gene hunting example

From CHDWiki

Jump to: navigation, search

Contents

[edit] Methods

[edit] Prioritization of candidate genes for CHD loci identified by array CGH

Fifteen well-delineated genome regions that were found to be imbalanced in CHD patients were selected for further analysis: 10 from a previously published study (Thienpont et al., 2007) and 5 imbalances detected during a follow-up study. Seven regions that contain a known CHD gene served as positive controls (Figure 1). Of the remaining 8 regions, we excluded 2 patients carrying multiple chromosomal imbalances from further analysis, while the 6 other chromosomal imbalances were used to identify novel putative CHD genes. This strategy has previously been successful, for example, in studies of the recurrent 22q11 deletion that led to the identification of TBX1 mutations causing CHDs and DiGeorge syndrome (Yagi et al., 2003). The same example also demonstrates the challenges associated with this strategy, since many other candidate genes within the deletion interval had to be investigated in model organisms before identifying TBX1 – thus requiring much time and resources for the identification (Scambler et al., 2000). Therefore, to enhance candidate gene selection in our regions, we applied the gene prioritization strategy (Endeavour) which is incorporated in CHDWiki.

[edit] Screening of candidate genes for CHD by expression analysis in zebrafish development

The expression was analyzed by a standard method of colorimetric whole-mount in situ hybridizations using DIG-labeled antisense RNA probes. RNA probes were produced by in vitro transcription from Sp6 or T7-tagged DNA that generated by PCR amplification from a cDNA pool. This pool was reverse transcribed from an RNA mix that was extracted using TRIzol (Invitrogen, Paisley, UK) from zebrafish embryos at the 6 different defined developmental stages (12, 18, 22, 30, 36 and 48 hpf). Nested PCR was applied when PCR using a single primer pair was unable to produce a single product (external primers, table 2). When no product was obtained after nested PCR, we assumed that the transcript was not present in the cDNA pool, and that the gene was therefore not expressed at the relevant stages. The 6 known CHD genes and 25 high-ranking genes selected from the 6 regions containing no known CHD gene were further analyzed in zebrafish (Figures 1 and 2). Their expression in the developing zebrafish heart was evaluated at 3 to up to 6 different developmental stages that are key for cardiac development: 12 to 18 hours post fertilization (hpf): heart cell specification and initiation of migration to the midline, 22hpf: at the midline, heart cone stage, 30 to 36 hpf: heart elongation and looping, initiation of chamber specification and valve formation, 48 hpf: valve formation. The expression was analyzed by a standard method of colorimetric whole-mount in situ hybridizations using DIG-labeled antisense RNA probes. RNA Probes were produced by in vitro transcription from Sp6 or T7-tagged DNA that was PCR amplified from a cDNA pool. This pool was reverse transcribed from an RNA mix that was extracted using TRIzol (Invitrogen, Paisley, UK) from zebrafish embryos at the 6 different defined developmental stages (12, 18, 22, 30, 36 and 48 hpf). Nested PCR was applied when PCR using a single primer pair was unable to produce a single product. Primer pairs are listed in supplementary table 2. PCR products were sequenced to verify if truthful probes were generated. When no product was obtained after nested PCR, we assumed that the transcript was not present in the cDNA pool and that the gene was therefore not expressed at the relevant stages. TBX1 expression in the developing zebrafish has often been described (Stalmans et al., 2003; Zhang et al., 2006), so that we did not reanalyze it in this study. All experiments were carried out using the wild-type AB zebrafish line.

[edit] Results

[edit] Prioritisation of CHD candidate genes

First, all genes from the selected regions were prioritized. In 6 regions containing a known CHD gene, the gene prioritization tool readily ranked at least one known CHD gene first among all other imbalanced genes (Figure 1). In a second step, genes affected by the six selected imbalances not containing known CHD genes were prioritized in an attempt to find novel CHD genes For each patient we analysed whether a known CHD gene was affected by his imbalance, explaining the CHD observed in the patient. If so, the patient was added to the group that serves as a positive control (Figure 1), else he was added to the group where we attempted gene hunting (figure 2).

Imbalances in patients from figure 2 were further analysed using a more sophisticated Endeavour algorithm than what was recently published: since different developmental mechanisms contribute to cardiac development, a training set was constructed for each putative developmental mechanism involved in cardiac development: left-right asymmetry establishment (LR), vascularisation, the primary and secondary heart field and neural crest cells. A dataset containing genes shown to cause CHDs upon haplo-insufficiency (both syndromic and non-syndromic, as of August 2006) was constructed to enrich our results for dosage sensitive genes implicated in human cardiac defects and to enable scoring of developmental pathways that are not explicitly annotated in CHDs. A dataset containing all genes was assembled to accommodate potentially undiscovered crosstalk between the different developmental processes that could generate a significant signal.

As a negative control, we also constructed training sets for processes that are unrelated to cardiac development: Parkinson disease, atherosclerosis and inflammatory bowels disease. All datasets are available here.

Candidate genes were scored using all constructed training sets except for the LR set, which was only used when defects suggestive of LR abnormalities were reported in patients with similar imbalances or no patients with similar imbalances were described (del22q12.2 and delXp22).

Using leave-one-out cross validations (LOOCVs), receiver-operator characteristics (ROC) curves were constructed. The area under the ROC curves (AUC) was measured to enable assessment of the performance of each training set (performance = 1 - AUC of ROC). An AUC around 0.5 is equivalent to random ranking of candidate genes, while an AUC verging upon 1 is equivalent to first-place rankings for most training set genes.

We arbitrarily set a ROC AUC threshold of 0.6 for data sources to eliminate data sources that do not contain (sufficient) information and contribute merely noise to the predictions.

Extra data sources were added and the algorithm structure was adapted, as described in a separate manuscript.

We demonstrated that all training sets were able to efficiently prioritize genes known to be involved in the related developmental field or process. We also calculated the median of the p-values of the left-out genes from the LOOCV (supplementary table 1).

Supplementary table 1: performance of training sets used to prioritize genes for their involvement in CHDs and cardiac development
Primary heart field
0.002
Secondary heart field
0.003
Dosage-sensitive CHD genes
0.054
Cardiac Neural crest cells
0.12
Left-right asymmetry establishment
0.5
Vascularisation
0.3
All genes in the above sets
0.008


A mixed set of criteria was used to select genes for further analysis:

  • From each candidate list, the 2 highest ranking genes from the overall prioritization were selected.
  • If the estimated penetrance of CHDs in similar aberrations was high (>25% or >50%), respectively the 3rd or 3rd and 4th ranking genes from the overall prioritization were selected
  • If the p-value for the highest ranking gene in any prioritisations usind an individual training sets was higher than the median p-value of the LOOCV of this training set (supplementary table 1), this first-ranked gene was also selected.

In this way, we obtained a list of 25 genes. From this list, we eliminated 1 gene (TP73) because it also ranked in the top 10% in every prioritization that was based on the negative control training set. We moreover eliminated 3 genes that were shown not to cause CHDs upon mutation in at least 2 independent reports (CHEK2, RS1 and CDKL5).

In order to further analyze these rankings we investigated the expression of all genes in the developing zebrafish heart.

supplementary table 2: primer sequences used for probe synthesis, and results of (nested) PCR on the cDNA pool'
Region HGNC name Human Ensembl ID Zebrafish Ensembl ID
forward reverse external forward external reverse probe
1p36.33 DVL1 ENSG00000107404 ENSDARG00000010515 ATCGCAGGGATGCTAGAAAA TAATACGACTCACTATAGGGAGCTGAAGCTAAGGCTGTGC CGGATGTTGTTGACTGGTTG GTGAGTTCTGGTGGGACGTT YES
1p36.33 HES4 ENSG00000188290 ENSDARG00000056438 gacacaaacgtcctcagcaa TAATACGACTCACTATAGGGcgcaagtctaccagggtctc GCCAGCCGATAATATGGAGA AACACTTGCCCAACAAAAGG YES
1p36.33 SKI ENSG00000157933 ENSDARG00000008034 GAGGAAATCCAGGTGGACAA TAATACGACTCACTATAGGGCTCAGACGTCCTGCTTCACA CCATTGCTGCCTTCAAAAAT CGTTCCTTCAGCAGGTCTTC YES
1p36.33 SKI ENSG00000157933 ENSDARG00000042151 ggtgagaccatctcgtgctt TAATACGACTCACTATAGGGgtcatcgtgtttgggcttct GCAACACGATTTGCTCTTCA GAAGGCTGACGGTCTCTGTC YES
4q34 HAND2 ENSG00000164107 ENSDARG00000008305 TACCATGGCACCTTCGTACA TAATACGACTCACTATAGGGCAGATGGCCTCATTTCGTCT ATGGGGAGACAGTTGGTGTC GCGGGAAATTGCACATAAAT YES
4q34 HMGB2 ENSG00000164104 ENSDARG00000029722 GGCCGCGGcgggaggaacacaagaagaa CATTTAGGTGACACTATAGAAatcctcgtcctcctcatcct

YES
4q34 HMGB2 ENSG00000164104 ENSDARG00000053990 tatgcgttcttcgtgcagac TAATACGACTCACTATAGGGatcctcgtcctcctcatcct AGACGTGAACAAACCCAAGG CATCTTCCTCGTCCTCCTCA YES
4q34 VEGFC ENSG00000150630 ENSDARG00000003028 GGCCGCGGgcaaaccttgctttcgagtc CATTTAGGTGACACTATAGAAtgatgttcctgcactgaagc

YES
14q22q23.1 ARID4A ENSG00000032219 ENSDARG00000043873 GGCCGCGGttcaagctcttccgattggt CATTTAGGTGACACTATAGAAcgtcctgctcttcatcatca

YES
14q22q23.1 BMP4 ENSG00000125378 ENSDARG00000019995 a probe was already available



YES
14q22q23.1 DLG7 ENSG00000126787 ENSDARG00000045167 CCACCTGTACCCAACAAACC TAATACGACTCACTATAGGGCTCAGCATCCATCACTCCAA AGACAAACCCCAGGAGGACT AACACCATTCTGGGAACAGC YES
14q22q23.1 OTX2 ENSG00000165588 ENSDARG00000011235 CCCCACAACCATCTTTAGCA TAATACGACTCACTATAGGGGAAGTGGAACCAGCATAGCC

YES
17q25.3 no ID ENSG00000187603 NA GGCCGCGGtgtgcctggttattccgaac CATTTAGGTGACACTATAGAAgcattcactggttatgctggt atctgatcgcaaggaggttg cgtcagcctttccctgatac NO
17q25.3 AATKa ENSG00000181409 ENSDARG00000060546 GGCCGCGGctgctgcaccttcacaaaaa CATTTAGGTGACACTATAGAAgtcttcctctgcctcactgg caggtggtggtgaaggaact cgaagtggtggatgacattg YES
17q25.3 AATKb ENSG00000181409 ENSDARG00000062815 gcaattccacagatcatcca CATTTAGGTGACACTATAGAAcggctgattccctaactcaa AGCCCTACTGCTCCTCTTCC TGGAGGTGGGTGTACTGTCA YES
17q25.3 ACTG1 ENSG00000184009 NA



NO
17q25.3 CSNK1D ENSG00000141551 ENSDARG00000006125 GGCCGCGGactgggatggagcgtgaac CATTTAGGTGACACTATAGAAgcttcaccatcagaaaggaac

YES
17q25.3 FOXK2 ENSG00000141568 ENSDARG00000011609 GGCCGCGGagggctcggtggatgttag CATTTAGGTGACACTATAGAAtttcagactgcgagttgtcg

YES
17q25.3 FOXK2 ENSG00000141568 ENSDARG00000030583 ccgtccggtgtttctctacc TAATACGACTCACTATAGGGgatggtcagcggggagat GTGTCCGTTCCGTCAGATTC CGCAGGTGTGTGATGTTCTC YES
17q25.3 MAFG ENSG00000197063 ENSDARG00000018109 ACTGAAGGTGAAGCGAGAGC TAATACGACTCACTATAGGGTGTGCATTATGACCGTGCTT CAAACAAGGCAAACAAAGCA TTACAAAATGTTCTCCGTTTGTG YES
22q12.2 THOC5 ENSG00000100296 ENSDARG00000038290 GGCCGCGGcttacagcctggactgcaca CATTTAGGTGACACTATAGAAcatacggatgtccgaggtct

YES
22q12.2 EWSR1 ENSG00000182944 ENSDARG00000020258 GGCCGCGGcgtcaaccagcaacactcag CATTTAGGTGACACTATAGAAgaagccacctctgtctccag taatgctgcttcagccacac aagtcagcgacttcctccaa YES
22q12.2 EWSR1 ENSG00000182944 ENSDARG00000039180 cagactcagtacggccaaca TAATACGACTCACTATAGGGaacctgacaacctgccaatc CTACCTCAACAGCACCAGCA CCCATATCACGATCCATTCC YES
22q12.2 KREMEN1 ENSG00000183762 ENSDARG00000062579 aacttctgcaggaacccaga TAATACGACTCACTATAGGGtgcggtgatattcacgatgt agaccagtctgcaaggagga acaacagaaaaccccagtgc YES
Xp22 RAI2 ENSG00000131831 ENSDARG00000060574 GGCCGCGGcttgtgccctatcctgtggt CATTTAGGTGACACTATAGAAtgacgatctcgatgtttcca tccttcaacatgcactgctc aaaatagcaggcatggcatc YES
Xp22 REPS2 ENSG00000169891 ENSDARG00000027794 AGTTCTGCACTGCCTTCCAT TAATACGACTCACTATAGGGGAGGCAACCAGCCACTTTTA CGACCTCAATGCTCTCATCA CAGAGTAGGGCGTGGTCAAT YES

[edit] Expression screening of highly ranked CHD candidate genes

We next set up a gene expression screen in developing zebrafish, hypothesizing that genes expressed in the developing zebrafish heart are more likely involved in human heart developmental disorders. Analyzing the expression of the 8 aforementioned known CHD genes as a positive control demonstrated that three of them, NKX2-5, TBX1, and NOTCH1 have a specific expression in the developing zebrafish heart (i.e., a restricted expression pattern that includes the developing heart). EHMT1, ATRX, CREBBP, NSD1, and FBN2 are genes that cause syndromic CHDs and appeared to be expressed throughout the developing zebrafish without a specific expression restricted to the developing zebrafish heart. This pattern is consistent with the multisystem involvement of the syndromes caused by mutations in these genes. Nevertheless, genes that cause syndromic CHDs when mutated (such as TBX1, which causes DiGeorge syndrome) can have specific expression in the developing heart. In 4 out of the 6 regions, no gene showed specific expression in the developing heart, while some showed ubiquitous expression. This suggests that the unknown gene is part of the syndromic subset: they are causing CHDs as well as the other abnormalities that cause the syndromic phenotype observed in the patients. In the 2 remaining regions, a single gene (the highest-ranking) was found to be specifically expressed in the developing heart. As illustrated on its CHDWiki gene page, HAND2 in 4q34.1 is expressed at 12 hours post-fertilization (hpf) in the lateral plate mesoderm where the cardiac cells become specified, and later in development (22-48hpf) it is expressed in the primitive heart and the pharyngeal arches that give rise to the outflow tract and large vessels in humans. BMP4 in 14q22 is specifically expressed at 18hpf in the specified cardiac cells (Figure 2) as they migrate towards the midline, later in the developing heart and at 36hpf at the regions where valve development initiates.

[edit] Conclusions

In the CHDWiki, advanced analysis capabilities were exploited for gene hunting: genes located in indels found in CHD patients (Figure 1 and Figure 2) were prioritized using the integrated prioritization tool. Genes linked to CHDs and affected by an indel were effectively prioritized. For regions not containing known CHD genes, high ranking genes were further analyzed for their expression in developing zebrafish embryos. Amongst those genes, BMP4 and HAND2 showed a specific expression in the developing zebrafish heart, suggesting that reduced dosage of these genes causes the CHD observed in the patients. Additionally, both genes are known to be required for cardiac development in a dosage-dependent manner in mice (Goldman et al., 2006; McFadden et al., 2005). Figure 1:patients carrying imbalances encompassing known CHD genes and expression these CHD genes in the developing zebrafish heart Figure 2: patients carrying imbalances not encompassing known CHD genes and expression of high-ranked genes in the developing zebrafish heart