Manuals

A manual is available for the software you are using:

  • Endeavour web: the web server of Endeavour: manual.
  • Endeavour: the java client: manual.

A manual is also available for the Endeavour batch mode (command line): manual.

  Lectures
  • Gene prioritization through genomic data fusion (18th May 2009 - Institut Curie, Paris, France)
    Link: presentation.

  • How to make the best of your aCGH data? (9th-11th March 2009 - Genomic Disorders - Wellcome Trust Conference Center, Hinxton, United Kingdom)
    Link: presentation.

  • Gene prioritization (18th November 2008 - Master Bioinformatics - Université de Liège, Liège, Belgium)
    Link: presentation.

  • Optimization of a genetic screen (15th-17th September 2008 - EURO-CBBM - Consiglio Nazionale delle Ricerche, Rome, Italy)
    Link: presentation.

  • Flash demonstration of Endeavour (Java application - ISMB 2007 - Vienna, Autria)
    In this demonstration, you will learn how to start the java client from the website, how to load lists of genes, how to train the model and scores the candidates. We demonstrate the powerfulness of our approach by applying it to the Usher syndrome and more precisely to the recent discovery of the DFNB31 gene as a Usher syndrome causing gene. The demo is 20 minutes long, start it now !
    Link: presentation and demonstration.

  • Video of the lecture by Professor Yves Moreau (MLSB 2007 - Evry, France)
    The first 30 minutes of the talk present the general principles of Endeavour. The last 20 minutes are intended for a machine learning audience and will be less relevant to biologists using Endeavour. The lecture is fully available on the video lecture website.
    Abstract: The overwhelming amount of biological data makes the assignment of candidate genes to diseases and biological pathways a formidable challenge. We present ENDEAVOUR, a generally applicable computational methodology to prioritize candidate genes based on their similarity to case-specific reference gene sets. Unlike previous methods, ENDEAVOUR is capable of flexibly utilizing multiple data sets from diverse sources. It allows the modular incorporation of de novo generated data sets and integrates distinct prioritizations into a global ranking by applying order statistics. We first validate the overall performance in a statistical cross validation of 29 diseases and 3 biological pathways. We validate a novel candidate for DiGeorge syndrome in a zebrafish model and present several new candidates for congenital heart disease. We extend the basic ENDEAVOUR methodology using data from multiple species (human, mouse, rat, drosophila and C. elegans). We also present an alternative machine learning methodology for gene prioritization using kernel methods for novelty detection that outperforms our previous results.
    Link: lecture.

  FAQ

General

  • What is gene prioritization?
    Gene prioritization is a process in which a list of candidate genes is analyzed. The goal is to give priority to interesting genes while discarding the non interesting ones. In our case, we are mainly interested in disease causing genes and therefore we would favor the genes that are likely involved in the disease of interest. In practice, the result of our approach is a ranking of the candidate genes with the more promising candidates at the top.

  • When do I need gene prioritization?
    Basically, anytime you have a list of candidate genes from which you want to select the most promising genes for further validation. One example is the comparative analysis of gene expression for a disease tissue and a reference tissue. This gives rise to a list of genes differentially expressed between the two conditions. One way to select the most promising genes among this list is to use Endeavour. A second example is the use of the array CGH technology for a patient with a known disorder but without diagnosis (known loci are normal). The result of the analysis, if successful, is a genomic region that is though to harbor the disease causing gene. One way to optimize the search is to start by validating the genes highly ranked by Endeavour.

  • What is the difference between the Java client and the web client?
    The core of the two applications is common and a prioritization performed on the two applications return the same results. On the one hand, the web server is intended for users with a limited knowledge in computer science. It is easy to use and the wizard help the user to run a prioritization in 4 steps. This tool runs on almost any browser and is platform independent. On the other hand, the Java server is intended for advanced users who want to tune their prioritization. Indeed, several options are only available in this application (e.g., adding your own private expression data). Due to the presence of these supplementary options, running a prioritization is less straightforward if you have a limited knowledge of computer science but a manual is there to help you.

  • What is the difference between the academic version and the commercial version?
    By default, there is no conceptual difference between the two versions. Off course every commercial agreement is different and can contain specific conditions (e.g., the setup of a secured connection between the client and the server to insure the privacy of the transfered data).

  • Why do I need a training set of genes?
    Our approach relies on similarity between any candidate and the biological process under study (e.g., a disease). A simple way to modelize a biological process is to use all the genes known to play a role in that process (i.e., the training genes). Endeavour uses the training genes to build a model of the process of interest and will then compare the candidate genes to that model. Without training genes (and therefore without a model), it is thus impossible to prioritize the candidate genes.

  • How small/large can my training set be?
    The most important property of your training is homogeneity. It should describe precisely the biological process you are interested in. For example, for our benchmark, we have used all the disease genes reported in OMIM which gave us on average 20 genes per disease. Off course, this approach might sometimes result in a very large set and one might consider building several training sets from the original large set (each subset would then correspond to a subprocess). In short, the size of the training set does not matter as long as the homogeneity is good, however (i) a very large training set (more than 100 genes) will probably not be homogeneous and (ii) a tiny set (4 genes or less) will likely not contain enough information to produce reliable results.

  • How can I estimate the homogeneity of my training set?
    The simplest way is to perform a leave-one-out cross-validation on that training set. The procedure is easy, each gene from the training set is, in turn, left out and the model is trained using the remaining training genes. Then, the left-out gene is scored against the model together with 99 genes randomly selected from the genome. The position of the left-out gene among the 100 candidates is recorded. As the procedure is repeated for every training gene, we get the positions for all the genes, which gives us an estimate of the quality of the training set as well as which data sources are informative. Unfortunately, this procedure is not yet available through our online tools but only through the batch mode.

  • How small/large can my candidate set be?
    Depending on which tool you are using, the procedure differs but you can always prioritize the whole genome. For the web client, the 'Full genome' checkbox allows you to prioritize the full genome and to receive the results by e-mail. If you are manually entering a large candidate set, only the top 200 genes will be displayed in the graphical user interface. For the Java client, the 'Full genome' option, available from the 'Tools' menu, allows you to score the full genome and once again to receive the results by e-mail. Depending on the models you are using, the scoring of a manually defined training set of 1500 genes can still be achieved, however using more than 1500 genes can produce Java memory errors. In case, you would better score the genome and apply a filter afterwards.

  • What reasonable threshold can I use for the final p-values?
    We recommend not to use the p-values to make decision since they are not exactly p-values (they do not fulfill all p-values properties). They represent the probability that a candidate gene would obtain these ranks by chance, but they are dependent on the number of candidate genes considered. As a result, there is no reasonable threshold for p-values and we advice to consider that Endeavour produce a ranking of the candidate genes from the most promising ones to the less promising ones. We are well aware of that limitation and we are currently working on that problem.

  • Is this approach limited to human?
    No, from version 2.44 on (January 2008), you can perform gene prioritization for human, mouse, rat, fly and worm. Of interest, a zebrafish version is also under development.

  • How can I use Endeavour in batch mode?
    We also provide a batch mode (meaning that you can use Endeavour via the command line without the GUI). However, please notice that you still need an active internet connection. Every action possible from the GUI is also available in batch. Furthermore, you can run the cross-validation as we did to benchmark our algorithm. Browse the manual for further informations.

  • Can I save my work?
    Saving your work is critical and several options have been made available to help researchers doing so. For the web server, the export panel (located at the bottom) allows you to export the final results in 'xml' format (for further reloading) or in 'csv' format (for further analysis). The sprint-plot figure can also be saved in standard picture format ('png','gif' and 'bmp'). For the Java server, you can save your gene sets (training sets or candidate sets) using either the 'xml' or the 'text' formats. To do so, you can use the 'Save' button located below the gene set. The files produced can be loaded in the application by use of the 'Load' button located next to the 'Save' one. Please notice that the use of the 'xml' format will result in a faster loading. You can also save the whole model using the 'xml' format here too for further reloading (Menu 'File' and option 'Save Model'), the saved model contains the training set, the candidate set, the models and, if available, the results. Like in the web server, results can be exported as 'cvs' file for further analysis and the sprint-plot figure as a picture ('png','gif' and 'bmp').

  • What do the sprint-plot colors mean?
    For the Java server, colors are given to genes whose p-values are smaller than a given threshold (currently 0.005). For the web server, colors are given to the top 15 genes. Notice that the colors are there to help the lecture of the sprint-plot but they do not represent the quality of the p-values.

  • Why some sprint-plot boxes are crossed?
    A crossed box can either be crossed by a red line or by a black line. A black line means that this gene has no information for that data source. Genes with missing information are usually located at the end of the rankings. A red line means that the gene has a maximum dissimilarity with the model, i.e., there is some information for that gene but the score obtained is the maximum score possible meaning that the gene has nothing in common with the model.

  • How do I include my own expression data set?
    You can include your own expression data sets or your own probability based data sets. Your own expression data set should be the expression profiles over a range of conditions for a large number of genes. The data should be stored in a database and you have to configure Endeavour so that it can use this data for gene prioritization. At the database level, you have to create two tables, the first one is made of a single column and contains the names of the different conditions. The second table has two columns and contains the expression profiles of the genes. A profile is a pipe separated array of values and the gene identifiers should come from EnsEMBL for human, mouse and rat, from FlyBase for fly and from WormBase for worm. You have then to configure Endeavour. To do so you have to open the 'Global properties' panel (under the menu 'Edit') and browse the 'Local expression database' tab. Here you can manage all your local expression sets. To add a new set, use the 'Add' button, and fill in the database information (url, driver, user and password). You also have to choose a unique name for that model and select the species it is valid for. The last two fields should contain the SQL queries needed to collect the gene expression profiles and the condition names. They should respectively look like 'SELECT * FROM my_table WHERE gene_id IN ();' and 'SELECT * FROM my_table_conditions;'. In order for you to use that model, you might need to restart Endeavour.

  • How do I include my own probability based data set?
    You can include your own expression data sets or your own probability based data sets. Your own probability data is simply the a priori probability for a gene to be a gene of interest. It is a probability so the higher the better. As for the expression data set, you can store the results in a database, but you also have the possibility to store them in a simple text file. In both cases, the gene identifiers should come from EnsEMBL for human, mouse and rat, from FlyBase for fly and from WormBase for worm. If you opt for a database solution, the single table has two columns, one for the gene identifier and one for the probability. In case you do not want to use a database, you can create a text file with the same data: one gene identifier, a tabulation and the corresponding probability per line. You have then to configure Endeavour. To do so you have to open the 'Global properties' panel (under the menu 'Edit') and browse the 'Local precomputed database' tab. Here you can manage all your local probability based sets. To add a new set, use either the 'Add [DB]' or the 'Add [File]' buttons. For a database based model, fill in the database information (url, driver, user and password). You also have to choose a unique name for that model and select the species it is valid for. The last field should contain the SQL query needed to collect the probabilities. It should look like 'SELECT * FROM my_table WHERE gene_id IN ();'. For a file based model, you have to choose a unique name, the species it is designed for and to browse to the file to be used. Please notice that if you move the corresponding file, the model will not work anymore. In order for you to use that model, you might need to restart Endeavour.

  • Can I get the endeavour data?
    Genomic data is the core of our system. Collecting and updating that data is a challenging and time consuming task. Nowadays, we are collecting data from more than 20 different databases for 5 different species. If you want to get access on that data, please contact Prof. Moreau in order to find an agreement.

  • How is that database xxxxxx is not included in the tool?
    If that database contains some specific information not already present via other databases and if that database is freely accessible, you might consider dropping us an email. We will fully consider your request and do our best to include that data source for the community.

  • I have a problem related to Endeavour, what can I do ?
    The simplest thing is to send us an e-mail that includes as many details as possible regarding your problem. For instance, you should mention your operating system, the version of Java you are running, the tool you are using (web server or Java server), as well as the complete description of the problem encountered. An helpful piece of information is the content of the Java console, it usually contains much more details than the Endeavour console. If the Java console is not present, you should check you Java options panel and force Java to show the console. You might also include in your email the related gene sets if suitable. We will come back to you as soon as possible.

Technical

  • [Java server] Why am I asked to trust a Thawte certificate when Endeavour starts?
    This is perfectly normal. You need to approve this certificate so that Endeavour can start, it certifies that we are the bioinformatics group of the department of electrical engineering of the Katholieke Universiteit Leuven and that our program is not malignant. If you check the box titled 'Always trust', you won't be asked anymore in the future.

  • [Java server] I am unable to start Endeavour, system says 'Missing signed entry in resource: mysql.jar' (or another jar file).
    Your Java installation is probably too old and can not handle our Thawte certificate. You need to approve this certificate so that Endeavour can start, it certifies that we are the bioinformatics group of the department of electrical engineering of the Katholieke Universiteit Leuven and that our program is not malignant. You should update your Java installation : from Java 1.5.0 (build 1.5.0_06-b05) on, all versions have been successfully tested.

  • [Java server] When started, the software says that all data sources are not available.
    Endeavour uses the SOA protocol to exchange data with our server. This error means that, for security reasons, your system does not allow Endeavour to open a SOAP connection trough HTTP on TCP port 80. Most frequently this is due to the use of a firewall/proxy. You can either allow such requests by changing the parameters of your firewall/proxy or configure Endeavour so that it uses the proxy (see next question).

  • [Java server] How can I configure Endeavour to use my proxy?
    Endeavour can use a proxy to connect t our server only if that proxy is not password protected in case there is not other solution than using the web server. To configure endeavour, you have to know the name of the proxy (e.g., myproxy.mydomain.com or an IP address) and the port used (e.g., 3128 or 8080). Then you have to edit the configuration file of Endeavour (the 'biovec.conf' file) and add the following lines: HTTP_PROXY_HOST=myproxy.mydomain.com and HTTP_PROXY_PORT=3128. The 'biovec.conf' file is usually located next to the launcher file ('endeavour.jnlp'). Save the configuration file and now Endeavour will use the proxy settings.

  • [Java server] I am unable to save my work, Endeavour says format not accepted.
    Make sure you are using a complete file name with extension (e.g., 'myWonderfulPrioritization.xml') even if you have selected the XML filter in the browsing windows. The possible extensions are 'xml', 'bin', 'set', 'dat' and 'txt'.

  • [Java server] Endeavour freezes when started.
    Your Java installation is probably outdated, we advice you to update your installation with the latest version of Java from the Sun website.

  • [Java server] Endeavour freezes after several minutes of correct behavior.
    You are probably encountering a Java 'out of memory error', you might try to score too many genes at the same time (in case you would better use the 'Full genome' service) or you have too many models opened at the same time. Anyway, this means that Java is running out of memory, in case you have less than 512 Mo of RAM, you would better use a more recent machine with at least 1 Go of RAM.

  • [Java server] I am not able to use some of the models.
    If you are able to use the other models, it probably means that there is something wrong with our server. Please contact Léon-Charles Tranchevent.

  • [Java server] Using the 'Full genome' service, I receive an email that links to an empty file.
    It probably means that there is something wrong with our server. Please contact Léon-Charles Tranchevent.

  • [Web server] The buttons are not working properly, nothing is happening when they are clicked.
    Make sure that your web browser allows Javascript. You can do that by editing the preferences/options of your web browser. We have tested our application with the most popular web browsers on the market but if you feel that this is bowser specific, please contact Léon-Charles Tranchevent.

  • [Web server] I have launched a prioritization but it takes forever to complete.
    Depending on the load of our server, a prioritization can queue several hours before being executed. We advice you to wait at least for 3 hours before reporting the problem to us. We will do our best in a near future to improve our IT infrastructure so that this kind of problems becomes less frequent.

  • [Web server] Application said I have received the results but the 'Results' panel is empty.
    It probably means that there is something wrong with our server. Please contact Léon-Charles Tranchevent with as many details as possible.