The GenePlexus webserver enables researchers to predict novel genes similar to their genes of interest based on their patterns of connectivity in human genome-scale molecular interaction networks.
Mancuso CA, et al. (2022) GenePlexus: a web-server for gene discovery using network-based machine learning, Nuc. Acids Res, doi.org/10.1093/nar/gkac335
When a user provides a set of genes to GenePlexus, it trains a custom machine learning (ML) model that captures the patterns of network connectivity of the user’s genes in contrast to other genes in the network. Based on this ML model Geneplexus will:
While many jobs will only take a few minutes to complete, some networks and feature types require training machine learning models that use 10s of GBs of memory, have more that 25,000 features and use thousands of positive and negative labeled genes. Therefore, some jobs may take up to 10 minutes or more and this time can vary due to latency speeds on the cloud server.
You can visit this page to see the results of an example run where GenePlexus was used to train a model and predict novel genes associated with the disease primary ciliary dyskinesia (PCD).
This run used the adjacency matrix representation of the human STRING network. Since the input genes correspond to a disease, the negative genes were selected based on genes associated with other diseases (from the DisGeNet database). The predicted genes are under the “Gene Predictions” tab. It is notable that the gene DNALI1 (dynein axonemal light intermediate chain 1), which was not included in the original gene list, is strongly predicted to be associated with PCD.
The “Similarities to…” tabs show that the model trained for PCD is similar to those trained for other diseases such as ciliopathy as well as models trained for biological processes that deal with dynein complex and arm assembly.
The "Network Graph" tab shows the network connectivity of up to the top 50 genes based on prediction probability score (an example is seen below where blue nodes are previously known associations and orange nodes are novel predicted associations).
The network-based approach implemented in GenePlexus has been extensively benchmarked in
Supervised learning is an accurate method for network-based gene classificationR Liu*, CA Mancuso*, A Yannakopoulos, KA Johnson, A Krishnan
Bioinformatics 36 (11), 3457-3465.