We are a computational biology group interested in developing novel machine learning algorithms for genomics. The driving application for our research is to interpret Genome-wide Association Study (GWAS) loci associated with psychiatric traits, in particular drug addiction. We are working on extending and implementing state-of-the-art statistical methods for analyzing massive genomic data sets. On the biological side our ultimate goal is to develop computational tools to interpret all the non-coding variation in the human population.
There are two major types of functional non-coding elements in the human genome - non-coding RNAs and gene regulatory elements. We have published a number of studies predicting and annotating non-coding RNA genes, inlcuding microRNAs, microRNA-stars, Piwi-interacting RNAs and CRISPRs, often in collaboration with experimental groups.
We have also worked on several computational methods of predicting gene regulatory elements from DNA sequence, including using evolutionary conservation across species and selective constraint within species, through the use of computational techniques from population genetics. We are also very interested in novel parameter estimation methods for Hidden Markov Models and other related graphical models.
Our current research thrust integrates several technical tools for analyzing massive data sets, including spectral learning, randomized numerical linear algebra computations, data stream algorithmics and regularization methods for high dimensional statistical problems. Our major application is nicotine addiction GWAS, in close collaboration with a number of experimental groups at Rutgers.