data and software
In Song and Chen (Genome Biology 2015) we developed a software tool, Spectacle, to segment the human genome into chromatin states (enhancer, promoter etc.). In the first link above are chromatin state annotations for the ENCODE Tier 1 and Tier 2 cell lines. In the second link, are chromatin state annotations for all Roadmap Epigenomics Project samples.
In Chen et al. (Genome Biology and Evolution 2010) we collected position weight matrices for yeast transcription factors from ChiP-chip data from the Young lab and protein binding microarray data from the Bulyk lab. The combined data set is above. Transcription factor binding sites are also accessible in the link above.
Raw sequencing data from Song et al. (Genome Biology and Evolution 2014) on variation in piRNA and transposable element content in strains of D. melanogaster as well as Ha et al. (BMC Genomics 2014) from human testis samples.
Spectacle-Tree is an extension to the Spectacle software program that allows us to process multiple samples related by a known tree. Example data sets would be cell types related by a known developmental lineage or human individuals related by a known family tree. The software can also analyze a single cell type or a cell type assayed in two conditions. From Zhang, Song, Chen, Chaudhuri (NIPS 2015) and Song, Zhang, Chaudhuri, Chen (manuscript, 2015).
Spectacle is a program for annotating chromatin states from histone mark data (e.g. from the ENCODE project). It uses spectral learning for inference of the parameters of a Hidden Markov Model instead of the more commonly used expectation-maximization algorithm. From Song and Chen (Genome Biology 2015).
MixMir is a program for finding microRNA motifs using gene expression and 3' UTR sequence data. From Diao et al. (Nucleic Acids Research 2014).
DEEGEP is a simple statistical method for testing for natural selection from a high dimensional (i.e. multiple population) allele frequency spectrum. The method uses a neutral control defined by the user (e.g. intergenic sites) to learn a background neutral distribution. This is accomplished by approximating the density with a finite expansion of Gegenbauer polynomials, inspired by Kimura's classic result in theoretical population genetics. One can then test for natural selection on a second set of putatively functional sites (e.g. microRNAs) by comparing the multi-population allele frequency spectrum to the neutral control.
In Roy et al. (Journal of Computational Biology, 2012) we presented a simple algorithm for filtering out erroneous mate-pairs/pair-end reads and using the remaining pairs to produce scaffolds. The method does not attempt to solve the general genome assembly problem but rather takes as input a set of contigs built by another contig assembly program, such as Velvet (Zerbino and Birney) or Abyss (Simpson et al.).