Rutgers University School of Arts and Sciences Division of Life Sciences

data and software


ENCODE cell line chromatin state annotations

Roadmap Epigenomics Project sample chromatin state annotations

In Song and Chen (Genome Biology 2015) we developed a software tool, Spectacle, to segment the human genome into chromatin states (enhancer, promoter etc.). In the first link above are chromatin state annotations for the ENCODE Tier 1 and Tier 2 cell lines. In the second link, are chromatin state annotations for all Roadmap Epigenomics Project samples.

Yeast Transcription Factor Motifs

Yeast Transcription Factor Binding Sites

In Chen et al. (Genome Biology and Evolution 2010) we collected position weight matrices for yeast transcription factors from ChiP-chip data from the Young lab and protein binding microarray data from the Bulyk lab. The combined data set is above. Transcription factor binding sites are also accessible in the link above.

PiRNA sequence data from 16 Drosophila melanogaster strains

PiRNA sequence data from 3 human testis samples

Raw sequencing data from Song et al. (Genome Biology and Evolution 2014) on variation in piRNA and transposable element content in strains of D. melanogaster as well as Ha et al. (BMC Genomics 2014) from human testis samples.



Spectacle-Tree is an extension to the Spectacle software program that allows us to process multiple samples related by a known tree. Example data sets would be cell types related by a known developmental lineage or human individuals related by a known family tree. The software can also analyze a single cell type or a cell type assayed in two conditions. From Zhang, Song, Chen, Chaudhuri (NIPS 2015) and Song, Zhang, Chaudhuri, Chen (manuscript, 2015).

Spectacle (Spectral learning for annotating chromatin labels and epigenomes)

Spectacle is a program for annotating chromatin states from histone mark data (e.g. from the ENCODE project). It uses spectral learning for inference of the parameters of a Hidden Markov Model instead of the more commonly used expectation-maximization algorithm. From Song and Chen (Genome Biology 2015).

MixMir (Mixed linear models for microRNA motif finding)

MixMir is a program for finding microRNA motifs using gene expression and 3' UTR sequence data. From Diao et al. (Nucleic Acids Research 2014).

DEEGEP (Density Estimation by Expansions of Gegenbauer Polynomials)

DEEGEP is a simple statistical method for testing for natural selection from a high dimensional (i.e. multiple population) allele frequency spectrum. The method uses a neutral control defined by the user (e.g. intergenic sites) to learn a background neutral distribution. This is accomplished by approximating the density with a finite expansion of Gegenbauer polynomials, inspired by Kimura's classic result in theoretical population genetics. One can then test for natural selection on a second set of putatively functional sites (e.g. microRNAs) by comparing the multi-population allele frequency spectrum to the neutral control.

SLIQ (Simple Linear Inequalities for Contig Scaffolding) 

In Roy et al. (Journal of Computational Biology, 2012) we presented a simple algorithm for filtering out erroneous mate-pairs/pair-end reads and using the remaining pairs to produce scaffolds. The method does not attempt to solve the general genome assembly problem but rather takes as input a set of contigs built by another contig assembly program, such as Velvet (Zerbino and Birney) or Abyss (Simpson et al.).