Publication Support
For a complete list of my publications see HSPH homepage or Google Scholar account
iBBiG: Iterative Binary Bi-clustering of Gene Sets
Daniel Gusenleitner, Eleanor A Howe, Stefan Bentink, John Quackenbush, and Aedin C Culhane
iBBiG is a bi-clustering algorithm which we apply to meta-gene set analysis of large numbers of gene expression datasets. The iterative algorithm extracts groups of phenotypes from multiple studies that are associated with similar gene sets. iBBiG does not require prior knowledge of the number or scale of clusters and allows discovery of clusters with diverse sizes. It is robust in the presence of noise and on simulated data, iBBiG outperformed commonly used clustering methods.
- Supplementary Materials to the manuscript (pdf)
- Contact aedin at jimmy.harvard.edu
Running iBBiG
Prepare a binary matrix. We run GSEAlm on each dataset and then discretize the p-values p<0.05 into 1 or 0. To run iBBiG bicluster analysis, source the scripts below and call the function
## Load iBBiG library library(iBBiG) #Create simulated dataset simData<-makeArtificial() binaryMatrix<-simData@Seeddata # Run iBBiG biclustering clusters<-iBBiG(binaryMatrix,nModules=8) ## nModules is the number of clusters plot(clusters, reorder=TRUE) summary(clusters)
The iBBiG class is an extension of biclust and contains the slots
- Clusterscore: Overall score of each cluster.
- RowScorexNumber: Score for each row (geneset) in each module, this can be used to rank gene set importance to each module
- Seeddata: Matrix of binary data (optional) not normally used except for creation of simulated data
- RowxNumber: RowScoreNumber are discretized into a binary vector. The scores are typically more useful than the binary vector, but this is included to enable easy comparision to biclust and fabia functions
- NumberxCol: Number of covariates (columns) in each module
- Number: Number of clusters
download
Download iBBiG R package from Bioconductor
source("http://bioconductor.org/biocLite.R") biocLite("iBBiG")