Our Research and Mission

Today, quantitative ideas are an essential part of cancer research, as many of the opportunities and challenges for the cancer research community involve using complex data to further our understanding of cancer biology and optimize prevention and therapy. The mission of the Department of Data Sciences is twofold: to generate data science knowledge and innovative tools that can enable progress in cancer research, and to provide investigators across DFCI and beyond access to highly qualified and committed quantitative scientists.

The department is home to a complex web of programs, centers, cores, labs and working groups which pursue complementary missions, but come together in forming one of the most vibrant computational sciences environment in today’s biomedicine. For example, the department is the home base of the DF/HCC Cancer Data Science program —the only one of this kind among comprehensive cancer centers, and the DF/HCC biostatistics core; it is also home to three DFCI strategic research centers, the Knowledge System software development group, and two statistical coordinating centers for clinical trial cooperative groups. Faculty members play a key role in open-source software initiatives such as Bioconductor, the CISTROME database, and cBioPortal, and in online data science education.


The mission for Center for Functional Cancer Epigenetics (CFCE) is to explore the key role that epigenetic alterations and abnormal transcriptional regulation play in the development and progression of cancer. A better understanding of these alterations will lead to better diagnosis for cancer and the potential to contribute to the knowledge required for the development of new therapeutics exploiting epigenetic mechanisms.
The Center for Cancer Evolution (CCE) was founded in 2016 and focuses on understanding cancer evolution through a multi-disciplinary approach. Our goal is to understand the mechanisms behind tumor evolution, metastasis formation, emergence of drug resistance to ultimately provide more specialized and effective patient care in a variety of different cancer types.
The cBio Center at Dana-Farber's mission is to provide oncologists with tools to mine genomic patient data for research and for guiding treatment decisions, devise strategies to overcome resistance to targeted cancer drugs and create new connections between scientists at Dana-Farber and Harvard Medical School, including collaborative structures for scientists using quantitative sciences to solve biological problems.
The department is the statistical center for ECOG-ACRIN, which designs and conducts biomarker-driven cancer research involving adults who have cancer or are at risk. We have boldly integrated therapeutic and diagnostic imaging-based research disciplines with the latest bioinformatics technologies into a single scientific organization. 
The IBCSG is a non-profit research organization dedicated to innovative clinical research to improve the prognosis of women with breast cancer. Patients and investigators from over 500 sites on 6 continents (Europe, Australia/New Zealand, Africa, Asia, North and South America) cooperate by participating in extensive clinical trials in breast cancer populations, guided by the highest scientific and ethical standards.

The DF/HCC is a designated NCI Comprehensive Cancer Center comprised of seven Boston institutions. The cancer research and treatment collaborations sponsored across participating institutions by the DF/HCC combine the disciplines of population science, clinical science, and basic research to facilitate the development of novel and improved modalities for the prevention and treatment of cancer.



The BayesMendel working group is dedicated to the development of methodologies, models, and open source software for predicting who may carry a cancer susceptibility gene. We use statistical ideas that go back to Bayes and genetic models that go back to Mendel.

Scott Carter Lab

Dr. Carter works closely with Boston area physicians to design and execute studies of cancer initiation, drug resistance, and metastasis using genomics technology applied to cancer-tissue specimens collected at various stages of disease progression. Dr. Carter has developed several novel computational methods in order to analyze these datasets and make inferences about clonal evolution underlying cancer progression. He has also developed software tools that are significantly increasing the impact of his work by making those methods available to the broader research community. These tools include HAPSEG, ABSOLUTE, CapSeg, Allelic CapSeg, and Phylogic.

In our lab, we combine experiments and theory to understand the complex dynamics of cell state transitions in development and in disease. To do so, we expand and combine emerging experimental techniques. For example, we use time-lapse microscopy to track the lineage history of individual cells as they divide, followed by single-molecule imaging to readout the expression levels of multiple genes in the same cells. We also use synthetic biology to engineer novel genetic circuits that can retain the history of major transitions in each cell. One of our favorite model systems is mouse embryonic stem (ES) cells in culture, which can rehash slices of development on a dish, providing us with a rich playground.

At the same time, we also develop new theoretical frameworks to decipher the data generated in our experiments: for example, methods for inferring dynamics from single-cell lineage trees with endpoint gene expression measurements. Often inspired by concepts in statistical physics, we have used ideas from geometry, critical phenomena, and at times number theory. Ultimately, we seek to find general theoretical principles to understand the stochastic processes and collective dynamics that emerge in populations of proliferating cells.

The unprecedented advancements in digital technology during the second half of the 20th century has produced a measurement revolution that is transforming science. In biomedical research, the Genomics revolution is being driven by new technologies that permit us to observe molecular entities analogous to identifying microorganisms and other breakthroughs permitted by the invention of the microscope. Choice examples of these technologies are next generation sequencing (NGS) and microarrays.

Scientific fields that have traditionally relied upon simple data analysis techniques have been turned on their heads by these technologies. Interpreting information extracted from these massive and complex datasets requires sophisticated statistical methodology as one can easily be fooled by patterns arising by chance or systematic errors that are hard to detect.

Rafael Irizarry’s lab is interested in the development of statistical tools that help researchers better interpret their data. The lab disseminates these tools through open source that is available for free online. This software has tens of thousands of users and the scientific publications in which these methods are highly cited.

The Knowledge Systems Group is an applied genomics software and data sciences group, focused on enabling cancer genomics research and precision cancer medicine at Dana-Farber Cancer Institute.

We are currently focused on four areas of research and development: 1) we collaborate with MSKCC on the continued development of the cBioPortal for Cancer Genomics, an open source platform for mining cancer genomics data sets; 2) we are actively collaborating with translational researchers and clinicians to mine genomic data generated by Dana-Farber's enterprise sequencing project -- now one of the largest databases of genomic data linked to detailed clinical data; 3) we are building new genomic tools for clinicians, including a new clinical trial matching platforms for algorithmically matching genomic profiles to open clinical trials; 4) we are helping to build new cloud-based platforms for securely sharing genomic data across multiple cancer centers, while preserving patient confidentiality.

The general theme of our research is to tackle biological problems with advanced computational and statistical methods. We develop sophisticated sequence alignment, sequence assembly and variant calling algorithms that are fundamental to the applications of biological sequence data. We analyze single-cell sequence data to investigate mosaic between cells. We are also interested in questions related to evolution, from species evolution over hundreds of millions of years, to human evolution in the past hundreds of thousands of years, all the way down to cell evolution in hundreds of cell cycles.

We are part of the department of Biomedical Informatics of Harvard Medical School and the department of Biostatistics & Computational Biology of Dana-Farber Cancer Institute. Our lab has collaborations with the Xie Research Group, the Reich lab and the MacArthur lab around the Boston area, as well as others elsewhere. Our work is routinely used by thousands of researchers across the world.

Our research focuses on algorithm development and integrative mining from high throughput data to understand gene regulation in cancer biology. We have developed a number of widely used algorithms for transcription factor motif finding, ChIP-chip / ChIP-seq / DNase-seq / CRISPR screen data analysis. Through integrating genome-wide transcription factor binding, chromatin dynamics, gene expression profiles, and chemical and functional screens, we try to model the specificity and function of transcription factors, chromatin regulators, RNA binding proteins, kinases, and lncRNAs in tumor development, progression, drug response and resistance.

The research of our lab focuses on the evolutionary dynamics of cancer. Cancer emerges due to an evolutionary process in somatic tissue. The fundamental laws of evolution can best be formulated as exact mathematical equations. Therefore, the process of cancer initiation and progression is amenable to mathematical investigation. Current areas of research include cancer stem cells, evolution of drug resistance, and the dynamics of metastasis formation.

Program in Regulatory Science

The mission of the Program in Regulatory Science (PRS) is to generate novel approaches for the development of therapeutics and biomarkers through preclinical and clinical trial designs, pipeline modeling, and scientific contributions in regulatory science.

The development and translation of effective precision medicines requires generation of information regarding both efficacious therapeutics and putative biomarkers that predict that efficacy. The exponential increase in potential biomarker information that is available for any given patient or disease, including genomic biomarkers, has created challenges with respect to the best methods to support both the scientific development and the regulatory decision-making required to bring new medicines to clinic. The Program in Regulatory Science develops new approaches and quantitative methods to support the discovery of new treatments in precision medicine and collaborates with groups at Dana-Farber in the development of innovative clinical trials.

The overall goal of our work in the Sander Lab is to solve biological problems using quantitative methods from bioinformatics, statistical physics, data sciences, statistics, computer science, and mathematics. We apply these computational methods to build predictive network models of molecular and cell-cell interactions, to support cancer precision medicine, and to make discoveries in structural and evolutionary biology.

The Singer group focuses on elucidating the function and underlying regulation of the immune system during homeostasis, cancer and autoimmunity. 

We generate high-throughput datasets and develop machine-learning frameworks to characterize in vivo systems and generate testable hypotheses about their regulation.

Location: Center for Life Sciences Building, 11th floor and Longwood Center, 9th Floor
Website: https://www.singerlab.website/

The main research interests of our lab are to develop systems biology approaches for characterizing cell states and understanding the regulatory circuitry. To this end, we take advantage of the existing large amount of omic-level datasets and develop computational methods to analyze and integrate various genomic data types, with the focus on chromatin dynamics and single-cell analysis. In collaboration with experimental biologists and clinical scientists, we have applied these computational approaches to study stem cell and cancer biology. We are excited to be involved in the International Human Cell Atlas Initiative.

Cheng-Zhong Zhang Lab

Cheng-Zhong Zhang Lab

Cheng-Zhong Zhang obtained a Ph.D. in Chemical Engineering with a minor in Physics from Caltech. He received postdoctoral training in single-molecule biophysics in Timothy Springer’s laboratory at Harvard Medical School and subsequently worked on single-cell genomics and cancer biology in Matthew Meyerson’s laboratory at the Broad Institute of Harvard and MIT and at Dana-Farber Cancer Institute. He combines single-cell genomic analysis and cell biology to study genome evolution and phenotypic variation. Currently he focuses on the mechanisms and the transcriptional consequences of chromosomal abnormalities, such as rearrangement and aneuploidy, and how such events may promote tumor development.