JOURNAL OF APPLIED STATISTICS, cilt.42, sa.7, ss.1556-1571, 2015 (SCI-Expanded)
The availability of the next generation sequencing (NGS) technology in today's biomedical research has provided new opportunities in scientific discovery of genetic information. The high-throughput NGS technology, especially DNA-seq, is particularly useful in profiling a genome for the analysis of DNA copy number variants (CNVs). The read count (RC) data resulting from NGS technology are massive and information rich. How to exploit the RC data for accurate CNV detection has become a computational and statistical challenge. We provide a statistical online change point method to help detect CNVs in the sequencing RC data in this paper. This method uses the idea of online searching for change point (or breakpoint) with a Markov chain assumption on the breakpoints loci and an iterative computing process via a Bayesian framework. We illustrate that an online change-point detection method is particularly suitable for identifying CNVs in the RC data. The algorithm is applied to the publicly available NCI-H2347 lung cancer cell line sequencing reads data for locating the breakpoints. Extensive simulation studies have been carried out and results show the good behavior of the proposed algorithm. The algorithm is implemented in R and the codes are available upon request.