Rs’ Response] We first wish to thank Dr. Kuznetsov for the
Rs’ Response] We first wish to thank Dr. Kuznetsov for the detailed comparison he has provided with third party experimental results. Since our study included 152 transcription factors, it was not possible for us to conduct an exhaustive literature review on each TF to compile all known targets. We relied on public databases and a few large scale studies where possible. Including the results of large scale studies was difficult, especially when the results were derived from ChIP-chip or tiling arrays (e.g., Cawley et al. [2]). The tiling arrays will interrogate the whole genome and may include exon regions and very large intergenic regions. This is in contrast to our analysis which included regions of only several kb surrounding transcription start sites. Many of the positive hits identified by tiling array could not be included in our analysis simply because the identified binding site falls outside of the PemafibrateMedChemExpress Pemafibrate promoter regions we examined. Thus our site filtering by statistical significance and gene region is at least part of the reason why not all of the Cawley sites are included. Dr. Kuznetsov further points to the study by Zeller et PubMed ID:https://www.ncbi.nlm.nih.gov/pubmed/28494239 al. {Zeller, 2006 #2124 and calculates that, of the 199 Myc targets identified in our study, only 16 overlap with the 688 targets identified by Zeller et al. using ChIP-PET. Although we would have hoped to see greater correspondence, this doesn’t necessarily indicate that our method is finding poor targets. Given the size of the genome in our study (18660 genes), if we assume that the 688 targets identifiedby Zeller et al. are the gold standard set of true positives, we calculate that the p-value for identifying 16 correct targets is 0.0012 (by hypergeometric distribution), indicating that our target set is enriched for true targets in a statistical sense, and that the 199 gene set may represent an interesting group for further study (this calculation may be repeated using the MATLAB function “hygecdf” where “p-value = 1-hygecdf(16,18660,688,199)”). Therefore, while we acknowledge the limitations discussed by our reviewer and agree that these may be best addressed by follow-on experimental studies, we feel that many of the target sets we have identified merit further analysis.Reviewer Comments 11. The authors used relatively large training set as well. For example, 4627 targets for CREB1 and HNF4-alpha [no references, V.K.]. They stressed that “In fact, when large sets of known interactions exist, the classifiers make few or no new predictions, perhaps suggesting that a significant subset of the targets for those factors have already been found (most strikingly, HNF4- classifiers yield only 3 new predictions, and CREB1 yields only 1)”. This conclusion means that all specific gene targets for these TF are known. However, it contradicts to observation. For example, using CACO method, S. Impey et al (Cell, 2004) found 32700 potential CREB regulatory regions in the rat genome. These authors also found that 60 CREB regulatory regions are located in 2 Kb 5′ upstream promoter regions and in internal gene regions. This estimate assumes that at least 19634 genes could be considered as putative direct targets for CREB. For different TFs, Chip-seq method [Johnson et al, Science, 2007, Roberson et al, 2007, Nat Meth, 20007 ] (which sampling in 10?0 times deeper that was use before in ChIP-based sequencing/cloning experiments) identifies from 2000 to 42000 locations of TF binding sites in the human genome. These findin.