Skip to content

Pre Processing Data

ay-lab edited this page Mar 14, 2022 · 7 revisions

dcHiC accepts sparse matrix/bed pairings (Hi-C Pro) as its default input, although other formats can be converted. See below for .cool and .hic support.

Input Option Meaning
-input [required] Specify 'cool' for .cool files, and 'hic' for .hic files
-file [required] Specify file path for .cool/.mcool/.hic file
-res [required] Specify resolution for analysis (e.g. '100000')
-prefix [required] Specify prefix of results.
-genomeFile [.cool only] Location of chromosome size file. Can edit to remove chromosomes from analysis.
-removeChr [.hic only; optional] Remove chromosomes by specifying in "A,B,C" format. Commonly used for chromosome Y.

.cool

To process .cool files, dcHiC uses the cooler dump feature to obtain the sparse matrix and uses the provided -genomeFile to produce a corresponding bed index file. It accepts .mcool and .cool files. The -genomeFile should be a tab-separated list of chromosomes with their associated sizes.

python preprocess.py -input cool -file coolfile.mcool -genomeFile mm10sizes.txt -res 100000 -prefix coolfile

NOTE: If your .cool/.mcool file only covers certain chromosome(s), change the -genomeFile so that it only specifies those.

.hic

To process .hic files, dcHiC uses the straw library. The pre-processing script outputs the sparse matrix input necessary for dcHiC. First, make sure you have hic-straw installed in your environment. If you wish to use all chromosomes, omit the -removeChr argument.

python preprocess.py -input hic -file HiCFile.hic -res 1000000 -prefix hicfile -removeChr 2,3,4

Note: This only accepts cis matrix interactions at the moment.

Clone this wiki locally