@bhandsaker
Hi Bob,why does the CNVDiscoveryPipeline is so time consuming? I test a WGS sample (about 30x),and run about 4 days,and it is still runing.This is my script about the CNVDiscoveryPipeline:
!/bin/bash
If you adapt this script for your own use, you will need to set these two variables based on your environment.
SV_DIR is the installation directory for SVToolkit - it must be an exported environment variable.
SV_TMPDIR is a directory for writing temp files, which may be large if you have a large data set.
export SV_DIR=/work/SoftW/svtoolkit
SV_TMPDIR=2016006L-3-1/tmpdir_CNVDiscovry
runDir=2016006L-3-1
inputFile=/work1/wsh/4.test/1.perl/1.pipetest/WGS/2016006L-3-1.dedupped.bam
sites=2016006L-3-1.discovery.vcf
genotypes=2016006L-3-1.genotypes.vcf
These executables must be on your path.
which java > /dev/null || exit 1
which Rscript > /dev/null || exit 1
which samtools > /dev/null || exit 1
For SVAltAlign, you must use the version of bwa compatible with Genome STRiP.
export PATH=${SV_DIR}/bwa:${PATH}
export LD_LIBRARY_PATH=${SV_DIR}/bwa:${LD_LIBRARY_PATH}
classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar:${SV_DIR}/lib/gatk/Queue.jar"
mkdir -p ${runDir}/logs || exit 1
mkdir -p ${runDir}/metadata || exit 1
java -Xmx4g -cp ${classpath} \
org.broadinstitute.gatk.queue.QCommandLine \
-S ${SV_DIR}/qscript/discovery/cnv/CNVDiscoveryPipeline.q \
-S ${SV_DIR}/qscript/SVQScript.q \
-cp ${classpath} \
-gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
-configFile conf/genstrip_parameters.txt \
-R /work/wsh/0.Pipeline/TargetSeq/Genome_STRiP_ref/Homo_sapiens_assembly19.fasta \
-I ${inputFile} \
-md ${runDir}/metadata \
-runDirectory ${runDir} \
-jobLogDir ${runDir}/logs \
-intervalList /work/wsh/0.Pipeline/TargetSeq/Genome_STRiP_ref/Homo_sapiens_assembly19.interval.list \
-genderMapFile /work1/wsh/4.test/1.perl/1.pipetest/WGS/2016006L-3-1_gender.map \
-jobRunner Shell \
--disableJobReport \
-tempDir ${SV_TMPDIR} \
-gatkJobRunner Shell \
-retry 10 \
-tilingWindowSize 1000 \
-tilingWindowOverlap 500 \
-maximumReferenceGapLength 1000 \
-boundaryPrecision 100 \
-minimumRefinedLength 500 \
-genotypingParallelRecords 500 \
-run
#
#
#
Could you help me check my script Whether there are some mistake? Thank you very much.