Hello,

I am new to using GenomeSTRiP. I successfully installed the program and completed the install test. With my own data, I have successfully completed the SVPreprocessing step. However, when I get to the SVDiscovery step, I repeatedly encounter an error, as below:

INFO 12:57:47,130 10-Jul-2016 SVDiscovery - Processing clusters ...
INFO 12:57:47,245 10-Jul-2016 ReadCountDiskCache - Initializing read count disk cache [practice1/metadata/rccache.bin] ...
INFO 12:57:47,246 10-Jul-2016 ReadCountDiskCache - Initialized read count disk cache with 1 file.
INFO 12:57:47,259 10-Jul-2016 SVDiscovery - No hapmap snp genotype directory specified
INFO 12:57:47,263 10-Jul-2016 SVDiscovery - No array intensity data specified
INFO 12:57:48,175 10-Jul-2016 SVDiscovery - Clustering: Generating clusters for 252 read pairs.
INFO 12:57:48,350 10-Jul-2016 SVDiscovery - Clustering: LR split size 252 / 252 maximal clique size 226 clique count 1
INFO 12:57:48,352 10-Jul-2016 SVDiscovery - Clustering: LR split size 26 / 252 maximal clique size 21 clique count 1
INFO 12:57:48,352 10-Jul-2016 SVDiscovery - Clustering: LR split size 5 / 252 maximal clique size 3 clique count 2
INFO 12:57:48,353 10-Jul-2016 SVDiscovery - Processing cluster 19:4817787-4818235 19:4820125-4820633 LR 21
Error: Exception processing cluster: null
Cluster: 19:4817787-4818235 19:4820125-4820633 LR 21
INFO 12:57:50,391 10-Jul-2016 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------

ERROR stack trace

java.lang.NullPointerException
at org.broadinstitute.sv.discovery.DeletionDiscoveryAlgorithm.writeVCFRecord(DeletionDiscoveryAlgorithm.java:547)
at org.broadinstitute.sv.discovery.DeletionDiscoveryAlgorithm.processCluster(DeletionDiscoveryAlgorithm.java:446)
at org.broadinstitute.sv.discovery.DeletionDiscoveryAlgorithm.processClusters(DeletionDiscoveryAlgorithm.java:353)
at org.broadinstitute.sv.discovery.DeletionDiscoveryAlgorithm.runDiscovery(DeletionDiscoveryAlgorithm.java:197)
at org.broadinstitute.sv.discovery.SVDiscoveryWalker.onTraversalDone(SVDiscoveryWalker.java:107)
at org.broadinstitute.sv.discovery.SVDiscoveryWalker.onTraversalDone(SVDiscoveryWalker.java:40)
at org.broadinstitute.gatk.engine.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:116)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
at org.broadinstitute.sv.main.SVCommandLine.execute(SVCommandLine.java:133)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.sv.main.SVCommandLine.main(SVCommandLine.java:87)
at org.broadinstitute.sv.main.SVDiscovery.main(SVDiscovery.java:21)

ERROR ------------------------------------------------------------------------------------------

Some notes: I am using a non-human genome and omitted the genome mask. For reference, my code is documented below. Any advice would be greatly appreciated.

!/bin/bash

inputType=bam
if [ ! -z "$1" ]; then
inputType="$1"
fi

runDir=practice1
genotypes=practice1.genotypes.vcf
sites=practice1.discovery.vcf

These executables must be on your path.

which java > /dev/null || exit 1
which Rscript > /dev/null || exit 1
which samtools > /dev/null || exit 1

For SVAltAlign, you must use the version of bwa compatible with Genome STRiP.

export PATH=${SV_DIR}/bwa:${PATH}
export LD_LIBRARY_PATH=${SV_DIR}/bwa:${LD_LIBRARY_PATH}

mx="-Xmx4g"
classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar:${SV_DIR}/lib/gatk/Queue.jar"

mkdir -p ${runDir}/logs || exit 1
mkdir -p ${runDir}/metadata || exit 1

Display version information.

java -cp ${classpath} ${mx} -jar ${SV_DIR}/lib/SVToolkit.jar

Run preprocessing.

For large scale use, you should use -reduceInsertSizeDistributions, but this is too slow for the installation test.

The method employed by -computeGCProfiles requires a GC mask and is currently only supported for human genomes.

java -cp ${classpath} ${mx} \
org.broadinstitute.gatk.queue.QCommandLine \
-S ${SV_DIR}/qscript/SVPreprocess.q \
-S ${SV_DIR}/qscript/SVQScript.q \
-gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
--disableJobReport \
-cp ${classpath} \
-configFile ${SV_DIR}/conf/genstrip_parameters.txt \
-tempDir ${SV_TMPDIR} \
-R ~/project-zarlab/mouseBAM/chr19_new.fa \
-runDirectory ${runDir} \
-md ${runDir}/metadata \
-ploidyMapFile ~/project-jflint/mouseBAM/chr19.ploidymap.txt \
-reduceInsertSizeDistributions false \
-computeGCProfiles true \
-computeReadCounts true \
-jobLogDir ${runDir}/logs \
-I ~/project-zarlab/mouseBAM/input.list \
-run \
|| exit 1

Run discovery.

java -cp ${classpath} ${mx} \
org.broadinstitute.gatk.queue.QCommandLine \
-S ${SV_DIR}/qscript/SVDiscovery.q \
-S ${SV_DIR}/qscript/SVQScript.q \
-gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
--disableJobReport \
-cp ${classpath} \
-configFile conf/genstrip_installtest_parameters.txt \
-tempDir ${SV_TMPDIR} \
-R ~/project-zarlab/mouseBAM/chr19_new.fa \
-runDirectory ${runDir} \
-md ${runDir}/metadata \
-disableGATKTraversal \
-genderMapFile /u/home/m/mdistler/project-jflint/genomestrip/svtoolkit/installtest/gender.map \
-jobLogDir ${runDir}/logs \
-minimumSize 100 \
-maximumSize 1000000 \
-suppressVCFCommandLines \
-I ~/project-zarlab/mouseBAM/input.list \
-O ${sites} \
-run \
|| exit 1