Command-Line Parameters ======================= This page provides a complete reference of all PhiSpy command-line parameters. Basic Parameters ---------------- Input and Output ^^^^^^^^^^^^^^^^ .. option:: infile **Positional argument** Input file in GenBank format. This is the annotated genome you want to analyze for prophages. .. option:: -o OUTPUT_DIR, --output_dir OUTPUT_DIR The output directory where all results will be written. This directory will be created if it doesn't exist. .. option:: -p FILE_PREFIX, --file_prefix FILE_PREFIX An optional prefix to prepend to all output files. This is useful when running PhiSpy on multiple genomes and storing outputs in the same directory. .. option:: -v, --version Show the program's version number and exit. .. option:: -h, --help Show the help message and exit. Training and Prediction ^^^^^^^^^^^^^^^^^^^^^^^ .. option:: -t TRAINING_SET, --training_set TRAINING_SET Choose the most closely related training set to your genome. Default is ``data/trainSet_genericAll.txt``. Training sets are pre-computed for different bacterial groups and help improve prophage prediction accuracy. .. option:: -l {short,long}, --list {short,long} List the available training sets and exit. Use ``short`` for a brief list or ``long`` for detailed information. .. option:: -m MAKE_TRAINING_DATA, --make_training_data MAKE_TRAINING_DATA Create training data from a set of annotated genome files. Requires ``is_phage=1`` qualifier in prophage CDSs. Algorithm Parameters -------------------- Window and Gene Parameters ^^^^^^^^^^^^^^^^^^^^^^^^^^ .. option:: -n NUMBER, --number NUMBER Number of consecutive genes in a region of window size that must be prophage genes to be called. Default: 5 .. option:: -w WINDOW_SIZE, --window_size WINDOW_SIZE Window size of consecutive genes to look through to find phages. Default: 30 .. option:: -g NONPROPHAGE_GENEGAPS, --nonprophage_genegaps NONPROPHAGE_GENEGAPS The number of non-phage genes between prophages that can be tolerated before they are considered separate prophages. Default: 10 .. option:: --phage_genes PHAGE_GENES The minimum number of genes that must be identified as belonging to a phage for the region to be included. Default: 1 or more genes. **Important:** Setting this to 0 will cause PhiSpy to identify other mobile elements like plasmids, integrons, and pathogenicity islands. Filtering Parameters ^^^^^^^^^^^^^^^^^^^^ .. option:: -u MIN_CONTIG_SIZE, --min_contig_size MIN_CONTIG_SIZE Minimum contig size (in base pairs) to be included in the analysis. Smaller contigs will be dropped. Default: 5000 Metrics and Features -------------------- .. option:: --metrics METRICS [METRICS ...] The set of metrics to consider during classification. If not set, all metrics will be used. Available metrics: - ``orf_length_med``: Median ORF length - ``shannon_slope``: Slope of Shannon's diversity of k-mers - ``at_skew``: Normalized AT skew - ``gc_skew``: Normalized GC skew - ``max_direction``: Maximum number of genes in the same direction Examples:: PhiSpy.py --metrics shannon_slope gc_skew genome.gb -o results PhiSpy.py --metrics orf_length_med shannon_slope at_skew genome.gb -o results .. option:: --expand_slope Use the product of the slope of the Shannon scores in making test sets. This can improve predictions in some cases. .. option:: --kmers_type {all,codon,simple} Type of k-mers used for calculating Shannon scores. Default: all - ``all``: All possible k-mers - ``codon``: k-mers with step of 3 nucleotides - ``simple``: Simple slicing from the first position Random Forest Parameters ^^^^^^^^^^^^^^^^^^^^^^^^ .. option:: -r RANDOMFOREST_TREES, --randomforest_trees RANDOMFOREST_TREES Number of trees generated by Random Forest classifier. Default: 500 HMM Search Parameters --------------------- .. option:: --phmms PHMMS Phage HMM profile database (like pVOGs or VOGdb) to search against the genome. The HMM search results are used as an additional feature to identify prophages. Example:: PhiSpy.py genome.gb -o results --phmms pVOGs.hmm --threads 4 .. option:: --threads THREADS Number of threads to use while searching with pHMMs and the random forest. Default: 2 .. option:: --skip_search When running PhiSpy again on the same input data with ``--phmms`` option, you can skip the search step using this flag. Annotation Parameters --------------------- .. option:: --include_annotations Use the annotations in the GenBank file for phage predictions. This is the default behavior. .. option:: --ignore_annotations Ignore the annotations in the GenBank file during predictions. Only use compositional and structural features. .. option:: --color If set, CDSs with pHMM hits will be colored in the output GenBank file for viewing in Artemis genome browser. Output Control -------------- .. option:: --output_choice OUTPUT_CHOICE Sum of codes for files to output. Default: 3 (prophage_coordinates.tsv + GenBank output) Output codes: ==== ================================================== Code File ==== ================================================== 1 prophage_coordinates.tsv 2 GenBank format output 4 prophage and bacterial sequences 8 prophage_information.tsv 16 prophage.tsv 32 GFF3 format output of just the prophages 64 prophage.tbl 128 test data used in the random forest 256 GFF3 format output for annotated genomic contigs ==== ================================================== Add the codes together to get multiple outputs. For example: - ``--output_choice 3`` (default): prophage_coordinates.tsv + GenBank output - ``--output_choice 11`` (1+2+8): coordinates, GenBank, and information files - ``--output_choice 512``: All output files See the `Output Files `_ page for details on each file. Repeat Finding ^^^^^^^^^^^^^^ .. option:: --include_all_repeats Include all repeats in the GenBank output, not just the best ones. .. option:: --extra_dna EXTRA_DNA Additional DNA (in base pairs) flanking the predicted prophage to test for repeats. Default: 2000 .. option:: --min_repeat_len MIN_REPEAT_LEN Minimum repeat length to search for. Default: 10 Evaluation and Debugging ------------------------- .. option:: -e [EVALUATE], --evaluate [EVALUATE] Run in evaluation mode. Does not generate new data, but reruns the evaluation. Useful for testing different parameters. .. option:: --keep_dropped_predictions Write regions that might be prophages but were filtered out to the output files. Useful for debugging. .. option:: -k [KEEP], --keep [KEEP] Do not delete temporary files. Useful for debugging. Logging ^^^^^^^ .. option:: --log LOG Name of the log file to write details to. Default: phispy.log .. option:: --quiet Run in quiet mode with minimal output to the console. Examples -------- Basic prophage prediction:: PhiSpy.py genome.gb -o results With specific training set:: PhiSpy.py genome.gb -o results -t data/trainSet_Streptococcus.txt With HMM search and multiple threads:: PhiSpy.py genome.gb -o results --phmms pVOGs.hmm --threads 8 --color Relaxed prophage calling (more sensitive):: PhiSpy.py genome.gb -o results --phage_genes 0 Strict prophage calling (more specific):: PhiSpy.py genome.gb -o results --phage_genes 5 Get all output files:: PhiSpy.py genome.gb -o results --output_choice 512 Using specific metrics only:: PhiSpy.py genome.gb -o results --metrics shannon_slope gc_skew Multiple genomes with prefixes:: PhiSpy.py genome1.gb -o results -p genome1_ PhiSpy.py genome2.gb -o results -p genome2_