Command-Line Parameters

This page provides a complete reference of all PhiSpy command-line parameters.

Basic Parameters

Input and Output

infile

Positional argument

Input file in GenBank format. This is the annotated genome you want to analyze for prophages.

-o OUTPUT_DIR, --output_dir OUTPUT_DIR

The output directory where all results will be written. This directory will be created if it doesn’t exist.

-p FILE_PREFIX, --file_prefix FILE_PREFIX

An optional prefix to prepend to all output files. This is useful when running PhiSpy on multiple genomes and storing outputs in the same directory.

-v, --version

Show the program’s version number and exit.

-h, --help

Show the help message and exit.

Training and Prediction

-t TRAINING_SET, --training_set TRAINING_SET

Choose the most closely related training set to your genome. Default is data/trainSet_genericAll.txt.

Training sets are pre-computed for different bacterial groups and help improve prophage prediction accuracy.

-l {short,long}, --list {short,long}

List the available training sets and exit. Use short for a brief list or long for detailed information.

-m MAKE_TRAINING_DATA, --make_training_data MAKE_TRAINING_DATA

Create training data from a set of annotated genome files. Requires is_phage=1 qualifier in prophage CDSs.

Algorithm Parameters

Window and Gene Parameters

-n NUMBER, --number NUMBER

Number of consecutive genes in a region of window size that must be prophage genes to be called. Default: 5

-w WINDOW_SIZE, --window_size WINDOW_SIZE

Window size of consecutive genes to look through to find phages. Default: 30

-g NONPROPHAGE_GENEGAPS, --nonprophage_genegaps NONPROPHAGE_GENEGAPS

The number of non-phage genes between prophages that can be tolerated before they are considered separate prophages. Default: 10

--phage_genes PHAGE_GENES

The minimum number of genes that must be identified as belonging to a phage for the region to be included. Default: 1 or more genes.

Important: Setting this to 0 will cause PhiSpy to identify other mobile elements like plasmids, integrons, and pathogenicity islands.

Filtering Parameters

-u MIN_CONTIG_SIZE, --min_contig_size MIN_CONTIG_SIZE

Minimum contig size (in base pairs) to be included in the analysis. Smaller contigs will be dropped. Default: 5000

Metrics and Features

--metrics METRICS [METRICS ...]

The set of metrics to consider during classification. If not set, all metrics will be used.

Available metrics:

  • orf_length_med: Median ORF length

  • shannon_slope: Slope of Shannon’s diversity of k-mers

  • at_skew: Normalized AT skew

  • gc_skew: Normalized GC skew

  • max_direction: Maximum number of genes in the same direction

Examples:

PhiSpy.py --metrics shannon_slope gc_skew genome.gb -o results
PhiSpy.py --metrics orf_length_med shannon_slope at_skew genome.gb -o results
--expand_slope

Use the product of the slope of the Shannon scores in making test sets. This can improve predictions in some cases.

--kmers_type {all,codon,simple}

Type of k-mers used for calculating Shannon scores. Default: all

  • all: All possible k-mers

  • codon: k-mers with step of 3 nucleotides

  • simple: Simple slicing from the first position

Random Forest Parameters

-r RANDOMFOREST_TREES, --randomforest_trees RANDOMFOREST_TREES

Number of trees generated by Random Forest classifier. Default: 500

HMM Search Parameters

--phmms PHMMS

Phage HMM profile database (like pVOGs or VOGdb) to search against the genome. The HMM search results are used as an additional feature to identify prophages.

Example:

PhiSpy.py genome.gb -o results --phmms pVOGs.hmm --threads 4
--threads THREADS

Number of threads to use while searching with pHMMs and the random forest. Default: 2

When running PhiSpy again on the same input data with --phmms option, you can skip the search step using this flag.

Annotation Parameters

--include_annotations

Use the annotations in the GenBank file for phage predictions. This is the default behavior.

--ignore_annotations

Ignore the annotations in the GenBank file during predictions. Only use compositional and structural features.

--color

If set, CDSs with pHMM hits will be colored in the output GenBank file for viewing in Artemis genome browser.

Output Control

--output_choice OUTPUT_CHOICE

Sum of codes for files to output. Default: 3 (prophage_coordinates.tsv + GenBank output)

Output codes:

Code

File

1

prophage_coordinates.tsv

2

GenBank format output

4

prophage and bacterial sequences

8

prophage_information.tsv

16

prophage.tsv

32

GFF3 format output of just the prophages

64

prophage.tbl

128

test data used in the random forest

256

GFF3 format output for annotated genomic contigs

Add the codes together to get multiple outputs. For example:

  • --output_choice 3 (default): prophage_coordinates.tsv + GenBank output

  • --output_choice 11 (1+2+8): coordinates, GenBank, and information files

  • --output_choice 512: All output files

See the Output Files page for details on each file.

Repeat Finding

--include_all_repeats

Include all repeats in the GenBank output, not just the best ones.

--extra_dna EXTRA_DNA

Additional DNA (in base pairs) flanking the predicted prophage to test for repeats. Default: 2000

--min_repeat_len MIN_REPEAT_LEN

Minimum repeat length to search for. Default: 10

Evaluation and Debugging

-e [EVALUATE], --evaluate [EVALUATE]

Run in evaluation mode. Does not generate new data, but reruns the evaluation. Useful for testing different parameters.

--keep_dropped_predictions

Write regions that might be prophages but were filtered out to the output files. Useful for debugging.

-k [KEEP], --keep [KEEP]

Do not delete temporary files. Useful for debugging.

Logging

--log LOG

Name of the log file to write details to. Default: phispy.log

--quiet

Run in quiet mode with minimal output to the console.

Examples

Basic prophage prediction:

PhiSpy.py genome.gb -o results

With specific training set:

PhiSpy.py genome.gb -o results -t data/trainSet_Streptococcus.txt

With HMM search and multiple threads:

PhiSpy.py genome.gb -o results --phmms pVOGs.hmm --threads 8 --color

Relaxed prophage calling (more sensitive):

PhiSpy.py genome.gb -o results --phage_genes 0

Strict prophage calling (more specific):

PhiSpy.py genome.gb -o results --phage_genes 5

Get all output files:

PhiSpy.py genome.gb -o results --output_choice 512

Using specific metrics only:

PhiSpy.py genome.gb -o results --metrics shannon_slope gc_skew

Multiple genomes with prefixes:

PhiSpy.py genome1.gb -o results -p genome1_
PhiSpy.py genome2.gb -o results -p genome2_