Command-Line Parameters
This page provides a complete reference of all PhiSpy command-line parameters.
Basic Parameters
Input and Output
- infile
Positional argument
Input file in GenBank format. This is the annotated genome you want to analyze for prophages.
- -o OUTPUT_DIR, --output_dir OUTPUT_DIR
The output directory where all results will be written. This directory will be created if it doesn’t exist.
- -p FILE_PREFIX, --file_prefix FILE_PREFIX
An optional prefix to prepend to all output files. This is useful when running PhiSpy on multiple genomes and storing outputs in the same directory.
- -v, --version
Show the program’s version number and exit.
- -h, --help
Show the help message and exit.
Training and Prediction
- -t TRAINING_SET, --training_set TRAINING_SET
Choose the most closely related training set to your genome. Default is
data/trainSet_genericAll.txt.Training sets are pre-computed for different bacterial groups and help improve prophage prediction accuracy.
- -l {short,long}, --list {short,long}
List the available training sets and exit. Use
shortfor a brief list orlongfor detailed information.
- -m MAKE_TRAINING_DATA, --make_training_data MAKE_TRAINING_DATA
Create training data from a set of annotated genome files. Requires
is_phage=1qualifier in prophage CDSs.
Algorithm Parameters
Window and Gene Parameters
- -n NUMBER, --number NUMBER
Number of consecutive genes in a region of window size that must be prophage genes to be called. Default: 5
- -w WINDOW_SIZE, --window_size WINDOW_SIZE
Window size of consecutive genes to look through to find phages. Default: 30
- -g NONPROPHAGE_GENEGAPS, --nonprophage_genegaps NONPROPHAGE_GENEGAPS
The number of non-phage genes between prophages that can be tolerated before they are considered separate prophages. Default: 10
- --phage_genes PHAGE_GENES
The minimum number of genes that must be identified as belonging to a phage for the region to be included. Default: 1 or more genes.
Important: Setting this to 0 will cause PhiSpy to identify other mobile elements like plasmids, integrons, and pathogenicity islands.
Filtering Parameters
- -u MIN_CONTIG_SIZE, --min_contig_size MIN_CONTIG_SIZE
Minimum contig size (in base pairs) to be included in the analysis. Smaller contigs will be dropped. Default: 5000
Metrics and Features
- --metrics METRICS [METRICS ...]
The set of metrics to consider during classification. If not set, all metrics will be used.
Available metrics:
orf_length_med: Median ORF lengthshannon_slope: Slope of Shannon’s diversity of k-mersat_skew: Normalized AT skewgc_skew: Normalized GC skewmax_direction: Maximum number of genes in the same direction
Examples:
PhiSpy.py --metrics shannon_slope gc_skew genome.gb -o results PhiSpy.py --metrics orf_length_med shannon_slope at_skew genome.gb -o results
- --expand_slope
Use the product of the slope of the Shannon scores in making test sets. This can improve predictions in some cases.
- --kmers_type {all,codon,simple}
Type of k-mers used for calculating Shannon scores. Default: all
all: All possible k-merscodon: k-mers with step of 3 nucleotidessimple: Simple slicing from the first position
Random Forest Parameters
- -r RANDOMFOREST_TREES, --randomforest_trees RANDOMFOREST_TREES
Number of trees generated by Random Forest classifier. Default: 500
HMM Search Parameters
- --phmms PHMMS
Phage HMM profile database (like pVOGs or VOGdb) to search against the genome. The HMM search results are used as an additional feature to identify prophages.
Example:
PhiSpy.py genome.gb -o results --phmms pVOGs.hmm --threads 4
- --threads THREADS
Number of threads to use while searching with pHMMs and the random forest. Default: 2
- --skip_search
When running PhiSpy again on the same input data with
--phmmsoption, you can skip the search step using this flag.
Annotation Parameters
- --include_annotations
Use the annotations in the GenBank file for phage predictions. This is the default behavior.
- --ignore_annotations
Ignore the annotations in the GenBank file during predictions. Only use compositional and structural features.
- --color
If set, CDSs with pHMM hits will be colored in the output GenBank file for viewing in Artemis genome browser.
Output Control
- --output_choice OUTPUT_CHOICE
Sum of codes for files to output. Default: 3 (prophage_coordinates.tsv + GenBank output)
Output codes:
Code
File
1
prophage_coordinates.tsv
2
GenBank format output
4
prophage and bacterial sequences
8
prophage_information.tsv
16
prophage.tsv
32
GFF3 format output of just the prophages
64
prophage.tbl
128
test data used in the random forest
256
GFF3 format output for annotated genomic contigs
Add the codes together to get multiple outputs. For example:
--output_choice 3(default): prophage_coordinates.tsv + GenBank output--output_choice 11(1+2+8): coordinates, GenBank, and information files--output_choice 512: All output files
See the Output Files page for details on each file.
Repeat Finding
- --include_all_repeats
Include all repeats in the GenBank output, not just the best ones.
- --extra_dna EXTRA_DNA
Additional DNA (in base pairs) flanking the predicted prophage to test for repeats. Default: 2000
- --min_repeat_len MIN_REPEAT_LEN
Minimum repeat length to search for. Default: 10
Evaluation and Debugging
- -e [EVALUATE], --evaluate [EVALUATE]
Run in evaluation mode. Does not generate new data, but reruns the evaluation. Useful for testing different parameters.
- --keep_dropped_predictions
Write regions that might be prophages but were filtered out to the output files. Useful for debugging.
- -k [KEEP], --keep [KEEP]
Do not delete temporary files. Useful for debugging.
Logging
- --log LOG
Name of the log file to write details to. Default: phispy.log
- --quiet
Run in quiet mode with minimal output to the console.
Examples
Basic prophage prediction:
PhiSpy.py genome.gb -o results
With specific training set:
PhiSpy.py genome.gb -o results -t data/trainSet_Streptococcus.txt
With HMM search and multiple threads:
PhiSpy.py genome.gb -o results --phmms pVOGs.hmm --threads 8 --color
Relaxed prophage calling (more sensitive):
PhiSpy.py genome.gb -o results --phage_genes 0
Strict prophage calling (more specific):
PhiSpy.py genome.gb -o results --phage_genes 5
Get all output files:
PhiSpy.py genome.gb -o results --output_choice 512
Using specific metrics only:
PhiSpy.py genome.gb -o results --metrics shannon_slope gc_skew
Multiple genomes with prefixes:
PhiSpy.py genome1.gb -o results -p genome1_
PhiSpy.py genome2.gb -o results -p genome2_