Tips and Tricks
This page contains helpful tips, troubleshooting advice, and solutions to common problems.
PATH Issues
Command Not Found
If you get this error:
$ PhiSpy.py -v
-bash: PhiSpy.py: command not found
Solution 1: Use full path
~/.local/bin/PhiSpy.py -v
Solution 2: Add to PATH (recommended)
echo "export PATH=\$HOME/.local/bin:\$PATH" >> ~/.bashrc
source ~/.bashrc
PhiSpy.py -v
Installation Tips
Quick Installation
The simplest installation (requires sudo):
sudo apt install -y python3-pip
python3 -m pip install --user phispy
Note: python3-pip automatically installs build-essential and python3-dev.
Verify Installation
Check that PhiSpy is installed correctly:
PhiSpy.py --version
PhiSpy.py --list short
Running PhiSpy
Handling Large Genomes
For large or draft genomes:
Filter small contigs:
PhiSpy.py genome.gb -o results --min_contig_size 10000
Process contigs separately if memory is an issue
Use appropriate resources:
Memory: ~2-4 GB per genome typically
CPU: Use
--threadsfor HMM searches
Working with Draft Genomes
Draft genomes often have prophages spanning contig breaks.
Best practices:
Pay attention to prophages at contig ends
Consider using assembly tools to close gaps
Review contig boundaries manually
Expect some fragmented predictions
Gzip Files
PhiSpy handles gzip automatically:
# These all work
PhiSpy.py genome.gb -o results
PhiSpy.py genome.gb.gz -o results
# Output matches input format
# If input is .gz, output files are also .gz
Multiple Genomes
Process multiple genomes efficiently:
Method 1: Use file prefixes
for genome in *.gb; do
name=$(basename "$genome" .gb)
PhiSpy.py "$genome" -o results -p "${name}_"
done
Method 2: Separate output directories
for genome in *.gb; do
name=$(basename "$genome" .gb)
PhiSpy.py "$genome" -o "results_${name}"
done
Parameter Tuning
Finding the Right Settings
Start with defaults and adjust based on results:
# 1. Run with defaults
PhiSpy.py genome.gb -o results_default
# 2. Review output
less results_default/prophage_coordinates.tsv
# 3. Adjust if needed
# Too many predictions:
PhiSpy.py genome.gb -o results_strict --phage_genes 5
# Too few predictions:
PhiSpy.py genome.gb -o results_relaxed --phage_genes 0
Systematic Testing
Test multiple parameter combinations:
for pg in 0 1 3 5; do
PhiSpy.py genome.gb -o results_pg${pg} --phage_genes $pg
done
# Compare results
wc -l results_pg*/prophage_coordinates.tsv
Using Jupyter Notebook
The interactive notebook is excellent for parameter exploration:
Clone the repository
Install Jupyter:
pip install jupyterRun:
jupyter notebook jupyter_notebooks/PhiSpy.ipynbAdjust parameters interactively
Performance
Speed Up Processing
Use multiple threads:
PhiSpy.py genome.gb -o results --threads 8
Skip HMM search on reruns:
PhiSpy.py genome.gb -o results --phmms pVOGs.hmm --skip_search
Use appropriate training sets (faster than generic)
Filter small contigs (reduces processing time)
Parallel Processing
Process multiple genomes in parallel:
# Using GNU parallel
parallel -j 4 'PhiSpy.py {} -o results/{/.} --threads 2' ::: genomes/*.gb
# Or simple background jobs
for genome in genomes/*.gb; do
name=$(basename "$genome" .gb)
PhiSpy.py "$genome" -o "results_${name}" --threads 2 &
done
wait
Debugging
Verbose Output
For detailed logging:
PhiSpy.py genome.gb -o results --log phispy_detailed.log
Review the log file for:
Which metrics were used
Training set loaded
Number of genes/contigs processed
Decisions made during prediction
Keep Temporary Files
Preserve intermediate files for inspection:
PhiSpy.py genome.gb -o results --keep
This keeps all temporary files for debugging.
Evaluation Mode
Re-run evaluation without reprocessing:
PhiSpy.py genome.gb -o results --evaluate
Useful when testing different thresholds.
Common Issues
No Prophages Found
Possible causes:
Genome truly has no prophages
Parameters too strict
Poor annotation quality
Contigs too small
Solutions:
# Try relaxed parameters
PhiSpy.py genome.gb -o results --phage_genes 0
# Check for annotation
grep -c "CDS" genome.gb
# Verify contig sizes
PhiSpy.py genome.gb -o results --min_contig_size 1000
Too Many False Positives
Symptoms:
Many small predictions
Ribosomal operons called as prophages
Pathogenicity islands included
Solutions:
# Increase stringency
PhiSpy.py genome.gb -o results --phage_genes 5
# Use specific metrics
PhiSpy.py genome.gb -o results --metrics shannon_slope gc_skew
# Use HMM database
PhiSpy.py genome.gb -o results --phmms pVOGs.hmm
Fragmented Predictions
Symptom: One prophage split into multiple predictions.
Solution:
# Allow more non-phage genes between regions
PhiSpy.py genome.gb -o results --nonprophage_genegaps 20
Memory Issues
For very large genomes:
Filter contigs:
PhiSpy.py genome.gb -o results --min_contig_size 10000
Reduce threads:
PhiSpy.py genome.gb -o results --threads 1
Process on high-memory machine
Split genome into smaller files
File Format Issues
GenBank Format Requirements
PhiSpy expects:
Valid GenBank format
CDS features with locations
DNA sequence included
Verify format:
# Check for required elements
grep "LOCUS" genome.gb
grep "CDS" genome.gb
grep "ORIGIN" genome.gb
Converting Formats
From GFF + FASTA to GenBank:
# Using BioPython
python -c "from BCBio import GFF; from Bio import SeqIO; \
SeqIO.write(GFF.parse('genome.gff', 'genome.fasta'), 'genome.gb', 'genbank')"
From EMBL to GenBank:
# Using BioPython
python -c "from Bio import SeqIO; \
SeqIO.convert('genome.embl', 'embl', 'genome.gb', 'genbank')"
Quality Control
Validating Predictions
Always validate key predictions:
BLAST search prophage genes
Check for att sites (strong evidence)
Verify integration sites (often at tRNAs)
Compare to known prophages in related species
Check GC content (often different from host)
Comparing Versions
When testing parameters:
# Generate comparable outputs
PhiSpy.py genome.gb -o v1 --phage_genes 1
PhiSpy.py genome.gb -o v2 --phage_genes 3
# Compare
diff v1/prophage_coordinates.tsv v2/prophage_coordinates.tsv
Batch Analysis
Create Summary Statistics
# Count prophages per genome
for dir in results_*/; do
count=$(wc -l < "$dir/prophage_coordinates.tsv")
echo "$(basename $dir): $count prophages"
done
Merge Results
# Combine all prophage coordinates
echo -e "Genome\tProphage\tContig\tStart\tStop" > all_prophages.tsv
for dir in results_*/; do
genome=$(basename $dir)
awk -v g="$genome" '{print g"\t"$0}' "$dir/prophage_coordinates.tsv" >> all_prophages.tsv
done
Best Practices Summary
Start simple: Use default parameters first
Review carefully: Always manually inspect results
Use HMM databases: Improves accuracy significantly
Choose appropriate training sets: Use organism-specific when available
Document parameters: Record what you used for reproducibility
Validate predictions: Use independent evidence
Iterate: Adjust parameters based on results
Keep logs: They’re invaluable for troubleshooting
Getting Help
If you encounter issues:
Check this documentation thoroughly
Review the log file for error messages
Search GitHub issues: https://github.com/linsalrob/PhiSpy/issues
Open a new issue if your problem is novel:
Include PhiSpy version
Describe the problem clearly
Provide example data if possible
Include command used and error message
Join the community: Participate in discussions on GitHub
Useful Resources
GitHub Repository: https://github.com/linsalrob/PhiSpy
Original Paper: https://doi.org/10.1093/nar/gks406
RAST Annotation: http://rast.nmpdr.org/
PROKKA Annotation: https://github.com/tseemann/prokka
pVOG Database: http://dmk-brain.ecn.uiowa.edu/pVOGs
VOGdb Database: http://vogdb.org/
Artemis Genome Browser: https://sanger-pathogens.github.io/Artemis/