6. Genome annotation
As mentioned during the whole workshop, also here there are many options, however, in the last few years a group of smart people put a nice package together specifically for analyzing fungal genomes named funannotate (https://funannotate.readthedocs.io/).
Genome annotation benefits from RNAseq results but this is not necessary. We can still get good predictions of genes with just a genome assembly.
Masking repeats
Funannotate has various steps and the first one is to softmask the repeat sections in the genome (it will basically put the repeat sections in the fasta file in small letters and the rest in capital letters):
funannotate mask -i PATH/TO/ASSEMBLY --cpus 12 -o PATH/TO/MASKED_ASSEMBLY.fasta
Gene prediction
Now the repeats are masked we can predict genes in the assembly. Funannotate will use AUGUSTUS for this which has pre-trained models for a range of species. We will have to select the species closest to the species we are analyzing. To check the species available we can call up a table with funannotate species
Once we have selected the species closest to our sample we can perform the prediction, e.g., coprinus:
funannotate predict -i PATH/TO/MASKED_ASSEMBLY.fasta -o PATH/TO/OUTPUT --species "YOUR SPECIES NAME" --strain OPTIONAL --busco_seed_species coprinus --cpus 12
Gene identification
This will create a set of gene models from our assembly that we can use to compare to various databases. Using funannotate we will compare the gene models with interproscan, eggnog, antismash, and phobius. We will start with interproscan:
#run using docker
funannotate iprscan -i PATH/TO/PREDICT_OUTPUT/predict_results/ -m docker --cpus 12
#run locally (Linux only)
funannotate iprscan -i PATH/TO/PREDICT_OUTPUT/predict_results/ -m local --iprscan_path /my/path/to/interproscan.sh -o OUTPUT
Next we will analyse the data with antismash and phobius. If they are not installed locally, we can do both at the same time remotely:
funannotate remote -i PATH/TO/PREDICT_OUTPUT -m phobius antismash -e your-email@domain.edu
NOTE: This doesn’t work properly and is quite slow. We should have both
phobiusandantismashinstalled on the server, so we can run it locally.
NOTE: THE BELOW PART IT STILL IN PROGRESS!
For phobius locally:
phobius -short -o annotate_misc/phobius.results.txt -l logfiles/phobius.log PATH/TO/annotate_misc/genome.proteins.fasta
For antismash locally:
antismash -t fungi -c 64 --databases PATH/TO/ANTISMASH/DATABASES --output-dir PATH/TO/OUTPUT --output-basename PREFIX PATH/TO/PREDICT_OUTPUT.gbk
the last input file could also be PATH/TO/PREDICT_OUT.fasta.
If funannotate predict generates a gff3 file you can use the extra option --genefinding-gff3 PATH/TO/GFF3-FILE.
Finally, we can combine all the analyses and do the actual annotation. If eggnog is installed locally, it will automatically add this analysis as well:
funannotate annotate -i PATH/TO/PREDICT_OUTPUT --cpus 64
Alternatively, when running the antismash and phobius locally:
funannotate annotate -i PATH/TO/PREDICT_OUTPUT --antismash PATH/TO/ANTISMASH_RESULTS --cpus 64