# 6. Genome annotation As mentioned during the whole workshop, also here there are many options, however, in the last few years a group of smart people put a nice package together specifically for analyzing fungal genomes named funannotate (https://funannotate.readthedocs.io/). Genome annotation benefits from RNAseq results but this is not necessary. We can still get good predictions of genes with just a genome assembly. ## Masking repeats Funannotate has various steps and the first one is to softmask the repeat sections in the genome (it will basically put the repeat sections in the fasta file in small letters and the rest in capital letters): ``` funannotate mask -i PATH/TO/ASSEMBLY --cpus 12 -o PATH/TO/MASKED_ASSEMBLY.fasta ``` ## Gene prediction Now the repeats are masked we can predict genes in the assembly. Funannotate will use AUGUSTUS for this which has pre-trained models for a range of species. We will have to select the species closest to the species we are analyzing. To check the species available we can call up a table with `funannotate species` Once we have selected the species closest to our sample we can perform the prediction, e.g., `coprinus`: ``` funannotate predict -i PATH/TO/MASKED_ASSEMBLY.fasta -o PATH/TO/OUTPUT --species "YOUR SPECIES NAME" --strain OPTIONAL --busco_seed_species coprinus --cpus 12 ``` ## Gene identification This will create a set of gene models from our assembly that we can use to compare to various databases. Using `funannotate` we will compare the gene models with `interproscan`, `eggnog`, `antismash`, and `phobius`. We will start with `interproscan`: ``` #run using docker funannotate iprscan -i PATH/TO/PREDICT_OUTPUT/predict_results/ -m docker --cpus 12 #run locally (Linux only) funannotate iprscan -i PATH/TO/PREDICT_OUTPUT/predict_results/ -m local --iprscan_path /my/path/to/interproscan.sh -o OUTPUT ``` Next we will analyse the data with `antismash` and `phobius`. If they are not installed locally, we can do both at the same time remotely: ``` funannotate remote -i PATH/TO/PREDICT_OUTPUT -m phobius antismash -e your-email@domain.edu ``` >**NOTE:** >This doesn't work properly and is quite slow. We should have both `phobius` and `antismash` installed on the server, so we can run it locally. >**NOTE:** >THE BELOW PART IT STILL IN PROGRESS! For `phobius` locally: ``` phobius -short -o annotate_misc/phobius.results.txt -l logfiles/phobius.log PATH/TO/annotate_misc/genome.proteins.fasta ``` For `antismash` locally: ``` antismash -t fungi -c 64 --databases PATH/TO/ANTISMASH/DATABASES --output-dir PATH/TO/OUTPUT --output-basename PREFIX PATH/TO/PREDICT_OUTPUT.gbk ``` the last input file could also be `PATH/TO/PREDICT_OUT.fasta`. If `funannotate predict` generates a gff3 file you can use the extra option `--genefinding-gff3 PATH/TO/GFF3-FILE`. Finally, we can combine all the analyses and do the actual annotation. If `eggnog` is installed locally, it will automatically add this analysis as well: ``` funannotate annotate -i PATH/TO/PREDICT_OUTPUT --cpus 64 ``` Alternatively, when running the `antismash` and `phobius` locally: ``` funannotate annotate -i PATH/TO/PREDICT_OUTPUT --antismash PATH/TO/ANTISMASH_RESULTS --cpus 64 ```