Reference Data

We keep a number of reference data sets available on Katana at /data/ so that we don’t accidentally - for instance - end up with 150 copies of the Human Genome in user’s home directories.

As these are reference data, they don’t change often and we can update them as necessary.

Katana Data Sets

Directory

Description

Update Schedule

URL

annovar

Reference datasets that come with software installation.

Installed when software is installed.

annovar.openbioinformatics.org

antismash

Reference files and commands for antismash version 4.2.0

Version specific database installed when software is installed

antismash.secondarymetabolites.org

blast

NCBI nr, nt, refseq_genomic and refseq datasets

Updated on release 6 times a year

www.ncbi.nlm.nih.gov/refseq

blastv5

Version 5 of NCBI nr, nt, refseq_genomic and refseq datasets.

Updated on release 6 times a year.

www.ncbi.nlm.nih.gov/refseq

diamond

Diamond reference databases for versions 0.8.38, 0.9.10, 0.9.22 and 0.9.24. Database format periodically changes.

Updated when NCBI nr databases are updated.

ab.inf.uni-tuebingen.de/software/diamond

gtdbtk

Version specific database installed when software is installed.

gtex

Genotype-Tissue Expression project, comprehensive resource to study tissue-specific gene expression and regulation

Please contact the Oates lab (e.oates@unsw.edu.au) for access to a large set of GTEx datathat is not publicly available.

https://gtexportal.org/home/

hapcol

Reference datasets that come with software installation.

Installed when software is installed.

hapcol.algolab.eu

hg19

Human reference genome hg19 (GRCh37).

Fixed reference. Never updated.

www.ncbi.nlm.nih.gov/grc

interproscan

Reference datasets for InterProScan versions 5.20-59.0 and 5.35-74.0

Version specific database installed when software is installed.

www.ebi.ac.uk/interpro

itasser

Rererence datasets for I-TASSER plus link to current nr database.

Version specific databases installed when software is installed plus link to nr database (see blast above).

zhanglab.ccmb.med.umich.edu/I-TASSER

kaiju

Reference databases for all versions of Kaiju. Same databases for all versions.

Databases installed when software is installed.

kaiju.binf.ku.dk

matam

Reference databases for all MATAM versions.

Version specific database installed when software is installed.

github.com/bonsai-team/matam

megan

Reference databases for all MEGAN versions.

Version specific database installed when software is installed.

ab.inf.uni-tuebingen.de/software/megan6

repeatmasker

Reference datasets for RepeatMasker version 4.0.7

Version specific database installed when software is installed.

www.repeatmasker.org

sra

Sequence Read Archive, repository of high throughput sequencing data

https://www.ncbi.nlm.nih.gov/sra

trinotate

Reference databases for all versions of Kaiju. Same databases for all versions.

Databases installed when software is installed.

trinotate.github.io