Skip to content

Reference DataΒΆ

We keep a number of reference data sets available on Katana at /data/ so that we don't accidentally - for instance - end up with 150 copies of the Human Genome in user's home directories.

As these are reference data, they don't change often and we can update them as necessary.

Directory Description Update Schedule URL
annovar Reference datasets that come with software installation. Installed when software is installed. annovar.openbioinformatics.org
antismash Reference files and commands for antismash version 4.2.0 Version specific database installed when software is installed antismash.secondarymetabolites.org
blast NCBI nr, nt, refseq_genomic and refseq datasets Updated on release 6 times a year www.ncbi.nlm.nih.gov/refseq
blastv5 Version 5 of NCBI nr, nt, refseq_genomic and refseq datasets. Updated on release 6 times a year. www.ncbi.nlm.nih.gov/refseq
diamond Diamond reference databases for versions 0.8.38, 0.9.10, 0.9.22 and 0.9.24. Database format periodically changes. Updated when NCBI nr databases are updated. ab.inf.uni-tuebingen.de/software/diamond
gtdbtk Version specific database installed when software is installed.
gtex Genotype-Tissue Expression project, comprehensive resource to study tissue-specific gene expression and regulation Please contact the [Oates lab](mailto:e.oates@unsw.edu.au) for access to a large set of GTEx datathat is not publicly available. https://gtexportal.org/home/
hapcol Reference datasets that come with software installation. Installed when software is installed. hapcol.algolab.eu
hg19 Human reference genome hg19 (GRCh37). Fixed reference. Never updated. www.ncbi.nlm.nih.gov/grc
interproscan Reference datasets for InterProScan versions 5.20-59.0 and 5.35-74.0 Version specific database installed when software is installed. www.ebi.ac.uk/interpro
itasser Rererence datasets for I-TASSER plus link to current nr database. Version specific databases installed when software is installed plus link to nr database (see blast above). zhanglab.ccmb.med.umich.edu/I-TASSER
kaiju Reference databases for all versions of Kaiju. Same databases for all versions. Databases installed when software is installed. kaiju.binf.ku.dk
matam Reference databases for all MATAM versions. Version specific database installed when software is installed. github.com/bonsai-team/matam
megan Reference databases for all MEGAN versions. Version specific database installed when software is installed. ab.inf.uni-tuebingen.de/software/megan6
repeatmasker Reference datasets for RepeatMasker version 4.0.7 Version specific database installed when software is installed. www.repeatmasker.org
sra Sequence Read Archive, repository of high throughput sequencing data https://www.ncbi.nlm.nih.gov/sra
trinotate Reference databases for all versions of Kaiju. Same databases for all versions. Databases installed when software is installed. trinotate.github.io