Sciverse

Best Bioinformatics Databases List

sciverse.in / 15 Jan 2026 05:12 PM

1. Genome & Sequence Databases

NCBI (GenBank / RefSeq / dbSNP)
Comprehensive resource for DNA and RNA sequence data.
Provides reference genomes, genetic variants, and annotations.
Widely used for genomics research and literature integration.

Website Link: NCBI

EMBL-EBI / ENA (European Nucleotide Archive)
European archive for nucleotide sequences and genome assemblies.
Stores raw sequencing data and annotated genomes.
Part of the International Nucleotide Sequence Database Collaboration.

Website Link: ENA

DDBJ (DNA Data Bank of Japan)
Japan’s primary nucleotide sequence database.
Synchronizes data with GenBank and ENA.
Supports large-scale sequencing submissions.

Website Link: DDBJ

CNGBdb (China National GeneBank Database)
Large-scale genomic and multi-omics data platform.
Provides cloud-based data analysis and visualization tools.
Supports global genomics research initiatives.

Website Link: CNGBdb

2. Protein Databases

UniProt (Swiss-Prot / TrEMBL)
Central resource for protein sequence and functional annotation.
Swiss-Prot contains manually curated protein data.
TrEMBL provides automated protein annotations.

Website Link: UniProt

PDB (Protein Data Bank)
Archive of three-dimensional protein and nucleic acid structures.
Structures obtained using X-ray, NMR, and cryo-EM methods.
Essential for structural biology and drug discovery.

Website Link: PDB

InterPro
Database of protein families, domains, and functional sites.
Integrates multiple protein signature databases.
Helps predict protein function and structure.

Website Link: InterPro

SCOP / CATH
Protein classification systems based on structure and evolution.
SCOP focuses on manual structural classification.
CATH provides automated hierarchical classification.

Website Link:
SCOP
CATH

3. Transcriptomics & RNA

GEO (Gene Expression Omnibus)
Repository for gene expression and functional genomics data.
Includes microarray and sequencing-based experiments.
Supports data reuse and meta-analysis.

Website Link: GEO

SRA (Sequence Read Archive)
Archive of raw high-throughput sequencing data.
Includes RNA-seq, WGS, metagenomics, and more.
Ensures reproducibility of sequencing studies.

Website Link: SRA

Ensembl / GENCODE
Provides genome-wide gene and transcript annotations.
GENCODE focuses on high-quality human and mouse data.
Ensembl supports multiple species with genome browsers.

Website Link:

Ensembl
GENCODE

miRBase / RNAcentral
miRBase catalogs experimentally validated microRNAs.
RNAcentral aggregates non-coding RNA sequences.
Provides unified access to ncRNA resources.

Website Link:
miRBase
RNAcentral

4. Pathways & Function

KEGG (Kyoto Encyclopedia of Genes and Genomes)
Links genes with pathways, metabolites, and diseases.
Used for pathway mapping and functional analysis.
Supports systems biology research.

Website Link: KEGG

Gene Ontology (GO)
Standardized vocabulary for gene function annotation.
Covers biological processes, molecular functions, and cellular components.
Widely used in functional enrichment analysis.

Website Link: GO

Reactome / EcoCyc
Curated databases of biological and metabolic pathways.
Reactome focuses on eukaryotic pathways.
EcoCyc specializes in bacterial metabolism.

Website Link:
Reactome
EcoCyc

5. Disease & Genetic Variation

OMIM (Online Mendelian Inheritance in Man)
Catalog of human genes and inherited disorders.
Provides detailed genotype–phenotype relationships.
Widely used in medical genetics.

Website Link: OMIM

ClinVar
Public archive of clinically relevant genetic variants.
Includes variant interpretations with supporting evidence.
Maintained by NCBI.

Website Link: ClinVar

COSMIC
Catalogue of somatic mutations in cancer.
Focuses on tumor samples and cancer genomics.
Widely used in oncology research.

Website Link: COSMIC

TCGA / HGMD
TCGA provides large-scale cancer genomics datasets.
HGMD catalogs disease-causing human gene mutations.
Used in clinical and research studies.

Website Link:

TCGA
HGMD

6. Model Organism Databases

MGD (Mouse Genome Database)
Comprehensive database for mouse genetics and genomics.
Links mouse genes to human disease models.
Maintained by The Jackson Laboratory.

Website Link: MGD

ZFIN
Central database for zebrafish genetics and development.
Includes gene expression and phenotype data.
Supports biomedical research.

Website Link: ZFIN

FlyBase
Database for Drosophila genetics and genomics.
Provides functional and phenotypic information.
Widely used in developmental biology.

Website Link: FlyBase

WormBase
Resource for Caenorhabditis elegans biology and genomics.
Includes genes, pathways, and phenotypes.
Supports nematode research worldwide.

Website Link: WormBase

SGD (Saccharomyces Genome Database)
Reference genome database for budding yeast.
Provides gene function and pathway information.
Widely used in molecular biology research.

Website Link: SGD

7. Plant Genomics

TAIR (The Arabidopsis Information Resource)
Primary database for Arabidopsis thaliana genetics.
Contains genome sequences and functional annotations.
Widely used in plant molecular biology research.

Website Link: TAIR