1. Genome & Sequence Databases
NCBI (GenBank / RefSeq / dbSNP)
Comprehensive resource for DNA and RNA sequence data.
Provides reference genomes, genetic variants, and annotations.
Widely used for genomics research and literature integration.
Website Link: NCBI
EMBL-EBI / ENA (European Nucleotide Archive)
European archive for nucleotide sequences and genome assemblies.
Stores raw sequencing data and annotated genomes.
Part of the International Nucleotide Sequence Database Collaboration.
Website Link: ENA
DDBJ (DNA Data Bank of Japan)
Japan’s primary nucleotide sequence database.
Synchronizes data with GenBank and ENA.
Supports large-scale sequencing submissions.
Website Link: DDBJ
CNGBdb (China National GeneBank Database)
Large-scale genomic and multi-omics data platform.
Provides cloud-based data analysis and visualization tools.
Supports global genomics research initiatives.
Website Link: CNGBdb
2. Protein Databases
UniProt (Swiss-Prot / TrEMBL)
Central resource for protein sequence and functional annotation.
Swiss-Prot contains manually curated protein data.
TrEMBL provides automated protein annotations.
Website Link: UniProt
PDB (Protein Data Bank)
Archive of three-dimensional protein and nucleic acid structures.
Structures obtained using X-ray, NMR, and cryo-EM methods.
Essential for structural biology and drug discovery.
Website Link: PDB
InterPro
Database of protein families, domains, and functional sites.
Integrates multiple protein signature databases.
Helps predict protein function and structure.
Website Link: InterPro
SCOP / CATH
Protein classification systems based on structure and evolution.
SCOP focuses on manual structural classification.
CATH provides automated hierarchical classification.
3. Transcriptomics & RNA
GEO (Gene Expression Omnibus)
Repository for gene expression and functional genomics data.
Includes microarray and sequencing-based experiments.
Supports data reuse and meta-analysis.
Website Link: GEO
SRA (Sequence Read Archive)
Archive of raw high-throughput sequencing data.
Includes RNA-seq, WGS, metagenomics, and more.
Ensures reproducibility of sequencing studies.
Website Link: SRA
Ensembl / GENCODE
Provides genome-wide gene and transcript annotations.
GENCODE focuses on high-quality human and mouse data.
Ensembl supports multiple species with genome browsers.
Website Link:
miRBase / RNAcentral
miRBase catalogs experimentally validated microRNAs.
RNAcentral aggregates non-coding RNA sequences.
Provides unified access to ncRNA resources.
Website Link:
miRBase
RNAcentral
4. Pathways & Function
KEGG (Kyoto Encyclopedia of Genes and Genomes)
Links genes with pathways, metabolites, and diseases.
Used for pathway mapping and functional analysis.
Supports systems biology research.
Website Link: KEGG
Gene Ontology (GO)
Standardized vocabulary for gene function annotation.
Covers biological processes, molecular functions, and cellular components.
Widely used in functional enrichment analysis.
Website Link: GO
Reactome / EcoCyc
Curated databases of biological and metabolic pathways.
Reactome focuses on eukaryotic pathways.
EcoCyc specializes in bacterial metabolism.
5. Disease & Genetic Variation
OMIM (Online Mendelian Inheritance in Man)
Catalog of human genes and inherited disorders.
Provides detailed genotype–phenotype relationships.
Widely used in medical genetics.
Website Link: OMIM
ClinVar
Public archive of clinically relevant genetic variants.
Includes variant interpretations with supporting evidence.
Maintained by NCBI.
Website Link: ClinVar
COSMIC
Catalogue of somatic mutations in cancer.
Focuses on tumor samples and cancer genomics.
Widely used in oncology research.
Website Link: COSMIC
TCGA / HGMD
TCGA provides large-scale cancer genomics datasets.
HGMD catalogs disease-causing human gene mutations.
Used in clinical and research studies.
Website Link:
6. Model Organism Databases
MGD (Mouse Genome Database)
Comprehensive database for mouse genetics and genomics.
Links mouse genes to human disease models.
Maintained by The Jackson Laboratory.
Website Link: MGD
ZFIN
Central database for zebrafish genetics and development.
Includes gene expression and phenotype data.
Supports biomedical research.
Website Link: ZFIN
FlyBase
Database for Drosophila genetics and genomics.
Provides functional and phenotypic information.
Widely used in developmental biology.
Website Link: FlyBase
WormBase
Resource for Caenorhabditis elegans biology and genomics.
Includes genes, pathways, and phenotypes.
Supports nematode research worldwide.
Website Link: WormBase
SGD (Saccharomyces Genome Database)
Reference genome database for budding yeast.
Provides gene function and pathway information.
Widely used in molecular biology research.
Website Link: SGD
7. Plant Genomics
TAIR (The Arabidopsis Information Resource)
Primary database for Arabidopsis thaliana genetics.
Contains genome sequences and functional annotations.
Widely used in plant molecular biology research.
Website Link: TAIR