SeqHub API
The SeqHub API gives you programmatic access to protein search and annotation tools built on top of Tatta Bio's genomic language models.
Endpoints
Protein Context Search
Search our database of 130,000+ microbial genomes for contigs containing proteins most similar (by embedding distance) to your query protein(s). A contig is a contiguous stretch of sequenced DNA encoding multiple proteins. Each result is the genomic neighborhood surrounding your match, along with functional annotations and taxonomic information.
- Single query:
POST /api/v1/protein-contexts/search— search with one protein sequence. - Multi-query:
POST /api/v1/protein-contexts/search/multi-query— search for contigs where all query proteins co-occur in the same genomic neighborhood.
Protein Annotation
Annotate a batch of protein sequences with biological function by finding their closest match in SwissProt (UniProt's curated database). Returns a functional description, accession ID, similarity score, percent identity, and query coverage for each input sequence.
- Annotate —
POST /api/v1/protein-annotations— annotate one or more protein sequences.
Authentication
All requests require a personal access token (PAT). Generate one from the API Tokens section of your SeqHub profile.
Pass the token in the Authorization header:
Code
Your token is shown once when you create it. It starts with seqhub_ — copy it exactly as shown.
Limits
Rate limits: 1000 requests per month for protein search endpoints, and 1000 requests per month for the annotation endpoint.
Batch size: The annotation endpoint accepts up to 128 sequences per request.
If you need higher limits, let us know! Please reach out to us at team@tatta.bio.
Versioning
The SeqHub API is currently in beta. The API is subject to breaking changes while in beta.