Doryteuthis pealeii gene pages Help
Annotation: Dpe Pita v0.3, release December 22, 2023
Go back to gene search page
Find genes based on gene symbols, JGI IDs (Dpe v2.1), or matches to human or fly (Drosophila melanogaster) proteins (gene symbols). Pita has combined JGI gene symbols if multiple transcripts were merged into a single gene model.
Search is case-sensitive. Human gene symbols are in upper case, squid JGI and Pita gene symbols are based on human gene symbols. Fly gene symbols can be capitalized, but often are not (cf. Wikipedia).
Partial search strings may result in more matches; regex special characters (regular expressions) can be used for more elaborate searches. See release notes (below) for more information on Pita gene annotation.
Search results include the following fields:
- Pita v0.3 "combined_symbols" (see release notes)
- JGI v2.1 Dopeav ID
- JGI v2.1 gene symbol
- Best hit in the human proteome (of all open reading frames tested)
- Reading frames with relevant matches in the human proteome, with the corresponding BLASTP E-value.
- Best hit in the fly (D. melanogaster) proteome (of all open reading frames tested)
- Reading frames with relevant matches in the fly proteome, with the corresponding BLASTP E-value.
The search may produce results based on matches in any of these fields. In some cases, more than one reading frame produces good hits in the human or fly proteome. This can be due to problems with the gene annotation (for example wrong splice sites, or read-through transcripts) or the genome assembly (sequencing errors, missing genomic fragments), resulting in apparent frame shifts.
In columns 5 and 7, only frames where the best match has an E-value smaller than 1e-04 are included. The frame is a number [1-3]. If followed by "M", the sequence with the best hit starts with a start codon (usually a substring of the sequence). This is not always the case, potentially due to imperfect genome assembly or gene models (missing exons).
Release notes Dpe Pita v0.3
- Annotation: pita_genes_v0.3.bed, pita_genes_v0.3.bed_12to.gtf
- Number gene models: 24967
- Release date: December 22, 2023
- Associated genome: Dpe_v2.1
- Comments, questions: gertjan.veenstra.{at}.ru.nl
Usage
- Annotation available in two commonly used formats (bed, gtf).
- Additional information on gene IDs: pita_all_ids.txt. This file links different types of IDs associated with genes in other annotations. Note that more than one id / gene from previous annotations can be associated with a gene in the current gene annotation. The file pita_all_ids.txt contains the following columns: uniqid, pita_id, JGI_mRNA_id, JGI_Dopeav_id, JGI_gene_symbol, combined_symbols.
- Use with genome assembly Dpe_v2.1. It has named chromosomes, contrary to UCB_Dpea_1
More information
- This annotation used Pita ("Pita Improves Transcript Annotation"; https://github.com/simonvh/pita), with the following input data: short read RNA-seq data (different tissues and stages of development) and H3K4me3 ChIP-seq data for improving the 5' ends of genes (branchial and systemic heart and two developmental stages).
- pita_genes_v0.3 associates gene IDs and gene symbols of JGI annotation with the pita gene models and complements pita annotation with non-overlapping JGI models (non-overlapping at exon level). This improves mappability (1.38 compared to 1.31, see numbers below).
- For this release the gene IDs in previous gene annotations are documented in a separate file (pita_all_ids.txt), allowing for a complete and easy comparison with previous (JGI) gene annotation or analyses. The information on IDs and gene symbols is also available on the Dory gene pages: https://veenstralab.nl/dpe.
- Gene names in v0.3 correspond to the "combined_symbols" column in pita_all_ids.txt. JGI gene symbols are combined (joined with underscores) if more than one gene symbol is associated with (parts of) the gene in the new gene annotation. For example, two (partial) ISL genes in previous annotations have been combined to ISL2_ISL1.These have been made unique by adding a suffix _[number] if more than one instance exists. For example, there are almost 5900 genes without a proper gene symbol (named NA, numbered from NA through NA_5888 in the current annotation).
- The mapping efficiency of RNA-seq data has improved compared to previous releases, including the JGI gff3 and original pita annotation. Relative mapping efficiencies of 1, 1.31, and 1.38 were observed for respectively JGI gff3, pita (Chana, 2021), and pita_genes_v0.3.
- Non-chromosomal scaffolds (scDpe) have been left out, except for two scaffolds that may correspond to mitochondrial DNA (blast match with mitochondrial DNA of a related species, Doryteuthis opalescens). This means the gene models on these scaffolds (from original JGI annotation) can be used for calculating QC metrics in single cell data. None of the mitochondrial genes have been named with a proper gene symbol in the JGI annotation ('NA'), and have therefore been renamed to MT-NA through MT-NA_22 in the current annotation.
- Contributors: Chana van der Heijden, Simon van Heeringen, Saskia Heffener, Caroline Albertin, Gert Jan Veenstra
Known issues
- Gene symbols are combined with underscore by pita based on JGI v2.1 gene symbols, but in JGI v2.1 some gene symbols were already combined with a hyphen in between. This leads to different combinations (for example ZNF112_ZNF559-ZNF177, ZNF559-ZNF177, ZNF559_ZNF177) which are all different genes.
- Underscores are not accepted in gene names in Seurat. This can be solved by substituting the underscore with a period.
- Some genes / transcripts newly identified by Pita (with unique IDs such as Dpe30:64556374-64557498_), have not yet received an NA_[number] or gene symbol. The Dory gene pages also do not show additional information on these loci.
- Gene annotation is complicated ... a work in progess ... for which we will need resources.