InBase, The Intein Database: Submitting Intein Data

Inteins may be submitted confidentially or publicly. You will be notified if another person has submitted the same intein.

The Intein Registry Curator adds new information to intein records as submitted and attempts to fill in all fields if not provided by the contributor. The 'Initially Contributed by' field only indicates the person who made the initial submission or the name of the scientist associated with the intein if the submission is made by the curator. The 'Independently Found by' field indicates that another person has independently identified and submitted information about an intein prior to publication or release in the database. Updates are acknowledged in the 'Reference' and 'Comments' fields.

Researchers should be aware that once an intein sequence is available in a public database, it may be identified and submitted by anyone searching databases for inteins. I therefore urge the initial discoverer to submit his or her entry as soon as possible to avoid seeing their intein submitted by another party - don't forget that you can ask to have your submission held for as long as you like or until another party submits it (at which time I will contact you).

Submitting Intein Data
InBase now includes intein amino acid sequences.
I. Send comments and submit data to the Intein Registry Curator, Francine Perler, at:
Dr. Francine Perler
New England Biolabs
32 Tozer Road
Beverly, MA 01915, USA
Phone: 1-978-929-5054 FAX:1-978-921-1350
II. Submitting a new intein:

Step 1: The preferred method of submission is by electronic submission using the form below or by email. Email to receive an email or Word (Mac) copy of the submission form. Alternatively, mail information by hard copy or Fax (please mail a copy of the Fax to insure accurate reading of your entry).

Step 2: Send a Bestfit or Blast type amino acid lineup for your new intein (complete precursor) vs. an inteinless homolog showing the insertion of the intein sequence.

Step 3: Fill in the following questionnaire and hit the submit button. Note that full explanations of each section are listed after the form and examples are given for each section based on the Tli Pol-1 intein - Clicking on the query name will automatically shift to the explanation.



(Sample Response)

A. Intein Name:

(Tli Pol-1)

B. Prototype Allele:

(Tli Pol-1)

C. Extein name:

(DNA polymerase)

D. Intein Class:


E. Organism Name:

(Thermococcus litoralis)

F. Organism Description:


G. Domain of Life:


H. Endo Activity:


I. Endo Motif:


J. Location in extein (aa preceding intein):


K. Insert Site :

(pol-b, Pol Motif B)

L. Intein size (aa):


M. N-terminal Splice Junction:


N. C-terminal Splice Junction:


O. Accession No.:

(S42459 in NCBI/protein)

P. Intein aa Sequence.:

(In FASTA format)

Q. Initially Contributed by:

(Francine B. Perler)

R. Contributor's Address: (optional):

S. Contributor's Phone No. (optional):


T. Contributor's Fax No. (optional):


U. Contributor's Email address:



V. Independently Found by:

(To be filled in by Curator)

W. Comments:

X. Block A:


Y. Block B:


Z. Block C:


AA. Block D:


AB. Block E:


AC. Block H:


AD. Block F:


AE. Block G:


AF. Date Submitted:

AG. References:

(Perler, PNAS, 1992, (in press))

AH. Allele Group:

(Tli Pol-1, pol-b)


A. Intein Name: Tli pol-1. The intein name should consist of a 3 letter organism specification, where the first letter is the first letter of the Genus (Thermococcus) and the second and third letters are a Species designation (litoralis). The organism abbreviation is then followed by an abbreviation for the extein gene (Pol). If more than 1 intein is present in a gene, they should be numbered in order of appearance from the N- to C-terminus. For organisms with only a genus designation and an isolate code, such as Psp Pol or Tsp Pol inteins, include a strain designation - Example: Psp-KOD Pol.

B. Prototype Allele: List the prototype intein, which is the first intein identified at the identical extein insertion site in an extein homolog. See the Intein Allele page for a current list of intein alleles and their prototype intein. If this intein is a member of a new allele group, list the first example of this allele as the prototype and in the 'Allele Group' field (See AG. below) indicate 'New' followed by the intein homolog.

C. Extein name: Full extein gene or protein product name.

D. Intein Class: List as EXPERIMENTAL if you have demonstrated the presence of the spliced product either by Western blot or staining of a SDS-PAGE gel. List as THEORETICAL if you have not demonstrated the presence of the spliced product.

E. Organism Name: List full genus and species name including isolate or strain designation.

F. Organism Description: Include a descriptive term for the organism such as Thermophile, Red alga, phage, Cyanobacterium, human pathogen, etc.

G. Domain of Life: Choose from: Eucarya, Eubacteria or Archaea

H. Endo Activity: If endonuclease activity has been demonstrated, list endonuclease name. Convention dictates that the endonuclease name is preceded by the 'PI-' prefix.

I. Endo Motif: List: 'DOD' if a member of the LAGLIDADG or dodecapeptide motif family; 'HNH' if a member of the HNH family; 'none' if no large insert is present between intein Blocks B and F; and 'unknown' if a large insert (>100 aa) is present that doesn't have any of the known homing endonuclease signature motifs. Leave blank if you are not sure.

J. Location in extein (aa preceding intein): List the amino acid preceding the intein (using the single letter code) and its position in the extein, with amino acid 1 being the initiating Met of the extein gene.

K. Insert Site: Include the insertion site name consisting of the abbreviated extein name and an alphabetic designation for insertion sites in that extein, taking into account the previously identified insertion sites in extein homologs. Also list notable landmarks like motifs or active site residues.

L. Intein size (aa): List the number of amino acids in the intein, not including the C-extein S/T/C.

M. N-terminal Splice Junction: Last 10 N-extein residues/intein N-terminus (single letter code).

N. C-terminal Splice Junction: Last 2 amino acids of the intein/first 10 amino acids of the C-extein (single letter code).

O. Accession No.: Indicate the database associated with the accession number. If the accession number refers to a large, multi-gene nucleotide sequence, also use ORF designation if possible. The amino acid sequence of the full extein is the preferred accession number.

P. Intein aa sequence: Please use the single letter aa code and submit in FASTA format. Any comment line should begin with ">". You can choose to include the +1 and -1 extein residues, just the intein sequence, or both sequences. Please indicate the accession number of the submitted sequence.

Q. Initially Contributed by: Your name

R. Contributor's Address: Optional

S. Contributor's Phone No.: Optional

T. Contributor's Fax No.: Optional

U. Contributor's Email address: Mandatory

V. Independently Found by: To be filled in by the Curator

W. Comments: Any comment that you think people will find useful, especially missing conserved residues or motifs. Try to be brief.

X. Block A: List the 13 amino acids in Block A and indicate the number of the last residue in the block when the first intein residue is #1. Instructions for identifying intein motifs are found on the Conserved Intein Features - Do you Have an Intein?page.

Y. Block B: List the 14 amino acids in Block B and indicate the number of the last residue in the block when the first intein residue is #1.

Z. Block C: List the 9 amino acids in Block C and indicate the number of the last residue in the block when the first intein residue is #1.

AA. Block D: List the 8 amino acids in Block D and indicate the number of the last residue in the block when the first intein residue is #1.

AB. Block E: List the 10 amino acids in Block E and indicate the number of the last residue in the block when the first intein residue is #1.

AC. Block H: List the 19 amino acids in Block H and indicate the number of the last residue in the block when the first intein residue is #1.

AD. Block F: List the 14 amino acids in Block F and indicate the number of the last residue in the block when the first intein residue is #1.

AE. Block G: List the 8 amino acids in Block G and indicate the number of the last residue in the block when the first intein residue is #1.

AF. Date Submitted: Date submitted

AG. References: List all references associated with this intein, including papers in press.

AH. Allele Group: If this intein is a member of an already identified allele group, use that allele group name. If this is a new allele group, indicate 'New' followed by the other intein homolog. The Allele Group is similar to the prototype allele designation, but it includes the extein insertion site designation and is listed in the "Intein Alleles" page.

Last database update: 11/05/10

