PDBSite - a database on protein active sites and their spatial environment

 

1. Introduction

2. Methods and Materials

3. Field description

4. Examples of querying

4.1 Searching for the proteins by the data on their functional sites

4.1.1 Ligand binding sites: Zinc, Calcium,

Phosphate, Sugar binding sites, etc.

4.1.2 Sites subjected to biochemical modifications:

phosphorilation, glycosylation sites, etc.

4.1.3 Active sites: cysteine protease active site, ribonucleolytic active site, etc.

4.2 Searching for the proteins by the data on structural characteristics of their sites

4.3 Searching for the functional sites by the data on proteins

 

 

 

1. Introduction

The data on biologically active protein sites are of extreme importance for solving many problems in molecular biology, biotechnology, and medicine. High specificity of biological activity in proteins is produced by unique structure of active sites that are often organised by a very complicate pattern. In particular, biologically active sites in proteins are often compiled out of remote by primary structure amino acid residues, which form compact clusters in the spatial structure with strictly ordered conformation. Specific structure and conformational parameters of these sites are determined by the structure of their spatial amino acid surroundings. For example, spatial amino acid surroundings of enzyme catalytic centres determine the relief of hollows in catalytic centres of enzymes in a substrate binding regions [1], whereas the residues of antigen determinants of proteins determine their structure by organising prominent parts at the protein surface [2]. For many natural and mutant proteins, the relationships were found between protein activity and physico-chemical properties of amino acid residues composing the local surroundings of a functional site [3]. The spatial surroundings of biologically active sites may be detected only if the data on tertiary protein structures are available. The Protein Data Bank (PDB) [4] contains data on the spatial protein structures and their biologically active sites (i.e., ligand binding regions, enzyme catalytic centres, regions subjected to biochemical modifications, etc.). However, neither of the well-known systems searching PDB does not provide the user with possibility to make the queries related with the active sites. PDBSite was developed as a daughter database accumulating the data on functional and structural characteristics of functional sites stored in PDB, as well as their spatial surroundings.

2. Methods and materials

The PDBSite database contains the data obtained by using the following fields of PDB database: HEADER, TITLE, KEYWDS, REMARK 800, SITE, ATOM. For using information contained in PDB, we have developed the program software for parsing. If a single PDB entry contains the data on several sites, then an individual entry for each site was created. An Internet access to the PDBSite database is provided by the Sequence Retrieval System (SRS).

The spatial surroundings of sites were calculated in the following way: by using atom co-ordinates, a parallelepiped including all the atoms of amino acid residues of a site was constructed. In what follows, atoms of amino acid residues are excluded from this parallelepiped. Spatial surroundings of a site were determined by analysis of all the rest protein residues. As the spatial surroundings of a site we determine all the rest amino acid residues in case at least one of constituting them atoms was included inside the parallelepiped. The schematic representation of the site enclosed into parallelepiped and illustrating the algorithm of forming its surroundings is shown in Figure 1.

Figure 1. Schematic representation of a site and its surroundings enclosed into parallelepiped. Site residues are marked by blue, surrounding residues, by green.

 

For characteristics of a site and its surroundings, we use (1) an exposure of each of its residues; (2) average value, (3) sum, and (4) spatial moment of physico-chemical parameters of amino acids, (5) centre mass co-ordinates for each residue, and (6) pairwise distance between mass centres of residues.

Spatial moment is calculated by the formula:

where pi, i = 1, 2, ...,N is the value of a certain property of the i-th residue of the 3-dimensional site, comprising N aminoacids; xi, yi, zi - co-ordinates of Ca atom of i-th residue, taken relatively to the geometrical centre of the 3-dimensional site.

 

Additionally, among the site characteristics, we have calculated (7) an indicator of the site discontinuity according to its primary structure. The indicator of discontinuity of a site was set as , where N is the number of residues, Pi is numerical number the i-th residue of the protein sequence of a site. For calculation of Exposure of amino acid residues, we have applied an approach of immersing the molecule into the cubic lattice (displacement of volume in cubic lattice).

 

3. Field description

 

PDBSite database contains the fields of two types: those that could be queried and searched or not. The fields that could not be queried contain additional information on the site structure and their surroundings.

Below is given the description of fields that could be queried.

ID - Entry identifier. This identifier is unique within PDBSite.

PDBID - PDB ID code. This identifier is unique within PDB.

Header - contains PDB classification for the entry. The field content corresponds to that in PDB.

Title - contains the title for experiment or analysis described in the entry. The field content corresponds to that in PDB.

Keyword - contains keywords describing the macromolecule. The content corresponds to that of KEYWDS of PDB.

Molecule - contains names of macromolecules from the COMPND of PDB and is designed to search for entries by the names of macromolecules.

NumSiteChains – contains number of different chains to which belong the residues of the site.

SiteDescr - contains description of the site. The content corresponds to that of SITE_DESCRIPTION sub-field of REMARK 800 field of PDB.

ResidueNotAA – contains the names of residues that are not amino acids but contained in the site.

LenSite - contains number of residues in the site.

LenSurround - contains number of residues in the site environment.

ExposureSite - contains average exposure of residues of the site.

ExposureSurround- contains average exposure of residues of the site environment.

Discontinuity – characterises discontinuity of the site by its primary structure.

 

Below is the list of fields that could not be queried but contain additional information on the site structure and its surroundings.

MolChains – contains identifier of chains of a macromolecule.

CHAIN_ID - contains chain identifier for the site and its environment.

POS – indicates the positions of site residues and its surroundings in a protein sequence.

RESNAME - contains names of site residues and its environment.

EXPOSE - contains exposure of each of the residues of the site and its environment.

 

Physico-chemical parameters of the site and its surroundings are listed in a special table. Types of physico-chemical characteristics are given in the columns of this table. The order of physico-chemical characteristics in the columns of the table is indicated in the line ORDER. In the lines, the type of a site and its surroundings is indicated, as well as the way of calculation of the physico-chemical parameter for the site and its surroundings. Three types of physico-chemical parameters are listed in the table: average, sum and module of spatial moment.

In the lines SITE, there are physico-chemical parameters calculated for the site. The lines FULL_SURROUND contain physico-chemical parameters calculated for all residues of site surroundings. The lines EXPOSED_SURROUND contain physico-chemical parameters calculated for exposed residues of the site surroundings. The lines BURIED_SURROUND contain physico-chemical parameters calculated for buried residues of site surroundings.

The table is organised in the following way.

The first is the line ORDER.

Then follows the line AVERAGE that indicates that in lines SITE, FULL_SURROUND, EXPOSED_SURROUND and BURIED_SURROUND the average values of physico-chemical parameters will be indicated. These lines are placed below the line AVERAGE.

Next is the line SUM indicating that physico-chemical parameters listed in the lines below were calculated as the sum values. Then follows the line SPATIAL MOMENT indicating that physico-chemical parameters listed in subsequent lines SITE, FULL_SURROUND, EXPOSED_SURROUND and BURIED_SURROUND were calculated as module of spatial moment. This is the end of the table.

 

PAIRWISE - contains pairwise distances between residues of the site.

COORDINATES CA_ATOMS - contains C-alpha atom co-ordinates of site residues.

COORDINATES CENTRE_MASS - contains centre mass co-ordinates of site residues.

4. Examples of querying

To make a query, do the following:

  1. Load the ‘Protein’ page of the GeneNetWorks system.
  2. Click the hyperlink PDBSITE.
  3. Select SRS ACCESS: PDBSITE. See an example in Figure 2.
  4. Figure 2. PDBSITE page. SRS ACCESS is marked by red arrow.

     

  5. Click the button ‘Search’. See an example in Figure 3.

Figure 3. PDBSITE starting page. Red arrow indicates the button ‘Search’.

 

 

4.1 Searching for proteins by data on their functional sites

4.1.1 Ligand binding sites: Zinc, Calcium,

Phosphate, Sugar binding sites, etc.

  1. Click the field ‘SiteDescr’ from the drop-down. For querying Zinc-binding sites insert ‘Zinc’ in the text-box located to the right. Click the button "Submit Query". See an example in Figure 4.
  2. Figure 4. Query form for searching for Zinc-binding sites. The fields to be entered are marked by red. The button "Submit Query" is indicated by arrow.

     

  3. The results of the querying operation will be a list of PDBSITE entries. To display the complete text of an entry, click the hyperlink with the entry name. An example is illustrated in Figure 5.
  4. Figure 5. A list of entries found. The entry name is indicated by arrow.

     

  5. Complete text of the entry found is shown in Figure 6.

Figure 6. Complete text of the entry found.

 

4.1.2 Sites subjected to biochemical modifications:

phosphorilation, glycosylation sites, etc.

 

  1. Select from the drop-down menu the field SiteDescr. For searching for Zinc-binding sites enter in the text-box to the right ‘phosphorylation’. Click the button "Submit Query". See an example in Figure 7.
  2. Figure 7. Query form for searching for phosphorylation sites. The fields that should be entered are marked by red. The button "Submit Query" is indicated by red arrow.

     

  3. Click ‘Submit Query’ button. This brings up a list of PDBSITE entries displayed in the ‘Query Results’ page. To display the complete text of an entry, click the hyperlink with the entry name. An example is shown in Figure 8.
  4. Figure 8. The list of entries found. An arrow shows the entry name.

     

  5. The complete text of the ‘phosphorilation site’ entry is shown in Figure 9.

Figure 9. Complete text of a phosphorilation site found.

 

4.1.3 Active sites: cysteine protease active site, ribonucleolytic active site, etc

 

4.2 Searching for proteins by structural characteristics of their sites

 

  1. If you want to find discontinuous sites with the length of 3 amino acid residues, do the following. Select from the drop-down menu the field ‘Discontinuity’. In the text-box located to the right enter ‘0’. In menu below select the field ‘LenSite’. In the field to the right enter the value ‘3’. Click the button "Submit Query". See an example in Figure 10.
  2. Figure 10. Query form for searching for discontinuous sites with the length of 3 amino acid residues. The fielda to be inserted are marked by red. An arrow indicates the button "Submit Query".

     

  3. The results of the querying operation will be a list of PDBSITE entries. To display the complete text of an entry, click the hyperlink with the entry name. See an example in Figure 11.
  4. Figure 11. The list of entries found. The entry name is marked by arrow.

     

  5. Complete text of the entry found is seen in Figure 12.

Figure 12. Complete text of one of the entries found.

 

4.3 Searching for sites by the data on proteins

 

  1. If you want to find the sites by using the data describing biological classification of proteins, select the option from the drop-down menu ‘Header’. In the text-box enter ‘HISTIDINE&KINASE’. Click the button ‘Submit Query’. See an example in Figure 13.
  2. Figure 13. Query form for searching for the sites by the data on classification of proteins. The fields to be entered are marked by red. An arrow marks the button "Submit Query".

     

  3. Submitting the query will bring up the list of PDBSITES entries found. To display the complete text of an entry, click the hyperlink with the entry name. See an example in Figure 14.
  4. Figure 14. The results of querying operation as list of entries found. An arrow indicates the entry name.

     

  5. Complete text of the entry found is shown in Figure 15.

Figure 15. Complete text of one of the entries found.

 

 

REFERENCES

 

1. Chothia, C. The nature of the accessible and buried surfaces in proteins.// J. Mol. Biol. 1976. V.105. P.1-14

2. Davies, D. R., Cohen G. H. Interactions of protein antigens with antibodies. // Proc. Natl. Acad. Sci. USA 1996. V.93. P.7-12.

3. Ivanisenko V.A., Eroshkin A.M. Search for sites containing functionally important substitutions in series of related or mutant proteins. // Mol. Biol. (Mosk). 1997. V.31. P.880-887. (Russian).

4. Bernstein F.C., Koetzle T.F., Williams G.J.B., Meyer E.F., Brice M.D., Rodgers J.R., Kennard O., Shimanouchi T., Tasumi M. The Protein Data Bank: a computer based archival file for macromolecular structures. // J.Mol.Biol. 1977. V. 112. P. 535-542.

 

[an error occurred while processing this directive]