PowerPoint Presentation

Published on May 20, 2022

Scene 1 (0s)

V.I. Vernadsk y Crimean Federal University. -t9ts..

Scene 2 (11s)

Group members : 1.Shah Alam Khan 2. Md-Shadab Alam 3. Rukhsar Bano 4. Varunraj Baskaran 5.Sandeep Bhardawaj 6.Tejaswini Rangaswamy Discipline : Big Data in Biology Topic : Structural Classification of Proteins [SCOP].

Scene 3 (28s)

The Structural Classification of Proteins (SCOP) database is a largely manual classification of protein structural domains based on similarities of their structures and amino acid sequences . A motivation for this classification is to determine the evolutionary relationship between proteins. Proteins with the same shapes but having little sequence or functional similarity are placed in different superfamilies , and are assumed to have only a very distant common ancestor. Proteins having the same shape and some similarity of sequence and/or function are placed in "families", and are assumed to have a closer common ancestor. Similar to CATH and Pfam databases, SCOP provides a classification of individual structural domains of proteins, rather than a classification of the entire proteins which may include a significant number of different domains..

Scene 4 (54s)

The SCOP database clusters different proteins that performs a similar biological function and are evolutionarily related to a common structural organization. This common structural organization could be the full protein or only in the active center region. Therefore one can predict the function of a protein that is not known with respect to its structural structure by comparing it with that of the known proteins. SCOP provides such kind of forecasts. SCOP hierarchically classifies proteins of known structures It classifies protein into three major groups that are families, superfamilies, and folds. Families define proteins that have a clear evolutionary relationship with each other and are limited by a sequence identity constraint of at least 30% of the total length of proteins. However, if a connection is established due to a similar function and structure, then it is possible that a protein that is below the threshold can be assigned to a family. Proteins that have very low sequence identity and have some relationships due to structural and functional similarities are put in superfamilies. Proteins having the same secondary structural arrangements are grouped into folds. It does not matter that the similarity of the proteins is based on physicochemical principles..

Scene 5 (1m 46s)

The SCOP database is freely accessible on the internet. SCOP was created in 1994 in the Centre for Protein Engineering and the Laboratory of Molecular Biology . It was maintained by Alexey G. Murzin and his colleagues in the Centre for Protein Engineering until its closure in 2010 and subsequently at the Laboratory of Molecular Biology in Cambridge, England. Hierarchical organization The source of protein structures is the Protein Data Bank . The unit of classification of structure in SCOP is the protein domain . What the SCOP authors mean by "domain" is suggested by their statement that small proteins and most medium-sized ones have just one domain,and by the observation that human hemoglobin, which has an α 2 β 2 structure, is assigned two SCOP domains, one for the α and one for the β subunit..

Scene 6 (2m 14s)

2. All beta proteins : All-β proteins are a class of structural domains in which the secondary structure is composed entirely of β-sheets , with the possible exception of a few isolated α-helices on the periphery. Common examples include the SH3 domain , the beta-propeller domain , the immunoglobulin fold and B3 DNA binding domain . 3. Alpha and beta proteins ( a+b ) : α+β proteins are a class of structural domains in which the secondary structure is composed of α-helices and β-strands that occur separately along the backbone . The β-strands are therefore mostly antiparallel ..

Scene 7 (2m 40s)

4. Alpha and beta proteins (a/b) : α/β proteins are a class of structural domains in which the secondary structure is composed of alternating α-helices and β-strands along the backbone. The β-strands are therefore mostly parallel . Common examples include the flavodoxin fold , the TIM barrel and leucine-rich-repeat (LRR) proteins such as ribonuclease inhibitor . 5. Multi-domain proteins a multidomain class comprises of members with domains of different folds. In a multidomain protein, each domain may fulfill its own function independently, or in a concerted manner with its neighbours . 6. Small proteins: this class includes members corresponding to several disulfide rich and metal binding proteins with few or almost no regular secondary structures ..

Scene 8 (3m 23s)

Fold (gross structural similarity): members in different superfamilies are grouped into one fold if the arrangement of major SSEs along with their topological connections is the same. Structural similarity among members in the same fold group arises from physicochemical properties favoring certain packing arrangements and chain topologies. Each class contains a number of distinct folds. This classification level indicates similar tertiary structure, but not necessarily evolutionary relatedness. For example, the "All-α proteins" class contains >280 distinct folds, including: Globin -like (core: 6 helices; folded leaf, partly opened), long alpha-hairpin (2 helices; antiparallel hairpin, left-handed twist) and Type I dockerin domains (tandem repeat of two calcium-binding loop-helix motifs, distinct from the EF-hand)..

Scene 9 (3m 52s)

Families: Proteins with a close common evolutionary origin are clustered together in families. The members of a family have significant sequence similarity leading to related structures and functions. The protein molecules have a clear homology. family can be defined as a collection of related protein regions, which share high sequence identity and usually good functional and structural similarity. Most of the members of a family show more than 30% sequence identity with each other. However, there exist few examples of families in SCOP containing members with low sequence similarity to the globin family where sequence identity between members could be as low as 15%. However, all members show the same overall structure and critical functional residues in topologically equivalent positions thus implying divergence from a common ancestor..

Scene 10 (4m 29s)

Superfamilies: If the proteins of different families have only low sequence similarity, but their structures are similar (which indicates a possible common evolutionary origin), then these families are clustered together as a superfamily. Families showing overall structural similarity and in many cases gross functional similarity, thus indicating potential common evolutionary origin, are categorized into one Superfamily. Sometimes although the functions are not the same for the families within the superfamily, the nature of functions along with the topological and chemical equivalence of functional sites imply the potential evolutionary relationship between the families..

Scene 11 (4m 58s)

SCOP successors : By 2009, the original SCOP database manually classified 38,000 PDB entries into a strictly hierarchical structure. With the accelerating pace of protein structure publications, the limited automation of classification could not keep up, leading to a non-comprehensive dataset. The Structural Classification of Proteins extended ( SCOPe ) database was released in 2012 with far greater automation of the same hierarchical system and is full backwards compatible with SCOP version 1.75. In 2014, manual curation was reintroduced into SCOPe to maintain accurate structure assignment. As of February 2015, SCOPe 2.05 classified 71,000 of the 110,000 total PDB entries..

Scene 12 (5m 28s)

SCOP2 prototype was a beta version of Structural classification of proteins and classification system that aimed to more the evolutionary complexity inherent in protein structure evolution.It is therefore not a simple hierarchy, but a directed acyclic graph network connecting protein superfamilies representing structural and evolutionary relationships such as circular permutations , domain fusion and domain decay. Consequently, domains are not separated by strict fixed boundaries, but rather are defined by their relationships to the most similar other structures. The prototype was used for the development of the SCOP version 2 database.The SCOP version 2, release January 2020, contains 5134 families and 2485 superfamilies compared to 3902 families and 1962 superfamilies in SCOP 1.75. The classification levels organise more than 41 000 non-redundant domains that represent more than 504 000 protein structures..

Scene 13 (6m 4s)

ORGANISATION AND FACILITIES OF SCOP The SCOP database is available as a set of tightly coupled hypertext pages on the WWW via the URL: http://scop.mrc-lmb. cam.ac.uk/scop/ The interface to SCOP has been designed to facilitate both detailed searching of particular families and browsing of the whole database. To this end, there are a variety of different techniques for navigation: Browsing through the SCOP hierarchy. SCOP is organised as a tree structure. Entering at the top of the hierarchy the user can navigate through the levels of Class, Fold, Superfamily, Family and Species to the leaves of the tree which are structural domains of individual PDB entries. An alternative hierarchy of Folds, Superfamilies and Families by the date of solution of the first representative structure is also provided..

Scene 14 (6m 42s)

From an amino acid sequence. The Sequence similarity search facility allows any sequence of interest to be searched against databases of protein sequences classified in SCOP using the algorithms BLAST , FASTA or SSEARCH . SCOP can then be entered from the list of PDB chains found to be similar and the similarity can be displayed visually. From a keyword. The keyword search facility returns a list of SCOP pages containing the word entered or combinations of words separated by a series of boolean operators. From a PDB identifier. The PDB entry viewer links PDB entries to various graphical views, external databases and SCOP itself. By history. Pages are provided that order folds, superfamilies and families by date of entry into PDB or publication. This is both for interest and to make it easier to keep up to date with the appearance of new folds or significant new members of existing folds..

Scene 15 (7m 20s)

To facilitate rapid and effective access to SCOP, a number of mirrors have been established, a full current list of which can be found via the above URL. The facilities provided by the various sites are always the same, so you will lose nothing by accessing your nearest mirror. The implementation does differ: for example, currently, sequence similarity searching is always carried out at the main, scop.mrc-lmb.cam.ac.uk site, however this is transparent to the user who will always be returned a search results page marked up with links to pages on the mirror that they started from. OTHER USES OF SCOP Non-redundant sequence databases and the evaluation of sequence alignment methods The clustering of sequences of protein chains of known structures at different levels of sequence similarity gives a series of non-redundant sequence databases known as PDB40, PDB90, PDB95 etc. (the number refers to maximum percentage sequence identity of any pair of sequences in the sequence databases) and these are available from SCOP. The current versions are produced by the ASTRAL procedure.

Scene 16 (8m 5s)

Assignment of protein structures to sequences using the intermediate sequence library PDB-ISL Two homologous sequences, which have diverged beyond the point where their homology can be recognised by a simple direct comparison, can be related through one or more other sequences that are suitably intermediate between the two. A library containing potential intermediate sequences for proteins of known structure (PDB-ISL) has been constructed and can be accessed directly or through SCOP. The sequences in the library were collected from a large sequence database using the sequences of the domains of proteins of known structure as the query sequences and the program PSI-BLAST . Sequences of proteins of unknown structure can be matched to distantly related proteins of known structure by using pairwise sequence comparison methods to find homologues in PDB-ISL. For a given error rate the number of correct matches found is the same as that found using PSI-BLAST and a large sequence database. The advantage of this library is that, because it uses pairwise sequence comparison methods such as FASTA or BLAST, it can be searched easily and, in most cases, much more quickly ..

Scene 17 (8m 51s)