PROSITE: a dictionary of sites and patterns in proteins. Finding sequence motifs in groups of functionally related proteins. Automatic generation of primary sequence patterns from sets of related protein sequences. Finding protein similarities with nucleotide sequence databases. Links to PubMed are also available for Selected References. Get a printable copy (PDF file) of the complete article (1.6M), or click on a page image below to browse page by page. Full textįull text is available as a scanned copy of the original print version. The practical use of the blocks database is demonstrated by detecting previously unknown relationships between oxidoreductases and by evaluating a proposed relationship between HIV Vif protein and thiol proteases. Examples are provided in which distant relationships are detected either using a set of blocks to search a sequence database or using sequences to search the database of blocks. Each block was calibrated by searching the SWISS-PROT database to obtain a measure of the chance distribution of matches, and the calibrated blocks were concatenated into a database that could itself be searched. When the automated system was applied successively to all 437 groups of related proteins in the PROSITE catalog, 1764 blocks resulted these could be used for very sensitive searches of sequence databases. Next, the local alignments are converted to blocks and the best set of non-overlapping blocks is determined. First, an automated version of Smith's algorithm for finding motifs is used for sensitive detection of multiple local alignments. Finally, learning curves for each mutant enzyme system indicate the influence of training set size on model performance.A system is described for finding and assembling the most highly conserved regions of related proteins for database searching. Next, protease and T4 lysozyme models trained with experimental mutants are used to predict activity levels for all remaining mutants a subsequent search for publications reporting on dozens of these test mutants reveals that experimental results are matched by 79% and 86% of predictions, respectively. Third, 100 stratified random splits of the protease and T4 lysozyme mutant data sets into training and test sets achieve 77.0% and 80.8% mean accuracy, respectively. Additionally, a novel method is introduced for evaluating statistical significance associated with the number of correct test set predictions obtained from a trained model. First, a measure of model performance utilizing area (AUC) under the receiver operating characteristic (ROC) curve surpasses 0.83 and 0.77 for data sets of experimental HIV-1 protease and T4 lysozyme mutants, respectively. We describe a machine-learning approach for inferring the activity levels of all unexplored single point mutants of an enzyme, based on a training set of such mutants with experimentally measured activity.īased on a Delaunay tessellation-derived four-body statistical potential function, a perturbation vector measuring environmental changes relative to wild type (wt) at every residue position uniquely characterizes each enzyme mutant for model development and prediction. However, synthesis and analysis of experimental mutants is time consuming and expensive. An important area of research in biochemistry and molecular biology focuses on characterization of enzyme mutants.
0 Comments
Leave a Reply. |