An Algorithmic Approach to Determining the Spatial Configuration of a Protein
Grade Level at Time of Presentation
Senior
Major
Computer Science & Mathematics (premed)
Minor
-
Institution
Eastern Kentucky University
KY House District #
81
KY Senate District #
34
Faculty Advisor/ Mentor
Atilla Sit, PhD.
Department
Mathematics & Statistics
Abstract
Determining the 3-dimensional structure of a protein is a significant problem that still poses a major challenge. The ability to determine the spatial orientation of a protein is highly desired, as it offers great insight into its functionality. Current mainstream experimental methods include X-Ray Crystallography and NMR Spectroscopy. Using NMR results, the primary objective is to obtain a solution such that the distances between atoms within the predicted, computed structure, are as close as possible to the experimental distances; this is referred to as the "Molecular Distance Geometry Problem". Once a true structure is acquired, it is then uploaded to an online protein database (PDB).
Our approach first began by collecting statistical data from the PDB, and coupled it with NOE data (distances between atoms lacking chemical bonds), in order to more accurately define distances within the target protein. We then implemented a "Branch-and-Prune" A.I. (artificial intelligence) algorithm to consider various physical and chemical assumptions when constructing a solution. Without this consideration the computational time is exponential, however, by necessitating the validation of these assumptions, the algorithm can decide when to stop constructing a potential solution and discard it. The uniqueness in our approach comes from developing a tolerance of violation for these assumptions. This violation tolerance is controlled through assigning weights of importance amongst all assumptions, and "balancing" these weights to produce optimal solutions. By allowing this kind of tolerance, we were able to produce a higher output of solutions, founded on the idea that although a solution may contain some degree of violations, it is still a candidate for being a true solution. The algorithm will be presented and a predetermined protein structure from the protein database will be used as an example to demonstrate the accuracy and performance of the algorithm.
An Algorithmic Approach to Determining the Spatial Configuration of a Protein
Determining the 3-dimensional structure of a protein is a significant problem that still poses a major challenge. The ability to determine the spatial orientation of a protein is highly desired, as it offers great insight into its functionality. Current mainstream experimental methods include X-Ray Crystallography and NMR Spectroscopy. Using NMR results, the primary objective is to obtain a solution such that the distances between atoms within the predicted, computed structure, are as close as possible to the experimental distances; this is referred to as the "Molecular Distance Geometry Problem". Once a true structure is acquired, it is then uploaded to an online protein database (PDB).
Our approach first began by collecting statistical data from the PDB, and coupled it with NOE data (distances between atoms lacking chemical bonds), in order to more accurately define distances within the target protein. We then implemented a "Branch-and-Prune" A.I. (artificial intelligence) algorithm to consider various physical and chemical assumptions when constructing a solution. Without this consideration the computational time is exponential, however, by necessitating the validation of these assumptions, the algorithm can decide when to stop constructing a potential solution and discard it. The uniqueness in our approach comes from developing a tolerance of violation for these assumptions. This violation tolerance is controlled through assigning weights of importance amongst all assumptions, and "balancing" these weights to produce optimal solutions. By allowing this kind of tolerance, we were able to produce a higher output of solutions, founded on the idea that although a solution may contain some degree of violations, it is still a candidate for being a true solution. The algorithm will be presented and a predetermined protein structure from the protein database will be used as an example to demonstrate the accuracy and performance of the algorithm.