The numerous methods to engineer proteins are limited by how tightly function can be associated to sequence and how frugally that sequence space can be sampled. Structural data and computational approaches can narrow the search space, and concomitantly reduce the amount of downstream characterization. These tools become increasingly important for proteins where the desired properties are difficult to measure at scale.
Machine learning is attractive as it doesn’t require any foreknowledge about particular protein features, or time consuming manual inspection of individual structural features. Given the evolved proteome is a compromise between folding, stability and function, we hypothesize that structural outliers, at locations away from the active site might be affecting the former properties, but not function. Building upon this framework, we have established a new protein engineering paradigm by leveraging artificial intelligence to learn the consensus microenvironments for individual amino acids and scan entire structures to identify residues with very low wild-type probabilities that deviate from the structural consensus.
Compared to parental proteins, stabilized variants often exhibit desirable properties like increased expression yield, thermal tolerance, and shelf-life. Our technology enables researchers to quickly identify residues within a protein that is not well suited for its local environment. We are able to identify these residues by learning the canonical microenvironment for each amino acid via a Convolutional Neural Network. Thus, enabling scientist to quickly identify when a particular residue is in a noncanonical microenvironment.
So far, our deep learning model has engineered in vivo stability across three diverse proteins, each representing a distinct protein engineering challenge.
Blue Fluorescent Protein (BFP)
Mannose Phosphate Isomerase
We have made it easy for academic institution to utilize our technology. To register for an account, please use an email address with a .edu top-level domain. Otherwise, please email us at firstname.lastname@example.org.
Users may provide a PDB code or upload a PDB file. We will email the user a prediction file within 30 minutes of submission.