IMProve: A Data-Driven Approach to Predicting Integral Membrane Protein Expression
Integral membrane proteins (IMPs) are of both fundamental scientific and medical importance, because they govern the flow of nutrients and information across cell membranes and because they comprise the largest class of therapeutic drug targets. However, the structural and biophysical characterization of IMPs lags far behind that of other protein classes, due to the inability to reliably produce significant quantities of IMPs for experimental studies. We are developing methods to predict and improve IMP expression to address this problem by removing fundamental gaps in knowledge and technology via a combination of computational and experimental methods.
We have developed a data-driven, statistical model that predicts IMP expression in E. coli directly from sequence. The model, trained on experimental data, combines a set of sequence-derived variables resulting in a score that predicts the likelihood of expression. We then test the model against various independent datasets from the literature that contain a variety of experimental outcomes demonstrating that the model significantly enriches for expressed proteins. The model is then used to score expression for membrane proteomes and protein families highlighting areas where the model excels. Surprisingly, analysis of the underlying features reveals an importance in nucleotide sequence-derived parameters for expression. This computational model can immediately be used to identify favorable targets for characterization.
We are working to make this method available as a web service to the broader community upon publication to accelerate progress of membrane protein biochemical and biophysical studies.
Please feel free to contact us to discuss immediate access or to be kept abreast of its release.
Shyam M. Saladi, Nauman Javed, Axel Müller, William M. Clemons, Jr., “Decoding sequence-level information to predict membrane protein expression” bioRxiv (pre-print).