Overview
At its core, SAC aims to create a sustainable biomaterial platform with AI sequence generation and carbon-sequestering protein production. Accurate AI sequence generation and larger yield production of generated proteins require careful measurements to produce data that can be utilized to obtain desired results. Understanding this, we focused our efforts on careful measurements of relative metrics that can be used to:
- Increase the overall spider silk protein yield
- Improve the mechanical property accuracy of our AI generated sequences
- Revalidate already established spider silk protein sequences
The bottleneck of biomaterial usage is rooted in the lack of established production processes that can be effectively scaled, especially for large molecular weight proteins like that of spider silk’s. In an effort to increase yield and produce abundant amounts of spider silk proteins for real-life applications, we utilized a variety of measurement techniques to yield results that we can build on in the future to streamline our platform.
Hardware-Assisted Real-Time Monitoring & Data Collection of Small Batch Cell Cultures
In the NTU-Taiwan SAC project, precise control of culture conditions during small batch cell growth is essential for optimizing the production of spider silk proteins. Our innovative Hardware-Assisted Real-Time Monitoring system enables us to collect continuous data and make timely adjustments, ensuring optimal protein yield and plasmid integrity before scaling up to larger bioreactors.
Importance of Real-Time Monitoring
Bacterial cultures, especially those harboring large plasmids (which encode complex proteins), are susceptible to plasmid loss. Maintaining the right conditions from the outset is crucial, as even minor fluctuations in pH, temperature, or nutrient availability can drastically impact protein yield. Therefore, the conditions of the flask cell culture are important, justifying the need for real-time monitoring.
By employing real-time monitoring, we can proactively detect and rectify deviations in culture conditions before entering the bioreactor, allowing us to optimize conditions at an early stage. During fermentation, as nutrient depletion occurs, the medium typically becomes more acidic, which can negatively impact plasmid retention and protein expression.
Key Metrics Monitored
- pH: As fermentation progresses, the medium often becomes more acidic, which can negatively impact plasmid retention and protein expression. Continuous pH monitoring enables us to intervene and adjust conditions as needed.
- Temperature (°C): Maintaining an optimal temperature is critical for bacterial growth and proper protein folding. Our system continuously tracks temperature to ensure it remains within the ideal range for E. coli.
- Humidity: Environmental humidity affects the evaporation rate, influencing the concentration of the culture medium. Monitoring this metric helps us fine-tune the physical conditions for cell growth.
- Light Intensity (Lux): While not directly influencing E. coli growth, consistent light conditions are essential for experimental reliability. Light intensity may also play a role in applications involving light-induced optogenetic systems, enhancing specific protein functionalities.
- Turbidity/Optical Density (OD): Currently, real-time turbidity detection via cameras is not feasible, requiring us to take periodic samples and use a spectrophotometer for accurate OD measurements. This is crucial for determining bacterial growth phases and ensuring optimal density for spider silk production.
Materials:
- Microchip APP-All MCU2023 and pH probe
- Customer cap
- Computer with Python
Figure 1 Microchip APP-All MCU2023 (left), pH probe with cap (right)
Steps:
- Connect the chip to the computer and calibrate the system with a pH 7 standard solution using one-click calibration.
- Place the pH probe into the shaking flask and position both the flask and chip into the incubator.
- Launch the program to record environmental parameters.
- Mode1: With real-time imaging, users can visualize environmental values without removing the samples. The program will also regularly output CSV files for data recording.
- Mode 2: No visual display; only environmental parameter values are recorded for future research and analysis.
Figure 2 Chip OLED displays values (left), FLEX real-use scenario (right)
Figure 3 Real-time monitoring of each parameter
Early-Stage Optimization and Monitoring GPSS Production
Before scaling to larger bioreactors, we conduct small batch trials in 100 mL cultures to identify the optimal conditions for spider silk protein expression. In these trials, we run five variations, comparing final protein yields while analyzing real-time data from metrics like pH and temperature to pinpoint the ideal environment.
During the production of T9-E9-TS9-S1 spider silk proteins, we ensure that environmental conditions remain stable. Utilizing green mEGFP fluorescent spider silk proteins allows us to measure fluorescence intensity, providing valuable insights before initiating the fermentation process. Given the significant resources required to operate a fermentation tank, real-time observations of bacterial production environments help minimize testing losses. By detecting changes in conditions—such as pH fluctuations due to nutrient depletion—we can ensure that bacteria do not divert amino acids for energy production, thereby preserving their capacity to synthesize proteins.
We tested three temperature settings (20°C, 25°C, and 30°C) overnight and explored pH levels (6, 7, and 8) to identify the most productive conditions. Using data from our hardware-assisted monitoring system, we deduced that optimal conditions for spider silk production (fluorescence) were achieved at 25°C and pH 7. We adopted these induction conditions as the basis for subsequent bioreactor production.
Impact on Bioreactor Performance
After optimizing small batch conditions, we apply this knowledge to 2.5L bioreactors, where we also monitor additional metrics such as dissolved oxygen (DO) and pH. The insights gained from small batch trials inform necessary adjustments to maintain optimal fermentation conditions at scale, ultimately leading to higher production yields and more efficient resource utilization.
This integrated approach—combining hardware-assisted monitoring with data-driven optimization—allows us to fine-tune the entire production process, ensuring the highest quality and quantity of spider silk proteins.
- Changes in OD600 after overnight cultivation at 20°C, 25°C, and 30°C.
- Changes in OD600 after overnight cultivation at pH levels 6, 7, and 8.
- Variations in fluorescence readings (excitation at 488 nm and emission at 507 nm) after overnight cultivation at 20°C, 25°C, and 30°C.
- Variations in fluorescence readings (excitation at 488 nm and emission at 507 nm) after overnight cultivation at pH levels 6, 7, and 8.
Induction time [hr] | 20°C | 25°C | 30°C | ||||||
---|---|---|---|---|---|---|---|---|---|
Sample 1 | Sample 2 | Sample 3 | Sample 1 | Sample 2 | Sample 3 | Sample 1 | Sample 2 | Sample 3 | |
0 | 0.42 | 0.41 | 0.42 | 0.4 | 0.41 | 0.42 | 0.4 | 0.41 | 0.42 |
3 | 0.55 | 0.56 | 0.57 | 0.62 | 0.69 | 0.66 | 1.01 | 0.99 | 1.09 |
6 | 0.75 | 0.72 | 0.67 | 1.09 | 1.1 | 1.14 | 1.63 | 1.71 | 1.52 |
9 | 1.03 | 1.13 | 0.99 | 1.6 | 1.59 | 1.64 | 2.01 | 2.1 | 2.05 |
12 | 1.42 | 1.33 | 1.24 | 1.78 | 1.83 | 1.86 | 2.41 | 2.35 | 2.39 |
15 | 1.92 | 1.89 | 1.81 | 2.01 | 1.93 | 1.99 | 2.8 | 2.74 | 2.85 |
18 | 2.01 | 2.13 | 2.05 | 2.67 | 2.78 | 2.77 | 2.99 | 2.91 | 2.95 |
21 | 2.51 | 2.48 | 2.39 | 3.13 | 3.15 | 3.01 | 4.31 | 4.22 | 4.23 |
24 | 2.67 | 2.78 | 2.77 | 3.72 | 3.78 | 3.68 | 4.56 | 4.61 | 4.75 |
Table 1 OD600 at different temperatures
Figure 4 OD600 at different temperatures
Induction time [hr] | pH6 | pH7 | pH8 | ||||||
---|---|---|---|---|---|---|---|---|---|
Sample 1 | Sample 2 | Sample 3 | Sample 1 | Sample 2 | Sample 3 | Sample 1 | Sample 2 | Sample 3 | |
0 | 0.4 | 0.41 | 0.42 | 0.43 | 0.51 | 0.46 | 0.43 | 0.4 | 0.41 |
3 | 0.63 | 0.51 | 0.55 | 0.91 | 0.94 | 0.89 | 0.73 | 0.75 | 0.71 |
6 | 0.76 | 0.82 | 0.81 | 1.42 | 1.38 | 1.31 | 1.22 | 1.13 | 1.1 |
9 | 1.21 | 1.18 | 1.15 | 1.73 | 1.72 | 1.69 | 1.76 | 1.79 | 1.83 |
12 | 1.59 | 1.63 | 1.67 | 2.01 | 2.12 | 2.15 | 2.01 | 2.03 | 2.04 |
15 | 2.02 | 1.98 | 2.09 | 2.81 | 2.87 | 2.67 | 2.41 | 2.45 | 2.39 |
18 | 2.78 | 2.98 | 2.81 | 3.21 | 3.51 | 3.64 | 2.89 | 2.97 | 2.95 |
21 | 3.04 | 3.03 | 3.19 | 3.91 | 3.87 | 3.95 | 3.5 | 3.68 | 3.64 |
24 | 3.81 | 3.92 | 3.99 | 4.28 | 4.12 | 4.21 | 4 | 3.91 | 3.93 |
Table 2 OD600 at different pH values
Figure 5 OD600 at different pH values
Total Fluorescence Level (Excitation: ~488 nm; Emission: ~509 nm) | ||
---|---|---|
20°C | 25°C | 30°C |
22314.35 | 28540.58 | 25832.14 |
23149.21 | 29038.47 | 26171.28 |
20321.41 | 29835.23 | 24693.32 |
Table 3 Fluorescence levels at different temperatures.
Figure 6 Fluorescence levels at different temperatures.
Figure 7 Fluorescence levels at different pH values.
Total Fluorescence Level (Excitation: ~488 nm; Emission: ~509 nm) | ||
---|---|---|
pH 6 | pH 7 | pH 8 |
23413.75 | 27414.21 | 22357.41 |
21988.63 | 26171.28 | 24231.42 |
21728.65 | 27314.42 | 23134.51 |
Table 4 Fluorescence levels at different pH values.
Establishing LLM “Ground Truths” with Mechanical Property Measurements of AI Generated Proteins
Fundamentally, our spider silk sequence predictions are improved by the GPSS AI model's incorporation of physical property testing. Our wet lab team finished testing the mechanical properties of the T9-E9-TS9-S1 spider silk proteins, and then sent the results back into the GPSS system to make sure the AI has access to reliable "ground truths" from actual trials. In addition to being crucial benchmarks, metrics like toughness, strain at break, and tensile strength also act as direct inputs to improve the forecasting accuracy of the AI.
This feedback loop enables us to continuously refine the GPSS AI, allowing it to generate spider silk protein sequences that align more closely with the actual mechanical performance required for real-world applications. The iterative process of prediction, validation, and feedback ensures that GPSS is constantly evolving, improving both the accuracy of sequence predictions and the quality of the proteins produced.
Looking forward, we plan to capitalize on GPSS's capacity to generate diverse spider silk sequences by incorporating cell-free synthesis technology. This high-throughput approach will allow us to scale up the production and testing of spider silk proteins efficiently, driving both faster data collection and more precise refinement of our AI model. In doing so, we aim to significantly advance the development of spider silk as a sustainable, high-performance biomaterial.
Figure 8 Feedback Loop Representation of NTU-Taiwan SAC Project: Design, Build, Test, Learn, and AI Optimization
Validating AI Predictions with Real-World Spider Silk and Film-to-Fiber Correlations
Our Generative Personalized Spider Silk (GPSS) LLM AI model, while powerful, currently operates with a limited database of spider silk protein sequences. Crucially, we have not yet incorporated feedback from real-world synthesized spider silk data into its system, which presents a challenge for accurately refining its predictions. To address this, we are collaborating with synthetic biology labs that specialize in the cloning of MaSp1 and MaSp2, the primary proteins found in Nephila pilipes spider silk. These proteins, which we label as R1 (MaSp1) and R2 (MaSp2), are the dominant components of the major ampullate spidroins (MaSp) and are responsible for the silk’s strength and extensibility.
As part of our validation process for the GPSS system, our wet lab team has produced both films and fibers from the recombinant R1/R2 proteins, conducting tensile tests to measure mechanical properties such as strength, strain, and toughness. This step is critical, as not all predicted protein sequences may be suitable for fiber spinning due to the complex technical challenges involved in handling and solvent conditions. In contrast, producing films is a more straightforward process, allowing us to test the material’s properties without the need for advanced spinning techniques.
Mechanical Property Comparison:
We obtained tensile test data from spider silk films and fibers, using proteins R1 and R2, derived from MaSp1 and MaSp2 cloning. By comparing the mechanical properties of the films and fibers, we established key parameters—strength, strain (extensibility), and toughness—that highlight a linear relationship between the two forms. This correlation enables us to predict the physical properties of spider silk fibers based on the simpler tensile tests performed on films, offering a valuable workaround for cases where fiber spinning may not be feasible.
Establishing a Linear Relationship:
By plotting the mechanical data from the films against that of the fibers, we observe a strong linear correlation. Specifically, for each mechanical property—strength, extensibility, and toughness—the data points from the films align closely with those from the fibers, suggesting a direct relationship between the film form of the protein and the drawn silk fibers.
For instance, tensile strength data indicates that for a given protein composition, the films consistently exhibit proportional strength values relative to the fibers. Similarly, elongation at break and toughness values for the films follow a linear trend with those of the fibers, maintaining a predictable ratio. This suggests that the mechanical behavior of the protein in film form can reliably predict its performance in fiber form. The mathematical representation of this linear relationship allows us to express the fibers’ mechanical properties as a function of those from the films.
Predicting Mechanical Properties Based on Protein Combinations:
Using various combinations of recombinant spider silk proteins, specifically R1 and R2, we explored the segments R1-32, R2-32, and R2-16, which represent the repetitive unit sequences of the spider silk protein repeated 32 or 16 times, respectively. Tensile tests were conducted on both fibers and films produced from protein ratios of R1-32 and R2-32, specifically at ratios of 0:1, 1:1, and 1:2. The tensile testing results (Figures a.-c.) for these combinations revealed distinct mechanical properties in terms of strength, strain, and toughness.
A strong linear relationship was established for each key property:
- Strength: y=14.146x y=14.146x
- Strain: y=3.2222x y=3.2222x
- Toughness: y=67.314x y=67.314x
In these equations, xx represents the film's mechanical property, while yy represents the corresponding property in the fibers. This linear relationship allows us to predict the mechanical properties of fibers based on the simpler tensile tests performed on the spider silk films.
By applying these equations, we can calculate the expected mechanical properties of spider silk fibers spun from different protein ratios, such as those from R1 and R2. For example, knowing the strength or strain of a film made from a specific protein mixture enables us to predict the tensile strength, extensibility, and toughness of the fibers spun from the same proteins.
Furthermore, we compared the experimentally derived mechanical properties from different protein combinations (R1 and R2) with properties predicted by our AI model (Figures d.-f.). The AI system, designed to forecast the mechanical performance of various protein mixtures, provided predictions that closely aligned with the experimentally measured values. The comparative results shown in the graphs indicate a consistent trend between the observed values and the AI-predicted data, reinforcing the validity of both our experimental approach and the AI model.
This consistency in trend lines suggests that our AI predictions can reliably estimate the mechanical properties of novel spider silk proteins prior to physical experiments. By validating the AI predictions with empirical data, we confirm the robustness of the model, enabling its application for further design and optimization of spider silk materials with targeted mechanical characteristics for biomedical and industrial applications.
Figure 9 Linear relationship between spider silk films and fibers
Reverse Engineering Software Tool to Generate Mechanical Property from Established Protein Sequences
Spider dragline silk is highly valued for its outstanding strength and durability, making it essential to decipher the relationship between its protein sequence and mechanical properties. To address this challenge, we developed a pipeline of software tools designed to explore this intricate link. By leveraging both protein sequence and mechanical property data, we can capture key factors that influence the silk's performance.
Due to the robust predictive capabilities of the framework provided by this study, we employed this architecture for reverse validation. The model predicts key mechanical characteristics—such as tensile strength, strain, Young's modulus, and toughness—helping to eliminate underperforming sequences early in the process. This filtering mechanism focuses our efforts on sequences that are more likely to meet the desired performance metrics, streamlining the search for optimal candidates.
To further improve prediction precision and expand the range of mechanical properties, we are also developing a new model architecture. This updated approach aims to solve the limitations observed in the previous model, enhancing prediction accuracy across a broader spectrum of properties. Specifically, we are constructing a transformer-encoder-based model to advance our predictive capabilities. This model utilizes protein embeddings provided by NVIDIA BioNeMo to represent sequences and employs a convolutional neural network (CNN) to generate multidimensional vectors. These vectors are then used to predict a comprehensive set of mechanical properties, including toughness, toughness_sd, Young’s modulus, Young’s modulus_sd, tensile strength, tensile_strength_sd, strain at break, strain_at_break_sd, one_percent_weightloss, five_percent_weightloss, ten_percent_weightloss, crystallinity, birefringence, birefringence_sd, diameter, diameter_sd, water content, supercontraction, and supercontraction_sd.
We hope that this pipeline will reduce the burden of developing spider silk materials. Ultimately, this work lays the foundation for designing innovative biomaterials with superior mechanical properties, tailored for diverse applications.
Protocol Sequence Acquisition
Once the target sequence is obtained, various properties can be predicted using software tools.
Given sequence:
MNWEKSCLPLLLLTAFCISVSAAQNVDSPWSSTEKADLFIRSFIDAISRSPAFTPSQLDDMSAIGDTLINSLDSMAQSGKSSRKTLQALNMAFASSMAEIAVAEQGGQSIDVKTSAIIDALNEAFIRTSGRVNNEFVNEIRQLILMFGRVSMNNIASESTATASAGVPAGSYVSSAPAASATSSGGYTSQSNYQGESQGISPAQSGYPGQQGYSSSSSAIAISLGYGQNGYGPGSGGSGTGSGAGQGGSGGDLGGPGASSAASAAATGQGYGTGQGQQGGPSGSSSAALSGDSQGYGPGQSGYPDQQGYSSSSSAIAISLGYGQNGYGPGSGGSGTGSGAGQGGSGGDLGGPGASSAASAAATGQGYGTGQGQQGGPSGSPSAALSGDSQGYGPGQSGYPDQQGYSSSSSAIAISLGYGQNGYGPGSGGSGTGSGAGQGGSGGDLGGPGASSAASAAATGQGYGTGQGQQGGPSGSPSAALSGDSQGYGPGQSGYPDQQGYSSSSSAIAISLGYGQNGYGPGSGGSGTGSGAGQGGSGGDLGGPGASSAASAAATGQGYG
- Predicted strain at break: 20.59 (Normalization: 0.32)
- Predicted tensile strength: 1.08 (Normalization: 0.29)
- Predicted elastic modulus: 3.97 (Normalization: 0.10)
- Molecular Weight: 52,545.87 Da
- Isoelectric Point: 4.06
Initially, the expected properties were as follows:
- Toughness: 0.5
- Elastic Modulus: 0.4
- Tensile Strength: 0.8
- Strain at Break: 0.9
With a standard deviation of 0.3 for each value, the accuracy of the predictions can be estimated using the Z-score:
Z-scores close to zero indicate higher prediction accuracy, and for this sequence, the toughness prediction (Z = -0.4) aligns relatively well with expectations. Overall, the properties are within acceptable ranges.
Amino Acid Composition
The amino acid composition for the provided sequence is as follows:
Amino Acid | Percentage |
---|---|
A | 12.32% |
C | 0.36% |
D | 3.39% |
E | 1.61% |
F | 1.43% |
G | 23.57% |
H | 0.00% |
I | 4.11% |
K | 0.89% |
L | 4.46% |
M | 1.25% |
N | 2.68% |
P | 4.82% |
Q | 8.21% |
R | 1.25% |
S | 19.11% |
T | 3.93% |
V | 1.61% |
W | 0.36% |
Y | 4.64% |
Table 5 Amino acid composition and their percentages.
DNA Sequence Analysis
The corresponding DNA sequence of the protein is analyzed for confirmation and additional insights. The DNA sequence is as follows:
ATGAATTGGGAAAAATCTTGTCTTCCTCTTCTTCTTCTTACTGCTTTTTGTATTTCTGTTTCTGCTGCTCAAAATGTTGATTCTCCTTGGTCTTCTACTGAAAAAGCTGATCTTTTTATTCGTTCTTTTATTGATGCTATTTCTCGTTCTCCTGCTTTTACTCCTTCTCAACTTGATGATATGTCTGCTATTGGTGATACTCTTATTAATTCTCTTGATTCTATGGCTCAATCTGGTAAATCTTCTCGTAAAACTCTTCAAGCTCTTAATATGGCTTTTGCTTCTTCTATGGCTGAAATTGCTGTTGCTGAACAAGGTGGTCAATCTATTGATGTTAAAACTTCTGCTATTATTGATGCTCTTAATGAAGCTTTTATTCGTACTTCTGGTCGTGTTAATAATGAATTTGTTAATGAAATTCGTCAACTTATTCTTATGTTTGGTCGTGTTTCTATGAATAATATTGCTTCTGAATCTACTGCTACTGCTTCTGCTGGTGTTCCTGCTGGTTCTTATGTTTCTTCTGCTCCTGCTGCTTCTGCTACTTCTTCTGGTGGTTATACTTCTCAATCTAATTATCAAGGTGAATCTCAAGGTATTTCTCCTGCTCAATCTGGTTATCCTGGTCAACAAGGTTATTCTTCTTCTTCTTCTGCTATTGCTATTTCTCTTGGTTATGGTCAAAATGGTTATGGTCCTGGTTCTGGTGGTTCTGGTACTGGTTCTGGTGCTGGTCAAGGTGGTTCTGGTGGTGATCTTGGTGGTCCTGGTGCTTCTTCTGCTGCTTCTGCTGCTGCTACTGGTCAAGGTTATGGTACTGGT
Entropy Analysis
The Shannon entropy of the sequence is 3.467, indicating moderate complexity. Higher entropy generally correlates with a more diverse sequence, potentially impacting the protein's stability and flexibility.
Protein Structure Prediction
Using the sequence, a 3D structure prediction can be made, allowing us to hypothesize the structural features and their alignment with the protein's function. Once the structure is predicted, it can be compared with known functional domains for further validation.
Figure 10 Linear relationship between spider silk films and fibers
Conclusion
While the ultimate determination of the protein's function should be made by biologists, this analysis provides a valuable head start. By predicting key properties and structural features, this approach can significantly reduce time spent on experimental validation.