Software

A user-friendly interface designed to simulate Genomic Scale Metabolic Models, making complex simulations accessible without programming expertise

Overview

Our software practices team focused on the use of simulating and analysing the genomic metabolic model for our E. coli textile (Keratin and Cellulose) degradation and spider silk production culture system. We have been working on setting up the programming system that allows us to calculate and simulate flux balance analysis (FBA), which gives insight into how metabolites, genes and reactions change throughout the system. Additionally, we have created a user interface that allows researchers who lack programming experience to be able to use this program without having to apply any coding practices. Along with the simulation program and user interface, we have looked at incorporating visualisation tools that can display our models as both small scale-pathways and as entire genomic metabolic flux trees. These tools allow us to easily and directly see changes in metabolic flux and are expected to help us identify bottlenecks in our systems. We have been successful in incorporating the program Fluxer into our user interface in order to limit the number of programs to run individually, however due to time constraints we were unable to incorporate or use Escher to complete our model as a full built metabolic pathway diagram. Our end goal was to be able to simulate both genetic knockouts, including those of genes, metabolites or reactions, as well as simulate changes in growth media to optimise our experimental design. Unfortunately we were unable to incorporate experimental data based on FBA as parameters into our programming system, to validate our model. However, future directions with this project would allow students to refine and improve our model’s accuracy, as well as design experimental conditions that are likely to provide near optimal results in relation to keratin and cellulose degradation and spider silk production.

Background

Systems Biology

Systems biology is an interdisciplinary field that focuses on the study of complex interactions within biological systems.5. Rather than examining individual components like genes or proteins in isolation, systems biology seeks to understand how these components work together to drive cellular functions and processes. By integrating data from genomics, proteomics, and metabolomics, it builds comprehensive models of biological systems, providing insights into how networks of genes, proteins, and metabolites interact to regulate life processes. This approach allows researchers to predict how changes at the genetic or metabolic level can affect an entire system, making it particularly valuable in fields like biotechnology, medicine, and environmental science.

CobraPy

CobraPy is a powerful Python package designed for constraint-based modelling of metabolic networks, particularly in bacterial systems. It provides tools for building and analysing metabolic models, using a mathematical framework known as flux balance analysis (FBA).1 CobraPy allows researchers to simulate cell metabolism by representing all biochemical reactions within an organism’s genome and optimising the fluxes through these reactions to maximise growth or the production of metabolites. With CobraPy, users can construct models from genome-scale metabolic networks, load pre-existing models, or customise models based on experimental needs. It supports adding or removing reactions, genes, and metabolites, enabling precise modifications to fit various experimental setups. This flexibility makes it a crucial tool for tasks like predicting the effects of gene knockouts, testing different environmental conditions, or optimising metabolic pathways for metabolic engineering applications. CobraPy's suite of algorithms also provides powerful optimization routines to help simulate and interpret metabolic shifts over time in response to changes in the environment, such as nutrient availability. Additionally, it can be used to design synthetic metabolic pathways and optimise conditions for large-scale fermentation processes in bioreactors, making it invaluable for applications in biotechnology and systems biology. In short, CobraPy facilitates a systems-level understanding of metabolism by enabling rigorous computational analysis of how metabolic networks function and respond to genetic or environmental changes.

SBML Files

SBML (Systems Biology Markup Language) files are a standard format for representing computational models of biological processes, especially those used in systems biology4. SBML files provide a structured way to define components such as reactions, metabolites, and regulatory networks, making it easier for researchers to exchange, analyse, and simulate models across different software platforms. SBML files are essential for CobraPy use, as this Python package can import and export models using the SBML format. Thus, this allows users to build metabolic models in CobraPy, save them as SBML files, and share them with other scientists or tools for further analysis, ensuring compatibility and reproducibility across various systems biology applications.

Fluxer

Fluxer is a web-tool designed specifically for visualising and analysing metabolic fluxes, often used in the context of flux balance analysis (FBA) for bacterial metabolism2, 3. It provides an intuitive way to visualise a full scale metabolic tree and interpret the distribution of fluxes in metabolic networks, helping researchers understand how metabolic reactions are behaving under different conditions or constraints. Developed by the Lobo Lab at UMBC, this tool allows scientists to enable real-time data visualisation within custom applications either within their pre-loaded models or via upload of ones own via an SBML file. The tool supports the manipulation of flux values in response to different experimental setups, giving users control over how metabolic fluxes are represented and interpreted. We acknowledge and thank the Lobo Lab for granting us access to their source code and assisting with its integration into our software UI. By embedding Fluxer, we've enhanced our ability to visualise flux distributions, making it easier to optimise metabolic models based on experimental conditions. This integration allows for more dynamic and tailored interpretations of metabolic behaviour, directly supporting experimental decision-making and biotechnological optimization.

Implementation

The Problem

To understand the mechanisms behind a synthetic biology project, scientists often need tools to model or simulate processes in silico. However, a significant challenge is the lack of accessible software and frameworks. Many biologists who want to model their experiments face the hurdle of learning to code, sometimes in multiple programming languages, before they can effectively run their simulations. The complexity of the data can vary—from small bioinformatics analyses to large-scale computational projects—requiring substantial time and effort. In some cases, researchers are forced to outsource this work, which can be costly and time-consuming. This led us to ask: "How can we reduce these inefficiencies in the context of our project?" Our solution was to develop a more user-friendly interface for existing metabolic modelling software, making it more accessible to those who do not wish to learn extensive programming knowledge.

Explanation of Our Tool

The tool we've developed is designed as an interactive webpage, allowing users to manipulate various parameters based on their specific modeling requirements. Given the nature of uAlberta 2024's ReneWool synthetic biology project, our tool is built on software that visualises genome-scale metabolic models (GEMs). This platform enables users to predict bioreactor concentrations over time through dynamic flux balance analysis (dFBA), simulate gene knockouts, and model gene additions. Additionally, it can optimise the objective biomass function alongside metabolite production, helping researchers predict the most favourable experimental conditions for their work.

User and Installation Guide



Optimize Function: This function performs Flux Balance Analysis (FBA) on the metabolic model to find an optimal set of fluxes (rates of reactions) that maximises or minimises a specified objective function. The objective function in this case for metabolic engineering is set as default to the production of biomass but can be adjusted to include production of a necessary metabolite.



Reaction Parameters

  1. reaction.name: The name of the reaction.
  2. reaction.id: A unique identifier for the reaction.
  3. reaction.equation: The stoichiometric equation representing the reaction (e.g., the balance of substrates and products).
  4. reaction.flux: The current flux value for this reaction after optimization (if the model has been optimised).
  5. reaction.lower_bound: The lower bound for the flux of this reaction (minimum flux allowed).
  6. reaction.upper_bound: The upper bound for the flux of this reaction (maximum flux allowed).
  7. reaction.subsystem: The metabolic subsystem to which the reaction belongs (e.g., glycolysis).
  8. reaction.gene_reaction_rule: The gene-protein-reaction rule, which defines the gene associations that enable this reaction.
  9. reaction.metabolites: This creates a dictionary where the key is the metabolite ID and the value is its stoichiometric coefficient in the reaction. This defines how much of each metabolite participates in the reaction.


Metabolite Parameters

  1. metabolite.name: The name of the metabolite.
  2. metabolite.id: A unique identifier for the metabolite.
  3. metabolite.formula: The chemical formula of the metabolite.
  4. metabolite.charge: The charge associated with the metabolite.
  5. metabolite.compartment: The cellular compartment where the metabolite is located (e.g., cytosol, extracellular).
  6. metabolite.reactions: A list of reactions in which the metabolite participates. For each metabolite, it retrieves the list of reaction IDs it is involved in.


Gene Parameters

  1. gene.name: The name of the gene in the model.
  2. gene.id: A unique identifier for the gene.
  3. gene.functional: This indicates whether the gene is functional in the context of the model (boolean that determines whether a gene is on or off).
  4. gene.reactions: A list of reaction IDs associated with the gene, i.e., reactions this gene is linked to based on gene-protein-reaction associations.


SBML Files: Here is an example of an SBML file, a widely-used XML-based format that encodes all the information of our model, including reactions, metabolites, genes, and their interactions. It is highly compatible across various systems biology tools, enabling easy exchange, simulation, and analysis of metabolic models.

Future Considerations

Implementations of Escher

One goal we were unable to complete this year was integrating the program Escher into our webpage. Escher is a versatile tool for visualising metabolic maps, available both as a website and as a Python library. It specialises in creating traditional metabolic pathway diagrams, making it easier to interpret complex metabolic data. By uploading CobraPy data via JSON files, users can generate detailed visual representations of metabolic pathways that incorporate flux data for each metabolite. Escher’s visualisations are particularly useful for presentations and publications, offering a more intuitive and digestible output than the raw, quantitative data from CobraPy.

Model Validation

AI has become a significant topic of discussion in iGEM software practices, and while our current tool doesn’t yet include AI, it has the potential for machine learning integration. By incorporating AI, we aim to improve the accuracy of our bioreactor simulations, allowing for more precise adjustments to experimental conditions. Currently, our models simulate ideal experimental scenarios, but machine learning could enhance their accuracy by learning from real-world data. Although we weren’t able to incorporate experimental results for validation this year, the tool is designed to integrate such data in the future. With each iteration of validation, the model would become increasingly precise, enabling us to predict and design better experiments in the lab. This integration is a key factor to consider as the project evolves.

References

  1. Ebrahim, A.; Lerman, J. A.; Palsson, B. O.; Hyduke, D. R. COBRApy: COnstraints-Based Reconstruction and Analysis for Python. BMC Systems Biology 2013, 7-74. https://doi.org/10.1186/1752-0509-7-74
  2. Hari, A.; Lobo, D. Fluxer: A Web Application to Compute, Analyze and Visualize Genome-Scale Metabolic Flux Networks. Nucleic Acids Research 2020, 48 (W1), W427–W435. https://doi.org/10.1093/nar/gkaa409
  3. Hari, A.; Zarrabi, A.; Lobo, D. Mergem: Merging, Comparing, and Translating Genome-Scale Metabolic Models Using Universal Identifiers. NAR Genomics and Bioinformatics 2024, 6. https://doi.org/10.1093/nargab/lqae010
  4. SBML.org: What is SBML? SBML.org. https://sbml.org/documents/what-is-sbml/.
  5. Galas, D. Systems Biology. Encyclopædia Britannica; 2018.