Engineering Success

Demonstrate engineering success in a technical aspect of your project by going through at least one iteration of the engineering design cycle. This achievement should be distinct from your Contribution for Bronze.

Engineering Success: ClusterControl


The idea for our project originated from the need to simplify the complex process of regulating biosynthetic gene clusters. With over 100,000 biosynthetic gene clusters identified and only a fraction explored, it became clear that their full potential remained largely untapped. To address this, we developed ClusterControl—an integrated, scalable web tool that empowers researchers to optimize the regulatory networks within these clusters. By significantly reducing the time spent on trial-and-error testing of rewired regulator proteins, ClusterControl streamlines the discovery and manipulation of biosynthetic gene clusters.

Design Phase

Drylab Design

Our primary objective was to create a flexible platform capable of precisely controlling gene cluster expression. Upon analyzing existing approaches, we found that the main obstacle was the unpredictability of native regulatory elements, which often respond to unknown signals. To overcome this, we focused on developing a modular framework that leverages rewired regulatory proteins capable of responding to well-characterized and customizable signals.

We began by building a robust database of regulatory domains from the Helix-Turn-Helix (HTH) family, known for their modularity and flexibility. Each domain was annotated with its binding preferences and signal-sensing capabilities.

A major challenge we faced in designing functional rewired bacterial regulators was identifying the optimal rewiring points. This issue became the foundation for developing ClusterControl as the culmination of our prior research.

Evolutionary and Structural Analysis

We explored various approaches to predict optimal rewiring points, including Alphafold2 predictions of structural complexes between regulatory proteins and their associated promoters, as well as evolutionary analyses.

Using NarL from E. coli and YdfI from B. subtilis as a model, we tested different rewiring points. A significant correlation was found between the DDIZ scores and the experimentally measured regulatory activity of multiple rewiring points in this chimeric regulator. This validation demonstrated the predictive power of ClusterControl’s metrics, supporting its utility in designing functional chimeric regulators.

1. We conducted BLAST searches against a custom database of LuxR-family HTH regulators, clustered at 50% sequence similarity.
2. Using multiple sequence alignments for regulators with 20-80% sequence similarity, we constructed a Direct Information (DI) matrix, quantifying evolutionary couplings between amino acids.
3. To enhance the accuracy, we filtered the data using Z-scores within three standard deviations, highlighting Disrupted Direct Information (DDI) points—areas where potential rewiring would disrupt evolutionary couplings.

We developed a new metric called DDIZ (Disrupted Direct Information Z-scored), where a lower score suggests minimal disruption, and thus, optimal rewiring points. The correlation between DDIZ scores and experimental data provides strong validation for the tool.

Figure 1: Visual representation of DDI pointsVisual representation of DDI points

Figure 2: Graph showing correlation between DDIZ scores and activityGraph showing correlation between DDIZ scores and activity

Structural Validation

To further validate our predictions, we performed structural analysis using PyMOL, which showed that optimal rewiring points align with the start of stable beta-sheet or alpha helix regions in Winged Helix-Turn-Helix and tetra helical bundle regulators respectively. This confirmed that modular swaps can be made without compromising protein integrity. Additionally, we incorporated ESM-based contact maps to speed up initial screenings, representing a performance improvement for initial assessments.

Figure 3: structural visualization of DDIZ scorestructural visualization of DDIZ score

ClusterControl Development


ClusterControl is the result of these engineering insights and provides researchers with an intuitive, scalable platform. It combines a Wiki and Database, where researchers can share and access information on regulators. This community-driven aspect of ClusterControl will grow into a valuable resource on bacterial Helix-Turn-Helix regulators. Additionally, users can upload their own protein sequences and use the tool to predict optimal rewiring points.

ClusterControl is built using MongoDB, Flask, and is deployable via Docker for seamless, scalable deployment. By allowing researchers to design rewired regulators with customizable signals, it greatly accelerates the research process in synthetic biology.

Figure 4: ClusterControl workflowClusterControl workflow

Figure 5: Web interface showing the process of visualizing DDIZ scoresWeb interface showing the process of visualizing DDIZ scores

Wet Lab Validation

Build

The ClusterControl interface was designed to be clean and user-friendly, guiding researchers through the design process. Users can upload sequences, select from a library of regulatory domains, and use our predictive models to simulate regulatory interactions. During the build phase, users receive visual representations of DDIZ scores, aiding their decision-making process.

Test

To validate ClusterControl, we tested it against experimental data from other researchers. Initial benchmarking results were encouraging, and we are currently conducting our own experimental validations to provide direct feedback. These tests will offer deeper insights into how the tool performs with novel constructs.

The final stage was to validate these predictions experimentally. We are currently testing the predicted fusion points in the lab, focusing on whether the chimeric constructs retain their functionality. The experimental data will further confirm the modularity.

Learn

As with any engineering project, we encountered setbacks. The most significant learning points included the need to integrate both computational predictions and structural analysis to define more accurate modular swaps. The DDIZ metric, combined with fluorescence data, provided a clearer view of how rewired proteins perform in practice.

Figure 6: Structural modelling of OmpR in Complex with EnvZStructural modelling of OmpR in Complex with EnvZ along with functional/nonfunctional points marked

One of the key takeaways was the necessity for a user-friendly interface. Early users struggled with the complexity of the design options, prompting us to simplify and streamline the interface to accommodate researchers of varying experience levels.

Future Directions


Moving forward, we will continue refining ClusterControl by addressing discrepancies between DDIZ scores and functional outcomes, expanding the tool’s capabilities, and incorporating user feedback from researchers across diverse fields. Through these iterations, we aim to create a comprehensive tool that empowers scientists to unlock the full potential of biosynthetic gene clusters.