Software | BUCT-China

Software

Introduction

After experiments with smFRET detection, we obtained fluorescence images, but there are few programs that analyze well enough to count the points. To solve this problem, we have designed a tool based on Python and the PyQt5 framework. This tool determines which points in an image are fluorescent spots by counting the number of pixels with gray values above a certain threshold (above the normal distribution confidence interval), and statistically counts them so that the final number can be extrapolated back to obtain the concentration of the target. Additionally, to ensure the rationality of the confidence interval selection, we have developed a machine learning-like training-testing tool to decide on a suitable confidence interval.

Prerequisites

First, You Need to Open It:

On Linux or macOS: Search for "Terminal". On Windows: Search for "Windows PowerShell". You will only need to use a few command-line commands:

Change Directory with cd: For example, you can type cd BUCT-China/ to navigate to the "BUCT-China" directory or cd .. to move back to the parent directory. List Directory Contents with ls: This command lists the elements in the current directory. Show Current Path with pwd: This command displays the path to your current location.

Prerequisites:

Ensure Python is Installed on Your Computer:

On Windows:

Open the terminal and type py. This command will display the installed Python version. You should have at least version 3.8. Then, press Ctrl+Z (you will see ^Z) and hit Enter. This will allow you to enter the classic command line >.

On Linux/macOS:

Open the terminal and type python --version. This command will show the installed Python version. You should have at least version 3.8. Afterwards, press Ctrl+Z (you will see ^Z). This will enable you to enter the classic command line >. If Python is Not Installed:

You can download it from the Python Download Page (version 3.8 or higher). Ensure You Have the Correct Version of pip:

On Windows:

In your terminal, type:

py -m pip install --upgrade pip

On Linux/macOS:

pip install --upgrade pip

Installation

Access the code:

Navigate to your chosen directory using the terminal and execute the following command:

git clone https://gitlab.igem.org/2024/software-tools/buct-china

Or download it from our GitLab: 2024 Competition / Software Tools / BUCT-China · GitLab

Use the “cd” command to enter the “buct-china/” folder:

cd buct-china

Install the necessary dependencies:

Type the following command in the terminal:

Windows:

py -m pip install -r requirements.txt

MacOS/Linux:

pip install -r requirements.txt

Type the command in the terminal

python3 findLight.py

python findLight.py

How we develop it

First Attempt

Initially, we tried using Otsu's thresholding method to determine the bright spots. This is the most commonly used method in threshold segmentation and is recognized as the best thresholding approach in the industry. However, the first attempt did not yield optimistic results; the number of bright spots exploded.

Figure 1. Result by the first attempt

Second Attempt

We attempted to use morphological erosion and dilation operations along with Sober and Canny edge detection operators, implementing the control variable method.

Figure 2. Result by the second attempt

In fact, by trying to change the size of the Gaussian kernel using the control variable method, we found that when the Canny edge detection operator is selected and the kernel size is set to 7, slightly better results could be achieved. In contrast, morphological operations had minimal effect and even had a counterproductive effect. Even so, the results still showed an excessive number of bright spots.

Based on this, we began to consider a fundamental issue: Is there a problem with Otsu's calculation method?

Core Issue

To deeply investigate the root of the problem, we held multiple discussions with members of Wet-lab group. The outcome of these discussions is that Otsu's calculation method does have issues; the algorithm should adapt to the characteristics of the images and the experimental features. The reasons are as follows:

Otsu is very sensitive to noise. Noise involves differences and conversions between concepts in experimental fields and data fields. Some slightly gray points in the image are actually clusters of bright spots, but Otsu may consider them as noise and arbitrarily classify them as background.
Otsu's method of dividing the image into foreground and background does not align with our experimental concept. In fact, such a classification method does not apply to our experimental samples. Recklessly dividing our image brightness levels into background brightness and bright spot brightness would result in the loss of bright spots with uneven brightness.
Fluorescent probe images are obtained from experiments conducted by different experimenters each time. Due to slight differences in sample conditions, the imaging results may vary slightly, and their luminance is easily affected.

The core issue is that we should adapt to the image characteristics and experimental features. To verify our viewpoint, we conducted an experiment: using only the grayscale value as the filtering criterion, we binarized the image and set a fixed grayscale threshold each time. Points with grayscale values above the threshold were marked as "bright spots." We used connected components to identify complete "bright spot clusters," which are our desired positive points. We employed two batches of samples, using the first 10 images from each batch (each batch contained a single TIFF file). For each image, we attempted thresholds ranging from 250 to 160.

Figure 3. Result by the third attempt

The results were nice! The first batch of samples showed good results with a threshold around 170, and the second batch around 200. This indicates that the solution to this problem might be simpler than we initially anticipated.

However, merely using manually set thresholds is far from sufficient; such a method is neither elegant nor scientific. We need to develop a calculation method unique to fluorescent probe images, which should be simple and elegant, with a single judgment criterion and statistically significant conditional constraints.

Solution: Statistical Method

We conducted a bold experiment by abandoning current common image thresholding algorithms like Otsu and instead sought statistical assistance. Initially, we treated all grayscale values of all coordinates in the entire image as one set and calculated the standard deviation and variance. At this step, our thinking was still within the old Otsu algorithm framework.

The results improved somewhat but were still not the final desired outcome. At this point, we wondered if we could utilize the definition of statistics to fit the distribution of our samples. We attempted to use the fit method in Python to fit the pixel coordinates and corresponding grayscale values of our images. We used several samples for fitting, and the vast majority of the samples conformed to a log-normal distribution. Therefore, we attempted to use a log-normal distribution for threshold segmentation. However, calculating the upper threshold of the log-normal distribution easily leads to infinite values, making it impossible to capture bright spots.

distributions = [
        stats.norm,
        stats.gamma,
        stats.lognorm,
        stats.beta,
        stats.weibull_min
    ]

    best_distribution = None
    best_params = None
    best_sse = np.inf

    for distribution in distributions:
        try:
            with warnings.catch_warnings():
                warnings.simplefilter("ignore")
                params = distribution.fit(pixel_values)
                arg = params[:-2]
                loc = params[-2]
                scale = params[-1]
                
                if scale == 0:
                    continue
                
                x = np.linspace(np.min(pixel_values), np.max(pixel_values), 100)
                pdf = distribution.pdf(x, *arg, loc=loc, scale=scale)
                
                hist, bin_edges = np.histogram(pixel_values, bins=100, density=True)
                
                sse = np.sum((hist - pdf)**2)
                
                if sse < best_sse:
                    best_distribution = distribution
                    best_params = params
                    best_sse = sse
        except Exception as e:
            print(f"Error fitting {distribution.name}: {str(e)}")
            continue

    if best_distribution:
        print(f"Best fitting distribution: {best_distribution.name}")
    else:
        print("No distribution could be fitted successfully.")

We consulted relevant materials and found that the log-normal distribution is very similar to the normal distribution, with the distribution shape being quite alike. The main difference is that its probability distribution is shifted to the right. Thus, we tried using the confidence interval of the normal distribution to calculate the bright spots, and the results were very good. Especially, setting the confidence interval at 99.994% showed excellent universality for uniformly illuminated images. We set 99.994% as the default confidence interval for processing images.

However, this confidence interval only applies to uniformly illuminated images, is very sensitive to noise, and imposes very high requirements on the precision of the experiments. We need a reasonable confidence interval that should be independently applicable to each experimental design.

Complementary Solution: Machine Learning-Based Methods

In order to avoid repetitive manual adjustment of confidence intervals and to automate the analysis, we have designed a tool based on the machine learning training-testing paradigm. This tool searches for the optimal confidence interval for the given experimental and control group images, and is better able to analyze individual fluorescent molecules without analyzing aggregates or background impurities.The training-testing process is illustrated in the diagram below:

Figure 4. Training-Testing Scheme

Within the given confidence interval range, incrementally adjust the confidence level by 0.001 each time to find the optimal confidence interval that maximizes the difference in the number of highlights between the experimental and control groups.
Record the maximum difference and the corresponding confidence interval.
Select the first image from the experimental group and the control group for display.

Usage Example

Training/Testing Tool

To process images, you should first determine an appropriate confidence interval.

First, open the training-testing tool page (default to the first page).
In the experimental group section, select the experimental group image files.
In the control group section, select the control group image files.
Enter the upper limit of the desired confidence interval (a confidence interval range that is too large may lead to prolonged training time. Based on extensive sample testing, we recommend starting with a confidence level above 98%).
Enter the lower limit of the desired confidence interval.
Click “Start Training” to begin training.
To return to the training-testing tool, click "Process Page".

Figure 5. Training-Testing Page

Figure 6. Show Training-Testing images

Image Processing Tool

Click “Import Images (.tif, .tiff)” to import .tif and .tiff files.
Analyze the number of image highlights: You can choose between two methods of analysis.

Click “Process Images (Default Confidence Level: 99.994%)” to analyze using the default confidence interval (99.994%).
Enter the confidence interval obtained from training-testing above and click "Use Custom Confidence Interval Processing" to analyze using a custom confidence interval.

Click "Export (.tif)" to export the current single .tif file on the page, or click "Export All (.tiff)" to export the current batch of .tiff files.
Click "Export Distribution Plot" to export the distribution plot image (.png).
Click "Export CSV" to export highlight data (.csv). The first column is the page number corresponding to the image in the .tiff file, the second column is the number of highlights in the image, and the third column is the average number of highlights across all images in the .tiff file.
Click “Previous” and "Next" to view the previous or next image.
To return to the training-testing tool, click "Switch to the Training Testing Page"

Figure 7. Process Page

Future improvements

Our goal is to create cancer diagnostic software that is quickly accessible to everyone. In this direction, we have implemented auxiliary diagnostic functions and can make the following improvements:

Enhance the graphical user interface
Optimize running speed
Develop a mobile version for quick result uploads and diagnostics
Optimize the point selection algorithm

References

Ross, S. M. (1996). Introduction to probability and statistics for engineers and scientists (3rd ed.). Academic Press.