Thousands of Rare CRISPR Systems Discovered Through New Search Algorithm

Category Science

Wednesday - November 29 2023, 07:42 UTC - 1 year ago

tldr #

Researchers have discovered thousands of rare new CRISPR systems in bacterial genomes that have a range of functions and could enable gene editing, diagnostics, and more. The team used their algorithm, called Fast Locality-Sensitive Hashing-based clustering (FLSHclust) to mine three major public databases. The new systems could potentially be harnessed to edit mammalian cells with fewer off-target effects than current Cas9 systems, and could also one day be used as diagnostics or serve as molecular records of activity inside cells.

content #

By analyzing bacterial data, researchers have discovered thousands of rare new CRISPR systems that have a range of functions and could enable gene editing, diagnostics, and more.

Microbial sequence databases contain a wealth of information about enzymes and other molecules that could be adapted for biotechnology. But these databases have grown so large in recent years that they’ve become difficult to search efficiently for enzymes of interest.

CRISPR systems are used in gene editing and diagnostics

Now, scientists at the McGovern Institute for Brain Research at MIT, the Broad Institute of MIT and Harvard, and the National Center for Biotechnology Information (NCBI) at the National Institutes of Health have developed a new search algorithm that has identified 188 kinds of new rare CRISPR systems in bacterial genomes, encompassing thousands of individual systems. The work was published on November 23 in the journal Science.

The new systems could potentially be harnessed to edit mammalian cells with fewer off-target effects than current Cas9 systems

The algorithm, which comes from the lab of pioneering CRISPR researcher Professor Feng Zhang, uses big-data clustering approaches to rapidly search massive amounts of genomic data. The team used their algorithm, called Fast Locality-Sensitive Hashing-based clustering (FLSHclust) to mine three major public databases that contain data from a wide range of unusual bacteria, including ones found in coal mines, breweries, Antarctic lakes, and dog saliva. The scientists found a surprising number and diversity of CRISPR systems, including ones that could make edits to DNA in human cells, others that can target RNA, and many with a variety of other functions.

The team used their algorithm, called Fast Locality-Sensitive Hashing-based clustering (FLSHclust) to mine three major public databases

The new systems could potentially be harnessed to edit mammalian cells with fewer off-target effects than current Cas9 systems. They could also one day be used as diagnostics or serve as molecular records of activity inside cells.

The researchers say their search highlights an unprecedented level of diversity and flexibility of CRISPR and that there are likely many more rare systems yet to be discovered as databases continue to grow.

CRISPR stands for clustered regularly interspaced short palindromic repeats

"Biodiversity is such a treasure trove, and as we continue to sequence more genomes and metagenomic samples, there is a growing need for better tools, like FLSHclust, to search that sequence space to find the molecular gems," says Zhang, a co-senior author on the study and the James and Patricia Poitras Professor of Neuroscience at MIT with joint appointments in the departments of Brain and Cognitive Sciences and Biological Engineering. Zhang is also an investigator at the McGovern Institute for Brain Research at MIT, a core institute member at the Broad, and an investigator at the Howard Hughes Medical Institute. Eugene Koonin, a distinguished investigator at the NCBI, is co-senior author on the study as well.

FLSHclust clusters together objects that are similar, and is faster and more memory-efficient than other clustering algorithms

CRISPR, which stands for clustered regularly interspaced short palindromic repeats, is a bacterial defense system that has been engineered into many tools for genome editing and diagnostics.

To mine databases of protein and nucleic acid sequences for novel CRISPR systems, the researchers developed an algorithm based on an approach borrowed from the big data community. This technique, called locality-sensitive hashing, clusters together objects that are similar, and it is faster and more memory-efficient than other clustering algorithms.

The study was published on Wednesday, 29 November 2023 in the journal Science

hashtags #

crispr geneediting biodiversity biotechnology moleculargems fastlocalitysensitivehashing

worddensity #

systems (8, 1.62%)
crispr (7, 1.41%)
databases (5, 1.01%)
search (5, 1.01%)
institute (5, 1.01%)