Cas9 is the most widely used gene editor of the CRISPR family and the first CRISPR nuclease with demonstrated programmable RNA-guided activity 1. The conventional classification divides type II CRISPR-Cas, or Cas9, systems into subtypes II-A, II-B, and II-C, and is based on multiple criteria. However, the basis of this classification is the presence of signature proteins in the CRISPR-Cas9 locus and the locus organization 2,3,4,5,6. A finer division of Type II proteins into clades I-X based on Cas9 phylogeny, which reproduces the subtypes II-A,B,C separation into groups of clades, was previously published 7. Finally, recent studies have reported functional characterization and gene editing ability of compact RNA-guided ancestors of Cas9 proteins, which are referred to as IscBs and HEAROs. In addition, a new subtype II-D that includes compact Cas9s homologous to these ancestors has been proposed 8,9,10.

For convenience, CasPEDIA phylogenetic tree integrates two classifications of Cas9 according to subtypes A,B,C, and to phylogenetic clades I-X, and in addition, includes clade U with type II-D, and IscB, HEARO RNA-guided nucleases.

The phylogenetic tree was generated with Cas9 sequences acquired from (Gasiunas et al. 2020) with minor modifications, sequences from public databases, from selected studies with in vivo editing data 11,12,13,14,15,16,17,18,19,8,9, and from the literature on published Cas9 and IscB structures 20,21,22,23,24,25,26,27,27,28,29.

Cas12 enzymes are RNA-guided nucleases and are the defining component of Type V CRISPR systems 30. These enzymes appear to have evolved from TnpB proteins encoded within IS200/IS605 family mobile genetic elements, potentially on multiple occasions 30,31,8. Of the Class 2 CRISPR enzymes, Cas12 proteins appear to be the most biochemically diverse by Cas enzymatic classification and thus suited for a wide variety of applications including, but also beyond, gene editing 32,33,34. With the notable exception of Cas12k which directs transposition of a Tn7 transposase, the activity of these enzymes tends to be self-contained and not reliant on additional proteins 34. In addition to being a biochemically diverse family, they exhibit great genetic diversity, with some subtypes forming polyphyletic clades (ex. Cas12f) 35. As within-subfamily clades continue to be characterized biochemically, researchers have (and will likely continue to) find divisions of enzymatic activity, such as recently reported with Cas12c2 and Cas12a2 36,37,38.

The above tree was produced using data from the most recent phylogenetic analysis of Cas12 proteins 30as well as recent reports of Cas12 enzymes since then including Cas12j (previously known as CasPhi), CasLambda and Cas12L 39,40,41and Cas12 ancestor, TnpB 31. In general Cas12 proteins are labeled a to n in order of discovery/characterization except for CasPhi and CasLambda owing to their viral origin 39,40. Some Cas12 proteins without known function appear widespread and have only recently been characterized: Cas12k (Cas12 U5), Cas12m (Cas12 U1), and Cas12n (Cas12 U4) 34,42,43. Two remaining Cas12 families are to-date uncharacterized, Cas12 U2 and Cas12 U3. Only complete Cas12 sequences (using non-truncated RuvC domains) were used. From these sequences, Cas12 proteins were aligned using MUSCLE, trees constructed with IQ-TREE, rooted against TnpB, and visualized with iTOL 44,45,46. Trees were manually curated by pruning errant Cas12 leaves.

All known Cas13 enzymes are RNA-guided RNA endonucleases and are the defining component of Type VI CRISPR systems 30. In contrast to Cas9 and Cas12, the primary activity of Cas13 proteins appears to be due to its trans-RNAse activity conferred by HEPN domains 47,48. These enzymes vary by the extent and bias of HEPN activity, having a strong influence on how they are used in diverse applications ranging from RNA-targeting to RNA diagnostics 49,50,51,52. Unlike Cas9 and Cas12 enzymes, the evolutionary origins of Cas13 enzymes remains unclear 30. These enzymes are organized into 4 subtypes, Cas13a-Cas13d, where characterization of Cas13c remains unknown (although assumed to target RNA). While these enzymes are generally capable of processing their own CRISPR array and function as standalone enzymes, they occasionally co-occur with additional proteins that appear co-functional 53,54,55.

The reported Cas13 tree was recreated from a recent report on Cas13 phylogeny 56. Briefly Cas13-annotated sequences from subtypes a-d were gathered from NCBI and GTDB r95 57using Hidden Markov Models. Sequences lacking two R/Q/N/K/H****H sequence motifs were assumed to be truncated proteins and removed. Sequences were clustered with CD-HIT v4.8.1 with length and sequence similarity cutoffs of 0.9 58. Sequences were aligned with MUSCLE v3.8.31, tree built with IQ-TREE v1.6.12, and visualized with iTOL 44,45,46. Trees were manually curated by pruning errant Cas13 leaves.

