CasPEDIA, or the Cas Protein Effector Database of Information and Assessment, is a community project that organizes Class 2 CRISPR-associated proteins by their functional properties. As an encyclopedia of single-enzyme effectors, CasPEDIA provides comprehensive descriptions of enzyme activities, structures, and sequences, complete with a literature review covering each nuclease's discovery, experimental considerations, and applications. Information presented in CasPEDIA includes citations for further research.
If you would like to contribute to CasPEDIA (e.g., moderate or update content, create new entries, etc.), please see the FAQ section entitled "Contribute to CasPEDIA" or the Contact page of the website for details.
While CRISPR researchers are CasPEDIA's primary audience, we seek to support those new to CRISPR. Individuals interested in learning CRISPR fundamentals are encouraged to read CRISPRpedia, created by The Innovative Genomics Institute.
As a centralized repository of Class 2 Cas systems, CasPEDIA is organized in wiki format, with detailed summaries of each nuclease. Additional resources exist, such as "Advanced Tool Search" and DELTA-BLAST sequence search (default parameters), to assist users in searching for enzymes relevant to their experimental and discovery needs. The website also details the phylogenetic classifications of Class 2 systems, complimenting similar efforts in the CRISPR community. When information is available, wiki entries contain the following sections:
- Quick Links: Three summary panels are available at the top of the website to enable rapid access to information about the nuclease. The three panels include: Classification (CasID information), Properties (essential details, including: protospacer length, PAM/PFS sequence, CDS Length, Number of Amino acids, etc.) and Resources (external links to: RefSeq Genome Assembly for the species of origin, RefSeq Gene ID, RefSeq Protein ID, UniProtKB ID, Conserved Domains Database ID(s), etc.)
- Summary:Brief summary of the nuclease, including its species of origin, context of its discovery, common uses, novel properties, etc.
- Applications: Description of how the Cas nuclease is utilized by scientists. This section contains literature reviews and sub-headers for Gene Editing, Tools and Diagnostics, and Engineered Variants, associated with the effector.
- Experimental Considerations: A high-level introduction on how to perform and design experiments with a specific Cas system. This section contains subheadings for "Delivery" (how to deliver a cas nuclease into a system or model organism using different modalities.) and "Design" (now to design guides against a target sequence with established rules or algorithms, create an expression construct for the nuclease, and other relevant notes to kickstart experimentation.).
- Nucleotide Sequence: A genome-browser view of the nuclease's sequence and architecture of its CRISPR array. Both the nucleotide sequence and genomic coordinates are available for download.
- Protein Structure: A summary of domain and residue information from UNIPROT and Pfam, with structures from PDB or generated with AlphaFold2.
- References: All information in CasPEDIA should be accompanied by a citation. The citations for each wiki are indexed at the bottom of the page in APA style.
All wiki content was curated by an expert panel CRISPR enthusiasts. KOLs (Key Opinion Leaders) for each nuclease were contacted to contribute content and moderate entries. However, CasPEDIA is a community resource, and we are seeking volunteers! If you would like to contribute to CasPEDIA (i.e. moderate/update content, suggest new entries and features, etc.), please see the FAQ section entitled "Update or contribute to CasPEDIA" as well as the "Contact Us" page of the website for more information.
Yes, the paper is currently under review. At Nucleic Acids Research ((DOI pending)).
The CasPEDIA is in beta and the manuscript is currently under review. If you would like to cite CasPEDIA, please reference our preprint.
CasID: Classifying enzymes by function with identification numbers
The homepage of the website provides a detailed description of the CasID system. CasIDs are identification numbers that demarcate the functional properties of each nuclease. Inspired by the EC system (enzyme Commission numbers), CasIDs are comprised of 3 digits. Briefly, the first digit represents Nuclease Activity, the second value indicates the Targeting Requirements of the RNA-guided enzyme, and the third digit summarizes the Guide RNA (gRNA) Design and Multiplexing properties of a nuclease. Across Class 2 systems, there are 9 distinct forms of Nuclease Activity, 7 different Targeting Requirements and 5 unique classes of gRNA Design and Multiplexing considerations.
CasIDs can be interpreted using the lookup table on the homepage of the website . Assessing the three constituent digits of an ID, reveals the nuclease's function. The first integer describes the Nuclease Activity of the enzyme, of which there are 9 distinct types, covering both cis and trans properties. The second value indicates the Targeting Requirements of the RNA-guided enzyme, spanning 7 different types. The third digit summarizes 5 unique classes of Guide RNA (gRNA) Design and Multiplexing properties for Class 2 systems.
To better understand CasIDs, consider SpyCas9a, with a CasID of 1.1.1.
- The Nuclease Activity for this enzyme falls under category 1, implying it has blunt double-strand cis nuclease-activity, and no trans nuclease-activity.1.
- The Targeting Requirements for SpyCas9a fit category 1, meaning it requires a 3' protospacer-adjacent motif (PAM)
- Finally, the Guide RNA (gRNA) Design and Multiplexing properties for this enzyme fall under category 1, such that the native CRISPR array for SpyCas9a requires a CRISPR RNA (crRNA) for targeting + a trans-acting crRNA (tracrRNA) for multiplexing to the protein backbone of the effector + additional factors for processing the array into mature guides. SpyCas9a can also be engineered to utilize a minimal array containing different single-guide RNAs (sgRNAs), represented as a contiguous, all-in-one crRNA + tracrRNA sequence.2,3.
To construct a CasID, refer to the homepage for detailed instructions. Briefly, CasID's are constructed from three digits that represent Nucelase Activity, Targeting Requirements and gRNA Design and Multiplexing. Use the key on the homepage to determine which integer corresponds to the properties of your nuclease.
NAVIGATION: Searching for enzymes and tools
Using the website navigation menu, select "Advanced Tool Search". This redirects to a new page that guides users through a series of drop-down menus, which ultimately generate a table of all Class 2 systems that possess the experimentalist's properties of choice. To elaborate, Advanced Tool Search requires four user-inputted selections: Cis-Activity Substrate, Trans-Activity Substrate, Targeting Requirements and gRNA Design and Multiplexability. Dropdown panels exist for each of the 4 categories and their contents mirror the functional categories allotted to CasIDs. From each dropdown menu, users click which properties they desire and also have the option to select "Any" for categories where an enzyme's functions is flexible.
If no entries in CasPEDIA fit the user's criteria, it returns "No tools fit query." A lack of results is intended to signal innovation opportunities where CasPEDIA users can engineer or discover new classes of editing tools.
The CasPEDIA search bar can be used to search for cas enzymes by name (e.g. SpySas9a, Cas9, Cas13, etc.) or RefSeq Protein ID (e.g. WP_010922251.1). Searches are not case-sensitive, but typos are discouraged for accurate results. Additionally, only one enzyme name can be searched at a time.
For instance, typing "SpyCas9a" into the search bar and clicking "Search" (Note: "Enter" is not a keyboard shortcut), returns a results table containing SpyCas9a and the following columns: Type Protein, CasID, Type Protein Accession (RefSeq protein ID, when available), Nuclease Activity, Targeting Requirement, gRNA Design and Multiplexability, and PAM. Click the text under the "Type Protein" column(i.e. SpyCas9a) to redirect to the wiki page. Broad and ambiguous searches, like "Cas13" are also allowed, and will return a results table with the 7 aforementioned columns, but multiple type proteins for the user to choose from.
Search results can be downloaded as a table by clicking the "Download" button, such that users can store results or continue processing data for other means.
The CasPEDIA search bar can be used to search for cas enzymes by CasID (e.g. 1.1.1, ), and thereby function. Although error-handling for whitespaces is provided, please follow the required syntax for CasIDs (3 digits separated by periods). Additionally, only one CasID can be searched at a time.
For example, typing "1.1.1" into the search bar and clicking "Search" (Note: "Enter" is not a keyboard shortcut), returns a results table containing all proteins with a functional characterization of 1.1.1 (e.g. SpyCas9a, Cas9b, Cas9c, SauCas9, etc.) as rows and 7 columns of metadata including: Type Protein, CasID, Type Protein Accession (RefSeq protein ID, when available), Nuclease Activity, Targeting Requirement, gRNA Design and Multiplexability, and PAM. Click the text under the "Type Protein" column (e.g. SpyCas9a) to redirect to the wiki page associated with any row/entry in the table.
Search results can be downloaded as a table by clicking the "Download" button, such that users can store results or continue processing data for other means.
CasPEDIA wiki entries are organized by Type Protein, or rather by nuclease name and its corresponding species. However, the CRISPR community is rapidly engineering sequences from nature into new variants with novel or optimized properties. All fusion proteins and engineered variants are discussed in the "Engineered Variants" section of the parental Type Protein. While a verbatim term-search for engineered variants is unsupported at this time, a broad term search [[link to term-search FAQ]] for the parental cas enzyme can be used, then engineered variants are enumerated in the "Engineered Variants" section of the parental sequence's wiki entry. Furthermore, a designated page on fusion proteins is also available [[future link]]. If you engineered a new variant or tool and would like to add it to CasPEDIA, please see the FAQ section entitled "Update or contribute to CasPEDIA" as well as the "Contact Us" page of the website for more information.
The CasPEDIA search bar can be used to search for cas enzymes by protein sequence (e.g. SpySas9a, Cas9, Cas13, etc.), using an underlying DELTA-BLAST function that queries against all sequences deposited in CasPEDIA. All sequence-searches most follow fasta syntax (i.e. > sequenceName\n seqeuncestr)Searches are not case-sensitive, and, aside from the that must proceed each sequence, search strings should exclude special characters (e.g. *, white space, line-breaks and punctuation marks). Currently, only one sequence can be inputted at a single time. Currently, the DELTA-BLAST parameters for the search tool are fixed (cannot be edited during runtime) and set to the tool's default settings. If you would like to perform extended BLAST searches, all protein sequences in CasPEDIA can be [[downloaded here]] for external searches.
Sequence searches redirect to a search results table, where each row is a unique Type Protein (aka. Nuclease and species) with 6 columns: Hit Name (type protein), Query Length, Hit Length, BLAST Alignment Length, Query Coverage (%) and PAM. The table is sortable (both ascending and descending ) by all fields in the table, include E-value, to assist users in finding their desired nuclease.
If a BLAST search does not yield any hits, no significant hits against the CasPEDIA database were found, or the input did not adhere to fasta format (please check the input). If the formatting for a failed search was correct and you discovered an entry you'd like to add to CasPEDIA, we'd be happy to help--please see the FAQ section entitled "Update or contribute to CasPEDIA" as well as the Contact page of the website for more information.
Valid search examples: Fasta format
Invalid searches examples:
- Incorrect Fasta formatting (No right caret or sequence name followed by line-break before sequence string):
- Incorrect Fasta formatting (No sequence name followed by line-break before sequence string):
- Nucleotide Sequences:
Scientists with novel cas nucleases can utilize CasPEDIA, even if their enzymes are undocumented. While sequence similarity between proteins does not guarantee equivalent enzymatic function, it is a reasonable proxy and serves as a putative indicator of a novel enzyme's properties. To this end, scientists can leverage the DELTA-BLAST search engine to identify the CasPEDIA entry with the highest sequence-similarity (lowest e-value) to their novel protein. The top DELTA-BLAST hit may have similar function to the candidate protein and the functional information embedded in its CasID can be used to inform a preliminary set of experiments to efficiently characterize the true nature of the novel nuclease. In addition to DELTA-BLAST, sequence lists from all phylogenetic trees in CasPEDIA may be downloaded [[link]] for further analysis, by including the novel nuclease in a new phylogenetic reconstruction, to identify the most closely-related entry in CasPEDIA for functional clues.
Please visit the Tool Finder pager to download all entries in CasPEDIA. To browse all entries in CasPEDIA, perform an empty search (press search) in the homepage search bar.
CURATION AND FUTURE DEVELOPMENTS: New entries and becoming a curator
We are grateful for your interest--CasPEDIA is a community resource and your participation is welcome! If you would like to contribute to CasPEDIA (i.e. moderate/update content, suggest new entries and features, etc.), please visit the "Contact" page of the website for detailed information.
Briefly, anyone can contribute to the database. CasPEDIA content is managed through a series of forms, which are shared amongst curators/moderators for completion. All information in CasPEDIA must be citable through peer-reviewed publications and/or external databases with rigorous curation standards.
If you would like to suggest a new CasPEDIA entry, contact email@example.com with the subject line "Request for new CasPEDIA Entry: New Entry Name." Novel cas nucleases will be considered for the wiki if its function has been described in 1 or more peer-reviewed publications and there is sufficient information to complete the webpage. We are also happy to work with CRISPR pioneers with papers under submission/revision and are willing to contribute to the curation process themselves. If you are unable to curate content for the entry you suggest, please identify a minimum of 5 scientist who'd be able to assist with the entry. The CasPEDIA Consortium will review all potential wiki content to maintain the database's curation standards.
If you are interested in updating a preexisting entry, contact firstname.lastname@example.org with the subject line "Request for to update Entry Name Wiki." Please describe the reason for the update and any citations from literature that are relevant. Our team will review your message and reach out with details regarding next steps or the results of the update. If you find an error in an entry, please include a description, citation supporting your requested amendment and relevant screenshots of the issue.
Bugs can be reported by contacting the team with the subject header "Bug Report." A screenshot of any errors is strongly encouraged, along with a description on any user-activity that elicited the error.
Other inquiries, like new feature requests or feature updates, can also be sent to the CasPEDIA team.
At minimum, CasPEDIA is updated quarterly (every 3 months) to maintain the accuracy of the database by including emergent publications and tools. However, errors or typos identified by the users will be corrected immediately.
CasPEDIA is a living database. The wiki contents and functionality of the site will evolve over time. Users can expect new wiki entries as community engagement increases and novel editing tools emerge. We also anticipate an expansion to Class 1 CRISPR systems and fusion proteins in the future. If you have recommendations for improving CasPEDIA, please see the FAQ section entitled "Contribute to CasPEDIA" as well as the "Contact Us" page of the website for more information.