Challenges in Cannabis Genome Sequencing for Genetic Tracking and Traceability

By Khyrrah-Cymone Shepard
Genome sequencing has made remarkable strides since the initiation of “The Human Genome Project” in 1990. Still, there are many challenges that must be overcome before this methodology can reach its fullest potential and be useful in serving as a method of Cannabis sativa genetics verification and tracking throughout the cannabis supply chain. Several major milestones that must be realized include end-to-end haploid type (single, unpaired set of chromosomes instead of complete paired set or “diploid”), long read, resolved genome sequences at a reasonable cost within a reasonable timeframe and with confidence in accuracy (Mostovoy et al.). These genomes are typically generated as shorter reads that are then scaffolded (Fig 1.) or matched to reference genomes in order to build a longer continuous read. While shorter sequencing reads indeed lower the cost barrier for producing more genomic data, it has created another issue as a result of this short-read technology.

Figure 1: Four sets of sequencing data (long-read WGS, Hi-C, optical mapping, and short-read WGS) were produced to generate the goat reference genome. A tiered scaffolding approach using optical mapping data followed by Hi-C proximity-guided assembly produced the highest-quality genome assembly. (Bickhart et al.)

There are two main issues with the more affordable short read sequencing methodology, the first being that sequential variants are typically not detected, especially if they involve a ton of repeats/inverted repeats, due to the limitation of the current referenced Cannabis genomes and the mapping process of the short-read sequences. This is especially unfortunate because larger variants can have up to a 13% variance within a diploid multichromosomal genome, such as Cannabis sativa, and this variance is thought to largely contribute to disease in various species, or maybe terpene profile in Cannabis sativa. Not being able to detect these variances with more affordable sequencing methodologies is particularly problematic and reference genomes produced with short read sequences are typically highly fragmented. The second limitation is the inherent errors, gaps and other ambiguities associated with taking tons of short read sequences and combining them all, like a jigsaw puzzle, in order to draft the larger genomic picture. While there is software with algorithms to assist in deciphering raw sequences, there is still much more work to be done on this challenge, considering that cannabis genome sequencing is new genomics territory. Unfortunately, as researchers seek higher and higher levels of data quality, shortcomings of this type of sequencing technology begin to become apparent. This sort of sequencing methodology relies heavily on reference sequences. This isn’t much of an issue with microbial genomes, which tend to be rather short and typically have one chromosome, however, when seeking to analyze much longer genomes with multiple diploid chromosomes and tons of mono and dinucleotide repeats, problems arise (English et al.).

Figure 2: Blockchain Digital Stamping Certificate which publicly documents the date and time of the completion of this work. (Mckernan – Crypto Funded Public Genomics)

The other category of sequencing is long read sequencing. Long read sequencing is as it sounds, the deciphering of much longer DNA strands. Of course, the technology is limited by the quality of the DNA captured, therefore, special high molecular weight DNA extraction protocols must be deployed in order to obtain the proper DNA quality (Fig. 3). Once this initial limitation is overcome there is the stark cost of long read sequencing technology. PacBio without a doubt makes one of the highest quality long read sequence generating instruments that has ever graced the field of biotechnology, but due to the steep price tag of the machine, progress in this field has been stifled simply because it just isn’t affordable and the read depth for mammalian and plant genomes is currently almost completely prohibitive until read lengths double in length for this instrumentation. In order to produce what is considered to be a “validated genome” both short read and long read sequencing methodologies are combined. Long read sequencing data is used to produce the reference contigs because they are much easier to assemble, then short read sequencing is scaffolded against the reference contigs as a sort of “consensus validation” of the long read contigs.

Figure 3: Depiction of various DNA high molecular weight DNA quality captured during cannabis genome submission project. (Mckernan – Crypto Funded Public Genomics)

Despite the shortcoming of utilizing short read sequencing technology for analysis of the cannabis genome, it is still useful especially when combined with other longer read sequencing technologies or optical mapping technologies. Kevin McKernan, chief scientific officer of Medicinal Genomics, has been working feverishly to bridge the information gap between the cannabis genome and other widely studied plant genomes. As a scientist that worked on the Human Genome Project in 2001, McKernan has a demonstrated history of brilliance in the field of genomics. This paved the way for him to coordinate the first crypto funded and blockchain notarized sequencing project (DASH DAO funded) (Fig. 2), which was completed in 60 days, and surprisingly showed that the cannabis genome is over 1 billion bases long which is 30% larger than any cannabis genome submitted prior to his work. By reaching the standard of 500kb N50 set forth by the Human Genome Project, Kevin McKernan was able to see new aspects of the cannabis genome that were not visible due to the fragmented genomic data previously generated. Information such as a possible linkage of THCA synthase and CBDA synthase genes is crucial when seeking to use the cannabis genome for verification and tracking purposes. This is because special linkages can be considered a type of “genetic marker” that may be used to differentiate cannabis cultivars and lineages. There are many types of genetic markers, including SNP (single nucleotide polymorphisms), VNTR (variable number tandem repeats) and even patterns of gene expression. Funding and recording of cannabis genomics must be further developed in order for potential markers to be identified and validated via larger scale genome-wide association studies.

These technologies, when combined, often reduce the number of scaffolds while increasing the percent of resolved genome by filling in gaps within the drafted genome. Nanopore sequencing is an especially interesting and innovative sequencing technology that is useful in many ways. One of the most powerful uses of this technology is its ability to upgrade the quality of draft and pushed genomes by resolving poorly organized genomes and genomic structure for a fraction of the time and cost of other long read sequencing platforms (Jian et al.), making it an excellent candidate for solving cost and time constraints. Nanopore’s portability and convenience makes it a real-time solution to solving genetics-based problems and questions. A notable use of this technology is recorded during an epidemiological outbreak in Africa, its proof of concept in pathogen detection in space, and its ability to detect base modifications during sequencing process. Even still there are more uses to this exciting technology and it has the potential to elevate cannabis genomics and the field of genomics entirely, while remaining portable and expeditious. A shortcoming of the Nanopore sequencing platform is its low sequencing coverage, which makes this platform inefficient for applications like haplotype phasing and single nucleotide variant detection due to the number of variants to be detected being smaller than the published variant-detection error rates of algorithms using MinION data. Single nucleotide variants can be considered to be genetic markers, especially markers for disease, so this is what inhibits Nanopore from resolving our cannabis genome sequencing problems, as of today.

There are genetic markers to discover, molecular biology protocols to optimize, and industry wide potential for exciting collaborationMany algorithmic problems seem to occur due to input data quality. Typical input data quality suffers as the reads get longer and the sequencing depth gets shorter, resulting in not enough data being generated by the sequencing to provide confidence in the genome assembly. To mitigate this, scientists may decide to fractionate a genome, sequence it, or they may clone a difficult to sequence region with highly repetitive regions in order to produce reads with greater depth and thus resolve the region. They can then perform single molecule sequencing to resolve genome structure then determine and confirm the place of the cloned region. Thus, it seems that the best solution to the limitation of algorithms is to be aware of sequencing platform limitations and compensate for these limitations by using more than one sequencing platform to obtain enough pertinent data to confidently produce authentic, “validated” genome assemblies (Huddleston et al.). With input data being critical in producing accurate sequencing data, standardization of DNA isolation protocols, extraction reagents and any enzymes utilized may be deemed necessary.

To conclude, the field of cannabis genomics is teeming with opportunities. There are genetic markers to discover, molecular biology protocols to optimize, and industry wide potential for exciting collaboration. More states will need to take into account the lack of federal government research grant availability and begin to think of creative ways to get cannabis science funds to continue the development of this industry. Specifically speaking, developing a feasible method for genetic tracking of cannabis plants will require improvements within the availability of sequencing technology, improvements in deploying the resources to these projects in order for them to be completed expeditiously, and standardization/validation of methods and SOPs used in order to increase confidence in the accuracy of the data generated.

A special thank you to all of my cannabis industry mentors that have molded and elevated my understanding of current needs and applied technologies within the cannabis industry, without you there would be no career within this industry for me. You are immensely appreciated.


The Ever-Growing Importance of Protecting Cannabis Extraction Innovations

By Alison J. Baldwin, Brittany R. Butler, Ph.D., Nicole E. Grimm
With legalization of cannabis for medicinal and adult use occurring rapidly at the state level, the industry is seeing a sharp increase in innovative technologies, particularly in the area of cannabis extraction. Companies are developing novel extraction methods that are capable of not only separating and recovering high yields of specific cannabinoids, but also removing harmful chemicals (such as pesticides) from the concentrate. While some extraction methods utilize solvents, such as hydrocarbons, the industry is starting to see a shift to completely non-solvent based techniques or environmentally friendly solvents that rely on, for example, CO2, heat and pressure to create a concentrate. The resulting cannabis concentrate can then be consumed directly, or infused in edibles, vape pens, topicals and other non-plant based consumption products. With companies continually seeking to improve existing extraction equipment, methods and products, it is critical for companies working in this area to secure their niche in the industry by protecting their intellectual property (IP).

Extraction can be an effective form of remediating contaminated cannabis

Comprehensive IP protection for a business can include obtaining patents for innovations, trademarks to establish brand protection of goods and services, copyrights to protect logos and original works, trade dress to protect product packaging, as well as a combination of trade secret and confidentiality agreements to protect proprietary information and company “know-how” from leaking into the hands of competitors. IP protection in the cannabis space presents unique challenges due to conflicting state and federal law, but for the most part is available to cannabis companies like any other company.

Federal trademark protection is currently one of the biggest challenges facing cannabis companies in the United States. A trademark or service mark is a word, phrase, symbol or design that distinguishes the source of goods or services of one company from another company. Registering a mark with the U.S. Patent and Trademark Office (USPTO) provides companies with nationwide protection against another company operating in the same space from also using the mark.

As many in the industry have come to discover, the USPTO currently will not grant a trademark or service mark on cannabis goods or services. According to the USPTO, since cannabis is illegal federally, marks on cannabis goods and services cannot satisfy the lawful use in commerce requirement of the Lanham Act, the statute governing federal trademark rights. Extraction companies that only manufacture cannabis-specific equipment or use cannabis-exclusive processes will likely be unable to obtain a federal trademark registration and will need to rely on state trademark registration, which provides protection only at the state-level. However, extractors may be able to obtain a federal trademark on their extraction machines and processes that can legitimately be applied to non-cannabis plants. Likewise, companies that sell cannabis-infused edibles may be able to obtain a federal trademark on a mark for non-cannabis containing edibles if that company has such a product line.

Some extraction companies may benefit from keeping their innovations a trade secretSince the USPTO will not grant marks on cannabis goods and services, a common misconception in the industry is that the USPTO will also not grant patents on cannabis inventions. But, in fact, the USPTO will grant patents on a seemingly endless range of new and nonobvious cannabis inventions, including the plant itself. (For more information on how breeders can patent their strains, see Alison J. Baldwin et al., Protecting Cannabis – Are Plant Patents Cool Now? Snippets, Vol. 15, Issue 4, Fall 2017, at 6). Unlike the Lanham Act, the patent statute does not prohibit illegal activity and states at 35 U.S.C. § 101 that a patent may be obtained for “any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof.”

For inventions related to extraction equipment, extraction processes, infused products and even methods of treatment with concentrated formulations, utility patents are available to companies. Utility patents offer broad protection because all aspects related to cannabis extraction could potentially be described and claimed in the same patent. Indeed, there are already a number of granted patents and published patent applications related to cannabis extraction. Recently, U.S. Patent No. 9,730,911 (the ‘911 patent), entitled “Cannabis extracts and methods of preparing and using same” that granted to United Cannabis Corp. covers various liquid cannabinoid formulations containing very high concentrations of tetrahydrocannabinolic acid (THCa), tetrahydrocannabinol (THC), cannabidiol (CBD), THCa and cannabidiolic acid, THC and CBD, and CBD, cannabinol (CBN), and THC. For example, claim 1 of the ‘911 patent recites:

A liquid cannabinoid formulation, wherein at least 95% of the total cannabinoids is tetrahydrocannabinolic acid (THCa).Properly crafted non-disclosure agreements can help further ensure that trade secrets remain a secret indefinitely.

Although the ‘911 patent only covers the formulations, United Cannabis Corp. has filed a continuation application that published as US2017/0360745 on methods for relieving symptoms associated with a variety of illnesses by administering one or more of the cannabinoid formulations claimed in the ‘911 patent. This continuation application contains the exact same information as the ‘911 patent and is an example of how the same information can be used to seek complete protection of an invention via multiple patents.

An example of a patent application directed to solvent-based extraction methods and equipment is found in US20130079531, entitled “Process for the Rapid Extraction of Active Ingredients from Herbal Materials.” Claim 1 of the originally filed application recites:

A method for the extraction of active ingredients from herbal material comprising: (i) introducing the herbal material to a non-polar or mildly polar solvent at or below a temperature of 10 degrees centigrade and (ii) rapidly separating the herbal material from the solvent after a latency period not to exceed 15 minutes.

Claim 12, covered any equipment designed to utilize the process defined in claim 1.

Although now abandoned, the claims of this application were not necessarily limited to cannabis, as the claims were directed to extracting active ingredients from “herbal materials.”

Other patents involve non-toxic extraction methods utilizing CO2, such as Bionorica Ethics GMBH’s U.S. Patent No. 8,895,078, entitled “Method for producing an extract from cannabis plant matter, containing a tetrahydrocannabinol and a cannabidiol and cannabis extracts.” This patent covers processes for producing cannabidiol from a primary extract from industrial hemp plant material.

There have also been patents granted to cannabis-infused products, such as U.S. Patent No. 9,888,703, entitled “Method for making coffee products containing cannabis ingredients.” Claim 1 of this patent recites:

A coffee pod consisting essentially of carbon dioxide extracted THC oil from cannabis, coffee beans and maltodextrin.

Despite the USPTO’s willingness to grant cannabis patents, there is an open question currently regarding whether they can be enforced in a federal court (the only courts that have jurisdiction to hear patent cases). However, since utility patents have a 20-year term, extractors are still wise to seek patent protection of the innovations now.

Another consideration in seeking patent protection for novel extraction methods and formulations is that the information becomes public knowledge once the patent application publishes. As this space becomes increasingly crowded, the ability to obtain broader patents will decline. Therefore, some extraction companies may benefit from keeping their innovations a trade secret, which means that the secret is not known to the public, properly maintained and creates economic value by way of being a secret. Properly crafted non-disclosure agreements can help further ensure that trade secrets remain a secret indefinitely.

Regardless of the IP strategy extractors choose, IP protection should be a primary consideration for companies in the cannabis industry to ensure the strongest protection possible both now and in the future.

Cannabis and the Environment: Navigating the Interplay Between Genetics and Transcriptomics

By Dr. Zacariah Hildenbrand
It is that time of year where the holidays afford us an opportunity for rest, recuperation and introspection. Becoming a new father to a healthy baby girl and having the privilege to make a living as a scientist, fills me with an immeasurable sense of appreciation and indebtedness. I’ve also been extremely fortunate this year to spend significant time with world-renowned cannabis experts, such as Christian West, Adam Jacques and Elton Prince, whom have shared with me a tremendous wealth of their knowledge about cannabis cultivation and the development of unique cannabis genetics. Neither of these gentlemen have formal scientific training in plant genetics; however, through decades of experimentation, observation and implementation, they’ve very elegantly used alchemy and the principles of Mendelian genetics to push the boundaries of cannabis genetics, ultimately modulating the expression of specific cannabinoids and terpenes. Hearing of their successes (and failures) has triggered significant wonderment and curiosity with respect to what can be done beyond the genetic level to keep pushing the equilibrium in this new frontier of medicine.

Lighting conditions can greatly impact the expression of terpenes (and cannabinoids) in cannabis.Of course genetics are the foundation for the production of premium cannabis. Without the proper genetic code, one cannot expect the cannabis plant to express the target constituents of interest. However, what happens when you have an elite genetic code, the holy grail of cannabis nucleotides if you will, and yet your plant does not produce the therapeutic compounds that you want and/or that are reflective of that elite genetic code? This ‘loss in translation’ can be explained by transcriptomics, and more specifically, epigenetics. In order for the genetic code (DNA) to be expressed as a gene product (RNA), it must be transcribed, a process that is modulated by epigenetic processes like DNA methylation and histone modification. In other words, the methylation of the genetic code can dictate whether or not a particular segment of DNA is transcribed into RNA, and ultimately expressed in the plant. To put this into context, if the DNA code for the enzyme THCA synthase is epigenetically silenced, then no THCA synthase is produced, your cannabis cannot convert CBGA into THCA, and now you have hemp that is devoid of THC.So what is the best lighting technology to enhance the expression of terpenes? 

With all of that being said, how do we ensure that our plants thrive under favorable epigenetic conditions? The answer is the environment; and the expression of terpenes is an ideal indicator of favorable environmental conditions. While amazing anti-inflammatories, anti-oxidants and metabolic regulators for humans, terpenes are also extremely powerful anti-microbial agents that act as a robust a line of defense for the plant against bacteria and pests. So, if the threat of microbes can induce the expression of terpenes, then what about other environmental factors? I am of the opinion that the combination of increased exposure to bacteria and natural sunlight enhances the expression of terpenes in outdoor-grown cannabis compared to indoor-grown cannabis. This is strictly my opinion based off of my own qualitative observations, but the point being is that lighting conditions can greatly impact the expression of terpenes (and cannabinoids) in cannabis.

A plant in flowering under an LED fixture

So what is the best lighting technology to enhance the expression of terpenes? Do I use full spectrum lighting or specific frequencies? The answer to these questions is that we don’t fully know at this point. Thanks to the McCree curve we have a fundamental understanding of the various frequencies within the visible light spectrum (400-700nm) that are beneficial to plants, also known as Photosynthetically Active Radiation (PAR). However, little-to-no research has been conducted to determine the impacts that the rest of the electromagnetic spectrum (also categorized as ‘light’) may have on plants. As such, we do not know with 100% certainty what frequencies should be applied, and at what times in the growth cycle, to completely optimize terpene concentrations. This is not to disparage the lighting professionals out there that have significant expertise in this field; however, I’m calling for the execution of peer-reviewed experiments that would transcend the boundaries of company white papers and anecdotal claims. In my opinion, this lack of environmental data provides a real opportunity for the cannabis industry to initiate the required collaborations between cannabis geneticists, technology companies and environmental scientists. This is one field of research that I wish to pursue with tenacity and I also welcome other interested parties to join me in this data quest. Together we can better understand the environmental factors, such as lighting, that are acting as the molecular light switches at the interface of genetics and transcriptomics in cannabis.

Quality From Canada

Near Infrared, GC and HPLC Applications in Cannabis Testing

By Tegan Adams, Michael Bertone

When a cannabis sample is submitted to a lab for testing there is a four-step process that occurs before it is tested in the instrumentation on site:

  1. It is ground at a low temperature into a fine powder;
  2. A solution is added to the ground powder;
  3. An extraction is repeated 6 times to ensure all cannabinoids are transferred into a common solution to be used in testing instrumentation.
  4. Once the cannabinoid solution is extracted from the plant matter, it is analyzed using High Pressure Liquid Chromatograph (HPLC). HPLC is the key piece of instrumentation in cannabis potency testing procedures.

While there are many ways to test cannabis potency, HPLC is the most widely accepted and recognized testing instrumentation. Other instrument techniques include gas chromatography (GC) and thin layer chromatography (TLC). HPLC is preferred over GC because it does not apply heat in the testing process and cannabinoids can then be measured in their naturally occurring forms. Using a GC, heat is applied as part of the testing process and cannabinoids such as THCA or CBDA can change form, depending on the level of heat applied. CBDA and THCA have been observed to change form at as low as 40-50C. GC uses anywhere between 150-200C for its processes, and if using a GC, a change of compound form can occur. Using HPLC free of any high-heat environments, acidic (CBDA & THCA) and neutral cannabinoids (CBD, THC, CBG, CBN and others) can be differentiated in a sample for quantification purposes.

Near Infrared

Near infrared (NIR) has been used with cannabis for rapid identification of active pharmaceutical ingredients by measuring how much light different substances reflect. Cannabis is typically composed of 5-30% cannabinoids (mainly THC and CBD) and 5-15% water. Cannabinoid content can vary by over 5% (e.g. 13-18%) on a single plant, and even more if grown indoors. Multiple NIR measurements can be cost effective for R&D purposes. NIR does not use solvents and has a speed advantage of at least 50 times over traditional methods.

The main downfall of NIR techniques is that they are generally less accurate than HPLC or GC for potency analyses. NIR can be programmed to detect different compounds. To obtain accuracy in its detection methods, samples must be tested by HPLC on ongoing basis. 100 samples or more will provide enough information to improve an NIR software’s accuracy if it is programmed by the manufacturer or user using chemometrics. Chemometrics sorts through the often complex and broad overlapping NIR absorption.

Bands from the chemical, physical, and structural properties of all species present in a sample that influences the measured spectra. Any variation however of a strain tested or water quantity observed can affect the received results. Consistency is the key to obtaining precision with NIR equipment programming. The downfall of the NIR technique is that it must constantly be compared to HPLC data to ensure accuracy.

At Eurofins Experchem , our company works with bothHPLC and NIR equipment simultaneously for different cannabis testing purposes. Running both equipment simultaneously means we are able to continually monitor the accuracy of our NIR equipment as compared to our HPLC. If a company is using NIR alone however, it can be more difficult to maintain the equipment’s accuracy without on-going monitoring.

What about Terpenes?

Terpenes are the primary aromatic constituents of cannabis resin and essential oils. Terpene compounds vary in type and concentration among different genetic lineages of cannabis and have been shown to modulate and modify the therapeutic and psychoactive effects of cannabinoids. Terpenes can be analyzed using different methods including separation by GC or HPLC and identification by Mass Spectrometry. The high-heat environment for GC analysis can again cause problems in accuracy and interpretation of results for terpenes; high-heat environments can degrade terpenes and make them difficult to find in accurate form. We find HPLC is the best instrument to test for terpenes and can now test for six of the key terpene profiles including a-Pinene, Caryophyllene, Limonene, Myrcene, B-Pinene and Terpineol.

Quality Systems

Quality systems between different labs are never one and the same. Some labs are testing cannabis under good manufacturing practices (GMP), others follow ISO accreditation and some labs have no accreditation at all.

From a quality systems’ perspective some labs have zero or only one quality system employee(s). In a GMP lab, to meet the requirements of Health Canada and the FDA, our operations are staffed in a 1:4 quality assurance to analyst ratio. GMP labs have stringent quality standards that set them apart from other labs testing cannabis. Quality standards we work with include, but are not limited to: monthly internal blind audits, extensive GMP training, yearly exams and ongoing tests demonstrating competencies.

Maintaining and adhering to strict quality standards necessary for a Drug Establishment License for pharmaceutical testing ensures accuracy of results in cannabis testing otherwise difficult to find in the testing marketplace.

Important things to know about testing

  1. HPLC is the most recommended instrument used for product release in a regulated environment.
  2. NIR is the best instrument to use for monitoring growth and curing processes for R&D purposes, only if validated with an HPLC on an ongoing basis.
  3. Quality Systems between labs are different. Regardless of instrumentation used, if quality systems are not in place and maintained, integrity of results may be compromised.
  4. GMPs comprise 25% of our labour costs to our quality department. Quality systems necessary for a GMP environment include internal audits, out of specification investigations, qualification and maintenance of instruments, systems controls and stringent data integrity standards.

An Introduction to Cannabis Genetics, Part II

By Dr. CJ Schwartz
Plants and animals have roughly 25,000 to 30,000 genes. The genes provide the information needed to make a protein, and proteins are the building blocks for all biological organisms. An ideal analogy is a blueprint (DNA) for an alternator (the protein) in a car (the plant). Proteins are the ‘parts’ for living things. Some proteins will work better than others, leading to visible differences that we call phenotypes.

geneticspaintedchromMany traits, and the genes controlling them, are of interest to the cannabis industry. For hemp seed oil, quality, quantity and content can be manipulated through breeding natural genetic variants. Hemp fibers are already some of the best in nature, due to their length and strength. Finding the genes and proteins responsible for elongating the fibers can allow for the breeding of hemp for even longer fibers. In cannabis, the two most popular genes are THCA and CBDA synthases. There are currently over 100 sequences of the THCAS/CBDAS genes, and many natural DNA variations are known. We can make a family tree using just the THCAS, gene data and identify ‘branches’ that result in high, low or intermediate THCA levels. Generally most of the DNA changes have little to no effect on the gene, but some of the changes can have profound effects.

In fact, CBDAS and THCAS are related, in other words, they have a common ancestor. At some point the gene went through changes that resulted in the protein producing CDBA, or THCA or both. This is further supported by the fact that certain CBDAS can produce some THCA, and vice-versa. Studies into the THCAS and CBDAS family are ongoing and extensive, with terpene synthase genes following close behind.

Identifying gene (genetic) variants and characterizing their biological function allows us to combine certain genes in specific combinations to maximize yield, but determining which genes are important (gene discovery) is the first step to utilizing marker-assisted breeding.

Gene Discovery & Manipulation

The term genetics is often misused in the cannabis industry. Genetics is actually “the study of heredity and the variation of inherited characteristics.” When people say they have good genetics, what they really mean is that they have good strains, presumably with good gene variants. When people begin to cross or stabilize strains, they are performing genetic manipulation.Slide1

A geneticist will observe or measure two strains of interest, for example a plant branching and myrcene production. The high-myrcene plant is tall and skinny with no branching, reducing the yield. Crossing the two strains will produce F1 hybrid seeds. In some cases, F1 hybrids create unique desirable phenotypes (synergy) and the breeder’s work is completed. More often, traits act additively, thus we would expect the F1 to be of medium branching and medium myrcene production, a value between that of the values recorded for the parents (additive). Crossing F1 plants will produce an F2 population. An F2 population is comprised of the genes from both parents all mixed up. In this case we would expect the F2 progeny to have many different phenotypes. In our example, 25% of the plants would branch like parent A, and 25% of the F2 plants will have high myrcene like parent B. To get a plant with good branching and high myrcene, we predict that 6.25% (25% x 25%) of the F2 plants would have the correct combination.

The above-described scenario is how geneticists assign gene function, or generally called gene discovery. When the gene for height or branching is identified, it can now be tracked at the DNA level versus the phenotype level. In the above example, 93.5% of your F2 plants can be discarded, there is no need to grow them all to maturity and measure all of their phenotypes.Slide1

The most widely used method for gene discovery using natural genetic variation is by quantitative trait loci mapping (QTL). For these types of experiments, hundreds of plants are grown, phenotyped and genotyped and the data is statistically analyzed for correlations between genes (genotype) and traits (phenotype; figure). For example, all high-myrcene F2 plants will have one gene in common responsible for high myrcene, while all the other genes in those F2 plants will be randomly distributed, thus explaining the need for robust statistics. In this scenario, a gene conferring increased myrcene production has been discovered and can now be incorporated into an efficient marker-assisted breeding program to rapidly increase myrcene production in other desirable strains.