Expansion of the Catalytic Repertoire of Alcohol Dehydrogenases in Plant Metabolism

Abstract Medium‐chain alcohol dehydrogenases (ADHs) comprise a highly conserved enzyme family that catalyse the reversible reduction of aldehydes. However, recent discoveries in plant natural product biosynthesis suggest that the catalytic repertoire of ADHs has been expanded. Here we report the crystal structure of dihydroprecondylocarpine acetate synthase (DPAS), an ADH that catalyses the non‐canonical 1,4‐reduction of an α,β‐unsaturated iminium moiety. Comparison with structures of plant‐derived ADHs suggest the 1,4‐iminium reduction does not require a proton relay or the presence of a catalytic zinc ion in contrast to canonical 1,2‐aldehyde reducing ADHs that require the catalytic zinc and a proton relay. Furthermore, ADHs that catalysed 1,2‐iminium reduction required the presence of the catalytic zinc and the loss of the proton relay. This suggests how the ADH active site can be modified to perform atypical carbonyl reductions, providing insight into how chemical reactions are diversified in plant metabolism.

. 1 Table S8. Genbank accession for sequences used to construct tree of maximum likelihood in Figure 4A.

Chemicals and molecular biology reagents
All solvents used for extractions, chemical synthesis and preparative HPLC were HPLC grade, and solvents used for UPLC/MS were MS grade. All solvents were purchased from Sigma Aldrich. Carbenicillin, kanamycin sulfate, isopropyl β-D-thiogalactoside (IPTG) salts were purchased from Sigma. Synthetic genes were purchased from IDT. All gene amplifications and mutations were performed using Platinum II Superfi DNA Polymerase (Thermo Fisher). Constructs were transformed into vectors using In-Fusion kit (ClonTech Takara) and colony PCR was performed using Phire II mastermix (Thermo Fisher) according to manufacturer's instructions. PCR product purification was performed using Zymoclean Gel DNA Recovery kit (Zymo). Plasmid purification was performed using the Wizard Miniprep kit (Promega). Strictosidine, precondylocarpine acetate, stemmadenine acetate, angryline, vincadifformine, 19-Egeissoschizine and tetrahydroalstonine were enzymatically prepared and purified as previously described [1][2][3][4] .

Cloning and mutagenesis
Cloning of CrDPAS, TiDPAS1, TiDPAS2, CrGS and CrTHAS has been previously reported [1,2,4,5] . Full-length CrDPAS, TiDPAS2, GS and THAS were amplified by PCR from the codon optimized synthetic genes listed in Table S2 using corresponding primers listed in Table S1. Thermoanaerobacter brockii alcohol dehydrogenase (TbADH) synthetic gene (Table S2) was cloned into the pOPINF vector. DPAS, GS and THAS mutants were generated by overlap extension PCR as previously reported [6] . PCR products were purified from 1% agarose gel and ligated into the BamHI and KPNI restriction sites of pOPINK vectors for small-scale GS and GS mutants. All other ADHs were cloned into pOPINF vector. pOPINF and pOPINK were a gift from Ray Owens (Addgene plasmid # 26042 and # 41143 [7] ). Constructs were ligated into vectors using the In-Fusion kit (Clontech Takara).

Expression and purification of proteins in E. coli
Constructs were transformed into chemically-competent E. coli Stellar cells (Clontech Takara) by heat shock at 42°C for 30 seconds and selected on LB agar containing 50μg/mL carbenicillin or kanamycin for pOPINF or pOPINK constructs respectively. Positive colonies were screened by colony PCR using primers listed in Table S1 and grown overnight at 37°C shaking at 200 r.p.m. Plasmids were then isolated and constructs were sequence verified. Plasmids were transformed into chemically competent E. coli SoluBL21 cells by heat shock for 30 seconds at 42°C and selected on LB agar containing 50 μg/mL carbenicillin or kanamycin for pOPINF or pOPINK constructs respectively. For small scale protein purification, 10 mL starter cultures of LB with 50 μg/mL of the respective antibiotic and a colony of transformed construct in SoluBL21 cells were grown at 37°C 200 r.p.m. overnight. Media (100 mL 2xYT media) containing 50 μg/mL antibiotic was inoculated with 1 mL of the starter culture and grown until OD600 of 0.6 was reached. For large scale purification, 20 mL starter cultures of LB with antibiotic and a colony of transformed construct in SoluBL21 cells were grown at 37°C 200 r.p.m. overnight. Media (1L 2xYT media) containing 50 μg/mL carbenicillin was inoculated with 10 mL of starter culture and grown until OD600 of 0.6 was reached. Once cultures had reached the desired OD600, cultures were transferred to 18°C 200 r.p.m shaking incubator for 30 minutes before protein expression was induced by addition of 300 μM IPTG, after which cultures were grown for an additional 16 hours.

CrPAS insect cell expression
N-terminal His6-tagged CrPAS was expressed in Sf9 insect cells as previously described [1] . Cells were harvested by centrifugation and the pellets frozen at -80°C until large-scale purification.

CrDPAS, CrGS and CrTHAS small-scale protein expression and purification
Cells were harvested by centrifugation at 4000 x g for 15 minutes and re-suspended in 10 mL buffer A1 (50 mM Tris-HCl pH 8, 50 mM glycine, 500 mM NaCl, 5% glycerol, 20 mM imidazole) with addition of EDTA-free protease inhibitor cocktail (Roche Diagnostics Ltd.) and 10 mg lysozyme (Sigma). Cells were lysed at 4°C using a sonicator (40% amplitude, 2 seconds on, 3 seconds off cycles for 2 minutes) and centrifuged at 35000 x g to remove insoluble cell debris. The supernatant was collected and filtered with 0.2 um PES syringe filter (Sartorious) and purified by addition of 150 μL washed Ni-NTA agarose beads (QIAGEN). Samples were incubated on a rocking incubator at 4°C for 1 hour. Beads were washed by centrifuging at 1000 x g for 1 minute to remove the supernatant, and then the beads were resuspended in 10 mL of A1 Buffer. This step was performed a total of three times. Protein was eluted by resuspending the beads in 600 μL of buffer B1 (50 mM Tris-HCl pH 8.0, 50 mM glycine, 500 mM NaCl, 5% glycerol, 500 mM imidazole) before centrifuging for 1000 x g for 1 minute and then collecting the supernatant. This elution step was repeated to remove all Ni-NTA bound protein. Proteins were buffer exchanged into buffer A4 (20 mM HEPES pH 7.5, 150 mM NaCl) and concentrated using 10K Da molecular weight cut off centrifugal filter (Merck) and stored at -80°C.

CrDPAS, TiDPAS2, CrGS, CrSGD, CrPAS and TbADH large-scale purification
Cells were harvested by centrifugation at 3200 x g for 15 minutes and re-suspended in 50 mL buffer A1 (50 mM Tris-HCl pH 8, 50 mM glycine, 500 mM NaCl, 5% glycerol, 20 mM imidazole) with addition of EDTA-free protease inhibitor cocktail (Roche Diagnostics Ltd.) and 10 mg lysozyme (Sigma). Dithiothreitol (Sigma) (final concentration of 0.05 mM) was additionally added to all buffers in purification of CrDPAS, TiDPAS2 and CrGS for crystallisation. Cells were lysed at 4°C using a cell disruptor at 30 KPSI and centrifuged (35000 x g) to remove insoluble cell debris. The supernatant was collected and filtered with 0.2 μm PES syringe filter (Sartorious) and purified using an AKTA pure FPLC (Cytiva). Sample was applied at 2 mL/min onto a His-Trap HP 5mL column (Cytiva) and washed with 5 column volumes (CV) of buffer A1 before being eluted with 5 CV of buffer B1. Protein was detected and collected using the UV 280 nm signal and then further purified on a Superdex Hiload 16/60 S200 gel filtration column (Cytiva) at a flow rate of 1 mL/min using buffer A4. Proteins were finally buffer exchanged into buffer A4 and concentrated using 10K Da molecular weight cut off centrifugal filter (Merck) before being snap frozen in liquid nitrogen and stored at -80°C.
For crystallisation of CrDPAS,TiDPAS2 and CrGS, protein after gel filtration was incubated on a rocker overnight at 4°C with 3C protease to cleave the 6xHis-tag. Proteins were then passed through a 1mL HisTrap column (Cytiva) to remove the cleaved tag. Proteins were then buffer exchanged into buffer A4 (20 mM HEPES pH 7.5, 150 mM NaCl) containing 0.05 mM tris(2carboxyethyl)phosphine (Sigma) and concentrated using 10K Da molecular weight cut off centrifugal filter (Merck) and stored at -80°C.

Synthesis of NADPD
Deuterated pro-R-NADPD was produced in vitro as previously described [8] with minor modifications. A 20 mL reaction mixture containing 2 mM NADP + , 4 mM d8-isopropanol, 1 mM semicarbazide and 5 μM TbADH in 50 mM ammonium bicarbonate buffer at pH 7.5 was incubated at 30°C. The progression of the reaction was monitored spectrophotometrically at 340 nm. When no significant increase in absorbance was observed (approximately 3 hours), 300 μL of Ni-NTA agarose beads (Qiagen) was added and the sample incubated rocking at room temperature for 30 minutes. The reaction was centrifuged to remove the Ni-NTA beads bound to TbADH, and the supernatant was filtered through a 45 μm glass filter and lyophilized to remove the unreacted d8-isopropanol, the acetone that forms during the reaction and the buffer. The residue, containing primarily NADPD, was stored at -20°C until use.

In vitro enzyme assays
Enzymatic assays with precondylocarpine acetate were performed in 50 mM HEPES buffer (pH 7.5) with 50 μM precondylocarpine acetate in MeOH (not exceeding 5% of the reaction volume), 250 μM NADPH cofactor (Sigma) and 150 nM enzyme to a final reaction volume of 100 μL. Reactions were incubated for 30 minutes at 30°C and shaking at 60 r.p.m. before being quenched with 1 volume of 70% MeOH with 1% H2CO2. Enzymatic assays with strictosidine aglycone were performed in 50 mM HEPES buffer (pH 7.5), 100 μM strictosidine and 1 mM SGD to a final reaction volume of 100 μL. Assays were incubated for 30 minutes at 30°C and shaking at 60 r.p.m before 500 nM of ADH enzyme and 250 μM NADPH was added. As control, the reactions were performed without the addition of ADH enzyme. Reactions were incubated for a further 30 minutes at 30°C shaking at 60 r.p.m. before being quenched with 1 volume of 70% MeOH with 0.1% H2CO2. All enzymatic assays were centrifuged at 14000 x g for 15 minutes and the supernatant analysed by UPLC-MS.

UPLC-MS
All assays were analysed using a Thermo Scientific Vanquish UPLC coupled to a Thermo Q Exactive Plus orbitrap MS. For assays using precondylocarpine acetate, chromatographic separation was performed using a Phenomenex Kinetex C18 2.6 μm (2.1 x 100 mm) column using water with 1% H2CO2 as mobile phase A and acetonitrile with 1% H2CO2 as mobile phase B. Compounds were separated using a linear gradient of 10-30% B in 5 minutes followed by 1.5 minutes isocratic at 100% B. The column was then reequilibrated at 10% B for 1.5 minutes. The column was heated to 40°C and flow rate was set to 0.6 mL/min. For assays using strictosidine aglycone, separation was carried out using a Waters Acquity BEH C18 1.7 μm (2.1 x 50 mm) using 0.1% NH4OH in water as mobile phase A and acetonitrile as mobile phase B. Compounds were separated using a linear gradient of 10-90% B in 9 minutes followed by 2 minutes isocratic at 90% B. The column was re-equilibrated at 10% B for 3 minutes. The column was heated to 50°C and flow rate was set to 0.4 mL/min. MS detection was performed in positive ESI under the following conditions: spray voltage was set to 3.5 kV ~ 67.4 µA, capillary temperature set to 275°C, vaporizer temperature 475°C, sheath gas flow rate 65, sweep gas flow rate 3, aux gas flow rate 15, S-lens RF level to 55 V. Scan range was set to 200 -1000 m/z and resolution at 17500.

Production and isolation of d-angryline and d2-vincadifformine
d-angryline was produced enzymatically from stemmadenine acetate using the same protocol previously described for the synthesis of angryline but replacing NADPH with NADPD [6] . Briefly, 0.25 mg of stemmadenine acetate, 40 μM flavin adenine dinucleotide (FAD) and 5 μg of CrPAS were combined in a total volume of 500 μL in 50 mM TRIS-HCl buffer pH 8.5 and incubated at 37°C to form precondylocarpine acetate (reaction progress was monitored by LC-MS, m/z 395.19). After 2 hours, 1 mg of NADPD and 9 μg of CrDPAS were added to the reaction and incubated for 20 minutes at 37 °C to obtain d-angryline (m/z 338. 19). Multiple reactions were prepared to obtain sufficient product for NMR characterization. After completion, the reactions were snap frozen in liquid nitrogen and stored at -80 °C.
d2-vincadifformine was also produced enzymatically, but in this case NADPD was generated directly in the reaction mixture using an alcohol dehydrogenase from E. coli (Merck product 49854). Multiple 500 μL reactions were prepared to obtain sufficient product for NMR characterization. Each reaction contained 400 μM NADP + , 0.89 μg d8-isopropanol, 1 μg of TbADH, 10 μg stemmadenine acetate, 0.8 μM CrPAS and 0.8 μM TiDPAS1 in 50 mM HEPES buffer pH 7.5. The reactions were incubated at 30 °C for 1 hour, snap frozen in liquid nitrogen and stored at -80 °C until purification of the final product.
d-angryline and d2-vincadifformine were purified by semi-preparative HPLC on an Agilent 1260 Infinity II HPLC system. The reactions were thawed and 500 μL of 90:9:1 MeOH:H2O:H2CO2 was added to the deuterated samples. The samples were filtered through 0.2 μm PTFE disc filters (Sartorius) to remove the precipitated enzymes and injected onto a Phenomenex Kinetex XB-C18 5 μm (250 x 10 mm) column. Chromatographic separation was performed using 0.1% H2CO2 in water as mobile phase A and acetonitrile as mobile phase B. A linear gradient from 10% B to 40% B in 15 minutes was used for chromatographic separation of the compounds followed by a wash at 40% B for 5 minutes and a re-equilibration step to 10% B for 5 minutes. Flow rate was 6 mL/min. Elution of dangryline and d2-vincadifformine was monitored at two wavelengths, 330 and 254 nm. Fractions containing the compounds of interest were collected, dried under reduced pressure and stored at -80 °C until further analysis.

Production and isolation of 19,20-dihydrovallesiachotamine
19,20-dihydrovallesiachotamine was produced enzymatically from 100 μM strictosidine reacted with 100 μM CrSGD in 50 mM HEPEs buffer pH 7.5 in a 100 mL reaction at 30°C. After 90 minutes, 500 nM of CrDPAS and 250 μM NADPH was added and the reaction monitored. After 2 hours a further 500 nM CrDPAS was added to a final concentration of 1 μM and left for a further 3 hours until the reaction reached completion. The sample was snap frozen in liquid nitrogen and stored at -80 °C. For purification, the sample was thawed on ice and filtered through a 0.2 μm PTFE disc filter (Sartorius) to remove the precipitated enzymes and then passed through a Supelco DSC-18 column (MilliporeSigma) and eluted with methanol. Eluent was dried down in a rotovap and resuspended in 1.5 mL methanol. The product was purified on an Agilent 1290 Infinity II semi-preparative HPLC system using a Waters XBridge BEH C18 5 μm (10 x 250mm) column and using 0.1% NH4OH in water as mobile phase A and acetonitrile as mobile phase B. Compounds were separated using a linear gradient of 10-65% B in 25 minutes followed by 10 minutes column re-equilibration at 10% B. Flow rate was set to 7mL/min. Compound was detected by measuring UV 290 nm and 254 nm signal. Fractions containing the compound of interest were collected and dried down using a rotovap and stored at -20 °C until NMR analysis.

ECD measurement
ECD spectra were measured at 25 °C on a JASCO J-810 spectropolarimeter (JASCO cooperation, Tokyo, Japan) using a 350 µL cell. Spectrometer control and data processing was accomplished using JASCO spectra manager II.

ECD spectral calculations for (-)-vincadifformine
Based on the structure determined from NMR analysis a molecular model was created in GaussView ver.6 (Semichem Inc., Shawnee, Kansas, USA) and optimized using the semi-empirical method PM6 in Gaussian (Gaussian Inc., Wallingford, Connecticut, USA). The resulting structure was used for conformer variation with the GMMX processor of the Gaussian program package. Resulting structures were DFT-optimized with Gaussian ver.16 (APFD/6-31G(d)). A cut-off level of 4 kcal/mol was used to select conformers which were subjected to another DFT optimization on a higher level (APFD/6-311G+(2d,p)). All structures up to a deviation of 2.5 kcal/mol from the lowest energy conformer were used to determine the ECD-frequencies in a TD-SCF calculation on the same level as the former DFT optimization. The ECD curve was calculated from the Boltzmann-weighed contributions of all conformers with a cut-off level of two percent.
X-ray data collection, processing and structure solution X-ray data sets for CrDPAS and TiDPAS2 structures were recorded on the 10SA (PX II) beamline at the Paul Scherrer Institute (Villigen, Switzerland) at wavelength of 1.0 .Å using a Dectris Eiger3 16M detector with the crystals maintained at 100K by a cryocooler. Diffraction data were integrated using XDS [12] and scaled and merged using AIMLESS [14] ; data collection statistics are summarized in Table S3-7. Structure's solution was automatically obtained by molecular replacement using the structure of tetrahydroalstonine synthase from C. roseus (PDB accession code 5FI3) as template with which CrDPAS and TiDPAS2 share 54% and 56% amino acid identity respectively. In all cases the map was of sufficient quality to enable 90% of the residues expected for a homodimer to be automatically fitted using Phenix autobuild [10] . The models were finalized by manual rebuilding in COOT [11] and refined using in Phenix refine.
X-ray data for CrGS was recorded at 100 K on beamline I03 at the Diamond Light Source (Oxfordshire, UK) using a Pilatus3 6M hybrid photon counting detector (Dectris), with crystals maintained at 100 K by a Cryojet cryocooler (Oxford Instruments). Diffraction data were integrated and scaled using XDS [12] via the XIA2 expert system [13] then merged using AIMLESS [14] A summary of the data processing is presented in Table S7. A template for molecular replacement was prepared with CHAINSAW [15] from the structure of tetrahydroalstonine synthase from C. roseus (PDB accession code 5FI3) with which CrGS shares 57% amino acid sequence identity. The structure was solved by molecular replacement using PHASER [16] , giving two copies of the subunit in the asymmetric unit, which formed the homodimeric assembly expected for this class of enzyme. After restrained refinement in REFMAC5 [17] at 2.0 Å resolution, the protein component of the model was completely rebuilt using BUCCANEER [18] . The model was finalized after several iterations of manual editing in COOT [11] and further refinement in REFMAC5 incorporating TLS restraints. The model statistics are reported in Table S8.

Docking simulations
Ligands were docked into the active site of TiDPAS and CrGS using AutoDock Vina on the Webina webserver using default parameters [19,20] . Coordinates of ligands were generated by PDBQTConvert. When assessing the results, we selected ligand orientations in which the 4-pro-R hydride of NADPH was in close proximity to the carbon being reduced; this orientation was not always the lowest possible energy solution. Results were visualised using PyMOL. Cavity pocket size estimation was computed using CASTp3.0 using default parameters [21] . Results were visualised using Chimera.

Phylogenetic analysis
Nucleic acid sequences of ADH genes were aligned using MUSCLE5 [22] . A maximum likelihood phylogenetic tree was constructed using IQTree using a best-fit substitution model followed by tree reconstruction using 1000 bootstrap alignments and the remaining parameters used default settings [23] . Tree visualisation and figures were made using iTOL version 6.5.2 [24] . Figure S1. MUSCLE amino acid sequence alignment of ADHs highlighting key residues. Catalytic zinc coordinating residues are labelled in green, structural zinc coordinating residues in blue and proton relay residues in orange. Protein names and uniprot accessions: Equus caballus alcohol dehydrogenase (EcADH1) P00327; Saccharomyces cerevisiae alcohol dehydrogenase 1 (ScADH1), P00330; Arabidopsis thaliana cinnamyl alcohol dehydrogenase 5 (AtCAD5), O49482; Catharanthus roseus 8hydroxygeraniol dehydrogenase (Cr8HGO), Q6V4H0; Catharanthus roseus geissoschizine synthase (CrGS), W8JWW7; Catharanthus roseus tetrahydroalstonine synthase (CrTHAS), A0A0F6SD02; Rauwolfia tetraphylla vomilenine reductase 2 (RtVR2) A0A0U4BHM2, Rauwolfia serpentina vomilenine reductase 2 (RsVR2), A0A0U3S9Q3; Tabernanthe iboga dihydroprecondylocarpine acetate synthase 1 (TiDPAS1), A0A5B8XAH0; Catharanthus roseus dihydroprecondylocarpine acetate synthase (CrDPAS), A0A1B1FHP3; Tabernanthe iboga dihydroprecondylocarpine acetate synthase 2 (TiDPAS2), A0A5B8X8Z0.   Angryline was characterised by NMR in a previous study [6] .  . NMR data of (-)-vincadifformine in chloroform-d has been previously reported [25,26] .                      [27] ; B. Chinchona pubescens dihydrocorynantheine aldehyde synthase (DCS) [28] ; C. Catharanthus roseus tetrahydroalstonine synthase (THAS) [4] ; D. Catharanthus roseus tabersonine-3-reductase (T3R) [29] ; E. Catharanthus roseus geissoschizine synthase (GS) [5] ; F. Catharanthus roseus heteroyohimbine synthase (HYS) [30] ; G. Catharanthus roseus and Tabernanthe iboga dihydroprecondylocarpine acetate synthase (DPAS), this study; H. Rauwolfia serpentina vomilenine reductase 2 (VR2) [31] ; I. Catharanthus roseus and Tabernanthe iboga dihydroprecondylocarpine acetate synthase (DPAS) [1,2] . Tables   Table S1. Primer sequences used in this study. Cloning overhangs are underlined. Mutated codons are in bold.   c CC½ is the correlation coefficient between symmetry equivalent intensities from random halves of the dataset. d The data set was split into "working" and "free" sets consisting of 95 and 5% of the data respectively. The free set was not used for refinement. e The R-factors Rwork and Rfree are calculated as follows: R = (| Fobs -Fcalc |)/| Fobs |, where Fobs and Fcalc are the observed and calculated structure factor amplitudes, respectively. f As calculated using MolProbity [32] . c CC½ is the correlation coefficient between symmetry equivalent intensities from random halves of the dataset. d The data set was split into "working" and "free" sets consisting of 95 and 5% of the data respectively. The free set was not used for refinement. e The R-factors Rwork and Rfree are calculated as follows: R = (| Fobs -Fcalc |)/| Fobs |, where Fobs and Fcalc are the observed and calculated structure factor amplitudes, respectively. f As calculated using MolProbity [32] . c CC½ is the correlation coefficient between symmetry equivalent intensities from random halves of the dataset. d The data set was split into "working" and "free" sets consisting of 95 and 5% of the data respectively. The free set was not used for refinement. e The R-factors Rwork and Rfree are calculated as follows: R = (| Fobs -Fcalc |)/| Fobs |, where Fobs and Fcalc are the observed and calculated structure factor amplitudes, respectively. f As calculated using MolProbity [32] . Statistics for the highest-resolution shell are shown in parentheses. a Rmerge = ∑hkl ∑i |Ii(hkl)  I(hkl)|/ ∑hkl ∑iIi(hkl). b Rmeas = ∑hkl [N/(N  1)] 1/2 × ∑i |Ii(hkl)  I(hkl)|/ ∑hkl ∑iIi(hkl), where Ii(hkl) is the ith observation of reflection hkl, I(hkl) is the weighted average intensity for all observations i of reflection hkl and N is the number of observations of reflection hkl. c CC½ is the correlation coefficient between symmetry equivalent intensities from random halves of the dataset. d The data set was split into "working" and "free" sets consisting of 95 and 5% of the data respectively. The free set was not used for refinement. e The R-factors Rwork and Rfree are calculated as follows: R = (| Fobs -Fcalc |)/| Fobs |, where Fobs and Fcalc are the observed and calculated structure factor amplitudes, respectively. f As calculated using MolProbity [32] . Values in parentheses are for the outer resolution shell. a Rmerge = ∑hkl ∑i |Ii(hkl)  I(hkl)|/ ∑hkl ∑iIi(hkl). b Rmeas = ∑hkl [N/(N  1)] 1/2 × ∑i |Ii(hkl)  I(hkl)|/ ∑hkl ∑iIi(hkl), where Ii(hkl) is the ith observation of reflection hkl, I(hkl) is the weighted average intensity for all observations i of reflection hkl and N is the number of observations of reflection hkl. c CC½ is the correlation coefficient between symmetry equivalent intensities from random halves of the dataset. d The data set was split into "working" and "free" sets consisting of 95 and 5% of the data respectively. The free set was not used for refinement. e The R-factors Rwork and Rfree are calculated as follows: R = (| Fobs -Fcalc |)/| Fobs |, where Fobs and Fcalc are the observed and calculated structure factor amplitudes, respectively. f As calculated using MolProbity [32] . Table S8. Genbank accession for sequences used to construct tree of maximum likelihood in Figure 4A.