Supporting Information - PNAS Information Méheust et al. 10.1073/pnas.1517551113 S-Gene Expression...

5
Supporting Information Méheust et al. 10.1073/pnas.1517551113 S-Gene Expression Analysis Given that the analysis using dozens of algal and protist genomic datasets showed that homologs of all of the S genes are expressed, we asked whether some of them might be differentially expressed in response to stress. This question was motivated by two ob- servations: (i ) The composite genes entered eukaryote nuclear genomes via primary endosymbiosis, and therefore they may still retain ancient cyanobacterial functions even when fused to novel domains, and (ii ) many of the S genes are redox enzymes or encode domains involved in redox regulation, and therefore their roles may involve sensing and/or responding to cellular stress resulting from the oxygen-evolving photosynthetic organelle. To address this issue, we inspected RNA-seq data from organisms that encoded particular S genes that had either been generated under conditions of cellular stress or spanned the lightdark transition. For genes present in Chlamydomonas reinhardtii, we used data from ref. 53 that compared triplicate transcript data from this alga grown in standard Tris-acetate-phosphate (TAP) medium (54) or TAP with the addition of 200 mM NaCl. The composite gene families 10, 24, and 18 showed significant dif- ferential expression (DE) under salt stress [up-regulation in both cases (P = 4.14e-4, P = 0.0268, and P = 0.0077, respectively)]. These genes encode bacterialcyanobacterial domain fusions that are involved in stress responses (i.e., rhodanese domain in 10) and in preventing protein misfolding (i.e., DnaJ/Hsp40 do- main at the N terminus of 24). Interestingly, the DnaJ domain is fused to an upstream SCP superfamily region that likely forms an extracellular domain. This novel protein is found only in green algae and plants, and may play a role in responding to salt stress. For S genes present in the diatom Phaeodactylum tricornutum, we used RNA-seq data from ref. 30 that compared cultures grown under control and nitrogen (N)-depleted conditions. This analysis showed that out of six families present in this alga (2, 1, 28, 20, 35, 61), three are differentially expressed under N stress (28, 35, 61). One of these is family 28, which encodes an N-terminal bacterium-derived, calcium-sensing EF-hand domain fused, intriguingly, to a cyanobacterium-derived region with sim- ilarity to the plastid inner-membrane proten import component Tic20, which acts as a translocon channel. This fused protein may use Ca 2+ as a signal for protein import, and is found in several stramenopile (brown algal) species. The homolog in P. tricornutum (NCBI gi:219117465) is significantly down-regulated (P = 2.57e-23) under N depletion. Finally, for S genes present in Arabidopsis thaliana, we used RNA-seq data reflecting light-dependent DE in seedlings, coty- ledons, and roots (14). This analysis showed that four distinct S-gene families have DE in plant tissues in the presence of light (14, 19, 1, 18). One of the gene families showing DE is family 1, which is broadly distributed in algae and plants and is composed of a cyanobacterium-derived N-terminal hydrolase domain fused to a non-cyanobacterium-derived LPLAT family (lysophospho- lipid acyltransferases) domain. This gene was significantly up- regulated in all three plant tissues in the presence of light. Méheust et al. www.pnas.org/cgi/content/short/1517551113 1 of 5

Transcript of Supporting Information - PNAS Information Méheust et al. 10.1073/pnas.1517551113 S-Gene Expression...

Page 1: Supporting Information - PNAS Information Méheust et al. 10.1073/pnas.1517551113 S-Gene Expression Analysis Given that the analysis using dozens of algal and protist genomic ...

Supporting InformationMéheust et al. 10.1073/pnas.1517551113S-Gene Expression AnalysisGiven that the analysis using dozens of algal and protist genomicdatasets showed that homologs of all of the S genes are expressed,we asked whether some of them might be differentially expressedin response to stress. This question was motivated by two ob-servations: (i) The composite genes entered eukaryote nucleargenomes via primary endosymbiosis, and therefore they may stillretain ancient cyanobacterial functions even when fused to noveldomains, and (ii) many of the S genes are redox enzymes orencode domains involved in redox regulation, and therefore theirroles may involve sensing and/or responding to cellular stressresulting from the oxygen-evolving photosynthetic organelle. Toaddress this issue, we inspected RNA-seq data from organismsthat encoded particular S genes that had either been generatedunder conditions of cellular stress or spanned the light–darktransition. For genes present in Chlamydomonas reinhardtii, weused data from ref. 53 that compared triplicate transcript datafrom this alga grown in standard Tris-acetate-phosphate (TAP)medium (54) or TAP with the addition of 200 mM NaCl. Thecomposite gene families 10, 24, and 18 showed significant dif-ferential expression (DE) under salt stress [up-regulation in bothcases (P = 4.14e-4, P = 0.0268, and P = 0.0077, respectively)].These genes encode bacterial–cyanobacterial domain fusionsthat are involved in stress responses (i.e., rhodanese domain in10) and in preventing protein misfolding (i.e., DnaJ/Hsp40 do-main at the N terminus of 24). Interestingly, the DnaJ domain isfused to an upstream SCP superfamily region that likely forms an

extracellular domain. This novel protein is found only in greenalgae and plants, and may play a role in responding to salt stress.For S genes present in the diatom Phaeodactylum tricornutum,

we used RNA-seq data from ref. 30 that compared culturesgrown under control and nitrogen (N)-depleted conditions. Thisanalysis showed that out of six families present in this alga (2, 1,28, 20, 35, 61), three are differentially expressed under N stress(28, 35, 61). One of these is family 28, which encodes anN-terminal bacterium-derived, calcium-sensing EF-hand domainfused, intriguingly, to a cyanobacterium-derived region with sim-ilarity to the plastid inner-membrane proten import componentTic20, which acts as a translocon channel. This fused protein mayuse Ca2+ as a signal for protein import, and is found in severalstramenopile (brown algal) species. The homolog in P. tricornutum(NCBI gi:219117465) is significantly down-regulated (P = 2.57e-23)under N depletion.Finally, for S genes present in Arabidopsis thaliana, we used

RNA-seq data reflecting light-dependent DE in seedlings, coty-ledons, and roots (14). This analysis showed that four distinctS-gene families have DE in plant tissues in the presence of light(14, 19, 1, 18). One of the gene families showing DE is family 1,which is broadly distributed in algae and plants and is composedof a cyanobacterium-derived N-terminal hydrolase domain fusedto a non-cyanobacterium-derived LPLAT family (lysophospho-lipid acyltransferases) domain. This gene was significantly up-regulated in all three plant tissues in the presence of light.

Méheust et al. www.pnas.org/cgi/content/short/1517551113 1 of 5

Page 2: Supporting Information - PNAS Information Méheust et al. 10.1073/pnas.1517551113 S-Gene Expression Analysis Given that the analysis using dozens of algal and protist genomic ...

Glo

eoch

aete

.witr

ocki

ana.

SAG

46_8

4C

para

doxa

Cya

nidi

osch

yzon

.mer

olae

.stra

in.1

0DPo

rphy

ridiu

m.a

erug

ineu

m.S

AG.1

380.

2C

mer

olae

Rho

della

.mac

ulat

a.C

CM

P73

6ca

rragh

een

Gsu

lphu

raria

Ery

thro

lobu

s.au

stra

licus

.CC

MP

3124

Rho

doso

rus.

mar

inus

.UTE

X.L

B.2

760

Ppu

rpur

eum

Mad

agas

caria

.ery

thro

clad

iode

s.C

CM

P32

34E

ryth

rolo

bus.

mad

agas

care

nsis

.CC

MP

3276

Tim

spur

ckia

.olig

opyr

enoi

des.

CC

MP

3278

Com

psop

ogon

.coe

rule

us.S

AG.3

6.94

Gal

dier

ia.s

ulph

urar

iaR

hodo

soru

s.m

arin

us.C

CM

P76

9O

stre

ococ

cus.

taur

iV

carte

riP

icoc

ystis

.sal

inar

um.C

CM

P18

97M

icro

mon

as.s

p..R

CC

299

Mpu

silla

Ota

uri

Chl

orel

la.v

aria

bilis

Cre

inha

rdtii

Auxe

noch

lore

lla.p

roto

thec

oide

sC

sube

llips

oide

aho

llow

.gre

en.s

eaw

eed

Mic

rom

onas

.sp.

NE

PC

C29

Mic

rom

onas

.sp.

CC

MP

2099

Cva

riabi

lisD

unal

iella

.terti

olec

ta.C

CM

P13

20P

yram

imon

as.p

arke

ae.C

CM

P72

6O

stre

ococ

cus.

luci

mar

inus

.CC

E99

01C

occo

myx

a.su

belli

psoi

dea.

C.1

69M

icro

mon

as.s

p.R

CC

472

Tetra

selm

is.s

triat

a.LA

NL1

001

Chl

amyd

omon

as.re

inha

rdtii

Bat

hyco

ccus

.pra

sino

ssh

ephe

rd.s

.pur

sePe

dicu

laris

.ker

neri

Man

oao.

cole

nsoi

Zam

ia.k

ickx

iilo

ng.g

rain

ed.ri

ceC

ycas

.med

ia.s

ubsp

..med

iaD

acry

dium

.nau

sorie

nse

gray

.rock

cres

sZa

mia

.inte

grifo

liach

ickp

eaC

astil

leja

.rubi

cund

ula

Trip

hysa

ria.p

usill

aG

ossy

pium

.arb

oreu

mP

hero

spha

era.

fitzg

eral

dii

Cer

atoz

amia

.mira

ndae

Enc

epha

larto

s.hi

ldeb

rand

tiiC

horis

pora

.tene

llaE

utre

ma.

halo

philu

mPo

doca

rpus

.spa

thoi

des

Eut

rem

a.sa

lsug

ineu

mD

onto

stem

on.s

enili

sD

acry

dium

.gui

llaum

inii

Podo

carp

us.s

alig

nus

Enc

epha

larto

s.co

ncin

nus

Enc

epha

larto

s.ca

ffer

Enc

epha

larto

s.sc

lavo

iE

ncep

hala

rtos.

aem

ulan

sG

lyci

ne.s

oja

Pedi

cula

ris.k

ansu

ensi

sE

ncep

hala

rtos.

cyca

difo

lius

Nic

otia

na.to

men

tosi

form

isE

ncep

hala

rtos.

dyer

ianu

sS

elag

inel

la.m

oelle

ndor

ffii

Enc

epha

larto

s.gr

atus

Bos

chni

akia

.him

alai

cam

uskm

elon

Pie

ris.n

ana

Ste

rigm

oste

mum

.vio

lace

umA

ltens

tein

.s.b

read

.tree

Tozz

ia.a

lpin

aC

ordy

lant

hus.

ram

osus

Enc

epha

larto

s.w

oodi

iZa

mia

.pum

ilaPo

doca

rpus

.luci

enii

Cyc

as.li

tora

lisZa

mia

.hym

enop

hylli

dia

Eup

hras

ia.c

ollin

aO

roba

nche

.cal

iforn

ica

Nic

otia

na.b

enth

amia

naP

hysc

omitr

ella

.pat

ens

Enc

epha

larto

s.tri

spin

osus

com

mon

.sun

flow

erw

ood.

toba

cco

Eup

hras

ia.a

lsa

swee

t.ora

nge

Japa

nese

.rice

Enc

epha

larto

s.w

hite

lock

iiPo

pulu

s.eu

phra

tica

Mac

roza

mia

.fras

eri

Enc

epha

larto

s.ap

lana

tus

Eru

ca.v

esic

aria

Japa

nese

.apr

icot

mal

o.si

naPe

dicu

laris

.folio

saB

osch

niak

ia.ro

ssic

aC

ycas

.cha

mbe

rlain

iiE

ncep

hala

rtos.

prin

ceps

Cas

tille

ja.e

xser

taA

froca

rpus

.man

nii

Sac

char

um.h

ybrid

.cul

tivar

.R57

0Po

doca

rpus

.aris

tula

tus

Sol

anum

.dem

issu

mS

trobi

lant

hes.

atte

nuat

aA

frica

n.oi

l.pal

mA

ptos

imum

.pum

ilum

Enc

epha

larto

s.nu

bim

onta

nus

Pedi

cula

ris.tu

bero

saca

stor

.bea

nN

icot

iana

.atte

nuat

aC

itrus

.cle

men

tina

Mur

icar

ia.p

rost

rata

Frag

aria

.ves

ca.s

ubsp

..ves

casu

gar.b

eet

Podo

carp

us.g

uate

mal

ensi

sE

ucal

yptu

s.gr

andi

sE

ncep

hala

rtos.

kisa

mbo

Bra

ssic

a.ju

ncea

win

e.gr

ape

Zam

ia.p

ygm

aea

Zam

ia.a

cum

inat

aC

astil

leja

.tenu

isPo

doca

rpus

.cun

ning

ham

iiR

obes

chia

.sch

impe

riiJa

troph

a.cu

rcas

barr

el.m

edic

Cer

atoz

amia

.mor

ettii

Enc

epha

larto

s.cu

pidu

sfo

xtai

l.mill

etO

roba

nche

.ludo

vici

ana

Mac

roza

mia

.ste

nom

era

Nel

umbo

.nuc

ifera

Enc

epha

larto

s.do

lom

iticu

sbu

gle

appl

ePo

doca

rpus

.law

renc

eiC

astil

leja

.min

iata

Cle

ome.

hass

leria

naso

ybea

nPo

doca

rpus

.tota

raZa

mia

.neu

roph

yllid

iaLe

com

tella

.mad

agas

carie

nsis

spot

ted.

mon

key.

flow

erLe

pido

zam

ia.p

erof

fsky

ana

Enc

epha

larto

s.le

hman

nii

Enc

epha

larto

s.itu

riens

isP

hase

olus

.vul

garis

Pyr

us.x

.bre

tsch

neid

eri

Cer

atoz

amia

.mix

eoru

mE

ncep

hala

rtos.

long

ifoliu

sC

ycas

.xip

hole

pis

Enc

epha

larto

s.se

ntic

osus

Enc

epha

larto

s.tra

nsve

nosu

sfie

ld.m

usta

rdC

ycas

.mac

onoc

hiei

Enc

epha

larto

s.pt

erog

onus

Triti

cum

.ura

rtuE

ncep

hala

rtos.

euge

ne.m

arai

sii

Afro

carp

us.fa

lcat

usG

lech

oma.

hede

race

aC

ycas

.cal

cico

laE

ncep

hala

rtos.

mun

chii

Ret

roph

yllu

m.c

ompt

onii

Rhe

um.a

ustra

leM

orus

.not

abili

sZa

mia

.por

toric

ensi

sM

adag

asca

r.per

iwin

kle

Ory

za.o

ffici

nalis

Enc

epha

larto

s.la

natu

sE

ncep

hala

rtos.

lebo

mbo

ensi

sC

apse

lla.ru

bella

Ara

bido

psis

.lyra

ta.s

ubsp

..lyr

ata

sesa

me

wild

.Mal

aysi

an.b

anan

aC

erat

ozam

ia.m

ique

liana

Enc

epha

larto

s.m

idde

lbur

gens

isPe

nste

mon

.cob

aea

Dac

rydi

um.x

anth

andr

umLa

miu

m.p

urpu

reum

sorg

hum

Est

erha

zya.

cam

pest

risdo

mes

ticat

ed.b

arle

yth

ale.

cres

sE

ncep

hala

rtos.

inop

inus

Ret

roph

yllu

m.m

inus

Che

lone

.obl

iqua

Hirs

chfe

ldia

.inca

naA

froca

rpus

.gra

cilio

rC

amel

ina.

sativ

aAu

reol

aria

.ped

icul

aria

Enc

epha

larto

s.fri

deric

i.gui

lielm

iO

roba

nche

.pin

orum

caca

ora

pePo

doca

rpus

.cor

iace

usA

egilo

ps.ta

usch

iiE

ncep

hala

rtos.

ngoy

anus

Sitk

a.sp

ruce

Orth

ocar

pus.

brac

teos

usE

ncep

hala

rtos.

cerin

usM

acro

zam

ia.p

lurin

ervi

aC

ycas

.thou

arsi

iD

acry

dium

.bal

ansa

ePo

lygo

natu

m.p

ubes

cens

Gen

lisea

.aur

eaE

ncep

hala

rtos.

ghel

linck

iiZe

a.m

ays

Pht

heiro

sper

mum

.japo

nicu

mPo

pulu

s.tre

mul

abl

ack.

cotto

nwoo

dC

astil

leja

.fiss

ifolia

Enc

epha

larto

s.ar

enar

ius

Pedi

cula

ris.ju

lica

Bra

chyp

odiu

m.d

ista

chyo

nC

astil

leja

.cris

ta.g

alli

date

.pal

mPo

doca

rpus

.acu

tifol

ius

cucu

mbe

rS

chiz

opet

alon

.wal

keri

tom

ato

peac

hpo

tato

Cer

atoz

amia

.hua

stec

orum

Dac

rydi

um.e

latu

mE

ncep

hala

rtos.

turn

eri

Cof

fea.

cane

phor

aA

mbo

rella

.tric

hopo

daA

froca

rpus

.usa

mba

rens

isN

agei

a.fo

rmos

ensi

sC

ucum

is.m

elo.

subs

p..m

elo

Rho

dom

onas

.sp.

CC

MP

768

Gui

llard

ia.th

eta.

CC

MP

2712

Gon

iom

onas

.pac

ifica

.CC

MP

1869

Gth

eta

Pry

mne

sium

.par

vum

.Tex

oma1

Em

ilian

ia.h

uxle

yi.C

CM

P15

16E

huxl

eyi

Ple

uroc

hrys

is.c

arte

rae.

CC

MP

645

Em

ilian

ia.h

uxle

yi.3

74Is

ochr

ysis

.gal

bana

.CC

MP

1323

Em

ilian

ia.h

uxle

yi.C

CM

P37

0Is

ochr

ysis

.sp.

CC

MP

1244

Chr

ysoc

hrom

ulin

a.po

lyle

pis.

CC

MP

1757

Em

ilian

ia.h

uxle

yi.3

79Pa

vlov

a.sp

.CC

MP

459

Isoc

hrys

is.s

p.C

CM

P13

24E

mili

ania

.hux

leyi

.PLY

M21

9G

ephy

roca

psa.

ocea

nica

.RC

C13

03B

nata

nsLo

thar

ella

.glo

bosa

.CC

CM

811

Aure

oum

bra.

lagu

nens

is.C

CM

P15

10A

anop

hage

ffere

nsE

ctoc

arpu

s.si

licul

osus

Cha

ttone

lla.s

ubsa

lsa.

CC

MP

2191

Och

rom

onas

.sp.

CC

MP

1393

Pse

udop

edin

ella

.ela

stic

a.C

CM

P71

6H

eter

osig

ma.

akas

hiw

o.C

CM

P23

93P

terid

omon

as.d

anic

a.P

TAu

reoc

occu

s.an

opha

geffe

rens

Nan

noch

loro

psis

.gad

itana

Het

eros

igm

a.ak

ashi

wo.

CC

MP

3107

Din

obry

on.s

p.U

TEX

LB22

67N

gadi

tana

Het

eros

igm

a.ak

ashi

wo.

CC

MP

452

Vauc

heria

.lito

rea.

CC

MP

2940

Pela

goco

ccus

.sub

virid

is.C

CM

P14

29Pe

lago

mon

as.c

alce

olat

a.C

CM

P17

56H

eter

osig

ma.

akas

hiw

o.N

BAu

reoc

occu

s.an

opha

geffe

rens

.CC

MP

1850

Tpse

udon

ana

Frag

ilario

psis

.ker

guel

ensi

s.L2

_C3

Thal

assi

osira

.wei

ssflo

gii.C

CM

P10

10P

seud

o_ni

tzsc

hia.

fradu

lent

a.W

WA

7Th

alas

sios

ira.o

cean

ica

Thal

assi

osira

.wei

ssflo

gii.C

CM

P13

36A

mph

ipro

ra.s

pE

xtub

ocel

lulu

s.sp

inife

r.CC

MP

396

Thal

assi

osira

.ant

arct

ica.

CC

MP

982

Thal

assi

othr

ix.a

ntar

ctic

a.L6

_D1

Cha

etoc

eros

.affi

nis.

CC

MP

159

Dity

lum

.brig

htw

ellii

.GS

O10

3C

haet

ocer

os.d

ebili

s.M

M31

A_1

Thal

assi

osira

.rotu

la.G

SO

102

Cha

etoc

eros

.cur

vise

tus

Dity

lum

.brig

htw

ellii

.GS

O10

5D

itylu

m.b

right

wel

lii.G

SO

104

Thal

assi

osira

.rotu

la.C

CM

P30

96S

kele

tone

ma.

dohr

nii.S

kelB

Thal

assi

osira

.min

iscu

la.C

CM

P10

93S

kele

tone

ma.

mar

inoi

.Ske

lAS

kele

tone

ma.

men

zelii

.CC

MP

793

Nitz

schi

a.pu

ncta

ta.C

CM

P56

1P

haeo

dact

ylum

.tric

ornu

tum

.CC

AP

.105

5.1

Am

phor

a.co

ffeae

form

is.C

CM

P12

7P

robo

scia

.ala

ta.P

I_D

3C

haet

ocer

os.n

eogr

acile

.CC

MP

1317

Ast

erio

nello

psis

.gla

cial

is.C

CM

P13

4Th

alas

sios

ira.o

cean

ica.

CC

MP

1005

Ptri

corn

utum

Thal

assi

onem

a.ni

tzsc

hioi

des.

L26_

BP

mul

tiser

ies

Cor

ethr

on.p

enna

tum

.L29

A3

Pse

udo_

nitz

schi

a.au

stra

lis.1

0249

_10_

AB

Thal

assi

osira

.gra

vida

.GM

p14c

1Fr

agila

riops

is.k

ergu

elen

sis.

L26_

C5

Scr

ipps

iella

.troc

hoid

ea.C

CM

P30

99D

urin

skia

.bal

tica.

CS

IRO

_CS

.38

Am

phid

iniu

m.c

arte

rae.

CC

MP

1314

Kar

enia

.bre

vis.

CC

MP

2229

Cer

atiu

m.fu

sus.

PA16

1109

Het

eroc

apsa

.triq

uetra

Sym

biod

iniu

m.s

p.C

1A

lexa

ndriu

m.m

onila

tum

.CC

MP

3105

Perid

iniu

m.a

cicu

lifer

um.P

AE

R_2

Pro

roce

ntru

m.m

inim

um.C

CM

P13

29S

crip

psie

lla.h

ango

ei.S

HTV

5K

rypt

oper

idin

ium

.folia

ceum

.CC

MP

1326

Gle

nodi

nium

.folia

ceum

.CC

AP

1116

_3S

ymbi

odin

ium

.sp.

CC

MP

2430

Cry

pthe

codi

nium

.coh

nii.S

elig

oA

lexa

ndriu

m.te

mar

ense

.CC

MP

1771

Pro

roce

ntru

m.m

inim

um.C

CM

P22

33K

aren

ia.b

revi

s.W

ilson

Oxy

rrhi

s.m

arin

aA

zadi

nium

.spi

nosu

m.3

D9

Sym

biod

iniu

m.s

p.M

pP

roro

cent

rum

.min

imum

Ling

ulod

iniu

m.p

olye

dra.

CC

MP

1738

Oxy

rrhi

s.m

arin

a.LB

1974

Sm

inut

umS

ymbi

odin

ium

.sp.

C15

Scr

ipps

iella

.han

goei

_lik

e.S

HH

I_4

Kar

lodi

nium

.mic

rum

.CC

MP

2283

Kar

enia

.bre

vis.

SP

1K

aren

ia.b

revi

s.S

P3

Eut

rept

iella

.gym

nast

ica_

like.

CC

MP

1594

Und

escr

ibed

.sp.

CC

MP

2098

Und

escr

ibed

.sp.

CC

MP

2436

Und

escr

ibed

.und

escr

ibed

.CC

MP

2298

Und

escr

ibed

.sp.

CC

MP

2293

Und

escr

ibed

.sp.

CC

MP

2097

67666564636261605958575655545352515049484746454443424140393837363534333231302928272625242322212019181716151413121110987654321

Transcriptomes + Genome dataset + NCBI RefSeq (349 samples)

S−g

enes

fam

ilies

Taxonomic distribution of the 67 S-gene families

Rhodophyta

Cercozoa

Unknow

nC

ryptophytaC

hlorophytaO

chrophytaD

inophytaE

uglenozoaG

laucophytaB

acillariophytaH

aptophytaS

treptophyta

Fig. S1. Taxonomic distribution of the 67 S genes discovered in our study. The taxonomic distribution of the data is shown with black boxes indicating a presence and white boxes indicating presumed absence in the genome or transcriptome data from the taxon.

Méheust et al. www.pnas.org/cgi/content/short/1517551113 2 of 5

Page 3: Supporting Information - PNAS Information Méheust et al. 10.1073/pnas.1517551113 S-Gene Expression Analysis Given that the analysis using dozens of algal and protist genomic ...

A

B

C

Fig. S2. Mapping data for S genes in Picochlorum and A. thaliana. (A) S-gene family 31, which encodes RIBR + DUF1768 (PyrR) and is involved in riboflavinbiosynthesis. The CDS derives from the MMETSP database and encodes this 1,848-nt S gene from Picochlorum oklahomensis CCMP2329. The 6,609 RNA-seqreads mapped to it are from the closely related species Picochlorum SE3 (55). The homologous SE3 gene is encoded on Picochlorum contig 185.g609.t1.(B) S-gene family 23, which encodes a TPR repeat/RING and an ATP-dependent protease domain. This CDS also derives from the MMETSP database, andencodes the 1,554-nt S gene from P. oklahomensis CCMP2329. The 9,253 RNA-seq reads mapped to it are from Picochlorum SE3. The homologous SE3 gene isencoded on Picochlorum contig 43.g1 98.t1. (C) S-gene family 14, which encodes GIY–YIF superfamily and thioredoxin superfamily domains. This genic regionis from A. thaliana (ArGrxS16), and the 16,906 unique reads mapped to the exons are also from this species [see Table S2 for Sequence Read Archive (SRA)run accession numbers]. Thin blue lines indicate a spliceosomal intron. The mapping in all cases is colored in green, red, and blue for forward, reverse, andpaired-end reads, respectively. The fused domains and their putative annotations are shown. These data are typical for all of the Picochlorum and Arabidopsismappings and for many of the other plant and algal S genes when sufficient RNA-seq data are available (Table S2). This unambiguous, “deep” mapping acrossthe region that spans the domain fusion argues strongly against misassembly of this (and other) S genes.

Méheust et al. www.pnas.org/cgi/content/short/1517551113 3 of 5

Page 4: Supporting Information - PNAS Information Méheust et al. 10.1073/pnas.1517551113 S-Gene Expression Analysis Given that the analysis using dozens of algal and protist genomic ...

Fig. S3. Genomic PCRs that targeted S genes identified in Picochlorum species. The contig number in the Picochlorum SE3 assembly is given for each S gene, asis the S-gene family number (see Table S2 for details). The PCR primers were complementary to regions at the 5′ and 3′ termini of the S genes to span thedomain-fusion region. The sizes of these S-gene CDS fragments matched the fragment sizes resulting from PCR amplification as follows: S gene 11 (1,015 nt), Sgene 23 (1,419 nt), S gene 4 (1,214 nt), S gene 34 (924 nt), and S gene 12 (1,119 nt). Sanger sequencing of these PCR fragments showed identity to the genomicregion in Picochlorum SE3, and BLASTX analysis using the fragments showed that each spanned the domain-fusion region in the respective S gene. The matchof the CDS size to the genomic region is explained by the paucity of spliceosomal introns in Picochlorum SE3. These data demonstrate that the tested S genesexist as intact fragments in this green alga.

Fig. S4. Maximum-likelihood (RAxML) (57) tree of species encoding S-gene family 31. This composite gene is limited to the Viridiplantae (Fig. S1) and encodesfused RIBR + DUF1768 (PyrR) domains that are involved in riboflavin biosynthesis. This manually trimmed alignment includes a selection of taxonomicallydiverse green lineage species and is of length 524 amino acids. The intact gene (see also mapping evidence in Fig. S2) was analyzed using the LG + Γ + Ievolutionary model with the results of 100 bootstrap replicates, when ≥50% are shown at the branches. The topology of this tree is consistent with theexpected phylogeny of Viridiplantae (e.g., 56), indicating an ancient origin of this S gene. The NCBI gi numbers are shown after each species name, whenavailable.

Méheust et al. www.pnas.org/cgi/content/short/1517551113 4 of 5