Microbial Community Composition and Diversity via 16S rRNA ...
文章推薦指數: 80 %
... sequencing by complementing both Bakt_341F (CCTACGGGNGGCWGCAG) and Bakt_805R (GACTACHVGGGTATCTAATCC) with sample-specific barcodes. BrowseSubjectAreas ? ClickthroughthePLOStaxonomytofindarticlesinyourfield. FormoreinformationaboutPLOSSubjectAreas,click here. Article Authors Metrics Comments MediaCoverage ReaderComments Figures Figures AbstractAsnewsequencingtechnologiesbecomecheaperandolderonesdisappear,laboratoriesswitchvendorsandplatforms.Validatingthenewsetupsisacrucialpartofconductingrigorousscientificresearch.Herewereportonthereliabilityandbiasesofperformingbacterial16SrRNAgeneampliconpaired-endsequencingontheMiSeqIlluminaplatform.Wedesignedaprotocolusing50barcodepairstorunsamplesinparallelandcodedapipelinetoprocessthedata.Sequencingthesamesedimentsamplein248replicatesaswellas70samplesfromalkalinesodalakes,weevaluatedtheperformanceofthemethodwithregardstoestimatesofalphaandbetadiversity.UsingdifferentpurificationandDNAquantificationprocedureswealwaysfoundupto5-folddifferencesintheyieldofsequencesbetweenindividuallybarcodessamples.Usingeitheraone-steporatwo-stepPCRpreparationresultedinsignificantlydifferentestimatesinbothalphaandbetadiversity.Comparingwithapreviousmethodbasedon454pyrosequencing,wefoundthatourIlluminaprotocolperformedinasimilarmanner–withtheexceptionforevennessestimateswherecorrespondencebetweenthemethodswaslow.Wefurtherquantifiedthedatalossateveryprocessingstepeventuallyaccumulatingto50%oftherawreads.WhenevaluatingdifferentOTUclusteringmethods,weobservedastarkcontrastbetweentheresultsofQIIMEwithdefaultsettingsandthemorerecentUPARSEalgorithmwhenitcomestothenumberofOTUsgenerated.Still,overalltrendsinalphaandbetadiversitycorrespondedhighlyusingbothclusteringmethods.Ourprocedureperformedwellconsideringtheprecisionsofalphaandbetadiversityestimates,withinsignificanteffectsofindividualbarcodes.Comparativeanalysessuggestthat454andIlluminasequencedatacanbecombinedifthesamePCRprotocolandbioinformaticworkflowsareusedfordescribingpatternsinrichness,beta-diversityandtaxonomiccomposition. Citation:SinclairL,OsmanOA,BertilssonS,EilerA(2015)MicrobialCommunityCompositionandDiversityvia16SrRNAGeneAmplicons:EvaluatingtheIlluminaPlatform.PLoSONE10(2): e0116955. https://doi.org/10.1371/journal.pone.0116955AcademicEditor:LudovicOrlando,NaturalHistoryMuseumofDenmark,UniversityofCopenhagen,DENMARKReceived:September8,2014;Accepted:December17,2014;Published:February3,2015Copyright:©2015Sinclairetal.ThisisanopenaccessarticledistributedunderthetermsoftheCreativeCommonsAttributionLicense,whichpermitsunrestricteduse,distribution,andreproductioninanymedium,providedtheoriginalauthorandsourcearecreditedDataAvailability:TherawsequencingdataforthesedimentsampleandthesodalakesareavailableatGenbankaccessionnumbersSRP044363andSRP044627,respectively.Funding:TheSwedishFoundationforStrategicResearchGrantNumber“ICA10-0015”(toAE).CarlTryggersFoundation(toOO).SwedishResearchCouncil(toSB).Thefundershadnoroleinstudydesign,datacollectionandanalysis,decisiontopublish,orpreparationofthemanuscript.Competinginterests:OneoftheauthorsisequallyaneditoratthePLOSONEJournal:SB. IntroductionAsthedroppingpriceofDNAsequencingpusheslaboratoriesandcorefacilitiestoswitchfromoneestablishedmethodtoanother,itiscriticaltocheckthevalidityandeffectsofthenewlyadoptedprocesses.Researchersareoftenforcedbythisfast-pacedevolutionoftechnologytousethelatestmachineswhileabandoningtheoldones,alltoooftenneglectingthatthecorrectsetupandvalidationofsuchmethodologyisacrucialstepinconductingrigorousscientificresearch.WewouldliketoshareourexperienceandthelessonslearnedfromusingtheIlluminaMiSeqplatformasoursolutionforperformingampliconsequencing,replacingtheestablished454pyrosequencingprotocolpreviouslyusedinourlaboratory. Ampliconsequencing,inparticularthatofthesmallsubunitrRNAgene(16SrRNAgeneinBacteriaandArchaeaor18SrRNAgeneinEukarya),isawidelyappliedapproachtostudythecomposition,organizationandspatiotemporalpatternsofmicrobialcommunities,duetoitsubiquityacrossalldomainsoflife[1].Inthelastdecades,16SrRNAgeneampliconswereanalyzedusingfingerprintingtechniquessuchasTRFLP[2]andARISA[3]incombinationwithclonelibraryconstructionandSangersequencing.However,thisoftenprovidedinsufficientcoveragetodescribeandcomparemicrobialcommunities[4].Now,high-throughputsequencing(HTS)technologyandtheapplicationofbarcodeindexingareallowingthecollectionofthousandsofsequencesfromalargenumberofsamplessimultaneously[5][6].Theseapproacheshaverevealeddeeperinsightsintothediversityofmicrobialcommunities[7][8]and,byincreasingsamplenumbers,haveexpandedthepossibilitiestostudycommunityandpopulationdynamicsovermuchfinertemporal[9]andspatialscales[8]. Still,thepresenceofsequencingerrorsandbasemiscallingtogetherwithPCRerrors,chimeraformationandpseudogenesintroducenoisethusbiasingestimatesofdiversityandtaxonabundances.Theseconcernshavebeenelaboratedingreatdetailpreviously[10][11][12][13][14][15][16],andtogether,thesestudiessuggestthatHTSbasedsurveysrequiresubstantialdatadenoising. AmongsttheseHTStechnologies,Illuminaiscurrentlythestateoftheartwhenitcomesto16SrRNAgeneamplicons[17][18][19][20][21][22][23].Itsurpassestheprevious454technologymostnotablybyitspricedifference,cominginatabout0.5USD/MbforaMiSeqinstallationagainst31USD/Mbfora454GSJuniorinstallation[24]. Here,weevaluatedthemethodssuggestedinthesepublicationsandprovideourviewonbestpracticesforfollowingthedynamicsofmicrobialtaxaaswellasforestimatingalphaandbetadiversity.AlphadiversityestimatestestedincludeChao1andACEforrichness,Pilou’s,ShannonWiener’sandSimpson’sevennessestimates.BetadiversityestimateswerebasedonweightedUniFracdistancesandBray-Curtisdissimilarities.WedescribedaprocedurestartingwithPCRamplificationofbacterial16SrRNAgenes,paired-endIlluminasequencingandendingwithbioinformaticanalyses. MaterialsandMethods Collection Asedimentsamplewascollectedfromthehighlyphosphorus-saturatedsedimentsofLakeVallentunasjön(N59°29’24”E18°01’42”)inSwedenbetween1–2cmofdepth.Thesedimentcorewasincubatedat21°Cpriortosampling[25].Inaddition,9samplesfrom3AustriansodalakeslocatedontheeasternbankoftheNeusiedlersee(N47°44’W16°49’)wereobtainedfromthewatercolumnduringthewinterof2012throughsummerof2013bycapturingcellson0.2μmfilters.Sampleswerekeptfrozenat−80°C.NucleicacidswereextractedusingthePowersoilDNAisolationKit(MOBIOLaboratoriesInc,CA,USA).Nospecificpermissionswererequiredforsamplingineitherlocationnorwereanyendangeredorprotectedspeciesinvolved. Primerandbarcodesdesign HPLC-purifiedprimerstargetingtheV3andV4regionsoftheribosomalRNAgeneoriginallydesignedforpyrosequencing[8]wereadaptedtoIlluminasequencingbycomplementingbothBakt_341F(CCTACGGGNGGCWGCAG)andBakt_805R(GACTACHVGGGTATCTAATCC)withsample-specificbarcodes.ThisregionoftherRNAgeneappearsoptimalforinterrogatingbacterialcommunities[26]. The7bplongbarcodesweredesignedtoavoidhomopolymersandtomaketwobarcodesdifferbyatleast2bp.Wetestedthesequencesforself-complementaryusingPrimer3(http://primer3.sourceforge.net/)andBLASTN.Hairpinloopsandselfannealingbaseswerediscardedleavinguswith324identifiedacceptablepairsofwhich50pairswererandomlyselected.Intotal,50barcodedforwardprimersand50barcodedreverseprimerswhereorderedfromEurofins(Germany).AdetailedlistcontainingthesequencesofthebarcodescanbefoundinS1Table. Amplificationandbarcoding Thevariableregions3and4ofthe16SrRNAgenefromthesinglesedimentsamplewasamplifiedusingfourdifferentprocedures. i)ThestartingDNAtemplatematerialwasamplifiedusingnon-barcodedPCRprimersfor20cycles(induplicateandsubsequentlypooled)followedbya100timesdilutionoftheresultingPCRproduct.Next,intriplicatereproduction,andforeveryofthefiftybarcodepairs,1μlofthedilutedPCRproductwasusedfor10additionalcyclesofamplificationwiththerespectivebarcodedprimers.Thisbatchisreferredtoasthe“two-stepPCR”andincluded149samplesinthreereplicatepoolsrunonanIlluminaMiSeq. ii)Insingletonreproduction,theDNAmaterialwasamplifiedfor25cyclesdirectlywiththebarcodedprimers,foreachofthefiftybarcodepairs.Thisbatchisreferredtoasthe“single-stepPCR”andincluded49samplesinonepoolrunonanIlluminaMiSeq. iii)Inaddition,theampliconsfromthefirstpoolofthetwo-stepPCRtreatmentweresequencedasecondtimeinadifferentrunwithanupdatedversionoftheIlluminachemistryandsoftware.Inparticular,lessrandomPhiXDNA(5%insteadof50%)wasspikedinthesample.Thisadditionaldatasetisreferredtoasthe“updatedchemistryrun”andincludedanother50samplesinonepoolrunonanIlluminaMiSeq. iv)Finally,thesamplewasalsosequencedonthe454LifeSciencessequencerwiththesameprimerpair341F-805Rasdescribedin[27].SequencingwasperformedattheSciLifeLabSNP/SEQsequencingfacilityatUppsalaUniversityusingstandardTitaniumchemistry. Thesodalakesincludedninesampleswhichweresequencedinparallelontwotechnologiesaccordinglyto(iii)and(iv). AllPCRswereconductedin20μlofvolumeusing1.0UPhusionhighfidelityDNApolymerase(NEB,UK),0.25μMprimers,200μMdNTPmixand0.4μgbovineserumalbumin.Thethermalprogramconsistedofaninitial95°Cdenaturationstepfor5min,acyclingprogramof95°Cfor40seconds,53°Cfor40seconds,and72°Cfor60secondsandafinalelongationstepat72°Cfor7minutes. Topreparethe248replicatesofthesedimentsamplesforsequencing,theconcentrationofPCRampliconswasestimatedwiththeGelProanalyzerprogrampriortopooling.ThreemicrolitersofeachPCRmixturewererunona1%agarosegel(Invitrogen,LifeTechnologiesEuropeBV)in1XTris-acetate-EDTA(TAE)bufferstainedwithgelreddye(0.0001%)andvisualizedunderUVtransilluminationusingaSpectronicsvariableintensityUVsourcewithdiffusorplatesandacooled12-bitCCDcamera(CoolsnapPro,MediaCybernetics,SilverSprings,MD).TheconcentrationofPCRproductwasestimatedbyGelProanalyzer3.1using100bpDNAladder(Invitrogen)asamolecularweightstandard.Inthecaseofthesodalakesamples,thisquantificationwasperformedusingafluorescentstain-basedkit(PicoGreen,Invitrogen)afterpurificationusingtheAgencourtAMPureXPpurificationsystem(BeckmanCoulter,Danvers,MA,USA). Oncecombined,everypoolofsampleswaspurifiedbyQiagenPCRpurificationkit(Qiagen,Germany)andquantifiedusingthePicoGreenkit(Invitrogen).Thisresultedinafinalamountofapproximately300–400ngperpoolwhichwassubmittedforlibrarypreparation. SeeS1Fig.foranoutlineoftheexperimentaldesign. Illuminasequencing ThesamplesweresubmittedtotheSciLifeLabSNP/SEQsequencingfacilityhostedbyUppsalaUniversitywheretheroutineTruSeqSamplePreparationKitV2protocol(EUC15026489RevC,Illumina)[28]wasappliedwiththeexceptionthattheinitialfragmentationorsizeselectstepwasnotperformed.ThisinvolvesthebindingofthestandardsequencingadaptersincombinationwithseparateIllumina-specificMIDbarcodesthatenablethecombinationofdifferentpoolsonthesamesequencingrun.ThisprocedureincludesanadditionalPCRtotaling10morecyclesofamplification.Asourdifferentbarcodesaremixedatthisstage,thiscouldpotentiallycausetheformationofchimericsequences.ThisprotocolalsoincludestheadditionofrandomPhiXDNAtothesolution(50%)toprovidecalibrationandhelpwiththeclustergenerationontheMiSeq’sflowcell.Asdetailedabove,lessPhiX(5%)wasusedforsomeofthesamples. Dataprocessing Thedatawaspreprocessedwithversion2.1.13oftheIlluminainstrumentcontrolsoftwareforthetwo-stepPCRpoolsandforthesingle-stepPCRpool.Theupdatedchemistrysampleswerepreprocessedwithversion2.2.0.ThisstepincludestheseparationofpoolsaccordingtotheirMIDsequencetagandthecomputationofafewstatisticssuchasGCdistributionandqualityscoredistribution.FurtherstatisticswereproducedusingFastQC[29]. Barcodesdemultiplexing.Usingourcustommadepipeline,everyread-pairproducedwasparsedandcheckedforrecognizablebarcodesonboththeforwardandreversesequences.Dependingontheoutcomeofthisprocedure,read-pairswereclassifiedintooneoffivecategories:1)Readswithtwodifferentrecognizablebarcodesatthestartofbothofthesequencesinthepairforwhichthesebarcodescongruentlymatchedtothesamesample.2)Readswithoutanyidentifiablebarcodesoneitherofthesequencesinthepair.3)Readswithaknownbarcodeononlyoneofthetwosequencesofthepair.4)Readswithtworecognizablebarcodesbutforwhichbothbarcodesbelongedtothesamefamily(i.e.wefindtwoforwardbarcodesortworeversebarcodes).5)Readswithtworecognizablebarcodesbutforwhichthebarcodesbelongedtodifferentsamples(e.g.wefindbarcode#6forwardpairedwithbarcode#18reverse).Allcategoriesunderwentfurtherprocessingstepsbutonlythematchingbarcodeswereusedinthefinalresults. Assembly.TheIlluminatechnologyuseddoesnotproduceasinglecharacterstringforeveryDNApolonyontheMiSeqflowcell,butinsteadproducestwostringsoffixedsizeeachstartingatoneendoftheoriginalfragment.Fortunately,thelengthofthe16SrRNAgeneregiontargetedbyourprimersisshortenoughtoensurethatbothsequenceendshavetooverlap.ToreconstructthecompletenucleotidesequenceweusedthePANDASeqalgorithm[30]atversion2.4.Atthisstep,theoverlappingregionswerealignedandscored.Alignmentsthatobtainedlowscores(<0.6)suchasthosewithshortalignmentlengthorhighproportionofmismatcheswerediscarded,providingafirststepofqualitycontrol. Qualitycontrol.Tofurthercheckforerroneousreads,wesearchedfortheforwardprimerandreverseprimeratthestartandendofeachread,respectively,anddiscardedthosethatdidnotcontainthem.Followingthis,wediscardedanysequencescontainingunderdeterminedbasepairs(representedbytheletter‘N’).Furthermore,wescannedeverysequencewithaslidingwindowof10basepairsanddiscardedallthosethatfellbelowaPHREDscoreof5.Finally,weappliedalengthcutoffanddiscardedanysequencehavinganoverlapregiongreaterthan100basepairs. Chimeras.Tocheckforchimericsequencesamongstthedifferentcategoriesofsequences,theUCHIMEalgorithm[31]includedinthefreeversion6.0.307ofUSEARCHwasused.Twovariationsoftheprogramwererunandcompared.Firstthedenovomodeinwhichthevaryingabundancesofsequencesintheinputwereexploited.Secondly,weusedthereferencemodeinwhichdecisionsaremadeusingadatabaseofchimera-freesequences.Forcomputationaltimeissues,thedenovoalgorithmwasrunon50’000randomlysampledsequences,whilethereferencealgorithmwasrunon100’000sequences. Clustering.Anexactclusteringalgorithmthatcomputesthedifferencebetweeneverypairofsequencesscaleswiththesquareofthenumberofinputsequencesandhencecannotbeusedonadatasetofthissize.Instead,weusedtheCD-HIT-OTUpackage[32]anditsvarianttailoredforIlluminareads(version4.5.5-2011-03-31).Anotherheuristic,theUCLUSTgreedyalgorithm[33](includedinthefree32-bitversion6.0.307ofUSEARCH)asimplementedintheQIIME[34]script“pick_otus.py”(v1.8.0)wasalsotestedwithdefaultparameters.Thirdly,thelatestproductfromRobertC.EdgartitledUPARSE[16]wasappliedtoourdata(includedinthefreeversion7.0.1001ofUSEARCH).AlthoughsomealgorithmssuchasSwarm[35]andSumaClust[36]arecapableofapplyingexactclusteringmethodsonfairlylargedatasets,thesearenotpublishedyetandwerenottestedinourstudy. Taxonomicassignment.ForeveryOTU,therepresentativesequenceoftheclusterwasusedasaqueryagainstthequalitycheckedSILVAMODdatabaseusingtheCRESTsoftwareversion2.0[37].ThisalgorithmusesMEGABLASTtoquicklysearchthroughahierarchicaldatabaseof16SrRNAgenesequencesandmakesuseofalowestcommonancestorstrategytoassigneachsequencetoaparticularleveloftaxonomyinthetreeoflife.TheSILVAMODdatabaseisbasedonamanualcurationofthetaxonomicalinformationfoundintheversion106oftheSILVASSURefnon-redundantrelease[38].UnlikeFigs.1and2,anexceptiontothisprocedurewasthecreationofFig.3wheresequenceswerenotpreviouslyclusteredintoOTUsandthusresultedinmuchlargerquantitiesofdata.Inthisparticularcase,theRDPnaivebayesianclassifierversion2.2wasusedincombinationwithitsownassociatedtaxonomicaldatabase[39]. Download: PPTPowerPointslidePNGlargerimageTIFForiginalimageFigure1.Distributionofbarcodesmatchingandmismatching.ToassignanIlluminareadtoaparticularsample,oneexaminesbothofthebarcodesateachendofthesequence.Ingreen,thetwobarcodesagreeonwhichsamplethereadiscomingfrom.Inblack,nobarcodesarefoundoneitherend.Inyellow,onlyonebarcodeispresent.Inorange,thetwobarcodescomefromthesamedirectionalsetandshouldnotbefoundtogether(e.g.twoforwardbarcodes).Inred,thetwobarcodeseachindicateadifferentsample.Overall,withthissetup,abouthalfoftherawdataisdiscarded. https://doi.org/10.1371/journal.pone.0116955.g001 Download: PPTPowerPointslidePNGlargerimageTIFForiginalimageFigure2.Distributionofsequencelengthsformatchingbarcodes.OncealltheforwardandreversereadsfromtheIlluminasequencerarejoinedonecansee–ingrayhere–apatterninthesizeofthefragmentsproduced.SuperposinginredtheabundanceoflengthvariationfoundintheSILVAMODdatabase,onecanseethatthevariationweuncoveredfollowscloselythenaturalvariationoftheV3–V4regionofthe16SrRNAgene.AsshowninFig.3,eachofthethreepeaksarecomposedofcharacteristicphyla. https://doi.org/10.1371/journal.pone.0116955.g002 Download: PPTPowerPointslidePNGlargerimageTIFForiginalimageFigure3.Compositionofdifferentlengthsfractions.Everyoneofoursedimentsamplereplicates,i.e.thetriplicatetwo-stepPCR,theone-stepPCRandtheupdatedchemistryareseparatedintothreesizefractions(low,medium,high)accordingtothepeaksidentifiedinFig.2.Fragmentsbelow446bpareexclusivelyoriginatingfromarchaea.Thesecondpeakbetween447and464bpcontains,forinstance,containsamajorityoftheChloroflexi.Thelastgroupabove465bpholdsmostoftheBacteroidetes.OtherspeciessuchasProteobacteriaorFirmicutesarefoundspanningasizerangeintheenvironmentalsample. https://doi.org/10.1371/journal.pone.0116955.g003Statistics.Statisticalanalyseswereperformedusingversion2.0–7oftheVEGANpackage[40]andtheRstatisticalframeworkversion2.11.Inparticular,theseincludedNMDSordinationplots(metaMDS()),beta-dispersion(betadis()),PERMANOVA(adonis()),permutationalANOVA(aovp())andtheestimationofdiversityindices.TheBray-CurtisdistancesarecalculatedwiththeusualsquaretransformationandWisconsinstandardizationusingrarefieddatasets. Comparisonwith454.ThedataoriginatingfromRoche’spyrosequencingmachinesincludedthesedimentaswellas9sodalakes.Inordertomaintaincomparability,the454readswereprocessedidenticallytotheIlluminadata(albeitwithouttheassemblystep)throughallthestepsofdemultiplexing,primerpresencecheck,exclusionofundeterminedbases,qualityfilteringandremovalofprimersequences.Forcomparison,the454datawascombinedwiththatofthesedimentsampleandsodalakessequencedontheIlluminamachine.Oncecombined,bothIlluminaand454readsweretrimmedto400bp,clusteredusingUPARSEandassignedusingCRESTasdescribedabove. Phylogeneticdistance.TheUniFrac[41]distancewascalculatedbyaligningtherepresentativesequenceofeveryOTUagainsta97%clusteredversionoftheSilvaSSURefnon-redundantdatabase.ThisdatabaseisdistributedbytheQIIME-groupandisbasedonrelease111.Thealignmentwasperformedbymothur’s[42]v.1.30.2align()functionwiththekmersearchstrategyandNeedleman-Wunschscoringmethod.Followingthis,fromthemultiplesequencealignmentproduced,aphylogenetictreeisbuiltwiththedefaultsettingsofFastTree[43]v2.1.7.Finally,thetreeandtheOTUtablearefedintotheweightedUniFracimplementationofPyCogent[44]v1.5.3whichcomputesthefinaldistancematrix. SeeS2Fig.foranoutlineofthedataprocessing.Thecodeproducedforthedevelopmentofthispipelinewaswritteninpythonandisavailableathttp://github.com/limno/illumitag/underanMITlicense. TherawsequencingdataforthesedimentsampleandthesodalakesareavailableataccessionnumbersSRP044363andSRP044627,respectively. Results Generalperformance Asatestsamplethatwouldbesufficientlycomplexforourtests,weusedmaterialfromtheuppersedimentlayerofaSwedishlake.WethenranthesampleinmultiplereplicatesandtriedtomeasurethereproducibilityofasequencingexperimentontheIlluminaMiSeqplatform.Thetotalnumberofpairedsequencesproducedfromthedummysedimentsamplereached10’338’568.Eachpaircontainedtwosequencesof250nucleotideseach.TheoverallPHREDqualityscoresaveragewasof36fortheforwardreadsandof33forthereversereads.Asisexpectedwiththistechnology,thereversereadswerealwaystaggedwithalowerqualitythantheforwardreadsespeciallyintheirterminalregion(seeS3Fig.formoredetaileddistributionsofthequalityscoring). Applyingthedemultiplexingproceduretoallreadsrevealedthatalargeproportionofthesequencespairsdidnotpossesmatchingforwardandreversebarcodes.Somesequencesdidnotbearanyrecognizablebarcodeatall(6%).Othershadonlyonebarcode(14%)placedoneitherend.Yetanothergrouphadtwoidentifiablebarcodesbutbelongingtodifferentsamples(22%),thuswerenotexpectedtobefoundtogether(Fig.1).Withinthetheremaining58%thathadmatchingbarcodes,proportionsofreadswereunevenlydistributedamongstthebarcodes,witharelativestandarddeviationupto55%.S4,S11,S12Figs.detailthedistributionsofbarcodeswithineachpool.Naturally,onlythematchingbarcodesequenceswereusedinourfinalanalysisandtherestwerediscarded. Furthersequencingrunsnotpresentedinthisstudywereperformedusingdifferentlaboratoryquantificationmethods.InsteadofquantifyingwithGelPro,thePicoGreenmethodwasusedforeverybarcodeindividually.Thisdidnotsignificantlyreducetheunevennessofthenumbersofsequencespersample(relativestandarddeviationwithpicogreen22%,26%,88%against21%,39%,49%,55%,21%forgelpro). Theassemblyprocedure,representingthenextstepintheanalyticalchain,discardedlessthan2%ofthematchingbarcodesequences.Consideringthemismatchingbarcodegroup,wefoundthatanequalratioofmatepairscouldnotbeassembled.However,forreadsthathadnoneoronlyonebarcode,theunassembledproportionswereover19%.S6Fig.detailsthedistributionsofassembledsequenceswithineachgroupofeachpool. Takingonlythematchingbarcodesgroup,thelengthoftheoverlappingregionvariedbetween20and70bpforthemajorityofthesequences.Afewsequences(0.5%)assembledwithanoverlapgreaterthan100bpandwerediscardedwhenapplyingthelengthcutoff.Fig.2showsthedistributionofsequencelengthsproducedwhichagreedwellwiththenaturalvariationinthelengthofthe16SrRNAgene.Threemainsizefractionsappeared,eachcontainingacharacteristicspeciescomposition.Forinstance,thefractionbetween430and446bpwascompostedexclusivelyofArchaea(Fig.3). Afterassembly,missingprimersaccountedforanaveragelossof4.3%oftheremainingsequences.Next,theunderminedbasepairfilteringeliminatedonaverage0.2%.Followingthis,thequalitycontroldiscardedafurther6.0%.Thisbringstheamountofqualityfilteredreadsfromoursedimentsampleto5’212’432. Overall,forthereadswithmatchingbarcodes,2.5%wereidentifiedaschimericwhenusingUCHIMEinreferencemodewhileupto23%wereidentifiedaschimericusingUCHIMEindenovomode.Takingthefractionofreadswithmismatchingbarcodesthesenumbersbecame5.2%and36.35%,respectively. OTUClustering Usingthe248replicatesofthesedimentsampleandapplyingtheCD-HIT-OTUalgorithmsetwithacutoffat97%sequenceidentityresultedin12’942OTUsafterexcludingchimericsequences.RunningtheUPARSEalgorithmwithaminimumdifferenceof3%betweenclustercentersproduced14’107OTUs.Insharpcontrast,usingtheUCLUSTalgorithmrunviaQIIMEonthesamereadsandusingasequencesimilaritythresholdat97%resultedin189’391OTUswhichis13timesmorethanusingthedefaultsettingsinUPARSE. OncethecentroidsequenceofeachUPARSEOTUwasannotatedagainsttheSILVAMODdatabase,allsequencespertainingtothephylaofplastids,mitochondria,thaumarchaeota,crenarchaeotaandeuryarchaeotawereremoved.Thisrevealedthat2’152OTUsrepresenting16%ofthereadswereidentifiedasbeingofnon-bacterialorigin,whichwasexpectedconsideringthecharacteristicsoftheprimer-pair. KeepingthebacterialOTUsandrarefyingthenumberofreadstothatofthelowestsample,wecancomparethethreeclusteringmethodsagain.Afterthisrarefaction,theUCLUSTmethodresultedinabouttwicethenumberofOTUswhencomparedtoUPARSEandCD-HIT-OTU.Indeed,processingthe248replicatedsedimentsampleinsuchamannerresultedin,onaverage,1’235OTUswhenusingUCLUSTand,onaverage,605and830OTUswhenapplyingCD-HIT-OTUandUPARSE,respectively.ThepatternwhereUCLUSTshowedmuchhighernumbersofOTUsthantheothertwomethodswasalsoobservedwhenusingaseparategroupof70sodalakesamples.Yet,plottingOTUaccumulationcurves(seeS9Fig.)revealedthatCD-HIT-OTUandUPARSEfollowedexpectedasymptotictrendswhileUCLUSTbehavedatypicallyshowingariseintheamountofrareOTUs.Examiningthelandscapeofassignmentsonthephylalevelrevealedthatpatternsinphylumcompositionwerestronglyconserved. Moreover,betadiversitymeasuresbasedonBray-Curtisdistances(withsequencenumbersrarefaction)appliedtoacollectionof70sodalakesamplesshowedhighlycorrespondingtrendswhencomparingthethreeclusteringprocedures.Additionally,apair-wiseProcrustestestamongthethreeOTUtablesresultedincoefficientsgreaterthan0.98andp-valuesoflessthan0.001.Also,trendsinevennessweresimilarandlinearmodelsbetweenallthreeclusteringmethodsresultedinR2valuesgreaterthan0.96andp-valuesoflessthan0.001.Slopesoftheselinearmodelsrangedfrom0.92to1.08revealingthatUCLUSTresultedinthemostevenOTUtablefollowedbyUPARSEwithCD-HIT-OTUprovidingthemostunevenOTUtable.DifferencesinrichnessestimatedfromthethreeOTUtablescorrespondedwell(R2rangingfrom0.74to0.92andp-valuesoflessthan0.001).Theslopesrevealedthatrichnessestimateswereapproximately40percenthigherwithUCLUSTascomparedtotheothertwoclusteringmethods.TheestimatedrichnesswasverysimilarbetweenCD-HIT-OTUandUPARSEwithaslopeof1.03andaninterceptof27. Precision ChoosingtheUPARSEclusteringmethodforitsspeedandsimplicityofuse,weproceededtoevaluatetheprecisionofthemethod.Thereproducibilityofthemethodandtheeffectsonalphaandbetadiversityweredeterminedbycomparingresultsfrom248technicalreplicatesofasingleenvironmentalsamplerunwith50barcodesin5differentpools.PermutationalMANOVAsonBray-Curtisdissimilaritiesrevealedsignificantdifferencesinbetadiversitybetweenreplicatesrunindifferentpoolsandwithdifferentmethods.Inparticular,betadiversitydifferedsignificantlybetweenthe1-stepand2-stepPCRmethods(R2=0.028,p-value<0.001). Similarly,usingaphylogeneticmeasuresuchasUniFracdistances,asignificantpoolandmethodeffectwereobserved.Thevariancesinalphadiversityestimatesamongstthe248replicatesandtheresultsfromthepermutationalanalysisofvariancearegiveninTable1.Here,nosignificantdifferenceinChao1estimatesamongreplicatesrunindifferentpoolsorwithdifferentmethodsorbarcodeswasobserved.However,evennessestimatesrevealedsignificantpoolandmethodeffects.Applyingpost-hoctestsrevealedthatintheselattercasesthe“single-stepPCR”procedurewasdifferentfromtheotherfourpools.Whenconsideringonlythetwo-stepPCRpools,nosignificantdifferencewasfound. Download: PPTPowerPointslidePNGlargerimageTIFForiginalimageTable1.PrecisionoftheIlluminamethod. https://doi.org/10.1371/journal.pone.0116955.t001 Comparisonswith454 Next,wecompared454pyrosequencingandIlluminasequencingstrategies.Weevaluatedtheagreementbetweenthetwomethodsbyusing10samplesforwhichboth454andIlluminadatawasavailable.Readsfromthetwosequencingmachinesunderwentmethod-specificqualityfilteringbeforebeingpooledandtrimmedtoalengthof400bp.AfterperformingOTUclusteringusingUPARSE,theconsistencyinalphaandbetadiversityaswellasthetaxonomiccompositionwasdetermined.UsingProcrustesandManteltests,asignificantcorrespondencebetweenbeta-diversityestimateswasrevealedwhenusingBray-Curtisdistances(R=0.995,p<0.001andR=0.954,p<0.001,respectively).Theconcordanceinbetadiversityisalsowellrepresentedinthedendrogram(Fig.4)andtheNMDSplot(S8Fig.). Download: PPTPowerPointslidePNGlargerimageTIFForiginalimageFigure4.Correspondenceofphylogeneticdistancebetween454andIllumina.Thisdendrogramincluded24replicatesofourtestsedimentsamplesequencedontheIlluminaplatformaswellasthesamesamplesequencedona454machine.Inaddition,9sodalakessamplesthatwereequallysequencedonbothtechnologiesareincluded.Forthesodalakes,theletterscorrespondtodifferentsystemswhereasthenumberscorrespondtodifferentsamplingdates.UsingtheUniFracmetrictocomputeadistancematrix,ahierarchicalclusteringisperformedfollowingtheUPGMAmethod.ThedetailofthetaxonomiccompositioncanbeseeninS7Fig. https://doi.org/10.1371/journal.pone.0116955.g004Accordingly,patternsinphylogeneticcompositionasdeterminedbyUniFracdistancesalsoagreedbetweenthetwoapproaches,asshownbyProcrustesandManteltests(R=0.993,p<0.001andR=0.968,p<0.001,respectively).WealsoobservedmatchingresultsforChao1andACErichnessestimates,whereascorrespondencewasratherlowforPilou’sevenness,ShannonWienerandSimpson’sindexbetweenthetwosequencingapproaches(Table2). Download: PPTPowerPointslidePNGlargerimageTIFForiginalimageTable2.Correspondencebetween454andIllumina. https://doi.org/10.1371/journal.pone.0116955.t002Onataxonomiclevel,therewassubstantialoverlapinthedetectedphyla(S7Fig.).However,therelativephylacontributionwasnotwellconservedbetweenthetwomethods.ThehighestdiscrepancieswereobservedinsampleswithsubstantialproportionsofCyanobacteria. PerformingapairedWilcoxontesttoidentifyinconsistentOTUabundancesbetweenthemethods,revealed18OTUswithasignificancedifference(p<0.05).However,falsediscoveryratesindicatethatthesediscoveriesaremostlikelyduetochance. DiscussionThereareseveralapproacheswhenitcomestoampliconsequencingofthe16SrRNAgenebasedontheIlluminatechnology[17][18][19][20][21][22][23].Thesehavebeenappliedtoinvestigatethemicrobialdiversityinnumerousenvironmentstogreatsuccess,evenrevealingthedynamicsofraretaxa. HereweintroduceourownprotocolstartingwithPCRamplificationofbacterial16SrRNAgenes,followedbypaired-endIlluminasequencingandendingwithbioinformaticanalyses.OurexperimentalIlluminatagsequencingdesignusedbarcodedprimersflankingtheV3-V4segmentof16SrRNAgene,aregioncommonlyamplifiedinpyrotagexperiments[8][45][27][46].ToconstructastandardIlluminapaired-endlibrarywithanindividualMID,50individualsampleswereamplified,mixedandthenusedastemplateseachtime.Usually4–6oftheseMIDcodedlibrarieswerethensimultaneouslysequencedonanIlluminaMiSeq.Withthecurrentreadlengthoftwice250bp,theV3-V4regionoftherRNAgenepresentsanoptimaltargetforsequencing[26]asitprovidesanadequateoverlapoftheforwardandreversepaired-endreads.Moreover,assemblingthesereadsincreasesthequalityandconfidenceintheoverlappingregion[20][22]. Wheredidallthesequencesgo? Applyingourbioinformaticpipeline,overhalfofthepaired-endreadsfromeachIlluminarunweresubsequentlydiscarded.Thiswasduetoeither(i)lowqualityscore(ii)unassembledpairs(iii)assembledpairsthatcontainedmismatchedbarcodes(iv)sequencingerrorsinoneorbothoftheprimerregions(v)archaealoreukaryoticsequences.OtherIllumina-based16SrRNAgenestudieshaveencounteredsimilarlyhigherrorrates,resultinginsuchextremereadfiltering(approximatelybyafactor2). Similartoourresults,highincidenceofmismatchingbarcodes[19]havebeenpreviouslyreportedasthemainlossfactor.Inthisearlierstudy,over-clusteringandsequencechimeraswereruledout,andinsteadprimercontaminationduringtheinitialsampleamplificationsweregivenasthemostlikelycause.Laboratorycontaminationisalwaysapossibilitythatcanexplainthemismatches.However,inourcase,thisisunlikelyasthehighproportionofbarcodemismatchesremainedunchangedineveryexperimentconductedforthisstudyaswellasinallthefollowingsequencingrunsthatarenotpresentedhere.Indeed,multipleindividualshavereproducedtheprotocolpresentedinthisarticleandallobtainedsimilarresultswithregardstosequenceloss. AtechnicalissuewithintheIlluminamachinecouldbeanothersource,suchaserroneousidentificationoftheDNAclustersontheflowcellbytheimagingsoftware.Comparingmatchingandmismatchingbarcodesshowedthatassemblyperformedconstantlyinbothcasespermittingustorefutethehypothesisthatthemismatchingbarcodesareduetothepairedendsequencing.Furthermore,reportsofsuchproblemsarenotprevalentandthemetricsgeneratedshowthatatleast90%oftheclusterspassedtheflowcellfilteringalgorithmforthefirstfourpoolsandatleast85%forthelastpool. Ourpresentinterpretationandargumentisthatmismatchedbarcodesequencesaremostlikelyproducedinthelibrarypreparation.Ourexperimentaldesignobviouslyamplifieseachsamplewithitscorrespondingbarcodesseparately.However,thereisasupplementaryPCRperformedbythesequencingfacility,occurringafterallsamplesarepooledtogetherandadaptorsareadded.Wehypothesizethatthiscausesthechimeraswhichissupportedbythechimeradetectionresults.Indeed,bothalgorithmsidentifiedthatthemismatchingbarcodesgrouphadaslightlylargerproportionofchimerasthanthegroupwithmatchingbarcodes.Yet,theincreasedproportionofchimerasinthemismatchingbarcodesgroupisratherlowandisnotsufficienttoentirelyexplaintheproportionofmismatches.Mostlikely,chimericsequencesareformedduringtheIlluminalibrarypreparationbyhighlysimilarampliconsthatoriginatefromdifferentlybarcodedsamples.Itisalsoworthmentioningthat,asaproofreadingpolymeraseisusedinthelibrarypreparation,thereisariskofunstableamplification.Suchpolymerasescanfallofffromtimetotimecreatingpartialampliconswhichwillbeusedin“false”primingtoproducechimeras.Thechimerasformedthroughsuchaprocesswillmoreorlessresembletheoriginalampliconthuswillgounnoticedbythechimeradetectionalgorithmsbutwillhaveamismatchingbarcode.Still,theproportionsofchimericsequencesvariedgreatlydependingonthealgorithmusedandcastdoubtsonthespecificityofthechimeradetectionalgorithm. TheEarthmicrobiomeprojectcircumventsthisproblembyusingaprimerconstructincludingstandardIlluminahandles,indexbarcodesIlluminaadaptersand16SrRNAgenespecificsequence.AlthoughPCRamplificationandlibrarypreparationarecombined,thepurchaseoffortylongprimerstomultiplex400samplesrequireamajorinvestmentbeingnotsuitableforsmalllabsprocessinglessthan1000samples.ApossiblesolutioncouldbeprovidedbyusingaprimerconstructincludingIlluminaadaptersand16SrRNAgenespecificprimersinthefirststepPCR,combinedwiththeattachmentofstandardIlluminahandlesandindexbarcodesinthesecondstepPCR[47].Thisrepresentsthenextgenerationprocedure,alreadyappliedinourlab. Asecondaryissueistheunevennessofthereadcoverageproducedpersample.Ifeverybarcoderepresentedauniqueenvironmentalsample,unlikeinourcurrentevaluationexperiment,onewouldtypicallypreferthequantityofdataproducedforeachsampletobeequal.Thisisforexamplethecaseifonewantstocomputestatisticalmeasurementsthataresensitivetosamplesize,whereoneisforcedtorarefythereadcountstothelowestsequencegrouporexcludesampleswithlownumbersofsequences.Sourcesfortheunevennessarepipettingerrorsanduncertaintiesinquantificationforwhichwehavenosolutionasdifferentproceduresallperformedequally“unsatisfactory”. Anotherinterestingobservationfollowingourstudyisthatthelengthdistributionagreeswiththenaturalvariationinthelengthsofthe16SrRNAgene.Suchlengthpolymorphismsaffectsquantificationasshorterreadsareknowntobepreferentiallyamplifiedandsequenced,whichalsosuggeststhatobservationsofmultiplebandsonelectrophoresisgelsofbacterialcommunity16SrRNAgeneampliconsarenotanartifactofPCR. OTUmakinganddiversityestimatesbiases Weobservedmajordifferencesinabsoluterichnessestimates,andminordifferencesinevennesswhichcanbeexplainedbytheheuristicsoftheclusteringalgorithms.Thus,alphadiversityestimatesbasedonOTUtablesobtainedbydifferentclusteringmethodsshouldnotbecomparedwithoutcorrectingforgeneraldiscrepancies.Suchcorrectionscanbeperformedusinglinearmodelsasobtainedinourstudy.Theapplicabilityofsuchlinearmodelsissupportedbyoursetof9sodalakesamples,buttheuniversalityofthemodelsrequiresfurtherevaluation. Besidesthedifferencesinabsoluteestimates,thehighR2andpvaluesofthelinearregressionanalysesrevealthatgeneraltrendsinalphaandbetadiversitywereconservedandcorrespondedwellregardlessoftheclusteringalgorithm.Whiletheclusteringmethodstestedrevealedconservedandcorrespondingtrends,weuseUPARSEbecausethisalgorithmrequiresthelowestCPUallocation. Takingacloserlookatthealphadiversityinthe248replicates,relativestandarddeviationneverexceeded5%amongreplicates,whileaverageestimatesofrichnessdeterminedfromallreplicateswereapproximately3.6timeslowerthanthenumbersofOTUsdetectedinallreplicates.Thisunderestimationofrichnessaswellasthevariationsamongreplicatesaremostlikelyduetosamplingartifactsassociatedwithrandomsampling[48],aswellastheperformanceofthetechnologyperse.ManystepsintheIlluminatagsequencingprocedureareassociatedwithrandomsampling,includingPCRamplificationoftargetgenes,ligationofamplifiedPCRproductstosequencingadaptors,amplificationofligationproductsandimmobilizationtoflowcells,aswellasidentificationoftheDNAclustersontheflowcellbytheimagingsoftware.OnewaytoimprovereproducibilityandquantitationistousebiologicalreplicatesalsowhenconsideringupstreamproceduressuchasenvironmentalsamplingandDNAextraction. Can454andIlluminadatabecombined? Picking24Illuminasequencedreplicatesofoursedimentsampleanditscorresponding454runaswellastheninesodalakessampleswhichwerebothrunwithIlluminaand454,weobtainedhighlycorrespondingtrendsinrichnessandbetadiversitybutnotinevenness.ApossibleexplanationforthelowerevennessproducedbytheIlluminatechnologycouldbethepresenceoftheadditionalPCRstepinthelibrarypreparation. ComputingtheUniFracmetriconallthesamples,wenotethattheaveragedistanceamongstsamplesincreasesasonemovesfromaseriesofreplicatesusingthesametechnology(meandistanceamongstIlluminasedimentreplicates0.404)toareplicationofthesamesamplewithtwodifferenttechnologies(meandistancebetween1x454sedimentand24xIlluminasediment0.499)toagroupofsamplestakenfromsimilarenvironments(meandistancebetweenallsodalakes0.686)andfinallytothecomparisonoftwototallydifferentenvironments(meandistancebetweenIlluminasedimentandIlluminasodalakes0.812).ThesetrendsholdupwhenusingtheBray-Curtisdistancemetric. Comparing454andIlluminabyusingthesamePCRprimersandbioinformaticanalysesresultedincorrespondingtrendsinrichnessandbetadiversity.Nonetheless,taxonomiccompositionwasproportionalbiasedwhichisalsoreflectedinthenoncorrespondingpatternsinevenness. ConclusionThemainconclusionofthisreportare:(i)Mismatchingbarcodesbetweenforwardandreversereads,indicativeforchimeras,aremostlikelyintroducedintheamplificationstepofthelibrarypreparation.Thus,reverseandforwardprimersneedtobecomplementedwithuniquebarcodestoavoidmiss-assignmentofreadstosampleswhenemployingthestandardTruSeqlibrarypreparationprotocol.(ii)Although,differentclusteringalgorithmsresultindifferentnumbersofOTUs,trendsinalphaandbetadiversityareconserved.(iii)Forthoseswitchingsequencingtechnologies,454andIlluminasequencedatacanbecombinedprovidedthefollowingconditionsarerespected:thesamePCRprimersandbioinformaticworkflowsmustbeappliedandthevariationsbetweenthemethodsmustbequantifiedandaccountedforintheinterpretationoftheresults. SupportingInformationS1Fig.Experimentoutline.Thisfiguredetailstheoutlineoftheexperimentaldesign.Theextrafifthpoolwheretheupdatedprotocolwasusedisnotrepresented.Othersamplesrunwithpicogreenindividualquantificationarenotshowneither.The454sampleisnotshown. https://doi.org/10.1371/journal.pone.0116955.s001(PDF) S2Fig.Pipelineoutline.Thisfiguresummarizesinflowchartthebioinformaticspipelinethatwasappliedtothedata.Thecodewaswritteninpythonandisavailableathttp://github.com/limno/illumitag/underanMITlicense. https://doi.org/10.1371/journal.pone.0116955.s002(PDF) S3Fig.Qualitydistribution.QualitydistributionplotsforeachofthefiveIlluminapoolsofsedimentdataasgeneratedbyFastQCandbasedontherawFASTQfiles.Forwardandreversereadsareshownseperately. https://doi.org/10.1371/journal.pone.0116955.s003(PDF) S4Fig.Barcodedistribution.DistributionofsequencesthathavematchingbarcodesforeverybarcodeidentifierineachofthefiveIlluminapoolsofsedimentdata. https://doi.org/10.1371/journal.pone.0116955.s004(PDF) S5Fig.Matchandmismatchregression.Thisfigureshowstherelativeproportionsofmismatchedbarcodespairsforeverypool.Foreveryoneof100barcodes,wecomputeacomparisionwitheveryoneofthe100barcodesexlucdingthetwocaseswherethetwobarcodesarecorrectlymatching(e.g.barcode2F-barcode2R)andwherethetwobarcodesarethesame(e.g.barcode16R-barcode16R).Thisamountstoatotalof9800pointsonthegraph.Onehastounderstandthatconsideringthepair“barcode1R-barcode15F”willnotproducethesamepointasthepair“barcode15F-barcode1R”.Hence,foreverydisctinctsequenceoftwomismatchingbarcodes(e.g.barcode1R-barcode15F)thepositionontheXaxisiscalculatedbydividingthenumberofcorrectmatchescontainingthefirstbarcode(e.g.barcode1R)bythenumberofallcorrectmatches.ThepositionontheYaxisiscomputedbydividingthenumberofreadscontainingthesetwoparticularbarcodesinthisspecificorderbythenumberofmismatchesinvolvingthesecondbarcode(e.g.barcode15F).Everypointdeviatingsignificativlyfromtheregressionhasitslabeldrawnandispotentiallysuspicous. https://doi.org/10.1371/journal.pone.0116955.s005(PDF) S6Fig.Assembledagainstunassembled.Proportionsofassembledandunassembledsequencepairs.Readsthatmarkedas“Lowquality”didassemble,buttheassemblywasnotsufficientlygoodandbelowthethresholdsetat0.6.Sequencesthatmarkedas“Unassembled”couldnotbeassembledatallandnooverlapcouldbedetermined. https://doi.org/10.1371/journal.pone.0116955.s006(PDF) S7Fig.Taxarelativeabundancecomparing454andIlluminasamples.Arerepresentedhere24replicatesofourtestsedimentsamplesequencedontheIlluminaplatform(“SedimentILL”)aswellasthesamesamplesequencedona454machine(“Sediment454”).Inaddition,9sodalakessampleswereindependentlysequencedonIlluminaand454.Thespeciesrelativeabundancespersampleclassifiedatthephylumlevelisshowninabarstack. https://doi.org/10.1371/journal.pone.0116955.s007(PDF) S8Fig.NMDScomparing454andIlluminasamples.Arerepresentedhere24replicatesofourtestsedimentsamplesequencedontheIlluminaplatform(redpointswithoutlabels)aswellasthesamesamplesequencedona454machine(bluepointlabeled“Sediment454”).Inaddition,9sodalakeswereindependentlysequencedonIlluminaand454(greenlabels“Soda454”and“SodaILL”).TheUnifracmetricisusedtocomputethedistancematrixfromwhichtheNMDSiscalculated. https://doi.org/10.1371/journal.pone.0116955.s008(PDF) S9Fig.OTUaccumulationcurves.OTUaccumulationcurvesforeveryofthethreeOTUclusteringmethods.Todrawthesegraphs,theOTUsappearinginatleastX%ofsamplesareconsideredandallreadspertainingtotheseOTUsaresumeduptoonenumber.Xisvariedfrom0%to100%onthehorizontalaxisandtheverticalaxisrepresentsthesumcomputed. https://doi.org/10.1371/journal.pone.0116955.s009(PDF) S10Fig.Regressionsforcomparing454andIlluminasamples.AcomparisonismadebetweenthediversitymeasuresobtainedforeachsamplethatwerebothsequencedonIlluminaand454platforms.Eachlineshowsthefitofalinearmodelstotheoneoffivedifferentalpha-diversityestimates(AtoE).Eachdotrepresentsoneofthesevensampleswithmorethan4391reads(othersnotincluded). https://doi.org/10.1371/journal.pone.0116955.s010(PDF) S11Fig.Productionrun2.Statisticsonthesecondilluminarun,wherethemethodisfurtherusedandputinproductionforvariousotherprojects https://doi.org/10.1371/journal.pone.0116955.s011(PDF) S12Fig.Productionrun3.Statisticsonthethirdilluminarun,wherethemethodisfurtherusedandputinproductionforvariousotherprojects. https://doi.org/10.1371/journal.pone.0116955.s012(PDF) S1Table.The50barcodepairs.Thefulllistof50barcodesusedalongwiththeirnucleotidesequence. https://doi.org/10.1371/journal.pone.0116955.s013(PDF) Acknowledgments ThecomputationsandbioinformaticswereperformedonresourcesprovidedbytheSwedishNationalInfrastructureforComputing(SNIC)throughUppsalaMultidisciplinaryCenterforAdvancedComputationalScience(UPPMAX)underProject“b2011032”.Wealsowouldliketoacknowledgethesupportforthe454sequencingmadebytheSciLifeLabSNP/SEQfacilityhostedbyUppsalaUniversity.Finally,wewanttothankAlexanderKTKirschnerandStefanJakwerthfortheirassistancewithsamplingofthesodalakesinAustria,aswellasSainurSamadforhisassistancewithDNAextractionandprocessinginthelaboratory. AuthorContributionsConceivedanddesignedtheexperiments:LSSBAE.Performedtheexperiments:LSOO.Analyzedthedata:LSAE.Contributedreagents/materials/analysistools:OOSBAE.Wrotethepaper:LSSBAE.Designedthesoftwareusedinanalysis:LS.References1. OlsenGJ,LaneDJ,GiovannoniSJ,PaceNR,StahlDA(1986)Microbialecologyandevolution:aribosomalRNAapproach.Annualreviewofmicrobiology40:337–365.pmid:2430518 ViewArticle PubMed/NCBI GoogleScholar 2. LiuWT,MarshTL,ChengH,ForneyLJ(1997)Characterizationofmicrobialdiversitybydeterminingterminalrestrictionfragmentlengthpolymorphismsofgenesencoding16SrRNA.AppliedandEnvironmentalMicrobiology63:4516–4522.pmid:9361437 ViewArticle PubMed/NCBI GoogleScholar 3. FisherMM,TriplettEW(1999)AutomatedApproachforRibosomalIntergenicSpacerAnalysisofMicrobialDiversityandItsApplicationtoFreshwaterBacterialCommunities.AppliedandEnvironmentalMicrobiology65:4630–4636.pmid:10508099 ViewArticle PubMed/NCBI GoogleScholar 4. CurtisTP,HeadIM,LunnM,WoodcockS,SchlossPD,etal.(2006)Whatistheextentofprokaryoticdiversity?PhilosophicalTransactionsoftheRoyalSocietyB:BiologicalSciences361:2023–2037.pmid:17028084 ViewArticle PubMed/NCBI GoogleScholar 5. AnderssonAF,AnderssonAF,LindbergM,LindbergM,JakobssonH,etal.(2008)Comparativeanalysisofhumangutmicrobiotabybarcodedpyrosequencing.PLoSONE3:e2836.pmid:18665274 ViewArticle PubMed/NCBI GoogleScholar 6. HamadyM,WalkerJJ,HarrisJK,GoldNJ,KnightR(2008)Error-correctingbarcodedprimersforpyrosequencinghundredsofsamplesinmultiplex.NatureMethods5:235–237.pmid:18264105 ViewArticle PubMed/NCBI GoogleScholar 7. SoginML,MorrisonHG,HuberJA,MarkWelchD,HuseSM,etal.(2006)Microbialdiversityinthedeepseaandtheunderexplored“rarebiosphere”.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica103:12115–12120.pmid:16880384 ViewArticle PubMed/NCBI GoogleScholar 8. HerlemannDP,LabrenzM,JürgensK,BertilssonS,WaniekJJ,etal.(2011)Transitionsinbacterialcommunitiesalongthe2000kmsalinitygradientoftheBalticSea.TheISMEJournal5:1571–1579.pmid:21472016 ViewArticle PubMed/NCBI GoogleScholar 9. EilerA,HeinrichF,BertilssonS(2011)Coherentdynamicsandassociationnetworksamonglakebacterioplanktontaxa.TheISMEJournal6:330–342.pmid:21881616 ViewArticle PubMed/NCBI GoogleScholar 10. QuinceC,LanzénA,CurtisTP,DavenportRJ,HallN,etal.(2009)Accuratedeterminationofmicrobialdiversityfrom454pyrosequencingdata.NatureMethods6:639–641.pmid:19668203 ViewArticle PubMed/NCBI GoogleScholar 11. ReederJ,KnightR(2009)The‘rarebiosphere’:arealitycheck.NatureMethods6:636–637.pmid:19718016 ViewArticle PubMed/NCBI GoogleScholar 12. GalandPE,CasamayorEO,KirchmanDL,LovejoyC(2009)EcologyoftheraremicrobialbiosphereoftheArcticOcean.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica106:22427–22432.pmid:20018741 ViewArticle PubMed/NCBI GoogleScholar 13. HuseSM,WelchDM,MorrisonHG,SoginML(2010)IroningoutthewrinklesintherarebiospherethroughimprovedOTUclustering.EnvironmentalMicrobiology12:1889–1898.pmid:20236171 ViewArticle PubMed/NCBI GoogleScholar 14. EngelbrektsonA,KuninV,WrightonKC,ZvenigorodskyN,ChenF,etal.(2010)ExperimentalfactorsaffectingPCR-basedestimatesofmicrobialspeciesrichnessandevenness.TheISMEJournal4:642–647.pmid:20090784 ViewArticle PubMed/NCBI GoogleScholar 15. TedersooL,NilssonRH,AbarenkovK,JairusT,SadamA,etal.(2010)454PyrosequencingandSangersequencingoftropicalmycorrhizalfungiprovidesimilarresultsbutrevealsubstantialmethodologicalbiases.TheNewphytologist188:291–301.pmid:20636324 ViewArticle PubMed/NCBI GoogleScholar 16. EdgarRC(2013)UPARSE:highlyaccurateOTUsequencesfrommicrobialampliconreads.NatureMethods10:996–998.pmid:23955772 ViewArticle PubMed/NCBI GoogleScholar 17. ClaessonMJ,WangQ,O’SullivanO,Greene-DinizR,ColeJR,etal.(2010)Comparisonoftwonext-generationsequencingtechnologiesforresolvinghighlycomplexmicrobiotacompositionusingtandemvariable16SrRNAgeneregions.NucleicAcidsResearch38:e200–e200.pmid:20880993 ViewArticle PubMed/NCBI GoogleScholar 18. BartramAK,LynchMDJ,StearnsJC,Moreno-HagelsiebG,NeufeldJD(2011)Generationofmultimillion-sequence16SrRNAgenelibrariesfromcomplexmicrobialcommunitiesbyassemblingpaired-endilluminareads.AppliedandEnvironmentalMicrobiology77:3846–3852.pmid:21460107 ViewArticle PubMed/NCBI GoogleScholar 19. DegnanPH,OchmanH(2011)Illumina-basedanalysisofmicrobialcommunitydiversity.TheISMEJournal6:183–194.pmid:21677692 ViewArticle PubMed/NCBI GoogleScholar 20. GloorGB,HummelenR,MacklaimJM,DicksonRJ,FernandesAD,etal.(2010)Microbiomeprofilingbyilluminasequencingofcombinatorialsequence-taggedPCRproducts.PLoSONE5:e15406.pmid:21048977 ViewArticle PubMed/NCBI GoogleScholar 21. CaporasoJG,LauberCL,WaltersWA,Berg-LyonsD,HuntleyJ,etal.(2012)Ultra-high-throughputmicrobialcommunityanalysisontheIlluminaHiSeqandMiSeqplatforms.TheISMEJournal6:1621–1624.pmid:22402401 ViewArticle PubMed/NCBI GoogleScholar 22. ZhouHW,LiDF,TamNFY,JiangXT,ZhangH,etal.(2010)BIPES,acost-effectivehigh-throughputmethodforassessingmicrobialdiversity.TheISMEJournal5:741–749.pmid:20962877 ViewArticle PubMed/NCBI GoogleScholar 23. KozichJJ,WestcottSL,BaxterNT,HighlanderSK,SchlossPD(2013)Developmentofadual-indexsequencingstrategyandcurationpipelineforanalyzingampliconsequencedataontheMiSeqIlluminasequencingplatform.AppliedandEnvironmentalMicrobiology79:5112–5120.pmid:23793624 ViewArticle PubMed/NCBI GoogleScholar 24. LomanNJ,MisraRV,DallmanTJ,ConstantinidouC,GharbiaSE,etal.(2012)Performancecomparisonofbenchtophigh-throughputsequencingplatforms.NatureBiotechnology30:434–439.pmid:22522955 ViewArticle PubMed/NCBI GoogleScholar 25. OsmanOA,GudaszC,BertilssonS(2014)Diversityandabundanceofaromaticcatabolicgenesinlakesedimentsinresponsetotemperaturechange.FEMSMicrobiologyEcology88:468–481.pmid:24597511 ViewArticle PubMed/NCBI GoogleScholar 26. Mizrahi-ManO,DavenportER,GiladY(2013)TaxonomicClassificationofBacterial16SrRNAGenesUsingShortSequencingReads:EvaluationofEffectiveStudyDesigns.PLoSONE8:e53608.pmid:23308262 ViewArticle PubMed/NCBI GoogleScholar 27. EilerA,DrakareS,BertilssonS,PernthalerJ,PeuraS,etal.(2013)UnveilingDistributionPatternsofFreshwaterPhytoplanktonbyaNextGenerationSequencingBasedApproach.PLoSONE8:e53516.pmid:23349714 ViewArticle PubMed/NCBI GoogleScholar 28. IlluminaI(2013)TruSeqDNASamplePreparationGuide.Available:http://support.illumina.com/sequencing/sequencing_kits/truseq_rna_sample_prep_kit_v2.ilmn.29. AndrewsS(2012)FastQC:AQualityControltoolforHighThroughputSequenceData.Available:http://www.bioinformatics.babraham.ac.uk/projects/fastqc/.30. MasellaAP,BartramAK,TruszkowskiJM,BrownDG,NeufeldJD(2012)PANDAseq:PAired-eNDAssemblerforIlluminasequences.BMCBioinformatics13:31.pmid:22333067 ViewArticle PubMed/NCBI GoogleScholar 31. EdgarRC,HaasBJ,ClementeJC,QuinceC,KnightR(2011)UCHIMEimprovessensitivityandspeedofchimeradetection.Bioinformatics27:2194–2200.pmid:21700674 ViewArticle PubMed/NCBI GoogleScholar 32. LiW,FuL,NiuB,WuS,WooleyJ(2012)Ultrafastclusteringalgorithmsformetagenomicsequenceanalysis.BriefingsinBioinformatics13:656–668.pmid:22772836 ViewArticle PubMed/NCBI GoogleScholar 33. EdgarRC(2010)SearchandclusteringordersofmagnitudefasterthanBLAST.Bioinformatics26:2460–2461.pmid:20709691 ViewArticle PubMed/NCBI GoogleScholar 34. CaporasoJG,KuczynskiJ,StombaughJ,BittingerK,BushmanFD,etal.(2010)QIIMEallowsanalysisofhigh-throughputcommunitysequencingdata.NatureMethods7:335–336.pmid:20383131 ViewArticle PubMed/NCBI GoogleScholar 35. MahéF,RognesT,QuinceC,deVargasC,DunthornM(2014)Swarm:robustandfastclusteringmethodforamplicon-basedstudies.PeerJPrePrints:e386v1.36. MercierC,BoyerF,BoninA,CoissacE(2013)SUMATRAandSUMACLUST:fastandexactcomparisonandclusteringofsequences.Available:http://metabarcoding.org/sumatra.37. LanzénA,JørgensenSL,HusonDH,GorferM,GrindhaugSH,etal.(2012)CREST–ClassificationResourcesforEnvironmentalSequenceTags.PLoSONE7:e49334.pmid:23145153 ViewArticle PubMed/NCBI GoogleScholar 38. QuastC,PruesseE,YilmazP,GerkenJ,SchweerT,etal.(2013)TheSILVAribosomalRNAgenedatabaseproject:improveddataprocessingandweb-basedtools.NucleicAcidsResearch41:D590–6.pmid:23193283 ViewArticle PubMed/NCBI GoogleScholar 39. WangQ,GarrityGM,TiedjeJM,ColeJR(2007)NaiveBayesianClassifierforRapidAssignmentofrRNASequencesintotheNewBacterialTaxonomy.AppliedandEnvironmentalMicrobiology73:5261–5267.pmid:17586664 ViewArticle PubMed/NCBI GoogleScholar 40. OksanenJ,BlanchetFG,KindtR,LegendreP,MinchinPR,etal.(2013)vegan:CommunityEcologyPackage.Available:http://CRAN.R-project.org/package=vegan.Rpackageversion2.0–7.41. LozuponeC,KnightR(2005)UniFrac:anewphylogeneticmethodforcomparingmicrobialcommunities.AppliedandEnvironmentalMicrobiology71:8228–8235.pmid:16332807 ViewArticle PubMed/NCBI GoogleScholar 42. SchlossPD,WestcottSL,RyabinT,HallJR,HartmannM,etal.(2009)Introducingmothur:open-source,platform-independent,community-supportedsoftwarefordescribingandcomparingmicrobialcommunities.AppliedandEnvironmentalMicrobiology75:7537–7541.pmid:19801464 ViewArticle PubMed/NCBI GoogleScholar 43. PriceMN,DehalPS,ArkinAP(2010)FastTree2–approximatelymaximum-likelihoodtreesforlargealignments.PLoSONE5:e9490.pmid:20224823 ViewArticle PubMed/NCBI GoogleScholar 44. KnightR,MaxwellP,BirminghamA,CarnesJ,CaporasoJG,etal.(2007)PyCogent:atoolkitformakingsensefromsequence.GenomeBiology8:R171.pmid:17708774 ViewArticle PubMed/NCBI GoogleScholar 45. PeuraS,EilerA,BertilssonS,NykänenH,TiirolaM,etal.(2012)DistinctanddiverseanaerobicbacterialcommunitiesinboreallakesdominatedbycandidatedivisionOD1.TheISMEJournal:1–13.46. AnderssonAF,RiemannL,BertilssonS(2009)PyrosequencingrevealscontrastingseasonaldynamicsoftaxawithinBalticSeabacterioplanktoncommunities.TheISMEJournal4:171–181.pmid:19829318 ViewArticle PubMed/NCBI GoogleScholar 47. HugerthLW,MullerEEL,HuYOO,LebrunLAM,RoumeH,etal.(2014)SystematicDesignof18SrRNAGenePrimersforDeterminingEukaryoticDiversityinMicrobialConsortia.PLoSONE9:e95567.pmid:24755918 ViewArticle PubMed/NCBI GoogleScholar 48. ZhouJ,KangS,SchadtCW,GartenCT(2008)Spatialscalingoffunctionalgenediversityacrossvariousmicrobialtaxa.PNAS105:7768–7773.pmid:18509054 ViewArticle PubMed/NCBI GoogleScholar DownloadPDF Citation XML Print Printarticle Reprints Share Reddit Facebook LinkedIn Mendeley Twitter Email Advertisement SubjectAreas? FormoreinformationaboutPLOSSubjectAreas,click here. Wewantyourfeedback.DotheseSubjectAreasmakesenseforthisarticle?ClickthetargetnexttotheincorrectSubjectAreaandletusknow.Thanksforyourhelp! Polymerasechainreaction IstheSubjectArea"Polymerasechainreaction"applicabletothisarticle? Yes No Thanksforyourfeedback. Sediment IstheSubjectArea"Sediment"applicabletothisarticle? Yes No Thanksforyourfeedback. RibosomalRNA IstheSubjectArea"RibosomalRNA"applicabletothisarticle? Yes No Thanksforyourfeedback. Genesequencing IstheSubjectArea"Genesequencing"applicabletothisarticle? Yes No Thanksforyourfeedback. Computersoftware IstheSubjectArea"Computersoftware"applicabletothisarticle? Yes No Thanksforyourfeedback. Genepool IstheSubjectArea"Genepool"applicabletothisarticle? Yes No Thanksforyourfeedback. Taxonomy IstheSubjectArea"Taxonomy"applicabletothisarticle? Yes No Thanksforyourfeedback. Bioinformatics IstheSubjectArea"Bioinformatics"applicabletothisarticle? Yes No Thanksforyourfeedback.
延伸文章資訊
- 1Evaluation of general 16S ribosomal RNA gene PCR primers ...
Primer pairs were: (i): S-D-Bact-0341-b-S-17, 5′-CCTACGGGNGGCWGCAG-3′ (32), and S-D-Bact-0785-a-A...
- 2Questions about V3–V4 primers for 16S rRNA amplicon ...
That paper reports primers called Bakt_341F (CCTACGGGNGGCWGCAG) and Bakt_805R (GACTACHVGGGTATCTAA...
- 3Does anyone have a suggestion to locate the best, most ...
Here are a few suggestions from my side. I could recommend the following primer pair for 16S ampl...
- 4Frontiers | Comparison of Two 16S rRNA Primers (V3–V4 and ...
The V3–V4 region was amplified using the S-D-Bact-0341-b-S-17 (5′-CCTACGGGNGGCWGCAG-3′) and the S...
- 516S Metagenomic Sequencing Library Preparation