The effect of 16S rRNA region choice on bacterial community ...

文章推薦指數: 80 %
投票人數:10人

In this work, we compare the resolution of V2-V3 and V3-V4 16S rRNA regions for the purposes of estimating microbial community diversity ... Skiptomaincontent Thankyouforvisitingnature.com.YouareusingabrowserversionwithlimitedsupportforCSS.Toobtain thebestexperience,werecommendyouuseamoreuptodatebrowser(orturnoffcompatibilitymodein InternetExplorer).Inthemeantime,toensurecontinuedsupport,wearedisplayingthesitewithoutstyles andJavaScript. Advertisement nature scientificdata datadescriptors article Theeffectof16SrRNAregionchoiceonbacterialcommunitymetabarcodingresults DownloadPDF DownloadPDF Subjects MetagenomicsWatermicrobiology AnAuthorCorrectiontothisarticlewaspublishedon17March2022 Thisarticlehasbeenupdated AbstractInthiswork,wecomparetheresolutionofV2-V3andV3-V416SrRNAregionsforthepurposesofestimatingmicrobialcommunitydiversityusingpaired-endIlluminaMiSeqreads,andshowthatthefragment,includingV2andV3regions,hashigherresolutionforlower-ranktaxa(generaandspecies).Itallowsforamoreprecisedistance-basedclusteringofreadsintospecies-levelOTUs.Statisticallyconvergentestimatesofthediversityofmajorspecies(definedasthosethattogetherarecoveredby95%ofreads)canbeachievedatthesamplesizesof10000to15000reads.TherelativeerroroftheShannonindexestimateforthisconditionislowerthan4%. DesignType(s) sequenceanalysisobjective•biodiversityassessmentobjective MeasurementType(s) rRNA_16S TechnologyType(s) DNAsequencingassay FactorType(s) aquaticnaturalenvironment•sequence_variant SampleCharacteristic(s) soilmetagenome•freshwatermetagenome•LakeBaikal•lakesediment•oilseep•mudvolcano Machine-accessiblemetadatafiledescribingthereporteddata(ISA-Tabformat) Background&SummaryModernmicrobiomestudiesoftenrelyontheanalysisof16SribosomalRNAsequencesforthetaxonomicidentificationofbacterialandarchaealstrains1.Thiskindofanalysishasbecomeadefactostandardforprokaryotictaxonomy.The16SrRNAgeneisapproximately1600basepairslongandincludesninehypervariableregionsofvaryingconservation(V1-V9)1–3.Moreconservativeregionsareusefulfordeterminingthehigher-rankingtaxa,whereasmorequicklyevolvingonescanhelpidentifygenusorspecies.Metabarcodingusing16SrRNAmarkeriswidespreadinthestudiesofvariousmicrobialcommunities4–6.Theintroductionofthenext-generationsequencingtechniques7–9hasledtonovelapplicationsofmetabarcodingmethods.Inparticular,increasedreadcountshaveallowedforquantitativeestimatesofthemicrobialcommunitycomposition.AnotheradvantageofNGS-basedmetabarcodingisthatquantitativeanalysishasbecomeavailableforcommunitiesofunculturedmicrobes.Yet,usingNGStechnologieshasitslimitations,causedchieflybyshorterreadlength.Itsmostimportantimpactisthedecreasedprecisionofspeciesidentification.Earliermetabarcodingworkswereperformedusing454LifeSciencessequencer10whichproducesreadsupto800 bplong,butthisplatformwasdiscontinuedbyRochein2015.MostcurrentworkisbasedontheIlluminaplatform11,whichproducessingle-endreadsonlyupto350 bpandpaired-endreadsupto2 × 300–350 bp.AsNGSreadsareaboutoneandahalftimesshorterthanSangeronesatbest,theyrequireamuchmorerigorouschoiceofthe16SrRNAregiontopreciselyandcomprehensivelydescribethediversityofabacterialcommunity.Clusteringofashortconservativeregionhasinsufficientresolutiontodetectthefinedifferencesbetweenstrainsthatoccupyslightlydifferentniches.Inotherwords,separatingthesampleintotoomanyverysmallOTUsdoesnotdecreaseanalysisquality,butseparatingitintoOTUsthataretoocrudedoespreciselythatbylumpingseveralspecieswithpotentiallydifferentecologyintoasingleOTU.Therefore,usingamorevariableregioncandetectfinertaxonomicdifferencesbetweencommunities,whichinturncanbeusedtodescribefinerdifferencesintheirfunctioning.ItshouldbealsobementionedthatincreasingreadlengthforIlluminaand454LifeSciencessequencersleadstodecreasingthequality,i.e.,accuracy,ofreadsequences.Basedonallthesefactors,researchersmustdesigntheirtaxonomicexperimentsusingcorrectregionandsequencingcoveragesothatitcouldaproducestatisticallysounddescriptionofboththebacterialspeciespresentinthesampleandtheirrelativenumbers.AseriesofexperimentalworksonartificialmicrobialcommunitiesusingNGSmetabarcodingmethods12–14hasshownthatachoiceof16SrRNAregioncansignificantlyaffecttheestimatesoftaxonomicdiversity.Inparticular,usingdifferentregionsleadstoestimatedproportionsoftaxadifferentfromeachotherandfromtheknowntruecomposition.Oneofthepossibleapproachestosolvingthisproblemistousetheexperienceofstudyingthesamecommunitiesusingdifferent16SrRNAregions.ThemicrobiomeoflakeBaikaliswell-studiedbymicrobiologists.Specifically,metabarcodingworkshavebeenperformedforthecommunitiesofthewatercolumn15–17,bottomsediments18–22,andcommunitiesassociatedwithcertainbaikalianorganisms23,24.Theseworkshaveshownthatthebacterialbiodiversityinthelakeisextremelyhigh.HighbiodiversityinLakeBaikalcanbeexplainedbythepresenceofmultipledistinctnichesvaryingintermsofenvironmentalconditions.OurmainaimwasacomparativeanalysisofthebacterialcommunitiesfromdistinctecotopesinLakeBaikalasrevealedbymetabarcodinganalysisusingV2-V3andV3-V4fragmentsofthe16SrRNAgene.ThetotalDNAwasextractedfrombiologicalsamples,amplifiedindependentlyusingprimerpairsdesignedforV2-V3andV3-V4fragments,andsequenced,afterwhichcommunitycompositionandbacterialdiversitywereestimatedusingbioinformaticsmethods.Weassumedthattheregionthatproducesthehighestdiversitywillbemostusefulforfurthermetabarcodingstudies.AnalysiswasperformedonthecommunitiesinhabitingcontrastingbiotopesinLakeBaikal.Thebottomsedimentisanorganicallyrichsubstrate,whereasthewatercolumnisrelativelypoorinthisregard.MethodsSamplingInthiswork,westudiedthebacterialcommunitiesofLakeBaikal’swatercolumnandbottomsediment.WatercolumnsamplesweretakeninJuly2013near«GorevoyUtes»underwateroilseep(centralbasin,53.3045170°N108.3919330°E,depth855 m)–zoneI(R-6)andnearthe«Bolshoy»mudvolcano(southernbasin,51.877900°N105.550517°E,depth1370 m)–zoneII(R-9).ThesamplesofthebottomsedimentsassociatedwithsurfacemethanehydratesedimentsweretakeninJuly2015atthe«Akademicheskiykhrebet»station(centralbasin,53.399782°N107.891370°E,depth536 m).Allthesamplesweretakenfromaboardthe«AkademikVereschagin»researchvesselusingSBE32CarouselWaterSamplerbathometersystem(USA)andgravitycorer.FulldescriptionandsampleidentifiersarefoundinTable1.Table1Informationontheanalyzedsamples.FullsizetableDNAextractionEightsamplesfromzoneI(takenatdepthsof0,50,100,200,300,500,700,and855 m)andfivesamplesfromzomeII(takenatdepthsof0,50,100,700,and1370 m)wereusedforDNAextraction.Fivelitersofwaterfromeachsamplewerefilteredusingnitrocellulosefilters(25 mmdiameter,0.2micronpores,«Millipore»,Germany)usingasqueezepump.FilterswereplacedinTEbuffer(10 mMTris-HCl,pH7.4;1 mMEDTA,pH8.0),frozenat−20°Candtransportedtothelaboratory.Sedimentsample(100 g)wasasepticallytakenaboardthevesselfromthe150–185 cmcorelayer,homogenized,frozeninliquidnitrogenat−196 °Candtransportedtothelaboratory.DNAextractionfromallthesampleswasperformedbylysozymetreatment.SedimentsampleswerehomogenizedinanagatemortarwithSiCpriorlysis.ForDNAextraction,phenol-chlorophormtechnique25wasusedwithseveralmodifications26.FourindependentDNAextractionswerecarriedoutforeachsample.Inaddition,anegativecontrol(DNAextractionwithsterileTE-buffer)wasperformedforeachindependentDNAextractionstoensurethatnocontaminationwithexogenousamplifiableDNAoccurredduringthedifferentstagesofsampletreatment.ConcentrationandqualityofextractedDNAweremeasuredwithaspectrophotometerSmartSpecPlus(BioRad,USA).DNAwasstoredat−70 °Cuntilfurtheranalysis.AmplificationandsequencingB_V23andPro_V34bacterial16rRNAgenefragmentswereamplifiedusinguniversalprimers(Table2).PhusionHotStartIIHigh-FidelityDNApolymerase(ThermoScientific#F-549S)withHigh-FidelityBufferwasusedfortheamplification.Afteroptimizingthepolymerasechainreaction(PCR)conditions(thermalprofileandMg2+concentration),therequiredminimumofPCRcycleswasadjustedforeveryDNAsample,therebypreventingtheplateaueffectinconcentrationsoftheproducts.Forthispurpose,concentrationofthePCRproductswascontrolledviacapillaryelectrophoresisonaShimadzuMulti-NAinstrument(DNA-12000reagentkit).LibrariesforIlluminaMiSeqanalysiswerepreparedwithNEBNextUltraIIDNALibraryPrepKit(NewEnglandBiolabs).ThelibrarieswereanalysedusingtheIlluminaMiSeqStandardKitv.3(Illumina)attheGenomicsCoreFacility,InstituteofChemicalBiologyandFundamentalMedicine,SiberianBranchoftheRussianAcademyofSciences(Novosibirsk).Alldata16SrRNAfragmentsweredepositedintheNCBISequenceReadArchive,bottomsediment(DataCitation1)andwater(DataCitation2).Table2Lociselectedfortheanalysisandstructureofoligonucleotideprimersfortheiramplification.FullsizetableReadanalysisReadanalysiswasconductedinMothurv.1.34.4softwareaccordingtoMiSeqSOPrecommendations27.R1andR2sequencescorrespondingtoribosomalRNAampliconsweremergedintocontigswiththemothurmerge.contigscommand,andresultingfragmentswerefilteredbyqualityinnon-overlappingregionsashavingnomorethanfivesiteswithaPhred-value <= 15.ScriptsusedtofilterMiSeqdatainaqualitymannerareavailableat:https://github.com/yuragal/mothur-scripts.Filteredsequenceswerealigned,clustered,andidentifiedtaxonomicallyusingtheSILVA123databases(http://arb-silva.de).ReadswereclusteredintoOTUatgeneticdistancesof0.03,whichisatypicalinterspeciesdistancewithinagenus.Theinformationonthespecies-rankOTUabundance(numberofreadsperOTU)wascollatedinasingletable.StatisticcomparisonsofspeciesdiversityStatisticalconvergenceofspeciesdiversityestimateswasmeasuredusingthebootstrapindex28whichshowsthepotentialamountandproportionofundetectedspeciesinthecommunity(underestimatedα-diversity).Species-rankOTUabundancewasusedtocalculateShannon29andSimpsonindices29ofcommunitybiodiversity.Theirconfidenceintervalsandrelativeerrorswerecalculatedusingthebootstrapalgorithmproposedin30.Correlationsbetweensamplesizes,valuesofindices,andrelativeerrorsweremeasuredbynon-parametricSpearmancoefficients31andcorrelationsignificancewastestedusingSpearmanstatistics.Potentialnumberofspeciesincommunities(hiddenα-diversity-hiddenspeciesrichness)wasevaluatedusingChao132andACE33indices.Standarderrorsfortheseindicesweredeterminedbymethods32,33.ThesignificanceofdifferencesbetweenaveragevalueofShannon,Simpson,Chao1andACEindicesidentifiedbyV2-V3andV3-V4fragmentswasestimatedusingpairedmodificationoftheWilkinson-Mann-Whitneynonparametriccriterion34.Qualitativeandquantitativecomparisonofthecommunitycompositionatdifferentlevelsoftaxonomicorganization(phylum,class,order,family)werecarriedoutusingNonmetricMultidimensionalScaling-NMDS35withBray-Curtisdistancemetric36,Gowerdistancemetric37andJaccarddistancemetric36.Beforeanalysis,alldatawerenormalizedbytheaveragenumberofreadspersample.Foranalysis,V2-V3andV3-V4OTUswerecombinedintoonedataset(taxonomyassociation).ThedegreeofdifferencesinthetaxonomiccompositionofcommunitiesbythefactorV2-V3andV3-V4fragmentsforBray-Curtis,GowerandJaccarddistancemetricwasestimatedusingtheR2-squaredcovariationcoefficient(R2=1-ss_w/ss_t,wheress_wandss_tarewithin-groupandtotalsumsofsquares).R2reliabilitywasestimatedusingpermutationtest(1000permutation)38.Estimatesoftaxonrepresentationinsamples(numberofreadspertaxon)werevisualizedintheformofheatmaps,wheretherowsandcolumnswereclusteredusingthe«average»methodbasedonthedistancematrixcalculatedwithBray-Curtis,GowerorJaccarddistancemetrics.StatisticcomparisonsofthegeneticdiversityestimatesTheproportionofmismatchednucleotidesbetweensequences(p-distance)wasusedasageneticdistancemetric.ForeachOTU,thesequencesthathaveaminimalsumofdistancestootherswithinthesameOTUwereselectedasrepresentative.Thepairwisep-distancedistributionsandnucleotidediversity(averagep-distance)estimates39werebuiltforV2-V3andV3-V4representativesequencesamples.Thesignificanceofdifferencesbetweenaveragep-distanceswasestimatedusingoftheWilkinson-Mann-Whitneynonparametriccriterion34.Theclosestfull-length16SrRNAsequencesforeachrepresentativesequencefrombothregionswerefoundintheSILVA123databaseusingmothursoftware.Inthisway,wehavecreatedtwosamplesoffull-lengthrRNAsdescribingOTUsdetectedintheanalysisofV2-V3andV3-V4fragments.Thesetwosampleswerepooledintoasingledatasetforwhichwethenbuiltthep-distancematrixandperformedNMDS35todetectthesimilaritybetweenOTUandspeciessetsdetectedintheindependentanalysesofV2-V3andV3-V4fragments.Statisticalanalyseswereperformedusingthe«ape»40,«pegas»41,«gplots»and«vegan»42Rpackages.CodeavailabilityScriptsusedtofilterMiSeqdatainaqualitymannerareavailableat:https://github.com/yuragal/mothur-scripts.ScriptsforRprogramminglanguageusedforstatisticcomparisonsofspeciesdiversityandstatisticcomparisonsofthegeneticdiversityareavailableat:https://github.com/barnsys/16S_rRNA_bacterial_communities_analysis.Thesescriptswerecreatedonthebasisoftutorialstotheveganpackage43,44.DataRecordsAlldataonV2-V3andV3-V4fragment16SrRNAweredepositedintheNCBISequenceReadArchive,bottomsediment(DataCitation1)andwater(DataCitation2).TechnicalValidationThefiltereddatasetfortheV2-V316SrRNAfragmentincluded118232readswiththeaveragelengthof205basepairs.Itwasclusteredinto2716species-levelOTUs,1070ofwhichincludedmorethanoneread.SingletonOTUsincluded1.4%ofthereads.TheV3-V4datasetincluded22191readswithanaveragelengthof443basepairsclusteredinto1615species-levelOTUs,509ofthemincludingmorethanoneread.SingletonOTUsincluded5.2%ofthereads.Thesenumberssuggestanacceptablequalityofsequencingandinitialdatafilteringstages.SingletonOTUswereexcludedfromfurtheranalyses.Foranalyzingtheconvergenceofα-diversityestimates,theOTUsweresortedfrommosttoleastabundant(i.e.,intheorderofdecreasingreadcounts).Analysisoftheα-diversityconvergenceusingthebootstrapindexhasshownthattheproportionoftheunderestimatedOTUs(thatis,thosethatcouldbepresentinthebiologicalsample,butwentundetectedbecauseofincompletesequencing)amongtop-rankingOTUs,including95%ofreads,doesnotexceed16%,whichisunderthe20%levelconsideredacceptableforbiologicalstudies.FurtheranalyseswereperformedonthispooloftopOTUscovering95%ofreads.Itincluded251OTUsfortheV2-V316SrRNAregionand171OTUsfortheV3-V4region.ThevaluesforShannonbiodiversityindicesinallsampleswithbothregionsrangedfrom1.5to4(Fig.1).Ineightoutof14samples,theseindiceswerehigherwhenestimatedusingtheV2-V3fragment,whileinsixoutof14samples,theV3-V4fragmentproducedhighervalues.Shannonindicesaveraged2.79fortheV2-V3regionand2.72fortheV3-V4region.TestingwiththeWilkinson-Mann-Whitneyindexshowed(Table3)thattheaveragesoftheShannonindicesdidnotdiffersignificantlybetweentheregions.Thus,metabarcodingwitheitherV2-V3orV3-V416SrRNAfragmentsyieldsimilarspeciesdiversityestimates.Figure1:ComparativeanalysisofindicesShannonandSimpson.(a)DependenceoftheShannonindexfromthesamplesizeforV2-V3andV3-V416SrRNAfragments.(b)DependenceoftheShannonrelativeerrorfromthesamplesizeforV2-V3andV3-V416SrRNAfragments.(c)Theper-sampleShannonindices,pointbarsindicate95%CI,redline–averagevalueforV2-V3fragments,blueline–averagevalueforV3-V4fragments.(d)DependenceoftheSimpsonindexfromthesamplesizeforV2-V3andV3-V416SrRNAfragments.(e)DependenceoftheSimpsonrelativeerrorfromthesamplesizeforV2-V3andV3-V416SrRNAfragments.(f)Theper-sampleSimpsonindices,pointbarsindicate95%CI,redline–averagevalueforV2-V3fragments,blueline–averagevalueforV3-V4fragments.V2-V3fragments16SrRNA–redpointer,V3-V4fragments–bluepointer.TheencodingofthesamplesisindicatedinTable1.FullsizeimageTable3Comparisonofcommunitydiversityindices.FullsizetableComparisonofShannonindicesfromdifferentsamples(Fig.1)demonstratedthatthebacterialdiversityinwatercolumnsvarywithdepth.Atsomedepths,thevalueswereclosetotheminimum(1.5),whereasatothers,bacterialcommunitieswerehighlydiverse(4.0).Despitethevarietyinprimarycarbonsources,Shannonindicesinbottomsedimentcommunitieswereclosetotheaverageat3.24and2.7fortheV2-V3andV3-V4regions,respectively.Correlationanalysis(r = 0.098,P-value = 0.61 > 0.05)showsthatthereisnocorrelationbetweenShannonindicesandreadcounts.Itmeansthatreadcountsaresufficientforcharacterizingcommunitydiversityinthestudiedsamplesinupto95%ofthemajorOTUs.Werethecoverageofthecommunityinsufficient,Shannonindiceswouldincreasewiththereadcountbasedonsamplingmoreandrarerspecies,andthustherewouldbeasignificantpositive(r > 0)correlationbetweenthesetwovalues.RelativeerroroftheShannonindexvalues,estimatedfor95%confidenceintervals,didnotexceed12%,whichis,again,undertheacceptablelevelof20%(Fig.1).CorrelationbetweentheShannonindexrelativeerrorandreadcountissignificantandnegative(r = −0.697,P-value = 0.00 <0.05).ThismeansthatincreasingreadcountsleadstoincreasingprecisionoftheShannonindexestimate.Usingsamplesofatleast5000reads,itdidnotexceed8%,whileincreasingthesamplesizeto10000readsfurtherdecreasedittolessthan4%,whichisapositiveresultforcomplexnaturalsamples.Itisimportanttonotethattheconvergentestimatesofspeciesdiversityinspecies-richcommunitiesusuallyrequirehigherreadcounts.Inourwork,anumberofthesampleswithShannonindexvaluesofmorethan3.5(whichispracticallyashighaspossibleforabiologicalsample)hadrelativeerrorsofroughly3%(Fig.1).Thesecommunitieswerecharacterizedusingdatasetsof10000–15000reads.Thus,wecanconcludethatthesereadcountsaresufficientforthemetabarcodinganalysisofthe95%topOTUsofabacterialcommunity,andhighersamplesizesareunnecessary.Inallsamples,theSimpsonbiodiversityindicesrangedfrom0.42to0.94forbothV2-V3andV3-V4regions(Fig.1).Foreightoutof14samples,theseindiceswerehigherforV3-V4fragment,whileinsixoutof14samples,theV2-V3regionproducedhighervalues(meanvalue0.79fortheV2-V3regionand0.82fortheV3-V4region).ShannonbiodiversityindicesrevealadifferentpatterncomparedtoSimpsonindex.HoweverthetestingwiththepairedWilkinson-Mann-Whitneytestdemonstrates(Table3)thattheaveragesoftheSimpsonindicesdidnotdiffersignificantlyfromShannonones.Thus,metabarcodingwitheitherV2-V3orV3-V416SrRNAfragmentsyieldsroughlythesamespeciesdiversityestimatesbothwithSimpsonandShannonindices.ThedependenceoftheSimpsonindexvaluesonthesamplesissimilartothepatternsobservedfortheShannonindex(Fig.1).Correlationanalysis(r = 0.015,p = 0.93 > 0.05)showsnocorrelationbetweenSimpsonindicesandreadcounts.ThesameisobservedfortheShannonindicesandthisisexplainedbythesimilarreasons.RelativeerroroftheSimpsonindexvalues,estimatedfor95%confidenceintervalsdidnotexceed11%(Fig.1).CorrelationbetweentheSimpsonindexrelativeerrorandreadcountissignificantandnegative(r = −0.59,p ~ 105 



請為這篇文章評分?