Sensitivity and correlation of hypervariable regions in 16s rrna genes in phylogenetic analysis

16S Metagenomics

V4 region

The effect of 16S rRNA region choice on bacterial community ...

2024-11-06

文章推薦指數： 80 %

投票人數：10人

In this work, we compare the resolution of V2-V3 and V3-V4 16S rRNA regions for the purposes of estimating microbial community diversity ... Skiptomaincontent Thankyouforvisitingnature.com.YouareusingabrowserversionwithlimitedsupportforCSS.Toobtain thebestexperience,werecommendyouuseamoreuptodatebrowser(orturnoffcompatibilitymodein InternetExplorer).Inthemeantime,toensurecontinuedsupport,wearedisplayingthesitewithoutstyles andJavaScript. Advertisement nature scientificdata datadescriptors article Theeffectof16SrRNAregionchoiceonbacterialcommunitymetabarcodingresults DownloadPDF DownloadPDF Subjects MetagenomicsWatermicrobiology AnAuthorCorrectiontothisarticlewaspublishedon17March2022 Thisarticlehasbeenupdated AbstractInthiswork,wecomparetheresolutionofV2-V3andV3-V416SrRNAregionsforthepurposesofestimatingmicrobialcommunitydiversityusingpaired-endIlluminaMiSeqreads,andshowthatthefragment,includingV2andV3regions,hashigherresolutionforlower-ranktaxa(generaandspecies).Itallowsforamoreprecisedistance-basedclusteringofreadsintospecies-levelOTUs.Statisticallyconvergentestimatesofthediversityofmajorspecies(definedasthosethattogetherarecoveredby95%ofreads)canbeachievedatthesamplesizesof10000to15000reads.TherelativeerroroftheShannonindexestimateforthisconditionislowerthan4%. DesignType(s) sequenceanalysisobjective•biodiversityassessmentobjective MeasurementType(s) rRNA_16S TechnologyType(s) DNAsequencingassay FactorType(s) aquaticnaturalenvironment•sequence_variant SampleCharacteristic(s) soilmetagenome•freshwatermetagenome•LakeBaikal•lakesediment•oilseep•mudvolcano Machine-accessiblemetadatafiledescribingthereporteddata(ISA-Tabformat) Background&SummaryModernmicrobiomestudiesoftenrelyontheanalysisof16SribosomalRNAsequencesforthetaxonomicidentificationofbacterialandarchaealstrains1.Thiskindofanalysishasbecomeadefactostandardforprokaryotictaxonomy.The16SrRNAgeneisapproximately1600basepairslongandincludesninehypervariableregionsofvaryingconservation(V1-V9)1–3.Moreconservativeregionsareusefulfordeterminingthehigher-rankingtaxa,whereasmorequicklyevolvingonescanhelpidentifygenusorspecies.Metabarcodingusing16SrRNAmarkeriswidespreadinthestudiesofvariousmicrobialcommunities4–6.Theintroductionofthenext-generationsequencingtechniques7–9hasledtonovelapplicationsofmetabarcodingmethods.Inparticular,increasedreadcountshaveallowedforquantitativeestimatesofthemicrobialcommunitycomposition.AnotheradvantageofNGS-basedmetabarcodingisthatquantitativeanalysishasbecomeavailableforcommunitiesofunculturedmicrobes.Yet,usingNGStechnologieshasitslimitations,causedchieflybyshorterreadlength.Itsmostimportantimpactisthedecreasedprecisionofspeciesidentification.Earliermetabarcodingworkswereperformedusing454LifeSciencessequencer10whichproducesreadsupto800 bplong,butthisplatformwasdiscontinuedbyRochein2015.MostcurrentworkisbasedontheIlluminaplatform11,whichproducessingle-endreadsonlyupto350 bpandpaired-endreadsupto2 × 300–350 bp.AsNGSreadsareaboutoneandahalftimesshorterthanSangeronesatbest,theyrequireamuchmorerigorouschoiceofthe16SrRNAregiontopreciselyandcomprehensivelydescribethediversityofabacterialcommunity.Clusteringofashortconservativeregionhasinsufficientresolutiontodetectthefinedifferencesbetweenstrainsthatoccupyslightlydifferentniches.Inotherwords,separatingthesampleintotoomanyverysmallOTUsdoesnotdecreaseanalysisquality,butseparatingitintoOTUsthataretoocrudedoespreciselythatbylumpingseveralspecieswithpotentiallydifferentecologyintoasingleOTU.Therefore,usingamorevariableregioncandetectfinertaxonomicdifferencesbetweencommunities,whichinturncanbeusedtodescribefinerdifferencesintheirfunctioning.ItshouldbealsobementionedthatincreasingreadlengthforIlluminaand454LifeSciencessequencersleadstodecreasingthequality,i.e.,accuracy,ofreadsequences.Basedonallthesefactors,researchersmustdesigntheirtaxonomicexperimentsusingcorrectregionandsequencingcoveragesothatitcouldaproducestatisticallysounddescriptionofboththebacterialspeciespresentinthesampleandtheirrelativenumbers.AseriesofexperimentalworksonartificialmicrobialcommunitiesusingNGSmetabarcodingmethods12–14hasshownthatachoiceof16SrRNAregioncansignificantlyaffecttheestimatesoftaxonomicdiversity.Inparticular,usingdifferentregionsleadstoestimatedproportionsoftaxadifferentfromeachotherandfromtheknowntruecomposition.Oneofthepossibleapproachestosolvingthisproblemistousetheexperienceofstudyingthesamecommunitiesusingdifferent16SrRNAregions.ThemicrobiomeoflakeBaikaliswell-studiedbymicrobiologists.Specifically,metabarcodingworkshavebeenperformedforthecommunitiesofthewatercolumn15–17,bottomsediments18–22,andcommunitiesassociatedwithcertainbaikalianorganisms23,24.Theseworkshaveshownthatthebacterialbiodiversityinthelakeisextremelyhigh.HighbiodiversityinLakeBaikalcanbeexplainedbythepresenceofmultipledistinctnichesvaryingintermsofenvironmentalconditions.OurmainaimwasacomparativeanalysisofthebacterialcommunitiesfromdistinctecotopesinLakeBaikalasrevealedbymetabarcodinganalysisusingV2-V3andV3-V4fragmentsofthe16SrRNAgene.ThetotalDNAwasextractedfrombiologicalsamples,amplifiedindependentlyusingprimerpairsdesignedforV2-V3andV3-V4fragments,andsequenced,afterwhichcommunitycompositionandbacterialdiversitywereestimatedusingbioinformaticsmethods.Weassumedthattheregionthatproducesthehighestdiversitywillbemostusefulforfurthermetabarcodingstudies.AnalysiswasperformedonthecommunitiesinhabitingcontrastingbiotopesinLakeBaikal.Thebottomsedimentisanorganicallyrichsubstrate,whereasthewatercolumnisrelativelypoorinthisregard.MethodsSamplingInthiswork,westudiedthebacterialcommunitiesofLakeBaikal’swatercolumnandbottomsediment.WatercolumnsamplesweretakeninJuly2013near«GorevoyUtes»underwateroilseep(centralbasin,53.3045170°N108.3919330°E,depth855 m)–zoneI(R-6)andnearthe«Bolshoy»mudvolcano(southernbasin,51.877900°N105.550517°E,depth1370 m)–zoneII(R-9).ThesamplesofthebottomsedimentsassociatedwithsurfacemethanehydratesedimentsweretakeninJuly2015atthe«Akademicheskiykhrebet»station(centralbasin,53.399782°N107.891370°E,depth536 m).Allthesamplesweretakenfromaboardthe«AkademikVereschagin»researchvesselusingSBE32CarouselWaterSamplerbathometersystem(USA)andgravitycorer.FulldescriptionandsampleidentifiersarefoundinTable1.Table1Informationontheanalyzedsamples.FullsizetableDNAextractionEightsamplesfromzoneI(takenatdepthsof0,50,100,200,300,500,700,and855 m)andfivesamplesfromzomeII(takenatdepthsof0,50,100,700,and1370 m)wereusedforDNAextraction.Fivelitersofwaterfromeachsamplewerefilteredusingnitrocellulosefilters(25 mmdiameter,0.2micronpores,«Millipore»,Germany)usingasqueezepump.FilterswereplacedinTEbuffer(10 mMTris-HCl,pH7.4;1 mMEDTA,pH8.0),frozenat−20°Candtransportedtothelaboratory.Sedimentsample(100 g)wasasepticallytakenaboardthevesselfromthe150–185 cmcorelayer,homogenized,frozeninliquidnitrogenat−196 °Candtransportedtothelaboratory.DNAextractionfromallthesampleswasperformedbylysozymetreatment.SedimentsampleswerehomogenizedinanagatemortarwithSiCpriorlysis.ForDNAextraction,phenol-chlorophormtechnique25wasusedwithseveralmodifications26.FourindependentDNAextractionswerecarriedoutforeachsample.Inaddition,anegativecontrol(DNAextractionwithsterileTE-buffer)wasperformedforeachindependentDNAextractionstoensurethatnocontaminationwithexogenousamplifiableDNAoccurredduringthedifferentstagesofsampletreatment.ConcentrationandqualityofextractedDNAweremeasuredwithaspectrophotometerSmartSpecPlus(BioRad,USA).DNAwasstoredat−70 °Cuntilfurtheranalysis.AmplificationandsequencingB_V23andPro_V34bacterial16rRNAgenefragmentswereamplifiedusinguniversalprimers(Table2).PhusionHotStartIIHigh-FidelityDNApolymerase(ThermoScientific#F-549S)withHigh-FidelityBufferwasusedfortheamplification.Afteroptimizingthepolymerasechainreaction(PCR)conditions(thermalprofileandMg2+concentration),therequiredminimumofPCRcycleswasadjustedforeveryDNAsample,therebypreventingtheplateaueffectinconcentrationsoftheproducts.Forthispurpose,concentrationofthePCRproductswascontrolledviacapillaryelectrophoresisonaShimadzuMulti-NAinstrument(DNA-12000reagentkit).LibrariesforIlluminaMiSeqanalysiswerepreparedwithNEBNextUltraIIDNALibraryPrepKit(NewEnglandBiolabs).ThelibrarieswereanalysedusingtheIlluminaMiSeqStandardKitv.3(Illumina)attheGenomicsCoreFacility,InstituteofChemicalBiologyandFundamentalMedicine,SiberianBranchoftheRussianAcademyofSciences(Novosibirsk).Alldata16SrRNAfragmentsweredepositedintheNCBISequenceReadArchive,bottomsediment(DataCitation1)andwater(DataCitation2).Table2Lociselectedfortheanalysisandstructureofoligonucleotideprimersfortheiramplification.FullsizetableReadanalysisReadanalysiswasconductedinMothurv.1.34.4softwareaccordingtoMiSeqSOPrecommendations27.R1andR2sequencescorrespondingtoribosomalRNAampliconsweremergedintocontigswiththemothurmerge.contigscommand,andresultingfragmentswerefilteredbyqualityinnon-overlappingregionsashavingnomorethanfivesiteswithaPhred-value <= 15.ScriptsusedtofilterMiSeqdatainaqualitymannerareavailableat:https://github.com/yuragal/mothur-scripts.Filteredsequenceswerealigned,clustered,andidentifiedtaxonomicallyusingtheSILVA123databases(http://arb-silva.de).ReadswereclusteredintoOTUatgeneticdistancesof0.03,whichisatypicalinterspeciesdistancewithinagenus.Theinformationonthespecies-rankOTUabundance(numberofreadsperOTU)wascollatedinasingletable.StatisticcomparisonsofspeciesdiversityStatisticalconvergenceofspeciesdiversityestimateswasmeasuredusingthebootstrapindex28whichshowsthepotentialamountandproportionofundetectedspeciesinthecommunity(underestimatedα-diversity).Species-rankOTUabundancewasusedtocalculateShannon29andSimpsonindices29ofcommunitybiodiversity.Theirconfidenceintervalsandrelativeerrorswerecalculatedusingthebootstrapalgorithmproposedin30.Correlationsbetweensamplesizes,valuesofindices,andrelativeerrorsweremeasuredbynon-parametricSpearmancoefficients31andcorrelationsignificancewastestedusingSpearmanstatistics.Potentialnumberofspeciesincommunities(hiddenα-diversity-hiddenspeciesrichness)wasevaluatedusingChao132andACE33indices.Standarderrorsfortheseindicesweredeterminedbymethods32,33.ThesignificanceofdifferencesbetweenaveragevalueofShannon,Simpson,Chao1andACEindicesidentifiedbyV2-V3andV3-V4fragmentswasestimatedusingpairedmodificationoftheWilkinson-Mann-Whitneynonparametriccriterion34.Qualitativeandquantitativecomparisonofthecommunitycompositionatdifferentlevelsoftaxonomicorganization(phylum,class,order,family)werecarriedoutusingNonmetricMultidimensionalScaling-NMDS35withBray-Curtisdistancemetric36,Gowerdistancemetric37andJaccarddistancemetric36.Beforeanalysis,alldatawerenormalizedbytheaveragenumberofreadspersample.Foranalysis,V2-V3andV3-V4OTUswerecombinedintoonedataset(taxonomyassociation).ThedegreeofdifferencesinthetaxonomiccompositionofcommunitiesbythefactorV2-V3andV3-V4fragmentsforBray-Curtis,GowerandJaccarddistancemetricwasestimatedusingtheR2-squaredcovariationcoefficient(R2=1-ss_w/ss_t,wheress_wandss_tarewithin-groupandtotalsumsofsquares).R2reliabilitywasestimatedusingpermutationtest(1000permutation)38.Estimatesoftaxonrepresentationinsamples(numberofreadspertaxon)werevisualizedintheformofheatmaps,wheretherowsandcolumnswereclusteredusingthe«average»methodbasedonthedistancematrixcalculatedwithBray-Curtis,GowerorJaccarddistancemetrics.StatisticcomparisonsofthegeneticdiversityestimatesTheproportionofmismatchednucleotidesbetweensequences(p-distance)wasusedasageneticdistancemetric.ForeachOTU,thesequencesthathaveaminimalsumofdistancestootherswithinthesameOTUwereselectedasrepresentative.Thepairwisep-distancedistributionsandnucleotidediversity(averagep-distance)estimates39werebuiltforV2-V3andV3-V4representativesequencesamples.Thesignificanceofdifferencesbetweenaveragep-distanceswasestimatedusingoftheWilkinson-Mann-Whitneynonparametriccriterion34.Theclosestfull-length16SrRNAsequencesforeachrepresentativesequencefrombothregionswerefoundintheSILVA123databaseusingmothursoftware.Inthisway,wehavecreatedtwosamplesoffull-lengthrRNAsdescribingOTUsdetectedintheanalysisofV2-V3andV3-V4fragments.Thesetwosampleswerepooledintoasingledatasetforwhichwethenbuiltthep-distancematrixandperformedNMDS35todetectthesimilaritybetweenOTUandspeciessetsdetectedintheindependentanalysesofV2-V3andV3-V4fragments.Statisticalanalyseswereperformedusingthe«ape»40,«pegas»41,«gplots»and«vegan»42Rpackages.CodeavailabilityScriptsusedtofilterMiSeqdatainaqualitymannerareavailableat:https://github.com/yuragal/mothur-scripts.ScriptsforRprogramminglanguageusedforstatisticcomparisonsofspeciesdiversityandstatisticcomparisonsofthegeneticdiversityareavailableat:https://github.com/barnsys/16S_rRNA_bacterial_communities_analysis.Thesescriptswerecreatedonthebasisoftutorialstotheveganpackage43,44.DataRecordsAlldataonV2-V3andV3-V4fragment16SrRNAweredepositedintheNCBISequenceReadArchive,bottomsediment(DataCitation1)andwater(DataCitation2).TechnicalValidationThefiltereddatasetfortheV2-V316SrRNAfragmentincluded118232readswiththeaveragelengthof205basepairs.Itwasclusteredinto2716species-levelOTUs,1070ofwhichincludedmorethanoneread.SingletonOTUsincluded1.4%ofthereads.TheV3-V4datasetincluded22191readswithanaveragelengthof443basepairsclusteredinto1615species-levelOTUs,509ofthemincludingmorethanoneread.SingletonOTUsincluded5.2%ofthereads.Thesenumberssuggestanacceptablequalityofsequencingandinitialdatafilteringstages.SingletonOTUswereexcludedfromfurtheranalyses.Foranalyzingtheconvergenceofα-diversityestimates,theOTUsweresortedfrommosttoleastabundant(i.e.,intheorderofdecreasingreadcounts).Analysisoftheα-diversityconvergenceusingthebootstrapindexhasshownthattheproportionoftheunderestimatedOTUs(thatis,thosethatcouldbepresentinthebiologicalsample,butwentundetectedbecauseofincompletesequencing)amongtop-rankingOTUs,including95%ofreads,doesnotexceed16%,whichisunderthe20%levelconsideredacceptableforbiologicalstudies.FurtheranalyseswereperformedonthispooloftopOTUscovering95%ofreads.Itincluded251OTUsfortheV2-V316SrRNAregionand171OTUsfortheV3-V4region.ThevaluesforShannonbiodiversityindicesinallsampleswithbothregionsrangedfrom1.5to4(Fig.1).Ineightoutof14samples,theseindiceswerehigherwhenestimatedusingtheV2-V3fragment,whileinsixoutof14samples,theV3-V4fragmentproducedhighervalues.Shannonindicesaveraged2.79fortheV2-V3regionand2.72fortheV3-V4region.TestingwiththeWilkinson-Mann-Whitneyindexshowed(Table3)thattheaveragesoftheShannonindicesdidnotdiffersignificantlybetweentheregions.Thus,metabarcodingwitheitherV2-V3orV3-V416SrRNAfragmentsyieldsimilarspeciesdiversityestimates.Figure1:ComparativeanalysisofindicesShannonandSimpson.(a)DependenceoftheShannonindexfromthesamplesizeforV2-V3andV3-V416SrRNAfragments.(b)DependenceoftheShannonrelativeerrorfromthesamplesizeforV2-V3andV3-V416SrRNAfragments.(c)Theper-sampleShannonindices,pointbarsindicate95%CI,redline–averagevalueforV2-V3fragments,blueline–averagevalueforV3-V4fragments.(d)DependenceoftheSimpsonindexfromthesamplesizeforV2-V3andV3-V416SrRNAfragments.(e)DependenceoftheSimpsonrelativeerrorfromthesamplesizeforV2-V3andV3-V416SrRNAfragments.(f)Theper-sampleSimpsonindices,pointbarsindicate95%CI,redline–averagevalueforV2-V3fragments,blueline–averagevalueforV3-V4fragments.V2-V3fragments16SrRNA–redpointer,V3-V4fragments–bluepointer.TheencodingofthesamplesisindicatedinTable1.FullsizeimageTable3Comparisonofcommunitydiversityindices.FullsizetableComparisonofShannonindicesfromdifferentsamples(Fig.1)demonstratedthatthebacterialdiversityinwatercolumnsvarywithdepth.Atsomedepths,thevalueswereclosetotheminimum(1.5),whereasatothers,bacterialcommunitieswerehighlydiverse(4.0).Despitethevarietyinprimarycarbonsources,Shannonindicesinbottomsedimentcommunitieswereclosetotheaverageat3.24and2.7fortheV2-V3andV3-V4regions,respectively.Correlationanalysis(r = 0.098,P-value = 0.61 > 0.05)showsthatthereisnocorrelationbetweenShannonindicesandreadcounts.Itmeansthatreadcountsaresufficientforcharacterizingcommunitydiversityinthestudiedsamplesinupto95%ofthemajorOTUs.Werethecoverageofthecommunityinsufficient,Shannonindiceswouldincreasewiththereadcountbasedonsamplingmoreandrarerspecies,andthustherewouldbeasignificantpositive(r > 0)correlationbetweenthesetwovalues.RelativeerroroftheShannonindexvalues,estimatedfor95%confidenceintervals,didnotexceed12%,whichis,again,undertheacceptablelevelof20%(Fig.1).CorrelationbetweentheShannonindexrelativeerrorandreadcountissignificantandnegative(r = −0.697,P-value = 0.00 <0.05).ThismeansthatincreasingreadcountsleadstoincreasingprecisionoftheShannonindexestimate.Usingsamplesofatleast5000reads,itdidnotexceed8%,whileincreasingthesamplesizeto10000readsfurtherdecreasedittolessthan4%,whichisapositiveresultforcomplexnaturalsamples.Itisimportanttonotethattheconvergentestimatesofspeciesdiversityinspecies-richcommunitiesusuallyrequirehigherreadcounts.Inourwork,anumberofthesampleswithShannonindexvaluesofmorethan3.5(whichispracticallyashighaspossibleforabiologicalsample)hadrelativeerrorsofroughly3%(Fig.1).Thesecommunitieswerecharacterizedusingdatasetsof10000–15000reads.Thus,wecanconcludethatthesereadcountsaresufficientforthemetabarcodinganalysisofthe95%topOTUsofabacterialcommunity,andhighersamplesizesareunnecessary.Inallsamples,theSimpsonbiodiversityindicesrangedfrom0.42to0.94forbothV2-V3andV3-V4regions(Fig.1).Foreightoutof14samples,theseindiceswerehigherforV3-V4fragment,whileinsixoutof14samples,theV2-V3regionproducedhighervalues(meanvalue0.79fortheV2-V3regionand0.82fortheV3-V4region).ShannonbiodiversityindicesrevealadifferentpatterncomparedtoSimpsonindex.HoweverthetestingwiththepairedWilkinson-Mann-Whitneytestdemonstrates(Table3)thattheaveragesoftheSimpsonindicesdidnotdiffersignificantlyfromShannonones.Thus,metabarcodingwitheitherV2-V3orV3-V416SrRNAfragmentsyieldsroughlythesamespeciesdiversityestimatesbothwithSimpsonandShannonindices.ThedependenceoftheSimpsonindexvaluesonthesamplesissimilartothepatternsobservedfortheShannonindex(Fig.1).Correlationanalysis(r = 0.015,p = 0.93 > 0.05)showsnocorrelationbetweenSimpsonindicesandreadcounts.ThesameisobservedfortheShannonindicesandthisisexplainedbythesimilarreasons.RelativeerroroftheSimpsonindexvalues,estimatedfor95%confidenceintervalsdidnotexceed11%(Fig.1).CorrelationbetweentheSimpsonindexrelativeerrorandreadcountissignificantandnegative(r = −0.59,p ~ 105

請為這篇文章評分？

延伸文章資訊

The effect of 16S rRNA region choice on bacterial community ...

In this work, we compare the resolution of V2-V3 and V3-V4 16S rRNA regions for the purposes of e...

Frontiers | Comparison of Two 16S rRNA Primers (V3–V4 and ...

Two different hypervariable regions of the bacterial 16S rRNA gene were amplified using aliquots ...

High-Speed, Multiplexed 16S Microbial Sequencing ... - Illumina

Sequencing the V4 region of 16S. Sequence variation in the 16S ribosomal RNA (rRNA) gene is widel...

Why use V3-region of the 16S rRNA encoding gene for ...

The main reason for using V3/V4 region in majority of the cases that these regions contain the ma...

Benchmark of 16S rRNA gene amplicon sequencing using ...

As the official Illumina protocol adopted the V3–V4 (V34) region, these two regions are widely us...

The effect of 16S rRNA region choice on bacterial community ...

文章推薦指數： 80 %

請為這篇文章評分？

延伸文章資訊

最新文章

相關網站資訊

更年期食療

更年期情緒

更年期症状

胰島素肥胖

胰島素阻抗

皮膚紅腫硬硬的

親子天下寶寶生活

毒品危害防制條例修法

抑鬱症安慰

應用寶

攝護腺

恐慌症睡覺

The effect of 16S rRNA region choice on bacterial community ...

文章推薦指數： 80 %

請為這篇文章評分？

延伸文章資訊

最新文章

相關網站資訊

更年期食療

更年期情緒

更年期症状

胰島素肥胖

胰島素阻抗

皮膚紅腫硬硬的

親子天下 寶寶生活

毒品危害防制條例修法

抑鬱症安慰

應用寶

攝護腺

恐慌症睡覺

親子天下寶寶生活