Optimizing PCR primers targeting the bacterial 16S ribosomal ...

文章推薦指數: 80 %
投票人數:10人

Since the 16S gene sequence is similar but not identical in different organisms, degenerate primers are used for 16S rRNA sequencing. A primer ... Skiptomaincontent Advertisement SearchallBMCarticles Search OptimizingPCRprimerstargetingthebacterial16SribosomalRNAgene DownloadPDF DownloadPDF Methodologyarticle OpenAccess Published:29September2018 OptimizingPCRprimerstargetingthebacterial16SribosomalRNAgene FrancescoSambo1,FrancescaFinotello2,EnricoLavezzo3,GiacomoBaruzzo1,GiuliaMasi3,ElektraPeta3,MarcoFalda3,StefanoToppo3,LuisaBarzon3&BarbaraDiCamillo  ORCID:orcid.org/0000-0001-8415-46881  BMCBioinformatics volume 19,Article number: 343(2018) Citethisarticle 29kAccesses 21Citations 14Altmetric Metricsdetails AbstractBackgroundTargetedampliconsequencingofthe16SribosomalRNAgeneisoneofthekeytoolsforstudyingmicrobialdiversity.Theaccuracyofthisapproachstronglydependsonthechoiceofprimerpairsand,inparticular,onthebalancebetweenefficiency,specificityandsensitivityintheamplificationofthedifferentbacterial16Ssequencescontainedinasample.Thereisthustheneedforcomputationalmethodstodesignoptimalbacterial16Sprimersabletotakeintoaccounttheknowledgeprovidedbythenewsequencingtechnologies.ResultsWeproposehereacomputationalmethodforoptimizingthechoiceofprimersets,basedonmulti-objectiveoptimization,whichsimultaneously:1)maximizesefficiencyandspecificityoftargetamplification;2)maximizesthenumberofdifferentbacterial16Ssequencesmatchedbyatleastoneprimer;3)minimizesthedifferencesinthenumberofprimersmatchingeachbacterial16Ssequence.Ouralgorithmcanbeappliedtoanydesiredampliconlengthwithoutaffectingcomputationalperformance.Thesourcecodeofthedevelopedalgorithmisreleasedasthemopo16Ssoftwaretool(Multi-ObjectivePrimerOptimizationfor16Sexperiments)undertheGNUGeneralPublicLicenseandisavailableathttp://sysbiobig.dei.unipd.it/?q=Software#mopo16S.ConclusionsResultsshowthatourstrategyisabletofindbetterprimerpairsthantheonesavailableintheliteratureaccordingtoallthreeoptimizationcriteria.Wealsoexperimentallyvalidatedthreeoftheprimerpairsidentifiedbyourmethodonmultiplebacterialspecies,belongingtodifferentgeneraandphyla.Resultsconfirmthepredictedefficiencyandtheabilitytomaximizethenumberofdifferentbacterial16Ssequencesmatchedbyprimers. BackgroundTargetedampliconsequencingoftheribosomalsmallsubunit,16SribosomalRNAgene(16SrRNA)isacommonapproachtoinvestigatethediversityofmicrobialcommunitiesinasite[1,2].The16SrRNAgeneispresentinallprokaryotesandcontainsbothfast-evolvingregions,whichcanbeusedtoclassifyorganismsatdifferenttaxonomiclevels,andslowly-evolvingregions,whicharerelativelyconservedthroughoutdifferentspecies.Theslowly-evolvingregionscanbeusedtodesignbroad-spectrumprimerpairsforpolymerasechainreaction(PCR)amplification,whichinturncanbeusedtoisolatespecies-specificfast-evolvingregions.Aprimerpairiscomposedofaforwardandareverseprimer:theformerismeanttomatchthesensesequenceofthebacterial16S,whilethelattershouldmatchtheantisensesequence[1].Theaccuracyof16SrRNAsequencingstronglydependsonthechoiceoftheprimerpairs.Manyofthecurrentbacterial16Sprimershavebeendesignedfromsequencedataobtainedfrominvitroculturedspecies,eventhoughenvironmentalmicrobiologistsestimatethatlessthan2%ofbacteriacanbeculturedinthelaboratory.However,ourknowledgeoverunculturablebacterialsequencesisrapidlygrowingthankstoNext-GenerationSequencing(NGS),atechnologythatiscontinuouslyevolvingandimproving[3].Asaconsequence,several16Ssequencedatabaseshavebeencreatedandarebeingmaintaineduptodatebythescientificcommunity[4,5,6].Thereisthustheneedforautomatedmethodsthatleveragesuchnewlyavailableinformationinthedesignandupdateofbacterial16Sprimers.Sincethe16Sgenesequenceissimilarbutnotidenticalindifferentorganisms,degenerateprimersareusedfor16SrRNAsequencing.Aprimersetiscalleddegeneratewhenitisusedasamixtureofoligonucleotidemoleculesthatcontaindifferentnucleotidesindefinedpositions.Apairofdegenerateprimerscanbenaturallyexpandedintoasetofnon-degenerateprimerpairs,whoseelementsareobtainedbyassigningallpossiblecombinationsofvaluestothedegeneratenucleotidesoftheoriginalpair.Wedefinesuchasetofnon-degenerateprimerpairsaprimer-set-pair(Table 1).Table1Exampleofthemappingfromapairofdegenerateprimerstoaprimer-set-pairFullsizetableAnoptimalprimer-set-pairshouldexhibitseveralproperties: Maximizeexperimentalefficiencyandspecificity,intermsofhowmuchaprimerpairisabletoamplifytheselectedDNAsequence,andnotothers,duringPCRamplification.EfficiencyandspecificitydependonanumberofparametersintrinsictothePCRmethod,whichneedtobesetinordertoguaranteethesuccessofthereaction.Keyparametersaretheprimerlength,theampliconlength,thenumberandpositionofmismatcheswithrespecttothetemplate,theprimerGC-content,andtheabilityofprimerstoproducesecondarystructuresbyinter-orintra-molecularinteractions[7].Inthefollowing,forthesakeofconciseness,werefertothisobjectivewiththetermefficiency. Maximizecoverage,intermsofthefractionofallbacterial16Ssequencesfromdifferentspeciesthataretargetedbyatleastoneforwardandonereverseprimerfromtheprimer-set-pair. Minimizeprimermatching-bias,intermsofdifferencesinthenumberofcombinationsofprimersfromtheforwardandreversesetsmatchingeachbacterial16S. Intheliterature,themajorityoftheapproachesforautomatedprimerdesignforasetofreferencesequencesarebasedonmultiplealignmentofthesetofsequences.Amongthese,LinhartandShamir[8]formulatetheproblemastheDegeneratePrimerDesignproblemandproposeadynamicprogrammingsolution,implementedintheHYDENsoftware.AnimprovementoftheHYDENsoftwareisproposedbyHugerthetal.[9]astheDegePrimesoftware.Noneoftheseapproachesaccountforprimerefficiency,whichinsteadistakenintoaccountbyBrodinetal.[10]inthePrimerDesignsoftware,asasetofconstraintsonadmissibleprimerpairs.Multiplealignment,however,isbasedonheuristicapproaches[11]andisinherentlyineffectiveinproducingacorrectfinalalignmentwhenthousandsofsequencesareinvolvedintheprocess,especiallywhensequencesshowacertaindegreeofheterogeneityasinthecaseof16S.Multiplealignmentofthe16SbacteriasequencesfromtheRibosomalDatabaseProject(RDP)[5]isusedbyWangandQian[12]toidentifyconservedfragmentsusefulforprimerdesign,buttheapproachfocusesjustonsingleprimersanddoesnotextendtheanalysistoprimerpairs.Finally,theSPYDERsoftwarefor16Sprimerdesignandassessment[13]exploitstheRDPProbeMatchtooltoquicklyassesscoverageofcandidateprimerpairs,buttheprimerdesignhastobemanuallycarriedoutbytheuser,ratherthanautomatedbythesoftware.Inthiswork,weproposeanalgorithmforoptimizingtheprimerchoice,whichsearcheswithinthesetofallpossibleprimer-set-pairsforthosesimultaneouslyexhibitinghighefficiencyandcoverageandlowmatching-bias.Thenoveltyofourapproachismany-fold.First,byformulatingcoverage,efficiencyandmatching-biasasoptimizationcriteria,weallowtheusertoexplicitlymodelthetrade-offbetweenthethreecompetingobjectives.Second,weconsiderforthefirsttimeminimalmatching-biasamongthecharacteristicsthatagoodprimer-set-pairmustexhibit.Whileefficiencyandcoverageareusuallytakenintoaccountwhendesigningaprimerset,matching-biasisseldomconsideredintheliterature.However,itshouldbetakenintoaccountinquantitativestudies,wheretheobjectiveistoquantifytherelativeabundanceofthedifferentspecies,andthepresenceofspeciesmatchedbymorecombinationsofforwardandreverseprimersmayleadtounwantedamplificationbiases.Third,byrelyingonprimer-to-sequencealignment,ratherthanonmultiplealignment,weavoidpotentialartefactsintheresultsduetoincorrectfinalalignmentwhenthousandsofsequencesareinvolvedintheprocess.Fourth,weremovetheconstraintthatthesetsofforwardandreverseprimersshouldbesummarizableasapairofdegenerateprimers:indeed,theinclusionofdegeneratebasesitesinprimerdesignmayleadtoinefficienttargetamplification,duethepresenceofmismatchesbetweenprimersandtargetsequences[14].Inaddition,theuseofdegenerateprimersmightleadtolow-reproducibilityinprimersynthesisandthusbiasesamongdifferentprimerbatches.Byavoidingdegenerateprimers,wethusprovidetheuserwithmorecontroloverwhatisactuallyamplifiedandoverpossiblebiases.OurapproachexploitsthebacterialsequenceknowledgeavailableinpublicdatabasessuchasGreenGenes[4],theprobeBase16Sprimersdatabase[15],recentlyupdatedafteracomprehensiveliteraturesurvey[16],andSILVA[6].Asanexampleofapplication,wepresenttheoptimizationofprimerchoiceforampliconsintherange700–800 bp,buttheprocedureisgeneralandcanbeappliedtoanydesiredampliconlengthandrepresentativebacteriapopulation.Insilicoresultsshowthatourstrategyisabletofindbetterprimer-set-pairsthantheonesavailableintheliteratureaccordingtoallthreeoptimizationcriteria.Furthermore,experimentalvalidationdemonstratesthattheoptimalprimer-set-pairsaresuitablefortheamplificationof16SrRNAfromavarietyofbacterialspeciesbelongingtodifferentgenera,thusconfirmingthepredictedefficiency,widecoverageandlowmatching-bias.MethodsProblemconstraintsAsstatedinthepreviousparagraph,anoptimalprimer-set-pairshouldsimultaneouslymaximizeefficiencyandcoverageandminimizematching-bias.Inthefollowing,wedescribehowwequantitativelyencodedtheseconstraints.EfficiencyTheperfectprimer-set-pairsshouldsatisfyseveralconstraints,aimedatimprovingPCRefficiencyandspecificity[7].However,concurrentlysatisfyingallconstraintsisoftenimpracticalandmoststate-of-the-artprimersviolateoneormoreconstraints[16].Wethusdecidedtointroduceefficiencyasanoptimizationscore,encodingmanyoftheconstraintsasfuzzyscorefunctions.Moreprecisely,wedefinedourefficiencyscoreasthesumoftenscoreterms:sevenfuzzyscoretermsrelatedtosingle-primerefficiencyconstraints,averagedacrossallprimersintheprimer-set-pairs,plusthreescoretermsrelatedtotheefficiencyoftheprimer-set-pairsasawhole.Sincealltermsaremeanttovarybetween0and1,theoptimizationscorerangesfrom0(minimalefficiency)to10(maximalefficiency).Broadlyspeaking,ourfuzzyscorecounts1foreachconstraintthatisperfectlysatisfied,or,alternatively,avaluebetween0and1dependingonhowclosetheprimeristotheconstraintlimit.Asanexample,considertheprimermeltingtemperature,Tm.Tmshouldbegreaterthanorequalto52degreesinaperfectprimer[7],but51isstilltolerable,albeitnotideal.Inthiscase,ourfuzzyscoringfunctionassigns1totemperaturesof52degreesorgreater,0totemperaturesof50degreesorlessandconsidersalinearincreasingfunctionbetween50and52degrees.Eachtermispreciselydescribedinwhatfollows.The7single-primerscoretermsare: 1. Meltingtemperature:themeltingtemperatureTmofaprimeriscomputedwiththenearest-neighbourformula[17].Thescoretermis1ifTm ≥ 52,0ifTm ≤ 50and(Tm-50)/2if50  0.7orfGC scorebest9   pbest=pnew10   scorebest=scorenew11  pcurr=pbest12returnpcurrState-of-the-artprimerpairsasinitialsolutionsWeselectedtheonlinedatabaseprobeBase[15,16]asasourceofcandidateprimer-set-pairstobeusedasinitialsolutionsbymopo16S.Thedatabasecontainsmorethan500pairsof(possiblydegenerate)primersandreportsforeachprimeritssequence,thestrandandpositionatwhichitmatchesthereference16SEscherichiacoligene,andthetargetdomainforwhichitisdesigned(beingeitherBacteria,ArchaeaorUniversal).Givenadesiredrangeforthetargetampliconlengthasinputofmopo16S,weselectedallprimerpairsfromtheprobeBasedatabasesatisfyingallthefollowingproperties: Ampliconlengthinthedesiredrange; Lengthofbothprimersgreaterthanorequalto17 ntandsmallerthanorequalto21 nt; BacteriaorUniversaltargetdomainofbothprimers. Sinceourapproachistoworkwithsetsofnon-degenerateprimers,incaseofdegeneraciesineithertheforwardorthereverseprimer,wesubstitutethedegenerateprimerwithacorrespondingsetofnon-degenerateprimers,obtainedbyassigningallpossiblecombinationsofvaluestothedegeneratenucleotidesintheprimer.AnexampleofthisprocedureisgiveninTable1.Wecomputedthethreescoresforeachoftheprimer-set-pairsandidentified,amongthese,theprimer-set-pairsformingtheinitialParetofront.ResultsWepresentacasestudyofoptimalprimerchoiceproceduretargetingampliconsintherangeof700–800 bp.Fromthesetofinitialprimer-set-pairsintheprobeBasedatabase,weidentified37setpairssatisfyingalltherequiredpropertiesandhavingreferenceampliconsinthedesiredrange.Exploitingthe457316SsequencesoftheGreenGenesbacterialOTUsasrepresentativeset(seetheMethodssection),wecomputedthethreescoresforeachoftheprimer-set-pairsandidentifiedthreeprimer-set-pairsformingtheinitialParetofront,representedassquaresinFig. 1.Fig.1Representationoftheefficiency,coverageandmatching-biasoptimizationcriteriafortheParetofront.Efficiencyisrepresentedonthey-axis,coverageonthex-axisandmatching-biasusingcolorshading.Theinitialprimer-set-pairsarerepresentedassquares;theprimer-set-pairsaftermulti-objectiveoptimizationarerepresentedascirclesFullsizeimageWethenexecutedmopo16S,launching20runsoftheMULTI-OBJECTIVE-SEARCHalgorithm,eachwith20restarts,foratotalofmorethan33,000,000functionevaluations.Thelistsofsolutionsreturnedbythe20runsarequiteheterogeneous,havingameanJaccardindex(sizeofintersectionoversizeofunion)betweeneachpairoflistsequalto0.007.Thesoftwarecollectedtheresultsofallthe20runsinasinglearchiveandcomputedthenewParetofront,representedascirclesinFig.1(notethattheidealpointsshouldbebrightyellowandlocatedtothetoprightcornerofthefigure).mopo16Scompleteditsexecutioninlessthan9 min,usinglessthan900 MBofRAMandupto4threadsonadesktopworkstationequippedwitha3.3 GHzIntel®Core™i5–2500.Theinitialprimer-set-pairs,chosenasthebest-performingprimer-set-pairsextractedfromtheprobeBasedatabase(indicatedassquaresinFig.1),areoutperformedbyallprimer-set-pairsobtainedbyourapproach(circlesinFig.1)accordingtoatleasttwocriteriaand,someofthem,accordingtoallthreecriteria.Inparticular,oneoftheinitialprimer-setpairsconsideredintheprobeBasedatabase(cyansquareinFig.1)hasmaximumefficiency(score10),butthelowestcoverageandthehighestmatching-biascomparedtoalltheothersolutions.Theothertwoinitialprimer-setpairs,instead,areoutperformedbyallthenewsolutionsaccordingtoallthreecriteria,withasingleexceptionofasolutionwithequalmatching-bias(purplesquareandpurplecircleinFig.1). In-silicovalidationFromtheoptimalprimer-set-pairsolutions(circles)inFig.1,weselectedthethreesetpairsmarkedwitharrowsforfurtherinspection.Theforwardprimersofallthreepairsaligntothereference16SsequenceoftheEscherichiacolibacteriumbetweenhypervariableregionsV2andV3,atpositions355–358,andallthreereverseprimersalignbetweenregionsV6andV7,atpositions1059–1063,thusresultinginampliconlengthsbetween701and708nucleotides.ThecompletesequenceofeachforwardandreverseprimerisreportedinTable 2.Eachprimer-set-pairwascomparedtothehumangenometoexcludenonspecificamplificationofhumansequences.PrimersequenceswerecomparedtotheGRCh38humangenomewithssearch36[29],allowingnogapsandupto2mismatches,consistentlywiththeCoverageconstraints.Noneofthepossibleprimerpairsamplifiesaregionofthehumangenomeshorterthan4000 nt,whichis5.6-foldthelengthoftheampliconsgeneratedinthebacterial16SrRNA.Table2Completesequenceofeachforwardandreverseprimerofthethreeselectedprimer-set-pairsFullsizetableTheefficiency,coverageandmatching-biasofthethreeprimer-set-pairscomputedontherepresentativesetarereportedinthefirstthreerowsofTable3.Inordertoassesshowournewprimer-set-pairsperformonmuchbroaderandcompletedatasets,wecomputedcoverageandmatching-biasofthethreeprimer-set-pairsonthe195,27916SsequencesoftheGreenGenes99%bacterialOTUsandonthe464,618bacterial16SsequencesoftheSILVASSURef119NonRedundant(NR)set,obtainedbyapplyinga99%identitycriteriontoremovehighlysimilarsequences.ResultsareshowninTable3(efficiencyisnotreportedsinceitdoesnotdependontheconsidereddataset)andconfirmtheperformanceobtainedontherepresentativeset.Slightlyimprovedresultsmightdependonthenumerosityoftheclustersassociatedtohighlyrepresentativereferencesequence(seeparagraph“Referencesetof16Ssequences,preparationandannotation”).Table3Numericalvaluesoftheefficiency,coverageandmatching-biasscoresforthethreeselectedprimersassessedonGreenGenesandSILVAreferencesequencesFullsizetableExperimentalvalidationThethreeprimer-set-pairsindividuatedbymopo16Swerealsoevaluatedinapanelofbacteriaisolatedfromclinicalspecimens,includingrepresentativesofdifferentphylawithintheBacteriadomain(Additionalfile1:TableS1),andcomparedwiththreenon-optimizedprimersets,usedascontrols,selectedamongthoseusedtoinitializemopo16SandreportedbyKlindworthetal.[16](Forward:Bact-0008-b-S-20-Reverse:Bact-0785-a-A-21;Forward:Bact-0347-a-S-19;Reverse:Bact-1028-b-A-19;Forward:Bact-0337-a-S-20;Reverse:Bact-1046-a-A-19).BacteriawereisolatedaspurecultureinstandardculturemediaandidentifiedbyautomatedbiochemicaltestingandMALDI-TOFanalysisonVitek2andVitekMSSystems,respectively(BioMerieux,Marcyl’Etoile,France).NucleicacidswerepurifiedfrombacteriabyusingMP96DNASVkitsonaMagNAPure96Systemworkstation(Roche,Basel,Switzerland),quantifiedanddilutedinordertoachieveapproximatelythesamefinalconcentration.Primerefficiencywasevaluatedbyreal-timePCRusingSYBRGreenIreagentonReal-timePCRona7900HTFastReal-TimePCRSystem(ThermoFisherScientific,Carlsbad,CA,USA)withthefollowingsteps:10 minat95 °C,35 cyclesofdenaturationfor30 sat95 °C,annealingattheselectedtargettemperaturefor60 s(60 °Cforset1andcontrol3,56 °Cforsets2and3andcontrol1and2),andextensionat72 °Cfor90 s.Thespecificityoftheamplificationproductwascheckedbymeltingcurveanalysis,whichshowednonon-specificamplificationofhumangenomicDNAwithanyoftheprimersetsunderevaluation(Additionalfile1:FigureS1).Amplificationefficiencyandcorrelationbetweenthresholdcycleandtargetquantityinthesampleweredemonstratedbyamplificationofserialdilutionsofreferencesamples.Resultsofreal-timePCRamplificationofthepanelofbacteriaisolatesdemonstratedthatthethreePCRprimersetsaresuitablefortheamplificationof16SrRNAfromavarietyofbacterialgenerafromdifferentfamiliesandphyla,thusconfirmingthepredictedefficiencyandwidecoverage.Figure 2showstheboxplotsoftheΔCtvaluescalculatedasthedifferencebetweenthemeanofthresholdcycle(Ct)valuescalculatedacrossdifferentprimerpairsonaspecificsampleandtheCtvalueonthesamesampleobtainedwithaspecificprimer-pair.SinceCtlevelsareinverselyproportionaltotheamountoftargetnucleicacidinthesample,positiveΔCtvaluesindicatehigherefficiencythanaverage;negativeΔCtvaluesindicatelowerefficiencythanaverage.Comparisonofamplificationefficiencybasedonthresholdcyclevaluesshowedthatoptimalprimer-set-pairs2and3outperformliteratureprimers(two-sidedpairedt-testp-valuelowerthan0.05forallcomparisonswithliteratureprimer-sets)withprimer-set-pair3asthebestperformer(Fig. 2).Optimalprimer-set-pair1showscomparableexperimentalefficiencywithliteratureprimers.CyclesequencingofPCRproductsobtainedwithprimer-set-pair3,followedbyphylogeneticanalysisontheleBIBI-PPFwebserver(Jean-pierreFlandrois,GuyPerrière,SimonPenel,BénédicteLafayandManoloGouy,UniversityofLyon,1.http://umr5558-bibiserv.univ-lyon1.fr/lebibi/PPF-in.cgi)wasperformedtochecktheabilitytoidentifybacteriaatgenusandspecieslevels.Allthesamplesunderevaluationwereclassifiedatgenuslevelwithscores> 0.99accordingtoShimodairaandHasegawatest[30,31],whileclassificationatspecieslevelwasachievedin> 50%ofcases.Fig.2Boxplotsofvaluesdemonstratingamplificationof16SDNAfrombacteriaisolates.Primersets1,2and3(Table2)andthreeprimerpairsfromtheliterature(Forward:Bact-0008-b-S-2-Reverse:Bact-0785-a-A-21;Forward:Bact-0347-a-S-19;Reverse:Bact-1028-b-A-19;Forward:Bact-0337-a-S-20;Reverse:Bact-1046-a-A-19)wereusedasreal-timePCRprimersetsonapanelofbacteriaisolatedfromclinicalspecimens,includingrepresentativesofcommonGram-positiveandGram-negativehumanpathogensbelongingtodifferentgeneraandphyla(Additionalfile1:TableS2).ΔCtvalueswerecalculatedasthedifferencebetweenthemeanofthresholdcycle(Ct)valuescalculatedforeachsampleusingdifferentprimer-pairsandtheCtvalueobtainedusingaspecificprimer-pair.PositiveΔCtvaluesindicatehigherefficiencythanaverage;negativeΔCtvaluesindicatelowerefficiencythanaverageFullsizeimageDiscussionInthispaper,wepresentedanovelalgorithm,mopo16S,foroptimalprimerdesignin16Smetagenomicsexperiments.Primersareoptimizedaccordingtothreecriteria,namelyefficiencyoftheprimersets,coverageoftherepresentativesetandcoveragebiasacrosstherepresentativeset.Boththerepresentativesetofsequencestobecoveredandtheinitialsetofstate-of-the-artprimersaredrawnfrompubliclyavailableandup-to-datedatabases.Thus,newsolutionscanalwaysbealignedwiththecurrentknowledgeonthe16Sgene.Inourstudy,weselectedprimersthatcouldgeneraterelativelylongampliconsbecausewewantedtoincludeseveralvariableregionsofthe16SrRNAgene,inordertoimprovetheabilitytotaxonomicallyclassifybacterialsequences(OTU)atgeneraorevenspecieslevel[32].Pleasenote,however,thatmopo16Sisgeneralenoughtobeapplicabletoanydesiredampliconlength.Ofnote,ampliconlengthisnotaffectingthecomputationalperformanceofthealgorithm,asthesearchfortheoptimalsolutionisperformedinthespaceofprimers.Onlytheparametersrelatedtotheamountofeffortinsearchingfortheoptimalsolution(i.e.thenumberofrunsoftheMULTI-OBJECTIVE-SEARCHalgorithmandthenumberofrestartsnrestofeachrun)canaffecttheexecutiontimeofthesoftware.Tomitigatethiseffect,mopo16SexecuteseachrunoftheMULTI-OBJECTIVE-SEARCHalgorithmonadifferentthread,resultinginanexecutiontimespeed-upthatisalmostlinearinthenumberofthreadsused.Tosolvethemulti-objectiveoptimizationproblemwechosetousealocalsearchapproachratherthanapopulation-basedsearchalgorithm,suchasamulti-objectiveevolutionaryalgorithm,forseveralreasons.First,thenatureofoursearchspace,i.e.thespaceofallpossibleprimerpairs,lendsitselfnaturallytoalocalsearchparadigm,wheretheeffectofchanging,addingordroppingonebaseatatimestartingfromaninitiallygoodsolution(theliteratureprimerpairs)isoftennotharnessingmuchtheprimerfeasibility.Ontheotherhand,wereckonthattherecombinationoperatortypicalofgeneticalgorithmsusedtocombinetwoparentsolutions[33,34]wouldalmostoftenresultinunfeasibleprimers,slowingdownthesearchfortheoptimum.Second,thestrengthoflocalsearchisthescarcityofparameterstobetuned.Inparticular,forsingle-objectivelocalsearchwechosetheiteratedbestimprovementlocalsearchapproach,whichisparameter-lessandterminateswhennofurtherimprovementisfound.Ontheopposite,evolutionaryalgorithms,comparedtolocalsearch,havemanymoreparametersthatneedtobeaccuratelytunedandthat,evenwhenoptimallytunedforasetofinstances,donotguaranteetoremainoptimalforunseendata.ConclusionsManyofthecurrentbacterial16Sprimershavebeendesignedfromsequencedataobtainedfrominvitroculturedspecies,eventhoughonlyaminorityofbacterialspeciescanbeculturedinthelaboratory.However,ourknowledgeofunculturablebacteriasequencesisrapidlygrowingthankstoNGSandseveral16Ssequencedatabaseshavebeencreatedandarebeingmaintaineduptodatebythescientificcommunity.Thereisthustheneedforautomatedmethodstodesignandupdatebacterial16Sprimersabletotakeintoaccountsuchnewavailableinformation.Inthiswork,wegiveourcontributiontothefieldbypresentingamethodforoptimalmulti-objectiveprimerchoice,whichexploitspubliclyavailabledatabasessuchasGreenGenes[3],probeBase[15,16]andSILVA[5].mopo16Scanbeappliedtoanydesiredampliconlengthandrepresentativebacteriapopulation.Ourapproach: Maximizesexperimentalefficiencyandspecificity,intermsofhowmuchaprimerpairisabletoamplifytheselectedDNAsequenceduringPCR. Maximizescoverage,intermsofthefractionofallbacterial16Ssequencesfromdifferentspeciesthatarematchedbyatleastoneforwardandonereverseprimerfromthesetpair. Minimizesmatching-bias,intermsofdifferencesinthenumberofcombinationsofprimersfromtheforwardandreversesetsmatchingeachbacterial16S. WedevelopedasoftwaretoolimplementingourapproachandreleaseditundertheGNUGeneralPublicLicenceasthemopo16Ssoftwaretool(Multi-ObjectivePrimerOptimizationfor16Sexperiments)athttp://sysbiobig.dei.unipd.it/?q=Software#mopo16S.Wetestedmopo16Sonanexampleproblem:theoptimalprimerschoiceforBacterial16Sandampliconsintherangeof700–800 bp.Thethreeresultingprimer-set-pairs,whenassessedinsilico,outperformedstate-of-the-artprimersaccordingtoallthreeoptimizationcriteria.Experimentally,thethreePCRprimersetsweredemonstratedtobesuitablefortheamplificationof16SrRNAsfromavarietyofbacterialspeciesbelongingtodifferentgenera,thusconfirmingthepredictedefficiency,widecoverageandlowmatching-bias. AbbreviationsNGS: NextGenerationSequencing OTUs: OperationalTaxonomicUnits PCR: PolymeraseChainReaction rRNA: RibosomalRNA ReferencesKuczynskiJ,LauberCL,WaltersWA,ParfreyLW,ClementeJC,GeversD,KnightR.Experimentalandanalyticaltoolsforstudyingthehumanmicrobiome.NatRevGenet.2012;13(1):47–58.CAS  Article  GoogleScholar  FinotelloF,MastrorilliE,DiCamilloB.Measuringthediversityofthehumanmicrobiotawithtargetednext-generationsequencing.BriefBioinform.2016.https://doi.org/10.1093/bib/bbw119.KircherM,KelsoJ.High-throughputDNAsequencing–conceptsandlimitations.BioEssays.2010;32(6):524–36.CAS  Article  GoogleScholar  DeSantisTZ,HugenholtzP,LarsenN,RojasM,BrodieEL,KellerK,HuberT,DaleviD,HuP,AndersenGL.Greengenes,achimera-checked16SrRNAgenedatabaseandworkbenchcompatiblewithARB.ApplEnvironMicrobiol.2006;72(7):5069–72.CAS  Article  GoogleScholar  ColeJR,WangQ,FishJA,ChaiB,McGarrellDM,SunY,BrownCT,Porras-AlfaroA,KuskeCR,TiedjeJM.Ribosomaldatabaseproject:dataandtoolsforhighthroughputrRNAanalysis.NucleicAcidsRes.2014;42(D1):D633–42.CAS  Article  GoogleScholar  QuastC,PruesseE,YilmazP,GerkenJ,SchweerT,YarzaP,PepliesJ,GlöcknerFO.TheSILVAribosomalRNAgenedatabaseproject:improveddataprocessingandweb-basedtools.NucleicAcidsRes.2013;41(D1):D590–6.CAS  Article  GoogleScholar  DieffenbachCW,LoweTM,DvekslerGS.GeneralconceptsforPCRprimerdesign.GenomeRes.1993;3(3):S30–7.CAS  Article  GoogleScholar  LinhartC,ShamirR.Thedegenerateprimerdesignproblem.Bioinformatics.2002;18(1):S172–81.Article  GoogleScholar  HugerthLW,WeferHA,LundinS,JakobssonHE,LindbergM,RodinS,EngstrandL,AnderssonAF.Degeprime,aprogramfordegenerateprimerdesignforbroad-taxonomic-rangepcrinmicrobialecologystudies.ApplEnvironMicrobiol.2014;80(16):5116–23.Article  GoogleScholar  BrodinJ,KrishnamoorthyM,AthreyaG,FischerW,HraberP,GleasnerC,GreenL,KorberB,LeitnerT.Amultiple-alignmentbasedprimerdesignalgorithmforgeneticallyhighlyvariablednatargets.BMCBioinformatics.2013;14(1):255.Article  GoogleScholar  FengD,DoolittleR.Progressivesequencealignmentasaprerequisitetocorrectphylogenetictrees.JMolEvol.1987;25(4):351–60.CAS  Article  GoogleScholar  WangY,QianPY.Conservativefragmentsinbacterial16SrRNAgenesandprimerdesignfor16SribosomalDNAampliconsinmetagenomicstudies.PLoSOne.2009;4(10):e7401.Article  GoogleScholar  ThomasMC,ThomasDK,SelingerLB,InglisGD.SPYDER,anewmethodforinsilicodesignandassessmentof16SrRNAgeneprimersformolecularmicrobialecology.FEMSMicrobiolLett.2011;320(2):152–9.CAS  Article  GoogleScholar  GravittPE,PeytonCL,AlessiTQ,WheelerCM,CoutleeF,HildesheimA,SchiffmanMH,ScottDR,AppleRJ.Improvedamplificationofgenitalhumanpapillomaviruses.JClinMicrobiol.2000;38(1):357–61.CAS  PubMed  PubMedCentral  GoogleScholar  LoyA,MaixnerF,WagnerM,HornM.probeBase-anonlineresourceforrRNA-targetedoligonucleotideprobes:newfeatures2007.NucleicAcidsRes.2007;35(1):D800–4.CAS  Article  GoogleScholar  KlindworthA,PruesseE,SchweerT,PepliesJ,QuastC,HornM,GlöcknerFO.Evaluationofgeneral16SribosomalRNAgenePCRprimersforclassicalandnext-generationsequencing-baseddiversitystudies.NucleicAcidsRes.2013;41(1):e1.CAS  Article  GoogleScholar  SantaLuciaJ,AllawiHT,SeneviratnePA.Improvednearest-neighborparametersforpredictingDNAduplexstability.Biochemistry.1996;35(11):3555–62.CAS  Article  GoogleScholar  ApteA,DanielS.PCRprimerdesign.ColdSpringHarbProtoc.2009;(3):pdb.ip65.https://doi.org/10.1101/pdb.ip65.Article  GoogleScholar  JMR,Walsh-WellerJ.AnintroductiontoPCRprimerdesignandoptimizationofamplificationreactions.MethodsMolBiol.1998;98:121–54. GoogleScholar  LefeverS,PattynF,HellemansJ,VandesompeleJ.Single-nucleotidepolymorphismsandothermismatchesreduceperformanceofquantitativePCRassays.ClinChem.2013;59(10):1470–80.CAS  Article  GoogleScholar  AclandA,AgarwalaR,BarrettT,BeckJ,BensonDA,BollinC,BoltonE,BryantSH,CaneseK,ChurchDM,ClarkK.Databaseresourcesofthenationalcenterforbiotechnologyinformation.NucleicAcidsRes.2014;42(D1):D7–D17.CAS  Article  GoogleScholar  PaqueteL,StǘtzleT.Stochasticlocalsearchalgorithmsformultiobjectivecombinatorialoptimization.In:GonzalezTF,editor.HandbookofApproximationAlgorithmsandMetaheuristics.NewYork:Chapman&Hall/CRC;2007.p.29–1-29-15. GoogleScholar  Dubois-LacosteJ,López-IbáñezM,StützleT.Improvingtheanytimebehavioroftwo-phaselocalsearch.AnnMathArtifIntell.2011;61(2):125–54.Article  GoogleScholar  SamboF,BorrottiM,MylonaK.Acoordinate-exchangetwo-phaselocalsearchalgorithmfortheD-andI-optimaldesignsofsplit-plotexperiments.ComputStatDataAnal.2014;71(0):1193–207.Article  GoogleScholar  BorrottiM,SamboF,MylonaK,GilmourS.Amulti-objectivecoordinate-exchangetwo-phaselocalsearchalgorithmformulti-stratumexperiments.StatComput.2017;27(2):469–81.Article  GoogleScholar  HoosHH,StǘtzleT.StochasticLocalSearch:Foundations&Applications.SanFrancisco:Elsevier;2004.DöringA,WeeseD,RauschT,ReinertK.SeqAnanefficient,genericC++libraryforsequenceanalysis.BMCBioinformatics.2008;9(1):11.Article  GoogleScholar  TheOpenMPAPIspecificationforparallelprogramming.2013.https://www.openmp.org/.Accessed1Mar2017.PearsonWR,LipmanDJ.Improvedtoolsforbiologicalsequencecomparison.ProcNatlAcadSciUSA.1988;85(8):2444–8.CAS  Article  GoogleScholar  ShimodairaH.Anapplicationofmultiplecomparisontechniquestomodelselection.AnnInstStatMath.1998;50(1):1–13.Article  GoogleScholar  ShimodairaH,HasegawaM.Multiplecomparisonsoflog-likelihoodswithapplicationstophylogeneticinference.MolBiolEvol.1999;16(8):1114.CAS  Article  GoogleScholar  FranzénO,HuJ,BaoX,ItzkowitzSH,PeterI,BashirA.ImprovedOTU-pickingusinglong-read16SrRNAgeneampliconsequencingandgenerichierarchicalclustering.Microbiome.2015;3:43.Article  GoogleScholar  ZhangQ,LiH.MOEA/D:amulti-objectiveevolutionaryalgorithmbasedondecomposition.IEEETransEvolComput.2007;11(6):712–31.Article  GoogleScholar  KnowlesJD,CorneDW.ApproximatingthenondominatedfrontusingtheParetoarchivedevolutionstrategy.EvolComput.2000;8(2):149–72.CAS  Article  GoogleScholar  Downloadreferences Funding ThisworkwassupportedbytheUniversityofPadova(ex60%). Availabilityofdataandmaterials ThedatathatsupportthefindingsofthisstudyareavailablefrompubliclyavailabledatabasesGreenGenes(http://greengenes.lbl.govandhttp://greengenes.secondgenome.com/)[4],probeBase(http://probebase.csb.univie.ac.at/)[15,16]andSILVA(https://www.arb-silva.de/)[6]. AuthorinformationAuthorsandAffiliationsDepartmentofInformationEngineering,UniversityofPadova,Padova,ItalyFrancescoSambo, GiacomoBaruzzo & BarbaraDiCamilloBiocenter,DivisionofBioinformatics,MedicalUniversityofInnsbruck,Innsbruck,AustriaFrancescaFinotelloDepartmentofMolecularMedicine,UniversityofPadova,Padova,ItalyEnricoLavezzo, GiuliaMasi, ElektraPeta, MarcoFalda, StefanoToppo & LuisaBarzonAuthorsFrancescoSamboViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarFrancescaFinotelloViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarEnricoLavezzoViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarGiacomoBaruzzoViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarGiuliaMasiViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarElektraPetaViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMarcoFaldaViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarStefanoToppoViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarLuisaBarzonViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarBarbaraDiCamilloViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarContributionsFSimplementedthealgorithmandanalyzedthedata.FSandGBdevelopedandtestedthesoftwaretool.BDCcoordinatedandsupervisedthestudy.FFperformedbacterial16Ssequencere-annotation.FS,FF,EL,LB,ST,BDCcontributedtothestudyconceptionanddesign.GM,EPandLBconductedtheexperiment.EL,MF,LB,SThelpedwiththedefinitionofthescoretermsforprimerefficiency.FSandBDCwrotethemanuscript.GBhelpedinwritingthemanuscript.Allauthorscontributedtoreadandapprovedofthemanuscript.CorrespondingauthorCorrespondenceto BarbaraDiCamillo.Ethicsdeclarations Ethicsapprovalandconsenttoparticipate Notapplicable. Consentforpublication Notapplicable. Competinginterests Theauthorsdeclarethattheyhavenocompetinginterests. Publisher’sNote SpringerNatureremainsneutralwithregardtojurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations. Additionalfile Additionalfile1:TableS1.Supplementaryinformationontheidentificationofbacterial16Ssequencesandexperimentalperformance.(DOCX610kb)Rightsandpermissions OpenAccessThisarticleisdistributedunderthetermsoftheCreativeCommonsAttribution4.0InternationalLicense(http://creativecommons.org/licenses/by/4.0/),whichpermitsunrestricteduse,distribution,andreproductioninanymedium,providedyougiveappropriatecredittotheoriginalauthor(s)andthesource,providealinktotheCreativeCommonslicense,andindicateifchangesweremade.TheCreativeCommonsPublicDomainDedicationwaiver(http://creativecommons.org/publicdomain/zero/1.0/)appliestothedatamadeavailableinthisarticle,unlessotherwisestated. ReprintsandPermissionsAboutthisarticleCitethisarticleSambo,F.,Finotello,F.,Lavezzo,E.etal.OptimizingPCRprimerstargetingthebacterial16SribosomalRNAgene. BMCBioinformatics19,343(2018).https://doi.org/10.1186/s12859-018-2360-6DownloadcitationReceived:22July2017Accepted:09September2018Published:29September2018DOI:https://doi.org/10.1186/s12859-018-2360-6SharethisarticleAnyoneyousharethefollowinglinkwithwillbeabletoreadthiscontent:GetshareablelinkSorry,ashareablelinkisnotcurrentlyavailableforthisarticle.Copytoclipboard ProvidedbytheSpringerNatureSharedItcontent-sharinginitiative Keywords16SrRNAsequencingPrimerdesignMultiobjectiveoptimization DownloadPDF Advertisement BMCBioinformatics ISSN:1471-2105 Contactus Submissionenquiries:[email protected] Generalenquiries:[email protected]



請為這篇文章評分?