Knowledge Graphs and Machine Learning | by Nicola Rohrseitz

文章推薦指數: 80 %
投票人數:10人

A Knowledge Graph is a set of datapoints linked by relations that describe a domain, for instance a business, an organization, or a field of ... OpeninappHomeNotificationsListsStoriesWritePublishedinTowardsDataScienceKnowledgeGraphsandMachineLearningApowerfulcombinationforthesemi-automaticgenerationofinsightsAKnowledgeGraphisasetofdatapointslinkedbyrelationsthatdescribeadomain,forinstanceabusiness,anorganization,orafieldofstudy.ItisapowerfulwayofrepresentingdatabecauseKnowledgeGraphscanbebuiltautomaticallyandcanthenbeexploredtorevealnewinsightsaboutthedomain.KnowledgeGraphsaresecondaryorderivatedatasets:Theyareobtainedbyanalyzingandfilteringtheoriginaldata.Morespecifically,therelationsbetweendatapointsarepre-calculatedandbecomeanimportantpartofthedataset.Thismeansthatnotonlyeachdatapointcanbeanalyzedfastandatscale,butalsoeachrelation.Thechoiceofhowtodescribetherelationsandtheabilitytoanalysefastandatscaleisthekeytonewinsights.Fromdatatoinformation,frominformationtoknowledge,fromknowledgetoinsight,frominsighttowisdom.Forinstance,whileageographicalmapcontainsthenamesandcoordinatesofcities,asimpleKnowledgeGraphwouldalsoincludethedistancebetweenthem.Hence,insteadofhavingtocalculatealldistanceswhenaqueryismade,Icanimmediatelyask:whatistheshortestroutebetweenpointAandpointR?Pre-calculatingthedistancesisasimplestep,butmakesthegeographicalanalysismuchfaster,allowingalsotoeasilytestdifferentscenarios.Forinstance,what’stheshortestroutebetweenAandR,knowingthatpointBissuddenlyunreachable?Graphsasanalysistoolshavebeenaroundforcenturies,butonlymorerecentlytheconceptof“KnowledgeGraph”hasemerged.ItsformaldefinitionisgivenbyPaulheim(2016),inwhichaKnowledgeGraph:describesreal-worldentitiesandtheirinterrelations;definespossibleclassesandrelationsofentitiesinaschema;allowsforpotentiallyinterrelatingarbitraryentitieswitheachother;coversvarioustopics.Beyondthedefinition,KnowledgeGraphhasgreatmarketingappeal:itimpliesatechnologicalartifactthatencapsulatesallrelationsofacompanyoranotherdomain,leadingtoabetterunderstanding.Andthatisbecomingmoreandmoretrue,alsothankstoMachineLearning.DescribingnewrelationsusingMachineLearningTheworkhorseoftheMachineLearningrevolutionisdataclassificationbymeansofDeepLearning.Byclassifyingdata,wecreatesubsetsofdatapointsthatarerelatedbybelongingtothesameclass.Thisrelationdidn’texistbeforetheclassificationandcannowbeusedtocreateaKnowledgeGraph.ThepowerofDeepLearningistobeabletoclassifycomplexdatawithoutprovidingexplicitdescriptionbutsimplyexamples.Images,speech,documents,spreadsheets,presentations,videos,…DeepLearningcanclassifyalotofdifferentkindsofdata,givinganunprecedentedopportunitytodescribeadomainfrommultipleperspectives.Describingmillionsofdatapointsbyhandislargelynotviable.Imaginehavingtoreadandclassifymillionsofprecisebutdrylegaldocuments.Notthebestuseofhumantime.Makingmostofone’stimeisalsoonesimpleexampleofML-generatedKnowledgeGraphforacompany:byanalyzingdocumentsitispossibletoknowthatboththeA-teamandteam∆areseparatelyworkingonthesamesubject,givingtheopportunitytoimprovethecollaboration.TherearemanypossibleusecasesforthepowerfulcombinationofgraphsandML.Thedevelopmentrequiresworkingontwochallengingaspects:accesstodataandfindingtheclassesthatwillleadtothedesiredoutcome.Whilethefirstismostlyanorganizational,legal,andoftenethicalissue,thesecondrequiresdomainknowledge.WhilebeforetheML-revolutionthiswastypicallyonlyprovidedbysubject-matterexperts,nowMLsystemscansupportthiswork,reducingthebarriertoentry.MLalsosupportsthedefinitionoftheclassesofrelationsMachineLearningcansupportthecreationofrelationsusingclassification,butalsothedefinitionoftheclasses.Forinstance,NaturalLanguageProcessingofdocumentscanmodeltopicsandrecognizenamed-entities.Withtheirstatisticalrepresentation,ahumancanthenmakedata-informeddecisionsaboutwhichelementsshouldconstitutenewtypesofrelations.Thesethenbecomethelabelsfortheclassification.ThismeansthattocreateaKnowledgeGraph,anydatabasethatmightcontainrelevantinformationiscrawledandscanned.Files,directories,activitylogs,…anythingcanbestatisticallyanalyzedtocreatetaxonomiesandontologies,whicharethetermsusedtodefinetheclasses,properties,andrelationsbetweendatapoints,aswellashownewarecreated.Theyaretheblueprintandinstructionsbywhichalldatapointsconsideredareclassifiedanddescribed.ThisisatthecoreofwhyKnowledgeGraphsarealsosometimescalledsemanticnetworks.Semanticemphasizesthefactthatthemeaningisencodedtogetherwiththecorrespondingdata.Thisisdonethroughthetaxonomiesandontologies(theirgeneraltermshavesomeoverlap,partiallyduetotheirorigin.Taxonomycomesfrombiology,whileontologyhasitsrootsinphilosophy—fromtheGreekOntologia,“thestudyofbeing”.Moreformaldefinitionsexist,likethoseusedincomputerscience,buttheyarenotcompletelyrelevantinthiscontext).Thehumanjudgementindefiningthetaxonomyandontologyisimportantbecausedatacouldbedescribedinaninfiniteamountofways.Machinearestillunabletoconsiderthebroadercontexttomaketheappropriatedecisions.Thetaxonomiesandontologiesarelikeprovidingaperspectivefromwhichtoobserveandmanipulatethedata.Iftheelementofinterestisnotbeingconsidered,thentheKnowledgeGraphwon’tprovideanyinsight.Choosingtherightperspectiveishowvalueiscreated.Typicallythistaskiscarriedoutiteratively,learningalsofromwhatdoesn’twork.Onceyouhavedefinedtherules,youcanapplythemtonewdata,creatingmetadataandthustheKnowledgeGraph.Inanappropriatedatabasesystem,itcanthenbeeasilyqueriedandanalyzed.Forinstance,howmanyrelationsdoesaparticularentityhave?WhatistheshortestroutefromAtoZ?Howsimilararethesubgraphs?OneofthepowersofKnowledgeGraphsisalsobeingabletorelatedifferenttypesofdataandprovenances.Thisisveryusefulforextractingvaluebycombininginformationfromdifferentsources,acrossforinstancecorporatesilos.CreatingaKnowledgeGraphisasignificantendeavorbecauseitrequiresaccesstodata,significantdomainandMachineLearningexpertise,aswellasappropriatetechnicalinfrastructure.However,oncetheserequirementshavebeenestablishedforoneKnowledgeGraph,morecanbecreatedforfurtherdomainsandusecases.Giventhatnewinsightscanbefound,KnowledgeGraphsareatransformativewayofextractingvaluefromexistingunstructureddata.Useitwisely.[Allphotosandimagesbyauthor]Ifyoureadthis,you'llprobablylikeFlyingattheSpeedofLight—Howfliesinthemetaversehelpedresolvealong-standingengineeringmystery.--1MorefromTowardsDataScienceFollowYourhomefordatascience.AMediumpublicationsharingconcepts,ideasandcodes.ReadmorefromTowardsDataScienceRecommendedfromMediumJaeDukSeo[ArchivedPost]AsWeMayProgramWeimingFederatedLearning+PersonRe-identification:Benchmark,In-DepthAnalysis,andPerformance…NarendraPrasathinTowardsDataScienceUnderstandLogisticRegressionfromScratchJaeDukSeo[ArchivedPost]ReviewforSoftmaxandlossfunctionsJaeDukSeo[ArchivedPost]OnImplicitFilterLevelSparsityinConvolutionalNeuralNetworksSanjivGautamSigmoidvsReLU — Thebattleoftheactivationfunctions.MaximLiuWhatistheAutograd?PytorchDesignPatternsExplained(1) — AutogradSukanyaBaginPythoninPlainEnglishTheCompleteXGBoostTherapywithPythonAboutHelpTermsPrivacyGettheMediumappGetstartedNicolaRohrseitz341FollowersInterdisciplinaryTechnologist,Startupfounder&BoardMember.Physics&NeurosciencePhD,AI&RoboticsExpert.Basketball-Snowboard-Surf—ViewsmineFollowMorefromMediumMadhanaBalaLet’slearnInteloneAPIAIAnalyticsToolkitZachWolpeNonparametricECGSignalProcessingAniketMarneElectricalPowerSystems — exploringsmartcitiesviaDataScienceChristianMonsoninTowardsDataScienceA.I.TalkswithAnimalsHelpStatusWritersBlogCareersPrivacyTermsAboutKnowable



請為這篇文章評分?