DepartmentofComputerScience,UniversityofYork
Heslington,York,YO105DD,UK
jc@cs.york.ac.uk
Abstract
Algorithmsforexactandapproximateinferenceinstochasticlogicprograms(SLPs)arepre-sented,basedrespectively,onvariableelimina-tionandimportancesampling.WethenshowhowSLPscanbeusedtorepresentpriordistri-butionsformachinelearning,using(i)logicpro-gramsand(ii)Bayesnetstructuresasexamples.Drawingonexistingworkinstatistics,weapplytheMetropolis-HastingalgorithmtoconstructaMarkovchainwhichsamplesfromtheposteriordistribution.APrologimplementationforthisisdescribed.Wealsodiscussthepossibilityofcon-structingexplicitrepresentationsoftheposterior.
brieflyshowshowvariableeliminationcanbeusedforex-actinferenceinSLPs.AftershowinghowtoextendSLPswithnon-equationalconstraintsinSection7,wecanfinallybringmuchoftheprecedingworktogetherinSections8and9toshowhowSLPscanbeusedtorepresentdistribu-tionsovercomplexmodelspaces,suchasisrequiredfor‘reallyBayesian’machinelearning.Section10containsconclusionsandpointerstopossiblefuturework.
2LogicProgrammingEssentials
Anoverviewoflogicprogrammingcanbefoundin(Cussens,1999).Here,lackofspacemeansthatonlythemostimportantdefinitionsareflagged.AnSLD-derivation
usingalogicprogramisa(finiteorinfi-ofagoal
nite)sequence,eachelementofwhichiseithera4-tupleortheemptygoalwhere
1Introduction
Astochasticlogicprogram(SLP)isaprobabilisticexten-sionofanormallogicprogramthathasbeenproposedasa
flexiblewayofrepresentingcomplexprobabilisticknowl-edge;generalising,forexample,HiddenMarkovModels,StochasticContext-FreeGrammarsandMarkovnets(Mug-gleton,1996;Cussens,1999).However,weneedtoask(i)whetherthisincreaseinflexibilityisneededforanyrealproblemsand(ii)whetherreasonablealgorithmsexistforinferenceandlearninginSLPs.
Inthispaperwegiveanumberofapproachestoapprox-imateandexactinferenceinSLPs,focusingmostlyonsampling.Moreimportantly,weapplySLPstoanimpor-tantproblem—Bayesianmachinelearning—whichwouldbedifficulttohandlewithsimplerrepresentations.Thepaperisorganisedasfollows.Section2containsessentialdefinitionsfromlogicprogramming.Section3showswecanuseSLPstodefinethreedifferentsortsofdistributions,butfocusesontheappealinglysimpleloglin-earmodel.Sections4and5givetwoquitedifferentwaysofimprovingsamplingfortheloglinearmodel,beingbasedonPrologandimportancesampling,respectively.Section6
istheselectedatomofgoal
;
istheselectedinputclausein,withitsvariablesrenamedsothatandhavenovariablesincom-mon;
isthemostgeneralunifierofandof)orfailiftheycannotbeunified.
(thehead
If
of
failthenfail.Otherwise,istheresultofreplacingby(thebody)inandthenapplyingtotheresult.If
then.
AnSLD-refutationisafiniteSLD-derivationendinginthe
emptygoal.TheSLD-treeforagoalisatreeofgoals,withasrootnode,andsuchthatthechildrenofanygoalaregoalsproducedbyoneresolutionstepusing(andfailhavenochildren).Fig6showsanSLD-tree.Acomputedanswerforagoalisasubstitutionforthevari-ablesinproducedbyanSLD-refutationof.
3DefiningDistributionswithSLPs
Astochasticlogicprogram(SLP)isalogicprogramwheresomeofthepredicates(thedistribution-definingorproba-bilisticpredicates)havenon-negativenumbersattachedtotheclauseswhichmakeuptheirdefinitions.Wedenotethe
by.Wealsowrite,andlabelforaclause
wedenotetheparametervectorcontainingalltheby.InthispaperwewillconsideronlynormalisedSLPswherelabelsfortheclausesmakingupthedefinitionofapredi-catesumtoone.
Thebasicideaisthatclauselabelsprobabilisticallyinflu-encewhichinputclauseischosenasaderivationproceeds,thusdefiningadistributionoverderivationsfromwhichdistributionsover(i)refutationsand(ii)variablebindingsmaybederived.WebeginbyrestrictingattentiontopureSLPs,whereallpredicateshavelabelleddefinitions,post-poningtheimpurecaseuntilSection7.3.1LoglinearModel
wecanGivenanSLPwithparametersandagoal
sampleSLD-derivationsforusingloglinearsamplingasfollows:
Loglinearsampling:Anycomputationrulemaybeusedtoselecttheatomfromthecurrent
ischosenwithgoal.Thenextinputclause
probabilityfromthoseclausesinwiththesamepredicatesymbolintheheadas.Westopwhenweproduceeitherfailor.
Letdenotethesetofrefutationsofthegoal(i.e.
bethenumberoftimesderivationsthatendin).Let
labelledclauseisusedinrefutation.Theloglineardistributionoverrefutationsofthegoalis
Thisistheoriginalsamplingmechanismdefinedby(Mug-gleton,1996).ItisatheoremoflogicprogrammingthatnormalSLD-refutationisessentiallyunalteredbychang-ingtheatomselectionrule(althoughdrasticchangesinef-ficiencycanoccur).Ifwedelayselectinganatom,allthatchangesisthatmaybemoreinstantiatedwhenweeventuallydoselectit.Unfortunately,thismeansthatde-layingtheselectionofwillalterunifandsoweareobligedtofixtheselectionruleinunification-constrainedsamplingsothatasingledistributionisdefined.
Theunification-constraineddistributionoverrefu-tationsforagoalis
unif
Computingtheunnormalisedprobability(potential)ofarefutationisonlyslightlylesseasythanwiththeloglinear
model,sinceallthevalues,for
unif,canbequicklyfoundastherefutationproceeds.
3.3BacktrackableModel
Unification-constrainedsamplingstopsassoonasitreachesfail,backtrackablesamplingbacktracksuponfailure,andsowillgenerallybeamoreefficientwayofsamplingrefutations.Backtrackablesamplingisessen-tiallyPrologwithaprobabilisticclauseselector.
Backtrackablesampling:Theselectedatomisalwaystheleftmostatominthecurrentgoal.Thenextinputclauseischosenwithprobability.Ifweproducefailthenwebacktracktothemostrecentchoice-point,deletethechoiceofinputclausethatledtofailureandchoosefromamongstthesurvivingchoiceswithprobabilityproportionaltoclauselabels.Ifnochoicesremainthenwebacktrackfurtheruntilachoicecanbemadeorreturnfail.Westopifweproduceorfail.
Letsuccbethesetofinputclauseswhichleadtoatleastonerefutation.Thebacktrackabledistribution
overrefutationsforagoalis
succ
Computingtheunnormalisedprobability(potential)ofarefutationis,ingeneral,hard;sincewemayhavetoex-ploreaverylargeSLD-treerootedat
toidentifythesetsucc.
Comparingtheloglinear,unification-constrainedandback-trackablemodelsweseethereisatrade-offbetweeneaseofsamplingarefutation,andeaseofcomputingapotentialforagivenrefutation.Ifonlysamplingisrequiredandwearehappythattheorderofliteralsinclausesmatters,thenthebacktrackablemodelmakessense.However,theloglin-earmodelhasattractivelysimplemathematicalpropertieswhichwewillexploitintheMCMCapplication.Fortu-nately,loglinearsamplingcanbespedupasthenextsec-tionshows.
4ImprovingLoglinearSampling
(Muggleton,1996)explicitlyintroducedSLPsasgeneral-isationsofHiddenMarkovmodels(HMMs)andStochas-ticContext-FreeGrammars(SCFGs).ComparingSLPstoHMMsweseethatinSLPs:(i)thestatesofanHMMarereplacedbygoals(ii)theoutputsofanHMMarereplacedbysubstitutions(iii)concatenationofoutputsisreplacedbycompositionofsubstitutions(iv)outputs(substitutions)aregenerateddeterministicallyand(v)statetransitionproba-bilitiesaregivenbyclauselabels.ItisalsomorenaturalinSLPstoassociateoutputs(substitutions)withtransitionsbetweenstates(goals)ratherthanwithstatesthemselves.TheconnectionbetweenSLPsandSCFGsisevencloser.Consider,theSCFGinFig1whichhasbeenimple-mentedasanSLP.Wecangeneratesentencesusinglog-linearsamplingwiththegoal:-s(A,[]).Since
isanSCFG,thequerywillalwayssucceed,eventhoughwedonotallowbacktrackingintheloglinearmodel.Suppose
1:s(A,B):-n(A,C),v(C,D),n(D,B).0.4:n([joe|T],T).0.6:n([kim|T],T).0.3:v([sees|T],T).0.7:v([likes|T],T).
Figure1:
:AnSCFG
now,thatweareinterestedonlyinreflexivesentences.WethenapplyaconstrainttotheSCFG:replacing
1:s(A,B):-n(A,C),v(C,D),n(D,B).with
1:s(A,B):-n(A,C),v(C,D),n(D,B)
A=[N|T1],D=[N|T2].
ormoreconcisely:
1:s([N|T1],B):-n([N|T1],C),
v(C,[N|T2]),n([N|T2],B).
Nowwecannotguaranteethat:-s(A,[]).willal-wayssucceed,thegrammarisnolongercontext-free.Thismeansthatsamplingsentencesfromthenewconditionaldistribution(conditionalontheconstraintbeingsatisfied)islessefficient.Wehavetothrowawayderivationsthatareinconsistentwiththeconstraint,justaswithforwardsam-plinginBayesnetsinthepresenceofevidence.
Intheloglinearmodelwemayselectanyatomfromthecurrentgoal,whichmeansthattheorderofliteralsinthebodiesofclausesdoesnotaffectthedistribution.However,sincewewillusePrologtoimplementSLPs,wecanexploitProlog’sleftmostatomselectionruletoforceconstraintstobeeffectedasearlyaspossible.WedothisbysimplymovingtheconstraintsleftwardssothatPrologencountersthemearlier.Thishastheeffectofproducingfailassoonasourchoiceofinputclauseshasensuredthataderivationcannotsucceed.Fig2hasanorderingofbodyliteralsfors/2thatensuresthataderivationfailsassoonaswepickasecondnounwhichisnotthesameasthefirst—wedon’twastetimechoosingtheverb.Themoralis:itisbettertofailsoonerthanlater.
1:s([N|T1],B):-n([N|T1],C),n([N|T2],B),v(C,[N|T2]).0.4:n([joe|T],T).0.6:n([kim|T],T).0.3:v([sees|T],T).0.7:v([likes|T],T).
Figure2:
:Asimplegrammar
5ImportanceSamplingforSLPs
SLPsareonlyrequiredforcomplexdistributions,whereit
isoptimistictodependonexactinference.Approximateinferencecanbebasedonsampling,wheree.g.toesti-matetheprobabilityofsomeevent,wesamplefromtheSLPandobtaintheevent’srelativefrequency.Unfortu-nately,evenwiththeProlog-basedspeedupgivenabove,pureloglinearsamplingcanstillbeslow.However,since
iseasiertosamplefromthantheloglineardistri-bution,theobvioussolutionistouseimportancesampling(Gelmanetal.,1995).Weproducesamplesfrom
andthenweightthemwiththeimportanceweights
Wehave:
unif
Wecanupdateaswego,sothere-foreitiseasytocomputeunif
weightedsamplesforaparticulargoal,wheretheweightsareknownuptoanormalisingcon-stant.Forapproximateinference,theunknownnormalisingconstantwilloftencancelout.Forexample,itisfrequentlyenoughtoestimatetheratiobetweenprobabilities.
6ExactInferenceinSLPs
Eachrefutationofagoalproducesacomputedanswer—variablebindingsforthevariablesin.Wecande-fineadistributionoverthecomputedanswersfor
bymarginalisation—wesumoverallrefutationsthatproduceacomputedanswer.Itisconvenienttorepresentcomputed
answersbyatoms:Definetheyield
ofarefutationofaunitgoaltobewhereisthecom-putedanswerforusing.Letbeacomputedanswerforthegoal,thenthecorrespondingyieldisand:
subgoals
and
whichdonotsharevariablesthen
Forgoals,suchas,whichcannotbedecomposedintosubgoalswithoutcommonvariables,weareforcedtofindsplittingsubstitutions.Asubstitutionsplitstwogoalsandifbeasetofandsplittingdosubstitutionsnotsharevariables.Letforandwhichincludesallcomputedanswersforthe
goal
restrictedtothecommonvariablesofand.Then:
Wewouldliketofindasmallfairlyquickly.Ifeachvariablecanonlytakeafairlysmallnumberofdis-cretevaluesasisoftenthecaseinBayesiannets,wecanjustgothrougheachofthesevalues.Weendthissectionbynotingthatthesecomputationsneedtobevectorisedtoreturna(finite)distributionoverbindingsforavariable,ratherthanasingleprobability.
7ImpureSLPs
WecanextendthedefinitionofSLPsbygoingbeyondcon-junctionsofequationalconstraintssuchasin.Fig3shows,anSLPforafragmentofFrench,wherethereisaconstraintthatadjectivesandnounsagreeongender.Theg/2predicatethatdefinestheconstraintisunlabelledsinceitplaysnopartindefiningthedistributionexcepttocutoutderivationswhichareinconsistentwithit.Whenanunlabelledpredicateisencounteredinaderivationweonlyconsiderthefirstvariablebindingitproduces(ifany).Backtrackingmaybeusedtoproducethisfirstbinding,butwemaynotreturntoseekanotherbindingifwehitfailurelateron.
Equationalconstraintscanbeplacedasearlyaswelike.Forotherconstraints,placingthemtooearlycanpreventpermissiblederivationsfrombeingfound.The2nd(com-mentedout)versionofs/2inFig3hastheg(A,G)tooearly.Sincebacktrackingisbanned,onlythefirstilbind-ingwillbefound,overlyconstrainingthevalueofAsoe.g.elleseravieillewillneverbeproduced.Inthecorrectver-sionweprobabilistically(partially)instantiatethevariableA,byourchoiceofinputclauseandonlytheneffecttheconstraintonG.Sinceweareallowedtobacktrackwithinthecalltog/2thiscallwillalwayssucceed.Thekeyisthatitmustbetheprobabilisticpredicatesthatchoosethevariablebindingsthatmatter.
Effectingnegatedconstraintstooearlycanletthroughmorederivationsthanissafe.Thethirdversionofs/2startswithanegatedgoalthatwillsucceedandproducenovariablebindings—sonoconstraintiseffected.Hadthis
doublenegationbeenattheendoftheclauseitwouldhaveeffectedthedesiredconstraint.
%Constrainttoolate-inefficient%1:s(A,B):-n(A,C),v(C,D),a(D,B),%g(A,G),g(D,G).%Constrainttooearly-overconstrained%1:s(A,B):-g(A,G),n(A,C),%a(D,B),g(D,G),v(C,D).%Constrainttooearly-underconstrained%1:s(A,B):-\\+\\+(g(A,G),g(D,G)),
n(A,C),v(C,D),a(D,B).%Constraintattherighttime.1:s(A,B):-n(A,C),g(A,G),
a(D,B),g(D,G),v(C,D).
0.4:n([il|T],T).0.6:n([elle|T],T).0.3:v([est|T],T).0.7:v([sera|T],T).0.2:a([vieux|T],T).0.8:a([vieille|T],T).g([il|_],m).g([elle|_],f).g([vieux|_],m).g([vieille|_],f).
Figure3:
:Genderagreementconstraint
8MachineLearningforDogmaticBayesians
Finally,neverforgetthatthegoalofBayesiancomputationisnottheposteriormode,nottheposteriormean,butarepresentationoftheen-tiredistribution,orsummariesofthatdistribu-tionsuchas95%intervalsforestimandsofin-terest(Gelmanetal.,1995,p.301)(italicsintheoriginal)
‘Bayesian’approachesinmachinelearningdonotliveuptothisexactingdemandtorepresenttheentireposterior,usuallysettlingforjusttheposteriormode(MAPalgo-rithms)orparticularexpectations(Bayesoptimalclassifi-cation).Inthispaper,weshowhowSLPscanbeusedtodefinepriorsrepresentingawiderangeofbiasesandcon-straintsandalsoshowhowtosamplefromposteriors.Al-thoughwefallshortofconstructing(usable)explicitrep-resentationsoftheposterior,suchapossibilitycannotberuledout.
Ourapproachisbasedontheprocesspriorapproachfordecisiontreesdevelopedindependentlyby(Chipmanetal.,1998)and(Denisonetal.,1998).Ourpresentationwillfollowthatgivenby(Chipmanetal.,1998).Inshort:
Insteadofspecifyingaclosed-formexpression
forthetreeprior,
,wespecifyim-plicitlybyatree-generatingstochasticprocess.Eachrealizationofsuchaprocesscansimplybe
consideredarandomdrawfromthisprior.Fur-thermore,manyspecificationsallowforstraight-forwardevaluationof
foranyandcanbeeffectivelycoupledwithefficientMetropolis-Hastingssearchalgorithms(Denisonetal.,1998)
WecanuseMetropolis-Hastingstosamplefromtheposte-riordistributionovertrees,bychoosinganinitialtreeandproducingnewtreesasfollows(whereisjustthelikelihoodwithtree)(Denisonetal.,1998):
1.Generateacandidatevaluewithprobabilitydistri-
bution.
2.Set
withprobability
elseset
BecauseSLPsdefinedistributionsoverfirst-orderatomicformulae—theyieldsofrefutations—theycaneasilyrep-resentdistributionsovermodelstructuressuchasdecisiontrees,Bayesiannetsandlogicprograms.Wewilldenotemodelsusing,possiblysuperscripted.Thesimplest,butpossiblyveryinefficientapproachtodefiningpriorsoverthemodelspacewithSLPsisasfollows:model(M):-gen(M),ok(M).
gen/1generatespossiblemodelsjustlikeaSCFGgen-eratessentences:therearenoconstraintssoweneverhitfail.ok/1isthenaconstraintwhichfiltersoutmodelswhichwedonotwishtoincludeinthemodelspace.
Ifok/1rejectsmanyofthemodelsgeneratedbygen/1,thenitwillbeinefficienttosamplefromthepriorandthisinefficiencytranslatestoinefficiencywhenrunningtheMetropolis-Hastingsalgorithm.Thesolution,asex-plainedinSection4istomoveconstraintsasearlyaspos-siblewithoutalteringthedistribution.Thishasbeendone
intheSLPs
andofwhichfragmentscanbefoundinFig4andFig5.Thesedefinepriorsoverlogicpro-gramsandBayesiannetworksrespectively.Ineachcase,wesimplydefinewhatamodelis,usingfirst-orderlogic,andthenaddlabelstodefineadistributionoverthismodelspace.Inwehaveconstraintsthatwedonotwantanemptytheoryandthatanygeneratedruleshavenotpre-viouslybeengenerated(newrule/2).Wealsohavea‘utility’constraintmake_nice/2whichalwayssucceedsandjustrewritesthegeneratedlogicprogramintoamore
convenientform.In
weassumethatthevariableRVsisalwaysinstantiatedtoalistofnamesofrandomvari-ables,sothat
isusedtodefinedistributionsoftheform,wherealltheprobabilisticinforma-tionisassociatedwithparents/3.Amoreefficientver-sionwouldpushtheacyclic/1constraintearlier.
model(LP):-theory([],LP).0.1:theory(Done,NicelyDone):-\\+Done=[],
make_nice(Done,NicelyDone).0.9:theory(RulesSoFar,Done):-rule(Rule),
newrule(RulesSoFar,Rule),
theory([Rule|RulesSoFar],Done).
Figure4:Fragmentof,anSLPdefiningapriorover
logicprograms
model(RVs,Net):-net(RVs,RVs,Net),
acyclic(Net).net([],_,[]).
net([H|T],RVs,Net):-parents(H,RVs,Ps),append(Ps,TNet,Net),net(T,RVs,TNet).
Figure5:Fragmentof,anSLPdefiningaprioroverBayesiannets(forafixedsetofrandomvariables)
8.1ImaginaryModels
Havingcarefullyfilteredoutunwantedmodels,wefindthatitisconvenienttore-admitthemtothemodelspacewhenweimplementourposteriorsamplingalgorithm.Howeveralltheseimaginarymodels,whichpreviouslydidnothaveaprobabilitydefinedforthem,willnowgetprobabilityzero.Doingthisensuresthatgeneratinganewproposedmodel
using
issimple.Iftheproposedmodelisimaginarythenwewillneveracceptit:sincewehave.Ananalogousapproachexistsinanalysis,whereitisofteneasiertodorealanalysiswithinthespaceofcomplexnumbers.
Recallthatthedistribution
overatomsisgeneratedbymarginalisationfromadistribu-tionofthesamenameoverrefutationsof.Extendingourdefinitiontoincludezeroprobabilityimagi-narymodelsamountstoextendingthisunderlyingdistribu-tiononrefutationstoalsoincludezeroprobabilitySLD-derivationsthatendinfail.Itturnsoutthatthislastdistributiononderivationsisthemostconvenienttoworkwith.NotethateachderivationcorrespondstoaleafinanSLD-tree.Wewillassociatealeafcorrespondingtoafailurederivationwithfailandaleafcorrespondingtoarefutationwiththemodelyieldedbythatrefutation.Non-leafnodesoftheSLD-treewillbeassociatedwithgoals(seeFig6).
8.2TheTransitionKernel
Wecangenerateanewderivation(yieldinganewproposed
model
)fromthederivationwhichyieldedthecurrentmodelasfollows.
1.BacktrackonesteptothemostrecentchoicepointintheSLD-tree2.Wethenprobabilisticallybacktrackasfollows:Ifatthetopofthetreestop.Otherwisebacktrackonemoresteptothenextchoicepointwithprobability.
3.Useloglinearsamplingtogenerateaderivationfromthechoicepointchoseninpreviousbacktrackingstep.However,inthefirststepofsamplingwemaynotchoosethebranchthatleadsbackto.
Ifthederivationsofoundendsinfailthenisanimaginarymodel,so,andwestayat.Theparametercontrolsthesizeofsteps;if,wealwaysrestartfromthetopofthetree.
Now,wemustcalculatewhenturnsouttobearealmodel.Firstletbethedeepestcommonpar-entgoalforand.(Strictly,weshouldsay“forthe
derivationsthatyield
and”,butfromnowonwewillabbreviateinthisway.)Thereisonlyonewaywecangetfromand:backtrackingtoandthenreaching
from.Wecannotgoviasomeparentof(suchasinFig6)sincewehaveprohibitedgoingdownthetreethesamewaywebacktrackedupit.
Supposethat
isatdepthintheSLD-tree,thentheprobabilityofbacktrackingthroughchoicepointsis
forandfor.Let
betherandomvariablethatgivesthenumberofback-tracksfrom.Similarly,wehavefor.Thenwe
findthat
whetherisatthetop-levelchoicepointornot.
G_0not a choice pointfailGCiC*...............M*......MiFigure6:Jumpingfromto
intheSLD-tree
Theprobabilityofreachingstartingfromis
whereistheclausewhich
isusedat
togetto.So.Swappingandsym-bolsgivesus.TheSLD-treeinFig6showsanexample,where.Notetheimag-inarymodelunderthetop-levelgoal.Next,notethat
,tosincecancellinginoutthejustlabelsisconstantleavesofclausesforuswiththatanythosegetusbelow
thatfromgetusasfaras.Inparticular
Finallynotethat
,sinceweareactuallydeal-ingwithderivationsthatyieldmodels,andthenormalising
factorcancelsout.Puttingallthistogether:
theory(main,In,Out)-->
select_clause(theory_2,ClauseNum),theory(ClauseNum,In,Out).
theory(theory_2_1,Done,NicelyDone)-->{\\+Done=[]},
{nice_all(Done,NicelyDone)}.
theory(theory_2_2,RulesSoFar,Done)-->rule(Rule),
{newrule(RulesSoFar,Rule)},!,
theory(main,[Rule|RulesSoFar],Done).labels(theory_2,[cdp(theory_2_1,0.1,0.1),cdp(theory_2_2,0.9,1)]).
Figure7:AnSLPinProlog
Wehaveyettodorealexperimentstotestwhetheroursam-plingalgorithmconvergesonthetrueposterior,butatleasthaveaworkingimplementationthatisreasonablyefficient,andwhichcanbeusedtoexploretheconsequencesofal-teringvariousparameters.Runningthealgorithmwithnodata(sothelikelihoodsdonotneedcalculating)usingtheprioroverlogicprogramstookalittleover9secondsofCPUtimetoproduce(andwriteout)10000samplesonaPentium233MHzrunningYapProlog.Thisinvolved6
acceptancesofaproposed
,anacceptancerateofonly5.46%andinvolved337distinctlogicprograms.Whenrunusingadatasetof5positiveand5negativeexamplesandasimple10%classificationnoiselikelihoodfunction,10000samplestook11.5seconds,involving451jumpsand465distinctlogicprograms.Theserunsweredoneusingabacktrackprobabilityof.Reducingto0.3pro-ducedonly39jumpsoutof10000.
10ConclusionsandFutureDirections
WehavedefinedanumberofalgorithmsforSLPs,together
withrelevantmathematicalanalysis.ThisgoessomewaytoestablishingthatSLPscanbeausefulframeworkforrep-resentingandreasoningwithcomplexprobabilitydistribu-tions.WeviewtheapplicationtoBayesianmachinelearn-ingasbeingthemostpromisingareaforfutureresearch.Thedefinitionofageneral-purposeandpracticaltransitionkernelisprobablythepaper’smajorcontribution.How-ever,itremainstobeprovenbyrigorousexperimentationthatourposteriorsamplerproducesbetterresultsthanmoreconventionalsearch-basedapproaches.Also,wehavealsoyettogiveaproperaccountofterminationwhensamplingfromSLPs.
Inthispaper,wehaveonlyconsideredpriorsoverstruc-tures,notparameters;butitiseasytoembedbuilt-inpred-icatesinthePrologcodetogeneratee.g.samplesfromaDirichlet.Moreinterestingly,thereisthepossibilityofcombiningthelikelihoodwiththepriortogenerateapos-teriorinthesameformastheprior.ItiseasytoconstructanimpracticalSLPfortheposterioroftheform:
posterior(Model):-prior(Model),likelihood(Model).
Theinterestingquestioniswhetherthisimpracticaldefini-tioncanbetransformedintoausablerepresentation.Oneproblemhere,isthatthesizeofanefficientrepresentationoftheposteriorislikelytoexplode,giventhattheposteriorisgenerallymorecomplexthanthemanually-derivedprior.Weconcludebypointingto(Cussens,2000)whereamuchmoredetailedaccountofSLPsisgiven,andwheretheEMalgorithmisappliedtoestimateSLPparametersfromdata.Acknowledgments
SpecialthankstoBlaiseEganforpreventingmefromre-inventingthewheelandGillianHigginsforputtingupwithme.ThanksalsotoStephenMuggletonandSureshMan-andharforusefuldiscussions.
References
Chipman,H.A.,George,E.I.,andMcCulloch,R.E.
(1998).BayesianCARTmodelsearch.JournaloftheAmericanStatisticalAssociation,39(443):935–960.withdiscussion.Cussens,J.(1999).Loglinearmodelsforfirst-orderprob-abilisticreasoning.InProceedingsoftheFifteenthAnnualConferenceonUncertaintyinArtificialIntel-ligence(UAI–99),pages126–133,SanFrancisco,CA.MorganKaufmannPublishers.Cussens,J.(2000).Parameterestimationinstochasticlogic
programs.SubmittedtoMachineLearning.Denison,D.G.T.,Mallick,B.K.,andSmith,A.F.M.
(1998).ABayesianCARTalgorithm.Biometrika,85(2):363–377.Gelman,A.,Carlin,J.,Stern,H.,andRubin,D.(1995).
BayesianDataAnalysis.Chapman&Hall,London.Muggleton,S.(1996).Stochasticlogicprograms.In
deRaedt,L.,editor,AdvancesinInductiveLogicPro-gramming,pages2–2.IOSPress.Muggleton,S.(2000).Semanticsandderivationfor
stochasticlogicprograms.InUAI-2000WorkshoponFusionofDomainKnowledgewithDataforDecisionSupport.Riezler,S.(1998).ProbabilisticConstraintLogicPro-gramming.PhDthesis,Universit¨atT¨ubingen.AIMSReport5(1),1999,IMS,Universit¨atStuttgart.
因篇幅问题不能全部显示,请点此查看更多更全内容
Copyright © 2019- sceh.cn 版权所有 湘ICP备2023017654号-4
违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com
本站由北京市万商天勤律师事务所王兴未律师提供法律服务