搜索
您的当前位置:首页正文

M. A View-based Consistency model based on transparent data selection in distributed shared

来源:尚车旅游网
Department of Computer Science,

University of Otago

Technical Report OUCS-2004-03

A View-based Consistency Model based on Transparent Data Selection in Distributed Shared

Memory

Authors: Z. Huang

Department of Computer Science, University of Otago

C. Sun

School of Computing & Information Technology Griffith University, Brisbane, Australia

S. Cranefield, and M. Purvis

Department of Information Science, University of Otago

Status: Not yet submitted

Department of Computer Science,

University of Otago, PO Box 56, Dunedin, Otago, New Zealand

http://www.cs.otago.ac.nz/trseries/

AView-basedConsistencyModelbasedonTransparentDataSelectioninDistributedShared

Memory

Z.Huang,C.Sun,S.Cranefield,andM.Purvis

DepartmentsofComputer&InformationScienceUniversityofOtago,Dunedin,NewZealand

SchoolofComputing&InformationTechnology

GriffithUniversity,Brisbane,AustraliaEmail:hzy@cs.otago.ac.nz,scz@cit.gu.edu.au,

mpurvis,scranefield@infoscience.otago.ac.nz

Abstract

ThispaperproposesanovelView-basedConsistencymodelforDistributedSharedMemory,inwhichanewconcept,view,iscoined.Aviewisasetofdataobjectsthatapro-cessorhastherighttoaccessinthesharedmemory.TheView-basedConsistencymodelonlyrequiresthatthedataobjectsofaprocessor’sviewareupdatedbeforeaprocessoraccessesthem.Inthisway,itcanachievethemaximumrelaxationofconstraintsonmod-ificationpropagationandexecutionindata-race-freeprograms.Thispaperfirstbrieflyreviewsanumberofrelatedconsistencymodelsintermsoftheiruseofthreetechniques–time,processoranddataselection–whicheacheliminatesomeunnecessarypropaga-tionofmemorymodificationswhileguaranteeingsequentialconsistencyfordata-race-freeprograms.Then,wepresenttheView-basedConsistencymodelanditsimplementation.Incontrastwithothermodels,theView-basedConsistencymodelcanachievetransparentdataselectionwithoutprogrammerannotationandcanofferthemaximumperformanceadvantage.Differencesamongrelatedworkarediscussedthroughillustrativeexamples.

1

PerformanceevaluationhasshownthatourimplementationoftheView-basedConsistencymodeloutperformstheLazyReleaseConsistencymodel,andthattheView-basedConsis-tencymodelhasadvantagesoveroptimalconsistencyprotocolssuchastheAffinityEntryConsistencyprotocol.Finallywesummarisesourcontributionsandpointsoutourfuturedirectionofimplementationeffortfordistributedsharedmemorysystems.

KeyWords:DistributedSharedMemory,SequentialConsistency,RelaxedSequentialConsistency,EntryConsistency,ScopeConsistency,LazyReleaseConsistency,Timese-lection,Processorselection,Dataselection,View-basedConsistency,Viewdetection,Viewtransition,FalseSharing

1Introduction

ADistributedSharedMemory(DSM)systemcanprovideapplicationprogrammerstheillusionofsharedmemoryontopofmessage-passingdistributedsystems,whichfacilitatesthetaskofparallelprogrammingindistributedsystems.DistributedSharedMemoryhasbecomeanactiveareaofresearchinparallelanddistributedcomputingwiththegoalsofmakingDSMsystemsmoreconvenienttoprogramandmoreefficienttoimplement[21,12,20,9,7,6,3,14,15].TheconsistencymodelofaDSMsystemspecifiesorderingconstraintsonconcurrentmem-oryaccessesbymultipleprocessors,andhencehasfundamentalimpactonDSMsystems’programmingconvenienceandimplementationefficiency[22].TheSequentialConsistency(SC)model[19]hasbeenrecognizedasthemostnaturalanduser-friendlyDSMconsistencymodel.TheSCmodelguaranteesthattheresultofanyexecutionisthesameasiftheoper-ationsofallprocessorswereexecutedinsomeglobalsequentialorder,andtheoperationsofeachindividualprocessorappearinthissequenceintheorderspecifiedbyitsownprogram[19].ThismeansthatinanSC-basedDSMsystem,memoryaccessesfromdifferentproces-sorsmaybeinterleavedinanysequentialorderthatisconsistentwitheachprocessor’sorderofmemoryaccesses,andtheordersofmemoryaccessesobservedbydifferentprocessorsarethesame.OnewaytostrictlyimplementtheSCmodelistoensureallmemorymodificationsbetotallyorderedandmemorymodificationsgeneratedandexecutedatoneprocessorbepropa-gatedtoandexecutedinthatorderatotherprocessorsinstantaneously.Thisimplementationiscorrectbutitsuffersfromseriousperformanceproblems[25].

2

Inpractice,notallparallelapplicationsrequireeachprocessortoseeallmemorymodifi-cationsmadebyotherprocessors,letalonetoseetheminorder.Manyparallelapplicationsregulatetheiraccessestoshareddatabysynchronization,sonotallvalidinterleavingsoftheirmemoryaccessesarerelevanttotheirrealexecutions.Therefore,itisnotnecessaryfortheDSMsystemtoforceaprocessortopropagateallitsmodificationstoeveryotherprocessor(withacopyoftheshareddata)ateverymemorymodificationtime.Undercertainconditions,theDSMsystemcanselectthetime,theprocessor,andthedataforpropagatingsharedmem-orymodificationsinordertoimprovetheperformancewhilestillappearingtobesequentiallyconsistent[24].Forexample,consideraDSMsystemwithfourprocessors

,

,

,and

,where

,

,and

shareadataobject,

,and

shareadataobject

,and

and

shareadataobject,asshowninFig.1.Thedataobjectsand

aresharedamong

processorsatalatertimenotshowninthisscenario.

w(x)P1w(z)w(y)w(x1)w(x2)P2

r(x) w(x)r(x) r(x1)program order

P3

P4

r(y)w: write r: read

Figure1:AscenarioofaDSMprogram

Supposeallmemoryaccessestoshareddataobjectsareserializedamongcompetingproces-sorsbymeansofsynchronizationoperationstoavoiddataraces.Underthesecircumstances,thefollowingthreebasictechniquescanbeused[24].Timeselection:Modificationsonashareddataobjectbyoneprocessorarepropagatedtootherprocessorsonlyatthetimewhenthedataobjectistobereadbythem.Forexample,modificationsonby

maybepropagated

outwardonlyatthetimewheneither

or

isabouttoread.Processorselection:Modifi-cationsonashareddataobjectarepropagatedfromoneprocessortoonlyoneotherprocessorwhichisthenextoneinsequencetoreadtheshareddataobject.Forexample,modificationson

by

maybepropagatedto

(butnotto

,

)if

isthenextoneinsequencetoread

.Dataselection:Processorspropagatetoeachotheronlythoseshareddataobjectsthatare

reallysharedamongthem.Forexample,object

,and

maypropagatetoeachotheronlydata

(not),and

and

propagatetoeachotheronlydataobject(not).

ToimprovetheperformanceofthestrictSCmodel,anumberofRelaxedSequentialConsis-tency(RSC)modelshavebeenproposed[10,13,18,5,17],whichperformoneormoreofthe

3

abovethreeselectiontechniques.RSCmodelscanbealsocalledconditionalSequentialCon-sistencymodelsbecausetheyguaranteeSequentialConsistencyforsomeclassofprogramsthatsatisfytheconditionsimposedbythemodels.Thesemodelstakeadvantageofthesyn-chronizationsindata-race-freeprogramsandrelaxtheconstraintsonmodificationpropagationandexecution.Thatmeansmodificationsgeneratedandexecutedbyaprocessormaynotbepropagatedtoandexecutedatotherprocessorsimmediately.MostRSCmodelscanguaranteeSequentialConsistencyfordata-race-freeprogramsthatareproperlylabelled[13](i.e.,explicitprimitives,providedbythesystem,shouldbeusedforsynchronizationintheprograms).However,intheseRSCmodelsdataselectionisachievedbyprogrammerannotation.Theprogrammerisrequiredtoprovideannotationsoftheassociationbetweendataobjectsandsynchronizationobjects,suchaslocksandbarriers,sothattheDSMsystemcanknowwhichdataobjectsshouldbepropagatedwhenasynchronizationobjectisaccessed.Thistypeofannotationisanextraburdenontheprogrammerandcausesanincreaseofthecomplexityofparallelprogramming.Theeaseofprogrammingwouldbeimprovediftheneedforhumanannotationcouldbereplacedwithautomaticdetectionoftheassociation.Automaticdetectioncanbedoneatruntimeand/orcompiletime.Compile-timedetectionmaycauseaslowercompilingprocess,buthaslittlerun-timeoverhead.Run-timedetectionmayhavemorerun-timeoverhead,butcangetmoreaccurateinformation.ThispaperproposesanovelView-basedConsistency(VC)modelwhichcanachieve,withrun-timeautomaticviewdetection,dataselectiontransparentlyinadditiontotimeandprocessorselection.

Therestofthispaperisorganisedasfollows.Section2brieflypresentsthebackgroundtoourwork,especiallytheclassificationofRSCmodelsintermsoftime,processor,anddataselection.Section3presentstheVCmodel[16],whosepropertiesaredescribedindetail.Section4discussesimplementationissuesandpresentsourimplementationoftheVCmodel.Section5describesthedifferencesamongrelatedworkthroughillustrativeexamples.Section6presentsandevaluatesperformanceresults.Finally,themajorcontributionsofthispaperandsomesuggestionsforfutureworkaresummarizedinSection7.

4

2RelaxedSequentialConsistencyModels

DuringtheexecutionofaDSMparallelprogram,multipleprocessorscommunicatewitheachotherthroughthevirtualsharedmemory.Insharedmemorysomedataobjectsareread-only,andsomeareread/write.Topreventdataraces(wheremultipleprocessorsreadandwritethesamedataobjectconcurrently),aparallelprogramhastoguaranteethataprocessorhasgainedexclusiveaccessbeforeaccessingaread/writedataobject.

RSCmodelsdistinguishsynchronizationdataobjectsfromordinarydataobjectsinsharedmemory.Synchronizationdataobjects,suchaslocksandbarriers1,arethosethatareexplic-itlyusedtoenforceexclusiveaccesstootherdataobjects.Therestofthedataobjectsinsharedmemoryarecalledordinarydataobjects.RSCmodelsrequiresequentialconsistencyforthesynchronizationdataobjectsinsynchronizationprimitives,suchasacquireandrelease.RSCmodelsassumethereisonlyonememorycopyforthesynchronizationdataobjectsandaccessestothesamedataobjectareexecutedsequentially.Forthestorageofordinarydataobjects,RSCmodelsadoptareplicatedarchitecturetoimproveperformanceunderthedis-tributedenvironment.Copiesarestoredinthelocalmemoryofeachparticipatingprocessor.Modificationsgeneratedandexecutedlocallyatsometimebyoneoftheprocessorsmaybepropagatedtoandexecutedbyotherprocessorsatalatertime.Thisdelaymaycausetheviola-tionofsequentialconsistency,but,ifhandledproperly,canbeusedtoimprovetheperformanceoftheDSMsystem.Inordertoachievesequentialconsistencywhiletakingadvantageofthedelay,RSCmodelsdonotallowanydataraceonordinarydataobjectsintheprogram.Exclu-siveaccesstoordinarydataobjectshastobeguaranteedbyexplicitlyusingsynchronizationprimitives.Ifdataracesonordinarydataobjectsarepreventedbyusingthesynchronizationprimitivesintheprogram,mostRSCmodelscanguaranteesequentialconsistencyfortheor-dinarydataobjects.Insummary,RSCmodelsstrictlyimplementsequentialconsistencyforsynchronizationdataobjects,whiletheyimplementsequentialconsistencyforordinarydataobjectsundertheconditionthatthereisnodataraceonthem.Thiscanbeguaranteedbyproperlylabellingtheprogramwiththesynchronizationprimitives.Aprogramissaidtobeproperlylabelledifthereisnodataraceintheprogramduetotheuseofthesynchronizationprimitivessuchasacquire,releaseandbarrier[13].AnexecutionofsuchaDSMprogramcan

beviewedasasequenceofbarriersessionsasshowninFig.2.

barrier session

non−criticalnon−criticalBcriticalAregionRcriticalAregionRregionregionB: barrier A: acquire R:release

barrier

sessionBBprogram order

Figure2:Aviewofaprogramexecutionbasedontheconceptofregion

Abarriersessionbeginswithabarrierandendswithanotherbarrier.Insideabarriersessionthereisasequenceofregionscomprisingcriticalregionsandnon-criticalregions.Theseregionsaredelimitedbyacquire,releaseandbarrierprimitives.Acriticalregionisasectionofcodethatensuresthatonlyoneprocessorexecutesthesectionofcodeatanytime.Itbeginswithanacquireandendswitharelease.Anon-criticalregionbeginswitharelease(theoutermostonewhentherearenestedcriticalregions)orabarrierandendswithanacquire(theoutermostonewhentherearenestedcriticalregions)orabarrier.Anon-criticalregiondoesnotoverlapwithanycriticalregion,butacriticalregionmaybenestedwithinanothercriticalregion.

Inanexecutionofaproperlylabelledprogram,accessestodataobjectsmayhavesomecausalordering.Thecausalorderingisessentialtothedefinitionofdataracesanddata-race-freeprograms.TheRSCmodelstakeadvantageoftheassumptionthatprogramsaredata-racefreetooptimizemodificationpropagationandexecution.Definition1CausalOrderingRelation“

AtahighlevelwecanmodelaDSMprogramasasetofsequencesofactions(onesequenceforeachprocessor).Theseactionsincludememoryaccessesandsynchronizationprimitivessuchasacquire,releaseandbarrier.Giventwoactions

and

,generatedbyprocessors

and,respectively,then

ifandonlyif(1)

,andthegenerationof

happened,suchthat

beforethegenerationof

inprogramorder;or(2)

,

isarelease,

isanacquireon

thesamelockas

,and

releasesthelockto

;or(3)thereexistsanaccess

and

.

Definition2PreviousandconcurrentactionsGivenanytwoactions

and

,(1)

issaidtobepreviousto

ifandonlyif

;(2)

and

aresaidtobeconcurrent(written)ifandonlyifneithernor

.

6

Definition3Conflictingaccesses,dataraceanddata-race-freeprogramsTwomemoryaccesses

and

conflictifandonlyif

and

accessthesamememory

locationandatleastoneisawrite.Iftherearetwoconflictingaccessesexecutionofaprogram,and

and

inan

,thenwesayadataraceoccursintheexecution.Aprogram

isdataracefreeifandonlyifeverypossibleexecutionoftheprogramhasnodatarace.Fromtheabovedefinitionsweknowthatifnodataraceoccurs,thenwhenaprocessormodifiesadataobjectinsideacriticalregion,otherprocessorswillnotaccessthatdataobject.Thusitisnotnecessarytopropagatemodificationsduringtheexecutionofacriticalregion.TheRSCmodelshavetakenadvantageofthisrelaxationofmodificationpropagationtoachievetime,processor,anddataselection.Theycanguaranteesequentialconsistencyfordata-race-freeprogramswhileimprovingtheirperformance.InthefollowingwebrieflydescribetheRSCmodelsintermsofthethreeselectiontechniques.

TheWeakConsistency(WC)model[10]isanRSCmodelwhichachievestimeselection.TheWCmodelrequiresaprocessortopropagateallitsmodificationstootherprocessorsonlyatsynchronizationtime,ratherthanateverymemorymodificationtime.Withtimeselection,modificationsonshareddataobjectscanbeaccumulatedandonlythefinalresultsareprop-agatedinbatchesatsynchronizationtime.Inthisway,thenumberofmessagesintheWCsystemscanbegreatlyreducedcomparedtothatinstrictSCsystems.

TheEagerReleaseConsistency(ERC)model[13]takestimeselectiononestepfurtherthantheWCmodelbydistinguishingtwodifferentsynchronizationprimitives:acquireandrelease,whicharetheentryandexitofacriticalregion,respectively.TheERCmodelrequiresthatsharedmemorymodificationsbepropagatedoutwardonlyatreleasetime.Inotherwords,theERCmodelismoretimeselectivethantheWCmodelbypropagatingmodificationsoutwardonlyattheexitofacriticalregion,insteadofatboththeentryandexitofacriticalregionasintheWCmodel,thusfurtherreducingthenumberofmessagesinthememorysystem.TheLazyReleaseConsistency(LRC)model[18]improvestheERCmodelbyperformingbothtimeandprocessorselection.InsteadofpropagatingmodificationstoallotherprocessorsatreleasetimeasinERC,LRCpostponesthepropagationofmodificationsuntilanotherpro-cessorhassuccessfullyperformedanacquire.Atasuccessfulacquire,theDSMsystemisabletoknowpreciselywhichprocessoristhenextonetoaccesstheshareddata,somodificationscanbepropagatedonlytothatparticularprocessor(ornopropagationatallisrequiredifthe

7

nextprocessoristhecurrentprocessor),thusachievingprocessorselectionintheLRCmodel.Bysendingmodificationsonlytotheprocessorthatisenteringacriticalregion,agreaterre-ductioninthenumberofmessagescanbeachievedintheLRCmodel.

TheEntryConsistency(EC)model[5]isverysimilartoLRCinpropagatingmodificationsonlytothenextprocessorenteringacriticalregion.Inadditiontotimeandprocessorselection,theECmodelalsoperformsdataselectionbypropagatingonlythoseshareddataobjectsthatareassociatedwithasynchronizationdataobjectsuchasalock.Theseassociationsareanno-tatedbytheprogrammer.Duetotheinclusionofdataselection,theECmodelcanbemoreefficientthantheLRCmodel[6].

TheScopeConsistency(ScC)model[17]isabletooffermostofthepotentialperformanceadvantagesoftheECmodelbymeansoftime,processor,anddataselection,anditalsoim-provestheprogrammabilityoftheECmodelbyrequiringprogrammerstoattachconsistencyscopeswithcodesections,insteadofdata.Forcriticalregions,dataobjectscanbeautomat-icallyassociatedwithscopes;buttheprogrammerhastoannotatethescopesexplicitlyfornon-criticalregions.Iftheannotationisnotprovidedorisincorrect,ScCdoesnotguaranteesequentialconsistencyfortheprogram.

Fromtheabovediscussionwecanseethatconstraintsonmodificationpropagationandex-ecutionhavebecomemoreandmorerelaxed.ThisrelaxationallowsDSMsystemstoperformtime,processor,anddataselection.ToachievetheseselectionsRSCmodelsrequireprogram-merstoannotatetheprogramsmanuallysothatselectionscanbecombinedwithsynchro-nizationprimitives.Forexample,theERCmodelrequiresprogramstobeproperlylabelledwithsystem-providedsynchronizationprimitives,sothattheDSMsystemisexplicitlynotifiedofaprocessor’sentrytoandexitfromacriticalregionandcantherebyselecttheexittimetopropagatemodifications.TheECmodel,furthermore,requirestheprogrammertoassociatesynchronizationdataobjectsexplicitlywithordinarydataobjectstoachievedataselection.TheScCmodelmadeonesteptoward(partially)transparentdataselectionbytakingadvantageoftheconsistencyscopesimplicitlydefinedbysynchronizationprimitives,butprogrammersmaystillhavetodefineadditionalconsistencyscopesexplicitlyinprogramsinordertoguaranteesequentialconsistency.Weshouldbeverycautiousofrequiringprogrammerannotationwhichimposesanextraburdenonprogrammersandincreasesthecomplexityofparallelprogram-ming.

8

Wedistinguishtwotypesofprogrammerannotations:synchronizationannotationswhicharerequiredtoensureboththecorrectnessofparallelprograms(toavoiddataraces)andthecorrectnessofmemoryconsistency;andannotationswhicharerequiredonlyforthecorrect-nessofmemoryconsistency.Forthefirsttypeofannotations,suchasacquireandreleaseprimitivesintheERC,LRC,ECandScCmodels,theDSMsystemcantakeadvantageofthemtoachievetimeand/orprocessorselectionwithoutimposinganadditionalburdenonprogrammers.However,forthesecondtypeofannotations,suchastheassociationbetweensynchronizationobjectsanddataobjectsfordataselectionintheECmodel,andtheaddi-tionalconsistencyscopesintheScCmodel,theyaretrulyanextraburdenonprogrammersandincreasethecomplexityofparallelprogramming.Theyshouldbereplacedbyautomaticassociationsviarun-timedetectionand/orcompile-timeanalysis[11].

InthefollowingsectionwepresenttheView-basedConsistencymodel,whichachievestime,processor,anddataselectionandremovestheextraburdenonprogrammersrequiredbyprevi-ousdataselectiontechniques.

3View-basedConsistency

TheView-basedConsistency(VC)model[16]hasbeenproposedtoachievedataselectiontransparentlywithoutprogrammerannotation.SimilartootherRSCmodels,itguaranteesSequentialConsistencyforproperlylabelleddata-race-freeprograms.

Aviewisasetofordinarydataobjectsthataprocessorhastherighttoaccessinthesharedmemoryataparticularpointintime.Wesayaprocessorhastherighttoaccesssomedataobjectifandonlyifithasgainedexclusiveaccesstothedataobjectorthedataobjectisread-only.

Ingeneralwesaythattheviewofaprocessorisasetofdataobjectsthattheprocessorhastherighttoaccess.Indata-race-freeprograms,wecansaythattheviewcomprisesdataobjectsthattheprocessorwillaccessinsomefollowingperiodofexecution,sincetheprocessorshouldhavegottheaccessrighttothedataobjects.

Atanytimepointofanexecutioninadata-race-freeprogram,supposeanytwoprocessors

and

haveviews

and

,respectively.Then

mustonlycontainread-onlydata

objects,otherwiseadataracewouldoccur.ThisisillustratedinFig.3.

9

read−only datashared memoryP1P2P3Figure3:Asnapshotofprocessors’viewsinadata-race-freeprogram

InaDSMprogram,exclusiveaccesstoadataobjectcanonlybegainedinthefollowingthreeways:

1.implicitassignmentbytheprogrammerbymakinguseofabarriersession.Exclusiveaccessisguaranteedbybarriers.

2.explicitacquisitionbycallingtheacquireprimitive.Exclusiveaccessisguaranteedbythelockmechanismofcriticalregions.

3.implicitacquisitionguaranteedbytheprogrammerwhocontrolsaccesstothedataobjectbyassociatingitwiththestatusofanothercritical-region-protectedobject.Forexample,exclusiveaccesstoataskcanbeguaranteedbyremovingthetaskfromalock-protectedtaskqueue(andthusthestatusofthetaskqueuehasbeenchanged).

Therefore,inanexecutionofaDSMprogram,onlywhenaprocessorcallssynchronizationprimitives,e.g.,barrier,acquire,andrelease,doesitsviewchange.Aprocessor’sviewisconstantinsideacriticalornon-criticalregion.Onlywhenaprocessormovesfromoneregiontoanotherdoesitgainorloseexclusiveaccesstosomedataobjects,whichcausesachangetotheviewoftheprocessor.

Accordingtotheaboveobservation,viewscanbehandledandprocessedbasedonregions(criticalandnon-critical).Forconvenienceofdescriptioninthispaper,weclassifyviewsintoCriticalRegionViews(CRVs)andNon-criticalRegionViews(NRVs).ACRVistheviewofaprocessorwhileitexecutesacriticalregion,andanNRVistheviewofaprocessorwhileitexecutesanon-criticalregion.Indata-race-freeprograms,aCRVconsistsofthedataobjectsaccessedinthecriticalregion,andanNRVconsistsofthedataobjectsaccessedinthenon-criticalregions.

10

ConsistencymaintenanceinVCrequiresupdatingdataobjectsoftheviewbeforeaprocessorentersaregion.Moreprecisely,thefollowingconsistencyconditionsaregivenfortheView-basedConsistency(VC)model.AnyimplementationofVCshouldsatisfytheseconditions.Definition4ConditionsforView-basedConsistency

Beforeaprocessor

isallowedtoenteracriticalornon-criticalregion,allprevious

writeaccessestotheordinarydataobjectsoftheCRVorNRVmustbeperformedwithrespectto

accordingtotheircausalorder.

Beforeaprocessorisallowedtopassabarrierprimitive,allpreviouswriteaccesses

mustbeperformedwithrespectto

accordingtotheircausalorder.

Thesequentialconsistencyofsynchronizationdataobjectsmustbeguaranteedbytheimplementationofthesystemprimitivessuchasacquire,release,andbarrier.

Awriteaccesstoamemorylocationissaidtobeperformedwithrespecttoprocessoratimepointwhenasubsequentreadaccesstothatlocationbywriteaccess.

at

returnsthevaluesetbythe

Insideabarriersession,basedonthefirstcondition,aprocessorisguaranteedtoaccesstheup-to-dateversionofthedataobjectsinitsview.

Thesecondconditionstatesthatwhenaprocessorreachesabarrier,itshouldbeabletoseeallthepreviousmodificationsmadebyotherprocessors.ThisconditionisalsorequiredinEC,ScCandLRC.

SCcorrectnessofVCProcessorsaresynchronisedtomodifythesameviewoneafteran-other,butmaymodifydifferentviewsconcurrentlyinanydata-race-freeprogram.Basedonthisobservation,foranyparallelexecutionofaDSMprogramunderVC,wecanproduceaglobalsequentialorderofthemodificationsonviews,inwhichthemodificationsonthesameviewareorderedinthesamewayasthesynchronisedorderoftheparallelexecution,andthemodificationsondifferentviewsareputinprogramorderiftheyareexecutedsequentiallyintheprogram,otherwisetheyareparallelandputinanyorder.Parallelmodificationsondiffer-entviewscanbeexecutedinanyorder,whichwillnotaffecttheexecutionresult.Obviously,accordingtotheconsistencyconditionsforVC,theparallelexecutionresultoftheprogramun-derVCisthesameastheabovesequentialexecutionofthemodifications.Therefore,aglobal

11

sequentialorderhasbeenfoundtomatchtheparallelexecutionresultunderVC.AccordingtothedefinitionoftheSCmodel,VCcanguaranteeSequentialConsistencyfordata-race-freeprograms.

ThefirstconsistencyconditionofVCalsoindicatesthat,whenaprocessorisenteringanewregion,itisonlyrequiredtoupdatethosedataobjectsinthecorrespondingnewview.Inthisway,VCachievestimeselection(attheentryofaregion),processorselection(thenextprocessorwhichhastherighttoaccesstheview),anddataselection(onlythedataobjectsoftheview).

AnyimplementationoftheVCmodelshouldconformwiththeaboveconsistencycondi-tions.Therearetwoimportanttechnicalissuesintheimplementation:viewdetectionandviewtransition.Viewdetectionmeansidentifyingallthedataobjectsinthenewviewofapro-cessorbeforeitentersanewregion.Indata-race-freeprograms,theviewconsistsofthedataobjectstobeaccessedbytheprocessorinthenewregion.Thusviewdetectionisapredictionofthedataobjectsthatwillbeaccessedbytheprocessorduringexecutionofthenewregion.Viewtransitionmeansupdatingallthedataobjectsinthenewviewofaprocessorbeforeitentersanewregion.Inshort,beforeaprocessorentersanewregion,itsviewshouldhavebeendetected;whenaprocessorentersanewregion,theviewtransitionshouldhavebeenachieved.AnyimplementationoftheVCmodelshouldguaranteethatviewdetectionandtransitionareimplementedcorrectly.

Correctnessandaccuracyaretwoimportantissuesinviewdetection.Acorrectviewshouldincludealldataobjectsthataprocessorhastherighttoaccess.Anaccurateviewshouldincludeandonlyincludethosedataobjects.ThecorrectnessofviewdetectionmustbesatisfiedinVCimplementation,whileinaccuracyofviewdetectionmayonlyaffectitsperformance(e.g.,propagationofirrelevantmodificationsofdataobjects).Ofcourse,theperformancealsodependsontheeffectivenessofviewdetectionandviewtransition.WewilldiscusstechniquesforviewdetectionandtransitioninSection4.

Inthefollowingdiscussion,weassumetheviewofeveryprocessoriscorrectlyandaccu-ratelydetectedunderanidealsituation,inordertoexplainhowthegenericVCmodelworks.Fig.4showshowtheVCmodelworksusingthesamescenarioshowninFig.1.InFig.4,weassumetheaccessestodataobject

aresynchronisedbythestatusof,asinthethird

wayforacquiringexclusiveaccessdescribedearlyinthissection.Underthisassumptionitis

12

P1A(l1)w(z)w(y)R(l1)P2P3P4

xA(l2)w(x)R(l2)w(x1)w(x2)x A(l2) r(x)w(x)R(l2)xx1A(l2) r(x)R(l2) r(x1)yA(l1) r(y)R(l1)w: write r: read A: acquire R: release

: request update on x and execute it at the processorFigure4:View-basedConsistencyinaction

andtoracefor.

program order

notpossiblefor

Weassumetherearetwocriticalregionswhichareprotectedbylocksand

and

,respectively,

when

and

accessdataobjectsinnon-criticalregions.Wealsoassumethattheviewofevery

processorhasbeendetectedcorrectlyateachstep.Thedetectedviewofitentersthecriticalregion,thedetectedviewofcriticalregions,andthedetectedviewof

includes

and

includes

whentheyentertheir

inthenon-criticalregionincludes

.Before

executesthecriticalregion,onlythemodificationofitsview.For

ispropagatedtotheprocessortoupdate

and

,onlythemodificationon

ispropagatedtothemtoupdatetheirviews

inthesamecriticalregion.When

entersthenon-criticalregion,onlythemodificationon

ispropagated.Thelockacquisitionandmodificationpropagationareseparateinthefigure,

buttheycanbecombinedinanimplementationinordertoimproveDSMperformance.ComparisonwithrelatedmodelsAmongtheRSCmodels,onlyScC[17]andEC[5]canachievedataselection.However,theycannotguaranteeSCcorrectnessfordata-race-freeprograms(seeSection5fordetails).InordertoguaranteeSCcorrectness,extraprogrammerannotationsarerequiredinprograms.Toselectivelyupdatedataobjects,ECusesguardedshareddata

andScCusesscope.Guardedshareddata

isasetofdataobjectsassociated

withasynchronisationdataobject.Thisassociationisspecifiedbytheprogrammer.AscopeinScCconsistsofsomeprogramsectionsprotectedbyasynchronizationdataobjectsuchasalock.Both

andscopearestaticandfixedforaparticularsynchronizationdataobjector

acriticalregion.Evenifsomedataobjectsarenotgoingtobeaccessedbyaprocessorinacriticalregion,theyareupdatedsimplybecausetheyareassociatedwiththelockorthecriticalregion.Forexample,inFig.4,suppose

and

areassociatedwithlock

inEC.Thusafter

acquires

,modificationson

andarepropagatedtoit,eventhoughitisnotgoingto13

access.Thiscircumstanceisnotuncommoninparallelprograms.Incontrast,aviewinVC

isdynamicandmaybedifferenteverytimethesamecriticalregionisentered.Forexample,inFig.4,althoughincludes

updatesboth

and

inthe

-relatedcriticalregion,

’sviewonly

when

enterstheunderVC.

-relatedcriticalregion.Thereforeonlythemodificationon

ispropagatedto

ComparedtoLRC,VCcanreducemoreofthefalse-sharing2effectinpage-basedDSMsystems(seeSection5fordetails).Thefalse-sharingeffectisthepropagationofuselessmod-ifications,whichiscausedbyfalsesharing.

ProgramminginterfaceTheVCmodelprovidesthesameprogramminginterfaceasLRC,ERCandWC.ItcanguaranteeSequentialConsistencyfordata-race-freeprogramsthatareproperlylabelled.Incontrast,ECrequirestheprogrammertoprovidecorrectlock-dataasso-ciation.Ifthelock-dataassociationisnotcorrect,ECdoesnotguaranteethecorrectexecutionoftheprogram.Similarly,ScCdoesnotguaranteethesameexecutionresultasSequentialConsistencyforsomedata-race-freeprogramsifexplicitscopeannotationsarenotcorrectlyprovidedbytheprogrammer(seeSection5fordetails).

Fromtheabovediscussion,weknowthat,inthegenericVCmodel(anidealimplementationofVC)whereviewscanbeaccuratelydetected,nouselessmodificationswillbepropagatedamongprocessors.Therefore,thegenericVCmodelcanachievethemaximumdataselection.Inotherwords,thegenericmodelcanachievethemaximumrelaxationofconstraintsonmod-ificationpropagationandexecutionindata-race-freeprograms.Asaconsequence,webelievethegenericVCmodelcanbeusedasaframeworkforfutureresearchonDSMimplementa-tions.

Tosummarise,wemakethefollowingstatementswhicharetrueinthegenericVCmodel.

Theviewofaprocessorisconstantwithinaregion.

Whenaprocessorentersanewregion,onlythedataobjectsofitsnewviewareupdated.

SequentialConsistencyfordata-race-freeprogramsisguaranteed.

Time,processor,anddataselectionareachieved.Dataselectionisachievedwithoutprogrammerannotations.

Maximumrelaxationofconstraintsonmodificationpropagationandexecutionfordata-race-freeprogramsisachieved.

Inthefollowingsection,wewilldiscussourimplementationoftheVCmodel.

4Implementation

WehaveimplementedtheVCmodelbasedonTreadMarks[3],whichisapage-basedDSMsystem.InTreadMarks,adiffisusedtorepresentmodificationsonapage.Initiallyapageiswrite-protected.Whenawrite-protectedpageisfirstmodifiedbyaprocessor,atwinofthepageiscreatedandstoredinthesystemspace.Whenthemodificationsonthepageareneededbyanotherprocessor,acomparisonofthetwinandthecurrentversionofthepageisdonetocreateadiff,whichcanthenbeusedtoupdatecopiesofthepageinotherprocessors.Inourimplementation,apageistreatedasthebasicunitofdataobjects.Thusaviewintheimplementationconsistsofpages.Thediffscanberegardedasmodificationsonpages.

4.1Viewdetection

Inconsistencymaintenanceweonlyneedtoknowwhicharethemodifieddataobjectsandthenupdatethem.Likewise,tomaintaintheconsistencyofaview,weonlyneedtoupdatethemodifieddataobjectsintheview.Therefore,wearenotinterestedintheunchangedpagesofaviewandthusonlythemodifiedpagesarerecordedinourimplementationofviewdetection.Inourimplementation,viewdetectionisachievedatruntime.Todetectmodifiedpagesinaview,ourimplementationtakesadvantageofthefollowingtwoexistingmechanismsinTreadMarks:

1.Whenawriteaccessisperformedonaninvalidatedpage,apagefaultwilloccur.Apagefaulthandlerwillrequestthediffsofthepagefromotherprocessors.Weextendedthepagefaulthandlertostorethepage’sidentifierintothecorrespondingview.

15

2.Whenawriteaccessisperformedonawrite-protectedpage,aprotectionviolationin-terruptwilloccur.Aninterrupthandlerwillmakeatwinofthepageunderthemultiple-writerscheme[8].Weextendedtheinterrupthandlertostorethepage’sidentifierintothecorrespondingview.

SincetheabovetwomechanismsalreadyexistedinTreadMarks,therewaslittleoverheadforstoringtheidentifiersofmodifiedpages.4.1.1Viewdetectionincriticalregions

Viewsincriticalregionsarerelativelyeasytodetect.AllthepagespreviouslymodifiedinthesamecriticalregionareincludedinthecorrespondingCRV,becausetheywillbeverylikelyaccessedlaterbyaprocessorexecutingthesamecriticalregion,althoughprogramlogiclikeifstatementsmayaffecttheaccesspatternofaprocessor.However,thepagesmerelymodifiedfromoutsideacriticalregionareexcludedfromtheCRV,sincetheirdiffsshouldn’tbeusedtoupdatetheCRV,asthiswouldindicatethereisadataraceintheprogram.

WeconstructaCRVbyrecordingpagesmodifiedinsideacriticalregion.However,ifapageisalreadywritablebeforeanewregionisentered,thatpagewillnotbedetectedandstoredintheCRVorNRVifitismodifiedintheregion.Todetectallmodifiedpagesinaregion,wemakeallwritablepageswrite-protected(read-only)whenenteringtheregion.Thisistheadditionaloverheadrequiredforviewdetection.Itisproportionaltothenumberoftimescriticalregionsareexecutedintheprogram.Fortunately,theoverheadwillbebalancedbytheperformancegainwhichisalsoproportionaltothenumberoftimescriticalregionsareexecuted.Ourexperimentalresultsdemonstratethisadditionaloverheadisrelativelysmall.Basedontheabovewrite-protectionmechanism,ourviewdetectionalgorithmworksasbelow.Apageset

isassociatedwitheachlockprotectingacriticalregion.Initially

is

empty.Duringanexecutionofthecriticalregion,ifapage

isdetectedasmodified,then

.Thepageset

includesallthepagesmodifiedinthecriticalregion.Since

aprocessorenteringthecriticalregionwillverylikelyaccessthepagesinthedetectedCRV,andisreasonablyaccurate.

,

isusedas

16

4.1.2Viewshrinkageinnon-criticalregions

Unfortunately,viewsinnon-criticalregionsarenotaseasytodetectasincriticalregions,becausetherearenoobviousconnectionsbetweennon-criticalregions.Themodifiedpagesinpreviousnon-criticalregionsmaynotbelongtotheviewofasuccessivenon-criticalregion.However,asforaCRV,itiscertainthatanNRVshouldonlyincludepagesmodifiedinpreviousnon-criticalregions,sincemodificationsmadeincriticalregionsshouldn’tbeusedtoupdateanNRV,otherwise,itindicatesthatadataraceoccursintheprogram.Ofcourse,amodifiedpagemaybeincludedinbothaCRVandanNRVorintwoCRVsbecauseofthefalse-sharingproblem.

Inourimplementation,anNRVinitiallyconsistsofallpagespreviouslymodifiedinnon-criticalregionsandthusadetectedNRVmaybelargerthantherealone.Thismeansthatwhenaprocessorexecutesanon-criticalregionitmaynotaccesssomepagesintheNRV.Thisinaccuracyonlyaffectstheperformance(lessdataselectionandnoreductionoffalse-sharingeffectinnon-criticalregions),notthecorrectnessoftheimplementation.

Toimprovetheperformanceofourimplementation,weuseRegionalLocality[15]tody-namicallyshrinktheNRVduringexecutionofanon-criticalregion.RegionalLocalityisbasedontheobservationthatasetofpagesthatareaccessedinanon-criticalregionwillbeverylikelyaccessedasawholeinothernon-criticalregions.Forexample,supposeprocessoranon-criticalregionandaccessespages

enters

,

,...,

duringexecutionofthenon-critical

region,andprocessor

entersanothernon-criticalregionafterwards.Sincedataobjectsac-

cessedinanon-criticalregionoftenmigratetogetherfromoneprocessortoanotherprocessor,asregulatedbytheprogrammertoshiftworkloadamongprocessors,whentwomembersofthepagesetset.

ToexploitRegionalLocalityinnon-criticalregions,wedetectthepagesmodifiedinnon-criticalregionsandaggregatethem.WeadoptaModifiedPagesSet(MPS)forgroupingpagesmodifiedinnon-criticalregions.AnMPSisformedasbelow.Whenaprocessorentersanon-criticalregion,auniqueemptyMPSiscreatedforthenon-criticalregion.Iftheprocessormodifiesapageduringexecutionofthenon-criticalregion,theidentifierofthepageisstoredintotheMPS.Whentheprocessorexitsfromanon-criticalregion,theMPSbecomescompleteandisstoredforlateruse.

17

accessesoneor

,

,...,

,itwillverylikelyaccesseverymemberofthat

Weusesomehintstodecidewhetherweshouldshrinktheviewofaprocessorinanon-criticalregion.ThefirsthintweuseistheaccesstoanypageinanMPS(calledthefirsthitoftheMPS).ThishintsuggeststhatthefullsetofthepagesintheMPSmightbetherealviewoftheprocessorexecutingthenon-criticalregion.IftheaccessedpagebelongstomultipleMPSs,weusetheaccesstoanotherpage(calledthesecondhitoftheMPS)toconfirmwhichMPSmightbetherealview.OnceanMPSisassumedtobetherealviewoftheprocessor,theprocessorreducesitsviewtotheMPS.Theshrunkenviewcanbeusedtoprefetchdiffsoftheviewinthenon-criticalregion.Wewilldiscusshowtotakeadvantageoftheshrunkenviewinviewtransition.

Thedetectedviewsarepropagatedtoaprocessorwhenitcallsacquire.Thustherearenoextramessagesforviewpropagation,exceptforsmalldataofviewspiggy-backedonthelockreleasemessage.

4.2Viewtransition

Viewtransitionistomakeaview(asetofpages)uptodate.Beforeanewregionisentered,viewtransitionmustbedone.Modificationpropagationandexecutionprotocols(alsocalledconsistencyprotocols)areusedtoachieveviewtransition,wherepagesthatarerequiredtobemadeconsistentbytheconsistencymodelaremadeuptodate.Thetwocommonmodifica-tionpropagationandexecutionprotocolsaretheinvalidationandtheupdateprotocol.Iftheinvalidationprotocolisadopted,whenapageisrequiredtobemadeconsistent(uptodate)bytheconsistencymodel,itisinvalidatedfirstandthediffsarepropagatedandappliedbyapagefaultlater.Iftheupdateprotocolisadopted,whenapageisrequiredtobemadeconsistent,thediffsarepropagatedandappliedimmediately.

Fromanimplementationpointofview,ifaviewisaccuratelydetectedanditspagesaretobeaccessedbytheprocessor,itwouldbemoreefficienttousetheupdateprotocoltopropagateandapplythediffs,ratherthanusingtheinvalidationprotocol.However,ifthedetectedviewislargerthantherealview,theinvalidationprotocolmaybemoreefficient.Sincethedetectedviewhaspagesthatwillnotbeaccessedbytheprocessor,theupdateprotocolwillthuspropa-gateuselessdiffs.Incontrast,theinvalidationprotocolpropagates(small)invalidationnoticesfirsttoinvalidatethepagesinthedetectedview,butthediffsofapagearepropagatedonlywhenthepageisintherealviewandthusaccessedbytheprocessor.Inthisway,theprop-18

agationsofuselessdiffsareavoidedintheinvalidationprotocol,withtheoverheadofmoremessages(andpagefaults).Iftheuselessdiffsarehuge,thereisachancefortheinvalidationprotocoltobemoreefficient.

TheupdateprotocolisgenerallysuitableforVC,justastheinvalidationprotocolisforLRC(seeSection5fordetails).However,sincethedetectedNRVsinourimplementationarenotaccurateandarenormallylargerthantherealonesinourimplementation,weadoptedthein-validationprotocolfortheviewtransitionintonon-criticalregions.TheinvalidationnoticesfortheNRVofaprocessorarepropagatedtotheprocessorwhenitcallsacquire,butaredis-abledincriticalregions.Theyareenabledandeffectiveinanynon-criticalregions.Whenaprocessorentersanon-criticalregion,thepageswiththedisabledinvalidationnoticesarein-validated.However,thesenoticeswillbedisabledagaininthefollowingcriticalregioniftheirpagesarenotaccessedinthenon-criticalregionnorbelongtothefollowingCRV.Obviously,comparedwithTreadMarks,enabling/disablinginvalidationnoticesisanotheroverheadinourimplementation.Againthisoverheadisproportionaltothenumberoftimescriticalregionsareexecuted.

Duringexecutionofthenon-criticalregion,theNRVmayshrinktooneoftheMPSs.OncetheNRVshrinks,weprefetchthediffsofthepagesintheshrunkenNRVandapplythemtotheirpages.Thisoptimisationdoesn’tchangetheeffectivenessoftheinvalidationnoticesofotherpages.

5Comparisonwithrelatedwork

Ourimplementationcanimproveperformanceattwolevels:theconsistencymodelandcon-sistencyprotocollevels.Attheconsistencymodellevel,itonlyupdatespagesintheviewofaprocessorenteringanewregion.ThedetectedCRVsaregenerallyaccurateandtheupdatepro-tocolisusedtoupdatethem.TheaccurateCRVsdetectedinourimplementationhelpreducethefalse-sharingeffect,aswillbediscussedshortly.InLRCallpreviouslymodifiedpagesmustbeupdatedandtheinvalidationprotocolisusedtoalleviatetheoverheadofpropagationofuselessdiffs.

Attheconsistencyprotocollevel,besidestheupdateprotocoladoptedforCRVs,ourimple-mentationusesshrunkenviewstoprefetchthediffsinnon-criticalregions.Thistechniqueis

19

similartosomeoptimalconsistencyprotocols[23,14,4],whichhavebeenproposedtoim-provetheperformanceoftheLRCmodel.Theseprotocolschooseoptimalwaystopropagatediffs,e.g.,prefetchingofdiffs.However,theyworkatthelevelofmodificationpropagationinLRC,insteadofatthelevelofaconsistencymodel.Theydon’tchangetheconsistencyconditionsofLRC.Therefore,theyinheritthefalse-sharingeffectfromLRC.Thoughtheycanreducethenumberofmessagesandtheoverheadofpagefaultsbyprefetching,theycan’tremovethefalse-sharingeffectinLRC,i.e.,thepropagationofuselessdiffs.ThesetechniquesarecomplementarytoourVCimplementationandmaybeusefultofurtheroptimizethediffpropagationinnon-criticalregionsinourimplementation.

P1

T1=sh_malloc(size_of_task);produce_task(T1);Acquire(L1);enqueue(Q, T1);Release(L1);

P2

Acquire(L1);dequeue(Q, T2);Release(L1);

consume_task(T2);

Figure5:Ataskqueueprogram

Toexplainthesubtledifferencesamongrelatedmodels,wewilldiscussataskqueuepro-gramshowninFig.5.Tomakethecomparisonfair,inthediscussionweassumetheinvali-dationprotocolisusedwiththemodelsbeingcompared.Thisassumptionisvalid,sinceRSCmodelsaddresswhentomakewhichpagesconsistentinpage-basedDSMsystemsandthusareindependentofmodificationpropagationandexecutionprotocols(alsocalledconsistencyprotocols)suchastheinvalidationorupdateprotocol,althoughthoseprotocolsmayaffecttheperformanceoftheirimplementations.IntheprograminFig.5,thevariable

isusedasataskqueue.Processor

producesa

task

andputsitinto

,andthenprocessor

getsatask

from

andconsumesthe

task,wheretasks.

and

arelocalvariablesthatstoretheaddressesoftheshareddataobjectsfor

Fig.6illustratestheread/writeaccessestotheshareddataobjectsinthetaskqueueprogram.

20

shared memorypage1page2T1QEC and ScC: invalidate page 1.LRC: invalidates page 1 and 2.VC: propagates invalidation noticesfor page 1 and 2, but the notice forpage 1 is only effective in the critical region and the one for page 2 is only effective in the non−critical region.P1write(*T1)Aread(Q), write(Q)R...shared memorypage1page2program orderT2QP2A: acquire R: release

Aread(Q), write(Q)Rread(*T2), write(*T2)program order

Figure6:DifferencesamongEC,ScC,LRC,andVC

ItisusedtoexplainhowdifferentRSCmodelsmaintainconsistencyinapage-basedDSM.Tomakethesubtledifferencesmoreclear,weassume

and

happentopointtothesametask

,whilein

datawhichislocatedinpage2,while

isinpage1.

InEC,theprogrammerisrequiredtoannotatethatisassociatedwithlock

ScCthisassociationcanbedetectedatrun-time.InbothECandScC,onlymadeuptodatebefore

isrequiredtobe

entersthecriticalregion.Therefore,theinvalidationnoticeforpage

1ispropagatedto

andpage1ismadeinvalidthere.Alaterpagefaultonpage1willbring

ituptodate.Sincepage2isnotinvalidated,toby

willnotreadtheup-to-datetaskdatapointed

.ThusSCcannotbeguaranteedforthisdata-race-freeprogramunderECandScC.

InLRC,allpreviouswriteaccesses,includingthewriteson

andthetaskdatapointedby

,mustbeperformedbeforetheacquireisperformedin

.Thustheinvalidationnotices

forpage1and2arepropagatedandpage1and2aremadeinvalid.Laterpagefaultswillbringthemup-to-date.

InVC,allpreviouswriteaccessestothedataobjectsinaviewmustbeperformedbeforeaprocessorentersthecorrespondingregion.Run-timeviewdetectioninVCcandeterminethatpage1()belongstotheCRVandthatpage2(thetaskdatapointedtoby

)belongsto

theNRV.Thustheinvalidationnoticesforpage1and2arepropagatedto

,however,the

noticeforpage1isonlyeffectiveinthecriticalregiontobring

uptodate,andtheonefor

21

page2isonlyeffectiveinthenon-criticalregiontobringthetaskdatauptodate.MindfulreadersmayrealisethatthemorecomplicatedprocessingoftheinvalidationnoticesinVCisnotsuperiortoLRCinthisexample,giventheextraoverheadofenablinganddisablinginvalidationnotices.However,theindividualtreatmentsofthenoticesfordifferentregionscanresultinlessfalse-sharingeffectinmanyothersituations,aswillbedescribedshortly.FromtheabovedescriptionsweknowtheRSCmodelsaredifferentintermsofwhentomakewhichpagesuptodate,i.e.,theirconsistencyconditionsaredifferent.However,thereisanotherimportantaspectofdifferenceamongthemregardingdataselection.ThroughuserannotationinEC,run-timedetectionofscope-dataassociationinScC,orrun-timeviewdetec-tioninVC,thesemodelshavefirmknowledgethatpage1()willbeverylikelyaccessedin

thecriticalregion,whichcanbereadilyusedtoimprovetheirimplementations.Forexample,insteadofadoptingtheinvalidationprotocolwhichcausesonepagefaultandonemodification(diff)requestintheaboveexample,theycanadopttheupdateprotocolwhichcanpiggy-backthediffsonthelockgrantingmessage,i.e.,lockrelease(thoughextramessagesmaybere-quiredifdiffsarehuge)andthussavetwomessages(diffrequestandreply)andonepagefault.

Ontheotherhand,LRChasnosuchknowledgeandthusdoesn’tknowwhichdiffswillbeuseful.Therefore,theinvalidationprotocolcanhelpavoidpropagationofuselessdiffsinsuchasituation.Ofcourse,someextraworkcanbedonetoimprovethediffpropagationinLRC.ExamplesaretheAffinityEntryConsistency(AEC)protocol[23]whichcanpre-sendthemodificationsbasedonLockAcquirerPrediction(LAP),andtheHeuristicDiffAcquiring(HDA)protocol[14]whichadoptstheupdateprotocolforpagesmodifiedincriticalregionsandpiggy-backsthediffsonthelockgrantingmessage.However,theseimprovementsdon’tchangetheconsistencyconditionsinLRCsuchaswhichpagesneedtobemadeuptodateatacquiretime.ThefollowingexamplewillillustratethedifferencebetweenLRCandVCintermsofreducingthefalse-sharingeffectandthedifferencebetweenVCandoptimaldiffpropagation(orconsistency)protocolssuchasAEC.

Fig.7presentsanexampleofthefalsesharingeffectinLRC.Intheexample,dataobjects

and

lieinthesamepage.Processor

modifies

inthefirstcriticalregion

andthenshould

modifies

inthesecondcriticalregion

.Afterexitsfrom

,

entersit.Although

onlyaccessesdataobjectsand,accordingtoLRC,bothpage1andpage2of

22

shared memory

page1

page2

YX

Z

read(Y), write(Z)P1Awrite(X)RARupdate X and Z by invalidatingpage 1 and 2 of P2CR1CR2page fault because of true sharingprogram orderpage fault becauseof false sharingP2Aread(Z), read(Y)Rprogram order

shared memorypage1page2A: acquire R: release: invalidated pageYXZbeinvalidated.Whensharing.

Figure7:FalsesharingeffectinLRC

reads,thereisapagefaultonpage1,whichiscausedbyfalse

shared memory

page1

page2

YXZ

read(Y), write(Z)P1Awrite(X)RARpropagate invalidation noticesfor page 1 and 2 to P2, but onlyupdate Z by invalidatingpage 2 of P2CR1CR2page fault because of true sharingprogram orderP2Aread(Z), read(Y)Rprogram order

shared memorypage1page2A: acquire R: release: invalidated pageYXZFigure8:ReductionofthefalsesharingeffectinVC

Forthesameexample,Fig.8showshowVCremovesthateffectoffalsesharing.VConlyrequiresupdatingthedataobjectsofthecurrentviewofaprocessor.Theviewofonlyincludesdataobjects

in

and

.Since

isnotmodifiedby

,VConlyupdatesby

invalidatingpage2,thoughtheinvalidationnoticeforpage1isalsopropagated(butdisabled)

23

andmaybeeffectivelatertobring

uptodatein

.Byrestrictingtheeffectivescopeofthe

invalidationnoticeforpage1(whichisofnouseforupdating

and

intheviewof

),VC

canavoidthepagefaultonpage1causedbyfalsesharing.Naturally,aswesaidbefore,theupdateprotocolshouldbeadoptedtomaketheimplementationofVCmoreefficientwiththeknowledgeofviews.

Aswementionedbefore,LRCcanbeimprovedbyadoptingmorecomplicateddiffpropaga-tionprotocols,suchasAEC[23]andHDA[14].AECpre-sendsdiffsfromcriticalregionstotheprocessorsthatwilllikelyenterthesamecriticalregionsbasedonLockAcquirerPrediction(LAP).FortheexampleinFig.7,AECmaybeabletopredictthat

willenter

andthus

thediffofpage2canbepiggy-backedonthelockreleasemessageandappliedtopage2before

isentering

.Inthisway,thenumberofpagefaultsanddiffrequest/replymessagescan

bereduced.Similarly,HDAuseshistoryrecordstopre-sendlikely-usefuldiffstoonlythenextprocessorenteringthesamecriticalregion.ThoughtheperformanceofLRCcanbeimprovedbytheaboveoptimaldiffpropagationprotocols,theydon’tchangetheconsistencyconditionsofLRC.Thisistheessentialdifferencebetweenconsistencymodelsanddiffpropagation(orconsistency)protocols.Therefore,inFig7,theinvalidationnoticeforpage1isstillrequired(byLRC)tobeeffectiveinAECandHDA.Consequently,thepagefaultonpage1willoccurduetofalsesharinginAECandHDA,whichreflectstheiressentialdifferencefromVC.

shared memory

page1

page2

YX

Z

read(Y), write(Z)P1Awrite(X)RARpropagate invalidation noticesfor page 1 and 2 to P2, but onlyupdate Z by invalidatingpage 2 of P2CR1CR2page fault because of true sharingprogram orderenable the invalidation notice for page 1 to cause page faultbecause of w/w false sharingP2Aread(Z), write(Y)Rprogram order

shared memorypage1page2A: acquire R: release: invalidated pageYXZFigure9:Thew/wfalsesharingeffectintheVCimplementation

24

Therearetwokindsoffalse-sharingeffect,write/read(w/r)andwrite/write(w/w).Aw/rfalse-sharingeffectoccurswhenoneprocessormodifiesashareddataobjectthatliesinthesamememoryconsistencyunit(e.g.,apage)asanothershareddataobject,whileanotherpro-cessorreadstheothershareddataobject.Forexample,thefalse-sharingeffectinFig.7isw/r.Aw/wfalse-sharingeffectoccurswhenoneprocessormodifiesashareddataobjectthatliesinthesamememoryconsistencyunit(e.g.,apage)asanothershareddataobject,whileanotherprocessorwritestotheothershareddataobject.WhileourcurrentimplementationofVCcanreducew/rfalse-sharingeffectasshowninFig.8,itcan’treducethew/wfalse-sharingeffectsinceacoarse-graineddiffmanagementschemeisadopted.Forexample,inFig.9(whichissimilartothescenarioinFigs.7and8,except

writesratherthanreads

),when

writes

onpage1,anyimplementationofVCwillfacetwochoices.Oneistoignorethedisabled

invalidationnoticeforpage1,makeatwinofpage1andmakeadifflater.Thisschemehastodistinguishthediffsfromdifferentviewsandacomplicated,fine-graineddiffmanagementwillbeinevitable.Thebenefitoftheschemeisthereductionofthew/wfalse-sharingeffect.Thealternativechoiceistoenabletheinvalidationnoticeforpage1,bringituptodatethroughthepagefaultbeforemakingitstwin,asshowninFig.9.Whiletheschemesimplifiesdiffmanagement,ittoleratesthew/wfalse-sharingeffect.Weadoptedthelatterschemeinourcurrentimplementation,butwillinvestigatefine-graineddiffmanagementinthefuture.SCcorrectnessofourVCimplementationOurimplementationcannormallyguaranteeSequentialConsistencyfordata-race-freeprograms.Itguaranteesthatpagesthatarepreviouslymodifiedinacriticalregionareupdatedwhenaprocessorisenteringthesamecriticalregion.Thatmeans,pagesthatarenotmodifiedinthecriticalregionarenotputintotheCRVintheviewdetection.Formostdata-race-freeprograms,itistruethatdataobjectsrequiredtobeupdatedandthenaccessedinacriticalregionareonlythosepreviouslymodifiedinthesamecriticalregion.OurVCimplementationcanguaranteeSequentialConsistencyforthosedata-race-freeprograms.However,theremaybecaseswherepagesmodifiedinnon-criticalregionsareaccessedinacriticalregioninadata-race-freeprogram,inwhichcasesourimplementationcan’tguaranteeSequentialConsistency.Forexample,ifwemodifytheprograminFig.5,sothatwhen

dequeuesataskitalsochecksthetaskdatainthecriticalregion.Inthatcase,the

processorwon’treadtheup-to-datetaskdatainthecriticalregion(butwillreadtheup-to-datetaskdatainthefollowingnon-criticalregions)inourimplementation,thoughthereisnodata

25

raceintheprogram.Butweclaimthiskindofprogramisverypeculiar,sincethecriticalregionisdesignedtoprotectonlythetaskqueue,notthetaskdata.Fortunately,wehaven’tfoundthiskindofpeculiarprogramsinourapplicationsyet.

Iftheabovepeculiartypeofprogramneedstobesupported,wecanenhanceourimplemen-tationwithcompile-timeanalysis.Oncetheaboveaccesspatternisfoundatcompiletime,thecompilercandirecttheDSMtoenabletheinvalidationnoticesofallmodifiedpages,whichwillguaranteeSequentialConsistencyforthoseprograms.

ItisworthnotingthattheaboveproblemwithSCcorrectnessofourimplementationdoesn’taffecttheSCcorrectnessofthegenericVCmodel.

6Experimentalevaluation

Inthissection,wepresentanexperimentalevaluationoftheLRCmodelandourimplementa-tionofVC.BothofthesehavebeenimplementedbasedonTreadMarks[3].Theexperimentalplatformconsistsof8PCsrunningRedHatLinux6.1,whichareconnectedbya10MbpsEthernet.EachofthePCshasa500MHzprocessorand128Mbytesofmemory.Thepagesizeinthevirtualmemoryis4KB.

Wechosefiveapplicationsintheexperiment:TSP,QS,BT,WaterandIS.TSP,QS,Water,andISareprovidedbytheTreadMarksresearchgroup.AlltheprogramsarewrittenintheClanguage.TSPistheTravellingSalespersonProblem,whichfindstheminimumcostpaththatstartsatadesignatedcity,passesthrougheveryothercityexactlyonce,andreturnstotheoriginalcity.QSisatask-queuestyleprogram.Itusesarecursivesortingalgorithmthatoperatesbyrepeatedlypartitioninganunsortedinputlistintoapairofunsortedsublists,suchthatalloftheelementsinoneofthesublistsarestrictlygreaterthantheelementsoftheother,andthenputtingthesublistsintoataskqueue.Itrecursivelytakesasublistfromthequeueandsortsthesublist,untilthetaskqueueisempty.BTisanalgorithmthatcreatesafixed-depthbinarytree.Inthealgorithm,multipleprocessesexploreabinarytreetosearchforunexpandednodes.Ifaprocessfindsanunexpandednode,itexpandsthenodeandcreatesnewunexpandednodes.Thealgorithmterminateswhenthefixed-depthbinarytreeisestablished.Waterisamoleculardynamicssimulation.Eachtimestep,theintra-andinter-molecularforcesincidentonamoleculearecomputed.IS(IntegerSort)ranksanunsortedsequenceof

keys.The

26

rankofakeyinasequenceistheindexvaluethatthekeywouldhaveifthesequenceofkeys

weresorted.Allthekeysareintegersintherange[0,

]andthemethodusedisbucket

sort.Theseapplicationsarerepresentativeofbothnumericalcomputing(Water,QSandIS),andsymboliccomputing(TSPandBT).TheperformanceresultsarelistedinTable1.

App

(Sec.)

LRC

1.89

i

1.65

LRC

0.29

i

4.59

LRC

0.16

i

25.73

LRC

10.25

i

19.09

1242812423

511

1143711347

3441

32673330

1044

Model

(Sec.)

962960

937

Diff

RPF

Mesgs

-0

911

58

-0

5301

2

-792

69342

4347

-3

95478

6

ISLRC

108.20

4444

1569

-7965

-

details).

WedesignedtheexperimentinordertodemonstratetheperformancegainofVCatbothconsistencymodelandconsistencyprotocollevels.Wealsocollectedevidenceoffalse-sharingeffectinourapplicationsinordertodemonstratethepotentialperformancegainofVC.Theextraoverheadofviewmaintenanceisalsoindicatedfromourresults.

Table1showstheperformanceresultsforLRCandourVCimplementation.VC

Reqisthenumber

ofmessagesfordiffrequests,RPFisthereductioninpagefaultsduetooverallimprovementinourVCimplementation,RFSisthereductioninpagefaultsduetothereductionofthefalse-sharingeffectintheVCmodel,andMesgsisthetotalnumberofmessages.ThesecondcolumnSeq.Timeistherunningtimeofthesequentialexecutionoftheapplications.ExceptforTSP,thesequentialexecutionoftheapplicationsisfasterthantheirparallelexecutionon8PCs,whichdemonstratestheneedforimprovingtheDSMperformanceonclustercomputers.

VC

i,becausetheyhavenow/rfalse-sharingeffectincriticalregions.Asdiscussed

inSection5,ourimplementationcanonlyreducew/rfalse-sharingeffect.

Inaddition,inthetask-queuestyleprogramQS,theinaccurateNRVscan’thelpourimple-mentationtoreducethefalse-sharingeffect,thoughQShasw/rfalse-sharingeffectinnon-criticalregions.

Forthesetwoapplications,theperformanceofVC

icanimproveits

performanceupto2.4%.Thoughthisperformancegainisnotverysignificant,itcanonlybeachievedattheleveloftheconsistencymodelandcan’tbeachievedbytheoptimaldiffpropagationprotocols[23,14,4]suchasAEC.Thisperformancegainwillbeincreasedifthe

28

w/wfalse-sharingeffectcanbereducedorthedetectedNRVscanbemoreaccurate.ForWater,VC

ihasreducedthefalse-

sharingeffect(reductionofthreepagefaults)inWater,theoverheadofviewmaintenanceovershadowedthebenefit.

VCvs.LRC

InTable1,VCisourimplementationwiththeoptimaldiffpropagationprotocolasdescribedinSection4.2,wheretheupdateprotocolisadoptedforupdatingCRVs.Thisoptimisationisverynaturalinourimplementation,sincetheCRVsarereadilyavailableandpredictthepagestobeverylikelyaccessed.

Table1showsVCoutperformsLRCforallfiveapplicationstested.VChasimprovedtheperformancesignificantlycomparedwithLRC(35%forTSP,35.3%forQS,9%forBT,3.9%forWater,and4.6%forIS).ThenumberofdiffrequestmessagesinVCissignificantlylessthanthatinLRC(97.4%lessinTSP,75.8%lessinQS,35%lessinBT,4.3%lessinWater,and37.6%lessinIS).ThedetectedCRVshavehelpedtheupdateprotocoltoreducethediffrequestmessages.ConsequentlythetotalnumberofmessagesinVChasbeengreatlyreducedcomparedwithLRC.

Themajorreasonfortheaboveimprovementisthatthediffsofthepagestobeaccessedincriticalregions(i.e.,CRVs)arepiggy-backedonthelockreleasemessagesandappliedtothepagesimmediately.Thusthenumberofdiffrequest/replymessagesandtheoverheadofpagefaultsarereducedinthoseapplications.

ItisworthnotingthattheaboveperformancegainduetotheuseoftheupdateprotocolforCRVsmaybeachievedbytheoptimaldiffpropagationprotocols[23,14,4].

False-sharingeffect

Therearetwoapplications,BTandWater,thatdemonstratethereductionofthefalse-sharingeffectachievedinourimplementation.BT,whichuseslockstoprotectnodesinthebinarytree,hasseriousfalse-sharingeffect,whileWaterhaslessfalse-sharingeffect.ForBT,theperformanceimprovementduetoreductionofthefalse-sharingeffectis18.4%ofthetotalimprovement.ThissignificantproportionhasdemonstratedtheadvantageoftheVCmodel

29

overtheconsistencyprotocols,e.g.AEC,intermsoftheprogramswithseriousfalse-sharingeffect.

Togetanimpressionofthefalse-sharingeffectinourapplications,wehaverecordedthetotalnumberofpagefaultsthatareduetothefalse-sharingeffectinsidecriticalregions.InTable1,RFSisalsothenumberofpagefaultsthatareduetothew/rfalse-sharingeffectinsidethecriticalregions;TFSisthenumberofpagefaultsthatareduetoallfalse-sharingeffects(includingw/randw/wfalsesharing)insidethecriticalregions.Theresultsshowthereducedw/rfalse-sharingeffectisonlyasmallportionofthetotalfalse-sharingeffectforhalfofourapplications(0%forTSP,0%forQS,18.2%forBT,and50%forWater)insidethecriticalregions.Fortheapplicationswithseriousfalse-sharingeffect,suchasBT,thereisstillagreatroomforperformanceimprovementwhichcanonlybeachievedattheleveloftheconsistencymodelinVC.

Wewerenotabletocollectevidenceforthefalse-sharingeffectinnon-criticalregions,butwebelieveapplicationssuchasQSandWaterhavesignificantfalse-sharingeffectinnon-criticalregions.

7Conclusions

Inthispaper,wehavepresentedanewRSCmodel,theVCmodel,forDSMsystems.RSCmodelshaverelaxedtheconstraintsonmodification(diff)propagationandexecutionfordata-race-freeprogramsthatareproperlylabelledwhilestillguaranteeingsequentialconsistencyforthoseprograms.Wehavebrieflydiscussedeachofthemintermsofthreeselectiontechniques.AmongpreviousRSCmodels,onlyECandScCcanperformdataselection,butatthepriceofacomplexprogrammerinterfacewhichimposesanextraburdenontheprogrammer.TheVCmodelhasbeenproposedtoremovesuchaburdenbymeansofautomaticviewdetection.ItcanachievedataselectionwithoutprogrammerannotationsandreducesmorefalsesharingeffectthanLRC.Itsonlyconsistencyrequirementisthatallthedataobjectsintheviewofaprocessormustbeupdatedduringviewtransition.Inthisway,itcanachievethemaximumrelaxationontheconsistencyrequirementsandproducesmoreroomforoptimisationinDSMimplementations.Therefore,thegenericVCmodelcanbeusedasaframeworkforfutureresearchonDSMimplementations.

30

WehaveimplementedtheVCmodelbasedonTreadMarks.Techniquesfortheimplementa-tionofVChavebeendiscussedinthispaper.ThedifferencesbetweenourVCimplementationandrelatedworkwereillustratedthroughexamples.PerformanceresultshaveshownthatourVCimplementationgenerallyoutperformstheLRCmodelandhasadvantagesovertheopti-maldiffpropagationprotocols[23,14,4].TheresultshavealsodemonstratedthattheextraoverheadofviewmaintenanceinourVCimplementationisrelativelytrivial.

FurtherresearchshouldbecarriedoutundertheframeworkoftheVCmodel.First,fine-graineddiffmanagementneedstobeinvestigated.Thecurrentimplementationtoleratesthew/wfalsesharingeffectduetotheadoptionofsimplediffmanagement.Moreelaboratediffmanagementmayhelptoreducemoreofthefalse-sharingeffect.Second,mechanismsfordetectionofaccurateNRVsneedtobeinvestigated.Finally,Run-timeandcompile-timetech-niques[11]needtobedevelopedforaccurateviewdetection.

References

[1]S.V.AdveandM.D.Hill:“Aunifiedformalizationoffourshared-memorymodels”,IEEE

TransactionsonParallelandDistributedSystems,4(6),pp.613-624,June1993.[2]S.V.AdveandK.Gharachorloo:“Sharedmemoryconsistencymodels:atutorial”,IEEE

Computer,29(12),pp.66-76,December1996.

[3]C.Amza,etal.:“TreadMarks:Sharedmemorycomputingonnetworksofworkstations”,

IEEEComputer,29(2),pp.18-28,February1996.

[4]C.Amza,etal.:“AdaptiveProtocolsforSoftwareDistributedSharedMemory”,Proc.of

IEEE,SpecialIssueonDistributedSharedMemory,87(3),pp.467-475,March1999.[5]B.N.Bershad,etal.:“SharedMemoryParallelProgrammingwithEntryConsistency

forDistributedMemoryMultiprocessors”,CMUTechnicalReportCMU-CS-91-170,September1991.

[6]B.N.Bershad,etal.:“TheMidwayDistributedSharedMemorySystem”,InProc.of

IEEECOMPCONConference,pp.528-537,1993.

31

[7]J.B.Carter,J.K.Bennett,andW.Zwaenepoel:“Implementationandperformanceof

Munin”,InProc.ofthe13thACMSymposiumonOperatingSystemsPrinciples,pp.152-164,Oct.1991.

[8]J.B.Carter,J.K.Bennett,andW.Zwaenepoel:“Techniquesforreducingconsistency-relatedinformationindistributedsharedmemorysystems,”ACMTransactionsonCom-puterSystems,13(3),pp.205-243,August1995.

[9]P.Dasgupta,etal.:“ThedesignandimplementationoftheCloudsdistributedoperating

system”,ComputingSystemsJournal,3,Winter1990.

[10]M.Dubois,C.Scheurich,andF.A.Briggs:“Memoryaccessbufferinginmultiproces-sors”,InProc.ofthe13thAnnualInternationalSymposiumonComputerArchitecture,pp.434-442,June1986.

[11]S.Dwarkadas,etal.:“AnIntegratedCompile-Time/Run-TimeSoftwareDistributed

SharedMemorySystem”,InProc.oftheSeventhSymposiumonArchitecturalSupportforProgrammingLanguagesandOperatingSystems,Oct.1996.

[12]B.FleischandR.H.Katz:“Mirage:Acoherentdistributedsharedmemorydesign”,In

Proc.ofthe12thACMSymposiumonOperatingSystemsPrinciples,pp.211-223,Dec.1989.

[13]K.Gharachorloo,D.Lenoski,andJ.Laudon:“Memoryconsistencyandeventordering

inscalablesharedmemorymultiprocessors”,InProc.ofthe17thAnnualInternationalSymposiumonComputerArchitecture,pp.15-26,May1990.

[14]Z.Huang,W.-J.Lei,C.Sun,andA.Sattar:“HeuristicDiffAcquiringinLazyRelease

ConsistencyModel”,InProc.of1997AsianComputingScienceConference,LectureNotesinComputerScience1345,pp.98-109,SpringerVerlag,1997.

[15]Z.Huang,C.Sun,andA.Sattar:“Exploringregionallocalityindistributedsharedmem-ory”,InProc.of1998AsianComputingScienceConference,LectureNotesinComputerScience1538,pp.142-156,SpringerVerlag,1998.

32

[16]Z.Huang,C.Sun,M.Purvis,andS.Cranefield:“View-basedConsistencyanditsImple-mentation”,InProc.oftheFirstIEEE/ACMInternationalSymposiumonClusterCom-putingandtheGRID,Brisbane,May2001.

[17]L.Iftode,J.P.SinghandK.Li:“ScopeConsistency:ABridgebetweenReleaseConsis-tencyandEntryConsistency”,InProc.ofthe8thAnnualACMSymposiumonParallelAlgorithmsandArchitectures,1996.

[18]P.Keleher:“LazyReleaseConsistencyforDistributedSharedMemory”,Ph.D.Thesis,

DeptofComputerScience,RiceUniv.,1995.

[19]L.Lamport:“Howtomakeamultiprocessorcomputerthatcorrectlyexecutesmultipro-cessprograms”,IEEETransactionsonComputers,28(9),pp.690-691,September1979.[20]D.Lenoski,etal.:“TheStanfordDASHmultiprocessor”,IEEEComputer,25(3),pp.63-79,March1992.

[21]K.Li,andP.Hudak:“MemoryCoherenceinSharedVirtualMemorySystems”,ACM

Trans.onComputerSystems,Vol.7,pp.321-359,November1989.

[22]D.Mosberger:“Memoryconsistencymodels”,OperatingSystemsReview,17(1),pp.18-26,Jan.1993.

[23]C.B.Seidel,R.Bianchini,andC.L.Amorim:“TheAffinityEntryConsistencyProtocol”,

InProc.ofthe1997InternationalConferenceonParallelProcessing,August1997.[24]C.Sun,Z.Huang,W.-J.Lei,andA.Sattar:“TowardsTransparentSelectiveSequential

ConsistencyinDistributedSharedMemorySystems”,InProc.ofthe18thIEEEInter-nationalConferenceonDistributedComputingSystems,pp.572-581,Amsterdam,May1998.

[25]A.S.Tanenbaum:DistributedOperatingSystems,PrenticeHall,1995.

33

因篇幅问题不能全部显示,请点此查看更多更全内容

Top