Scheduler
AchimStreit
PC2-PaderbornCenterforParallelComputing
PaderbornUniversity33102Paderborn,GermanyEmail:streit@upb.dehttp://www.upb.de/pc2Abstract
Theself-tuningdynPschedulerformodernclusterresourcemanagementsystemsswitchesbetweendiffer-entbasicschedulingpoliciesdynamicallyduringruntime.Thisallowstoreactonchangingcharacteris-ticsofthewaitingjobs.Inthispaperwepresenten-hancementstothedecisionprocessoftheself-tuningdynPschedulerandevaluatetheirimpactontheper-formance:(i)Whiledoingaself-tuningstepaperfor-mancemetricisneededforrankingtheschedulesgen-eratedbythedifferentbasicschedulingpolicies.Thisallowsdifferentobjectivesfortheself-tuningprocess,e.g.moreusercentricbyimprovingtheresponsetime,ormoreownercentricbyimprovingthemakespan.(ii)Furthermore,aself-tuningprocesscanbecalledatdif-ferenttimesoftheschedulingprocess:onlyattimeswhenthecharacteristicsofwaitingjobschange(halfself-tuning),i.e.newjobsaresubmitted;oralwayswhentheschedulechanges(fullself-tuning),i.e.whenjobsaresubmittedorrunningjobsterminate.Weusediscreteeventsimulationstoevaluatetheachievedperformance.Asjobinputfordrivingthesim-ulationsweuseoriginaltracesfromrealsupercomputerinstallations.Theevaluationofthetwoenhancementstothedecisionprocessoftheself-tuningdynPschedulershowsthatagoodperformanceisachieved,iftheself-tuningmetricisthesameasthemetricusedmeasuringtheoverallperformanceattheendofthesimulation.Additionally,callingtheself-tuningprocessonlywhennewjobsaresubmitted,issufficientinmostscenar-iosandtheperformancedifferencetofullself-tuningissmall.
1Introduction
Resourcemanagementsystems(RMS)formodernhighperformancecomputing(HPC)clustersconsistofmanycomponentswhichareallvitalinkeepingthemachinefullyoperational.Anefficientusageofthemachinesisimportantforusersandowners,assuchsystemsarerareandhighincost.WithregardstoperformanceaspectsallcomponentsofamodernRMSshouldperformtheirassignedtasksefficientlyandfast,sothatnoadditionaloverheadisinduced.However,ifresourceutilizationandjobresponsetimesaread-dressed,theschedulerplaysamajorrole.Acleverschedulingstrategyisessentialforahighutilizationofthemachineandshortresponsetimesforthejobs.However,thesetwoobjectivesarecontradicting.Jobstendtohavetowaitforexecutiononahighlyutilizedsystemwithspacesharing.Shortorevennowaitingtimesareonlyachievablewithlowutilizationsortime-sharing.Typicallyaschedulingpolicythatoptimizestheutilizationprefersjobsneedingmanyresourcesforalongtime.Jobsrequestingfewresourcesforashortamountoftimemayhavetowaitlongeruntiladequateresourcesareavailable.Ifsuchsmallandshortjobsarepreferredbythescheduler,theaveragewaitingtimewouldbereduced.Asjobstypicallyhavedifferentsizesandlengths,fragmentationofthescheduleoccursandtheutilizationdrops[1].Thetaskofthescheduleristofindagoodcompromisebetweenoptimizingthesetwocontrarymetrics.
Clustersystemsusuallyhavealargeusercommu-nitywithdifferentresourcerequirementsandgeneraljobcharacteristics[3].Forexample,someuserspri-marilysubmitparallelandlongrunningjobs,whilstotherssubmithundredsofshortandsequentialjobs.Furthermore,thearrivalpatternsvarybetweenspecific
usergroups.Hundredsofjobsforaparameterstudymightbesubmittedinonegoviaascript.Otherusersmightonlysubmittheirmassivelyparalleljobsoneaf-tertheother.Thisresultsinanon-uniformworkloadandjobcharacteristicsthatpermanentlychange.ThejobschedulingpolicyusedinaRMSischoseninordertoachieveagoodoverallperformancefortheexpectedworkload.Mostcommonlyusedisfirstcomefirstserve(FCFS)combinedwithbackfilling[7,14,9],asonav-erageagoodperformancefortheutilizationandre-sponsetimeisachieved.However,withcertainjobcharacteristicsotherschedulingpoliciesmightbesu-periortoFCFS.Forexample,formostlylongrunningjobs,longestjobfirst(LJF)isbeneficial,whilstshort-estjobfirst(SJF)isusedwithmostlyshortjobs[1].Hence,asinglepolicyisnotenoughforanefficientre-sourcemanagementofclusters.ManymodernRMSshaveseveralschedulingpoliciesimplemented,oritisevenpossibletoreplacetheschedulingcomponent.Theremainderofthispaperisstructuredasfollows.InSection2somerelatedworkonself-tuninganddy-namicpolicyswitchingisgiven.Section3startswithashorthistoryofdevelopment,containstheconceptoftheself-tuningdynPscheduler,andpresentsthedif-ferentdecidermechanismsandenhancementstothem.Theusedperformancemetricandtheappliedwork-loadfortheevaluationarepresentedinSection4.TheevaluationinSection5startswithalookontheperfor-manceofthethreebasicpolicies.Thentheevaluationresultsofthedifferentself-tuningmetricsandofthecomparisonofhalfvs.fullself-tuningarepresented.Thepaperendswithconclusionsanabriefoutlookonfuturework.
2RelatedWork
In[13]theproblemofschedulingamachineroomofMPP-systemsisdescribed.Userseithersubmitlongrunningbatchjobsortheyworkinteractively(typi-callyonlyforashorttime).ToaccomplishthisonasingleMPP-systemtheresourcemanagementsystemhastoswitchfrombatchmode(preferringbatchjobs)tointeractivemode(preferringinteractivejobs)andback.Usuallythisisdonemanuallybytheadminis-trativestaff,e.g.atfixedtimesoftheday:interactivemodeduringworkinghours,batchmodefortherestofthedayandoverweekends.Ingeneral,theover-alljobthroughputisthemainobjectiveofbatchpro-cessing.Asbatchjobstypicallyhavealongruntime,waitingisnotverycritical.Ontheotherhand,auserthatworksinteractivelycountsthefiveminutesuntilhe/shecanstartworkingwiththerequestedresources.Otherissuesliketheoveralljobthroughputortheuti-
lizationarelessimportantwhileoperatingininterac-tivemode.Whichincomparisontobatchmodejobsarerathershort.Theidea[4]istoallowtheuserstodecideinwhichmodethesystemshouldbeoperating.Hence,theImplicitVotingSystem(IVS)isintroduced,asusersshouldnotvoteexplicitly:
•Ifmostofthewaitingjobsaresubmittedforbatchprocessing,IVSswitchestoLJF(longestjobfirst).Asbatchjobsaretypicallylong,theyreceiveahigherpriorityintheschedulingprocess.Hence,resourcesarelongerboundtojobs,lessresourcefragmentationiscausedandtheutilizationandthroughputofthesystemisincreased.
•Ifmostofthewaitingjobsaresubmittedforin-teractiveaccess,IVSswitchestoSJF(shortestjobfirst).Asinteractivejobsareusuallyshortintheirruntimeandshortjobsarepreferred,theaveragewaitingtimeisreduced.
•Ifthesystemisnotsaturated,thedefaultschedul-ingstrategyFCFS(firstcomefirstserve)isused.Note,athresholdfordefiningwhenasystemissaturatedandwhennotisdefinedbytheadmin-istrativestaff.FortheauthorsaMPPsystembe-comessaturated,ifmorethanfivejobscannotbescheduledimmediately.
Unfortunately,theideaofIVSwasneverrealizednorimplementedandtestedinarealenvironment.
In[3]asimilarapproachfortheNASAAmesiPSC/860systemispresented.Intheprimetimedur-ingthedayonlyafractionoftheresourcesisallocatedtothebatchpartition,whilemostoftheresourcesareavailableforinteractiveaccess.Duringnonprimetimeallresourcesareassignedtothebatchpartition.There-partitioningisdonemanuallyandatfixedtimesoftheday.
Theproblemofgettingthebestperformancefromamodernresourcemanagementsystemisdescribedin[2].Commonlysuchsoftwaresystemsarehighlypa-rameterizedandtheadministrativestaffperformsalotoftrialanderrortestinginordertofindagoodpa-rametersettingforthecurrentworkload.Ifthework-loadchanges,newparametersettingshavetobefound.However,theyarenotoriouslyoverworkedandhavelit-tleornotimeforthisfinetuning,sotheideaistoauto-matethisprocess.Muchinformationaboutthecurrentandpastworkloadisavailable,whichisusedtorunsimulationsintheidleloopofthesystem(oronaded-icatedmachine).Variousparametersettingsaresimu-latedandthebestsettingischosen.Theauthorscallsuchasystemself-tuning,asthesystemitselfsearches
foroptimizedparametersettings.Tocreatenewpa-rametersettingsforthesimulations,geneticalgorithmsareused.Newparametersettingsaregeneratedbyran-domlycombiningseveralpotentialcombinationsfromthepreviousstep.Chromosomesarethebinaryrepre-sentationofaparameter.Aparametersettingiscalledindividualandtheaccordingparametervaluesarecon-catenatedintheirbinaryrepresentation.Inthisexam-plethefitnessfunctionistheaverageutilizationofthesystemachievedbytheaccordingparametersetting.Allsimulatedparametersettings(individuals)inonesteprepresentageneration.Thechromosomesofthefittestindividualsofagenerationareusedtoproducenewindividualsforthenextgeneration.Newgenera-tionsarecontinuouslycreatedwiththelatestsystemworkloadasinput.Theprocessisstartedwithdefaultvalues.InacasestudyforschedulingbatchjobsoftheNASAAmesiPSC/860systemtheauthorsobservedthatwiththeself-tuningsearchforparametersettingstheoverallsystemutilizationisimprovedfrom88%(withthedefaultparameters)to91%.
In[8]heuristicsforthedynamicmappingofaclassofindependenttasksontoheterogeneouscomputingsystemsareintroduced.Themappingproblemcon-sistoftwoparts:matchingandscheduling.Inthematchingphasetheassignmentoftaskstomachinesiscomputed,whilstduringschedulingtheexecutionorderofthetasksoneachmachineiscomputed.Intheirworkadynamicmappingisneeded,asthear-rivaltimesofthetasksmayberandomandthesetofavailablemachinesischanging.Machinesmaygooff-lineandnewmachinesmaycomeon-line.Map-pingheuristicscanbegroupedintwoclasses:on-linemodeandbatch-modeheuristics.Intheon-linemodetasksaremappedontoamachineimmediatelyaftertheirsubmission.Inthebatch-mode,tasksarenotim-mediatelymappedwhentheyarrive,insteadtheyarestoredandthemappingisinvokedatpre-scheduledmappingevents.TheheuristicMCT(minimumcom-pletiontime)assignseachtasktothatmachinewhichresultsinthetask’searliestcompletiontime.Tasksmaybeassignedtomachines,whichdonothavetheminimumexecutiontime.Incontrast,MET(mini-mumexecutiontime)assignseachtasktothatmachinethatperformsthetaskintheleastamountofexecutiontime.AsthemachinesreadytimesarenotconsideredbyMET,loadimbalanceacrossthemachinesmayoc-cur.Thenewbatch-modeSA(switchingalgorithm)heuristicusesthesetwoheuristics(MCTandMET)inacyclicfashiondependingontheloaddistributionacrossthemachines.ByswitchingbetweenMCTandMETanewheuristicwiththedesirablepropertiesofthetwosingleheuristicsisgenerated.
3Self-TuningdynP
Asingleschedulingpolicyisusuallyusedinare-sourcemanagementsystemandittypicallygeneratesgoodschedulesonlyforjobswithspecificcharacteris-tics(e.g.shortjobs).Ifthejobcharacteristicschange,otherschedulingpoliciesmightperformbetteranditmightbebeneficialthatthesystemadministratorchangestheschedulingpolicy.However,systemad-ministratorsarenotabletomonitorthesituationandcontinuouslyaltertheschedulingpolicyinresponsetoworkloadchanges.
WedevelopedthedynPscheduler,whichautomati-callyswitchestheactiveschedulingpolicyduringruntime.Ingeneral,thesetofschedulingpoliciestochoosefromcanconsistofmanypoliciesonecanthinkof.WestartedwithavariantofthedynPscheduler,whichusesboundsfortheaverageestimatedruntimeofwaitingjobstocheck,whichpolicyisbestsuitedforthecurrentjobcharacteristics.Amajordrawbackofthisversionisobvious,astheperformancedependsonaproperset-tingofthebounds.Andinordertoreflectdifferentjobcharacteristics,theseboundsneedtobechanged.Wedevelopedtheself-tuningdynPschedulerwhichauto-maticallysearchesforthebestsuitedpolicy.
3.1Concept
AtthePC2(PaderbornCenterforParallelCom-puting)theself-developedresourcemanagementsys-temCCS(ComputingCenterSoftware,[6])isusedformanagingthePSCPentium3cluster[12]andtheplingcluster[11].Threeschedulingpoliciesarecurrentlyimplemented:FCFS,SJF,andLJF.Accordingtotheclassificationin[5],CCSisaplanningbasedresourcemanagementsystem.Planningbasedresourceman-agementsystemsschedulethepresentandfuturere-sourceusage,sothatnewlysubmittedjobsareplacedintheactivescheduleassoonaspossibleandtheygetastarttimeassigned.Withthisapproachbackfillingisdoneimplicitly.Byplanningthefutureresourceusage,asophisticatedapproachispossibleforfindinganewpolicy.Forallwaitingjobstheschedulercomputesafullschedule,whichcontainsplannedstarttimesforeverywaitingjobinthesystem.Withthisinformationitispossibletomeasuretheschedulebymeansofaperformancemetric(e.g.responsetime,slowdown,ormakespan).Theconceptofself-tuningdynPis:
Theself-tuningdynPschedulercomputesfullschedulesforeachavailablepolicy(here:FCFS,SJF,andLJF).Theseschedulesareevaluatedbymeansofaperformancemetric.
Thereby,theperformanceofeachpolicyisex-pressedbyasinglevalue.Thesevaluesarecomparedandadecidermechanismchoosesthebestvalue,i.e.thesmallestvalue.
Inthefollowing,theperformancemetricusedintheself-tuningprocessiscalledself-tuningmetricforsim-plicity.
3.2DeciderMechanisms
Fortherequireddecision,severallevelsofsophis-ticationarethinkable.In[16]wepresentedthesim-pledeciderthatbasicallyconsistsofthreeif-then-elseconstructs.Itchoosesthatpolicywhichgeneratestheminimumvalueoftheappliedself-tuningmetric.Thesimpledecideralsohasdrawbacks,asitdoesnotcon-sidertheoldpolicy.Inparticulariftwopoliciesareequalandadecisionbetweenthemisneeded,informa-tionabouttheoldpolicyishelpful.Table5showsadetailedanalysisofthesimpledecider.FCFSisfavoredinthreeandSJFinonecase,althoughstayingwiththeoldpolicyisthecorrectdecisionwiththesecases.Thisbehaviorisimplementedintheadvanceddecider.Atafirstglanceitdoesnotmakeanydifferencewhichpolicyamongequalsischosen.Atthisstagethesched-uleronlyknowsestimatesofthejob’sruntimeandusuallythejob’sactualruntimeisshorterthanesti-mated.Whenajobendsearlierthanestimated,theschedulechangesandnewplanningisnecessary.De-pendingonthechosenpolicydifferentjobsmighthavebeenstartedinthemeantime.Therefore,evenadeci-sionbetweentwoequalpoliciesisrequired.Previously,thefairnessamongthepolicieswasofmajorinterest.However,itmightbeinterestingtoexplicitlypreferoneofthepoliciesandneglecttheothers.Forthatpurposewedevelopedthepreferreddecider[17].Thepreferredpolicyisnotswitchedunlessanyotherpolicyisbet-ter.Wheneveranyoftheotherpoliciesarecurrentlyused,thepreferredpolicyonlyhastoachieveanequalperformanceandthedeciderswitchesback.
Thedecidersoftheself-tuningdynPschedulerconsideronlythethreepoliciesFCFS,SJF,andLJFforthreereasons.Firstofall,weevaluatethegeneralbehaviorandperformanceofself-tuningdynPsched-ulersfortheresourcemanagementofHPCsystems.Wedonotevaluate,whichcombinationofpoliciesisbestsuitedforspecificjobcharacteristics.Presumably,combinationswithotherandmoreschedulingpoliciesexist,whichgenerateevenbetterresults.Secondly,FCFS,SJF,andLJFarethemostknownschedulingpoliciesandmanyresourcemanagementsystemshaveatleastthesethreeimplemented.Andthirdly,thesethreepoliciesareimplementedintheresource
managementsoftwareCCS,whichdepictsthebasisandstartingpositionforourwork.
Inpreviousworkwealreadypresentedthebasicsoftheself-tuningdynPscheduler.Itstartedwiththesim-pledeciderin[16].Next,wedevelopedtheadvanceddecider[15]andrecentlythepreferreddecider[17].Inthispaperwepresentfurtherenhancementstothede-cisionprocess,whichcanbeappliedtoallthreemen-tioneddecidermechanisms.
3.3EnhancementstotheDecisionProcess
Aspreviouslymentionedtheaimoftheself-tuningdynPscheduleristoeliminateinputparametersforthescheduler,especiallythosewhichdependonthecharacteristicsoftheprocessedjobsandneedtobere-adaptedcontinuously.Nevertheless,enhancementsthatinfluencetheschedulerinamoregeneralwayarethinkable.Ofcoursetheyshouldbeindependentofanyjobcharacteristicsandeasytohandle,sothatacontinuousmanualre-adaptationisnotneeded.DifferentSelf-TuningMetrics
Theconceptoftheself-tuningdynPschedulerisbasedontheplanning-basedschedulingapproach,whereallwaitingjobsareplacedinaschedule.Withassign-ingaproposedstarttimetoeachjob,itispossibletocomputethewaitingtimeofthejobs,sothatsched-ulescanbecompared.Forthis,differentself-tuningmetricscanbeapplied,e.g.usercentricmetricsliketheaverageresponsetimeortheslowdown(bothun-weightedorweighted),andownercentricmetricslikethemakespan.Bychoosingaspecificmetric,theself-tuningdynPscheduleroptimizesitsbehavioraccord-ingtothismetric.Weusethefollowingmetrics,whicharealldefinedinthenextsection:averageresponsetime(ART),averageresponsetimeweightedbyarea(ARTwA),averageresponsetimeweightedbywidth(ARTwW),averageslowdown(SLD),averageslow-downweightedbyarea(SLDwA),averageslowdownweightedbywidth(SLDwW)andmakespan.CallingofSelf-Tuning
Doingself-tuningandpotentiallyswitchingtheactiveschedulingpolicy,shouldbedonewheneverthesched-uleorthesetofwaitingjobschanges,i.e.wheneveranewjobissubmittedoranalreadyrunningjobter-minatesearlierthanestimated.Inthefollowing,wecallthisfullself-tuning.However,itmightalsobesuf-ficienttodotheself-tuningsteponlywhennewjobsaresubmitted,i.e.thecharacteristicsofwaitingjobs
inthesystemchange.Wecallthisoptionhalfself-tuning,asroughlyonlyhalfasmuchself-tuningcallsareperformed.Thisoptionisinterestingtoevaluateincombinationwiththecomputetimetodotheself-tuningprocess.Itmightbebeneficialtosaveupsomecomputetimeandmaketheschedulingbehaviormorecomprehensiblefortheusers.
4Evaluation
Itiscommonpracticetousediscreteeventsimu-lationsfortheevaluationofjob-schedulingstrategies.
ForthispurposewedevelopedMuPSiE(MultiPurposeSchedulingSimulationEnvironment).Severalpoliciesandtheplanning-basedschedulingapproachfrom[5]areimplemented.AllpresentedresultsaregeneratedwithMuPSiE.
4.1PerformanceMetrics
Weusetheusercentricslowdownmetricformea-suringthesimulatedscheduleswithallprocessedjobs.
Theslowdownofajobisalsooftencalledstretch[10]orrelativeresponsetime,asthejobsresponsetimeisdi-videdbythejobsruntime.Theslowdowncomeswith-outadimensionincontrasttoe.g.responsetime.Ad-ditionally,weweighteachjob’sslowdownwithitsarea.Thereby,itiscircumventedthatjobswiththesameruntime,butwithdifferentresourcerequirements,havethesameimpactontheoverallperformance.
Withtheparametersforafinishedjobiofatotalofmprocessedjobs:•tai
isthearrivalorsubmissiontime
•tsiisthestarttime•tei
istheendtime
•wiisthewidth(numberofrequested/usedre-sources)
•li=tei−ts
iisthelength(runtime,duration)
•twi=tsi−
tai
isthewaitingtime
•
tri
=
twi
+liistheresponsetime•si=trili
=1+
twili
istheslowdown
•ai=wi·liisthearea
Theaverageslowdownweightedbyjobarea(SLDwA)foralljobsisdefinedas:
mai·si
SLDwA=
i=1m(1)
ai
i=1
Iftheprocessedjobsarenotchangedintheirwidthorruntime,theaverageslowdownweightedbyjobareaisequaltotheaverageresponsetimeweightedbyjobwidthandtheequationholds:
mwi
SLDwA=ARTwW·
i=1m(2)
ai
i=1
Forcompleteness,theothermetricsusedduringthe
self-tuningprocessaredefinedasfollows:•theaverageresponsetime:
ART=1m
r
m·t
i
(3)
i=1•theaverageresponsetimeweightedbyjobarea:
mai·tri
ARTwA=
i=1m(4)
ai
i=1
•theaverageresponsetimeweightedbyjobwidth:
mwi·tri
ARTwW=
i=1m(5)
wi
i=1
•theaverageslowdown:
SLD=1m
m·
si
(6)
i=1
•theaverageslowdownweightedbyjobwidth:
mwi·si
SLDwW=
i=1
m(7)
wi
i=1
•themakespan:
i=1max
,...,m
tei
(8)
4.2Workload
Anevaluationofjobschedulingpoliciesrequirestohavejobinput.Inthisworkajobisdefinedbythesubmissiontime,thenumberofrequestedresources(=width),andtheestimatedruntime(=length).Aswemodelaplanningbasedresourcemanagementsys-tem[5],runtimeestimatesaremandatory.Addition-ally,forthesimulationtheactualruntimeisneeded.Inthispaper,weusefourtracesfromtheParal-lelWorkloadsArchive[18],asallothertracesdonotcomewithinformationaboutruntimeestimates.ThecharacteristicsofthefourtracesareshowninTable1(takenfrom[17]).
•CTC(CornellTheoryCenter),system:512-nodeIBMSP2(only430nodesareavailableforbatchprocessing),duration:July1996-May1997,jobs:79,302•KTH(SwedishRoyalInstituteofTechnology),system:100-nodeIBMSP2,duration:October1996-August1997,jobs:28,490•LANL(LosAlamosNationalLab),system:1024-nodeConnectionMachineCM-5fromThinkingMachines,duration:October1994-September1996,jobs:201,387•SDSC(SanDiegoSupercomputingCenter),sys-tem:128-nodeIBMSP2,duration:May1998-April2000,jobs:67,667
5Results
Westartwithpresentingtheresultsforthethreeba-sicpoliciesFCFS,SJF,andLJFinshort.Thisgivesagoodreferenceforthesubsequentevaluations.Atfirst,theresultsofthecomparisonofdifferentself-tuningmetricsarepresented.Acomparisonofhalfandfullself-tuningforthementioneddecidermechanismsfol-lows.Finally,ananalysisoftheswitchingbehaviorforthesimpleandadvanceddeciderisdone.
5.1BasicPolicies
Aspreviouslystated,theevaluationshowsthatnoneofthepoliciesisthebestforeveryjobsetcharacter-istic.InTable2thebestbasicpolicywithrespecttotheaverageslowdownweightedbyareaishighlightedwithboldfont.ParticularlyfortheSDSCtrace,thedifferencesinslowdownarelargeasSJFisworsethanFCFSbyafactorofalmosttwoandLJFisevenworse(twiceasbadasFCFS).
Backfillingisdoneimplicitlywiththeplanning-basedschedulingapproachforallthreebasicschedulingpolicies.
FCFS
SJF
LJF
CTC2.04551.92772.5212KTH3.10152.54885.8118LANL1.68011.70312.0507SDSC
6.826012.566226.8207
Table2.Overallaverageslowdownweightedbyarea(SLDwA)forthethreebasicpoliciesFCFS,SJF,andLJF.
5.2DifferentSelf-TuningMetrics
Inthefollowing,theresultswithdifferentself-tuningmetricsarepresented.Weusedthefollowingusercentricmetrics:averageresponsetime(ART),aver-ageresponsetimeweightedbyarea(ARTwA),aver-ageresponsetimeweightedbywidth(ARTwW),aver-ageslowdown(SLD),averageslowdownweightedbyarea(SLDwA),andaverageslowdownweightedbywidth(SLDwW).Additionally,theownercentricmet-ricmakespanisused.
Onecanassumethatthebestperformanceisachievedbyusingthesamemetricduringtheself-tuningprocessandafterthesimulationisfinished.Asstatedearlier,weusetheaverageslowdownweightedbyarea(SLDwA)metricformeasuringallsimulatedjobs.Hence,usingSLDwAduringself-tuningshouldleadtothebestperformance(i.e.thesmallestvalues).ThisisseenintheupperpartofTable3.FortheCTC,LANL,andSDSCtracethebestself-tuningmetricisSLDwAandthementionedexpectationisfulfilled.However,itisinterestingthatfortheKTHtraceus-ingthejob’swidthasaweightisslightly(0.8%)betterthanusingthejob’sareaasaweight.Apossiblereasonforthisistheoverestimationfactor,whichissmallerfortheKTHtrace(1.5)thanfortheotherthree(2.2,cf.Table1).
Usingallotherself-tuningmetricsresultsinasig-nificantlyworseperformance.InparticularthisisseenfortheARTwAmetricandtheSDSCtrace,astheachievedperformanceistwiceasbad.ObservingthenumbersfortheARTwWandtheSLDwAself-tuningmetricshowsthatthenumbersareequal.Thisisare-sultofEquation2andthefactthattheprocessedjobsarenotchangedintheirwidthorruntime.
ThebottompartofTable3showstheoveralluti-lizationwithallsimulatedjobs.Independentoftheappliedself-tuningmetrictheexactsameutilizations
traceCTCKTHLANLSDSC
min11321
requestedresourcesavg.10.727.66104.9510.54
max3361001,024128
estimatedruntime[sec.]minavg.max06012
24,32413,6783,68314,344
64,800216,00030,000172,800
actual
runtime[sec.]minavg.max0010
10,9588,8581,6596,077
64,800216,00025,200172,800
averageoverest.factor2.2201.5442.2202.360
min0000
interarrivaltime[sec.]avg.max3691,031509934
164,472327,952201,00679,503
Table1.Basicpropertiesoftheusedtraces(86,400seconds=1day).
aregeneratedforthetracesCTC,KTH,andLANL.Thisindicatesthatthosejobssubmittedtowardstheendoftheschedulearealwaysscheduledatthesamestarttimeandareresponsibleforthemakespanandthereforefortheutilization.OnlyfortheSDSCtracedifferentutilizationsareachieved,asthedifferencesofthebasicpoliciesarelarge(cf.Table2).Usingtheownercentricmetricmakespanleadstothebestuti-lization.ThismatchestotheresultswithSLDwA,asmakespanandutilizationareconnectedviathetotalsumofjobareasandthetotallyavailableresourcesonthesimulatedmachine.
Theseobservationsreflectthedifferentswitchingbe-havioroftheself-tuningdynPscheduler,ifdifferentper-formancemetricsareapplied.Itispossibletotunethesystemperformanceineitherway:userorownercen-tric.Usingeitherownerorusercentricmetricsintheself-tuningprocesstogenerategoodoverallresultsforopposingmetrics,i.e.userandowner,generallyleadstoapoorperformanceandshouldbeavoided.
Comparingthebestself-tuningmetricwiththebestbasicpolicyfromTable2showsthatonlyfortheCTCandLANLtracetheself-tuningdynPschedulerisbet-ter.Theperformancelossoftheself-tuningdynPschedulerismarginal(0.4%)fortheKTHtrace,butalmost200%fortheSDSCtrace.Thisisaresultoftheoverestimationofthejob’sruntimebytheusersandthelargedifferencesofthebasicpolicies.Thismisleadstheself-tuningdynPschedulerinthedecisionprocessandresultsinwrongdecisions.Althoughthisalsohappenswiththeothertraces,theimpactismostseenfortheSDSCtrace.
5.3Halfvs.FullSelf-Tuning
Forthecomparisonofhalfandfullself-tuningonecanassumethatapplyingfullself-tuningisthebestop-tion.Withthistheself-tuningdynPschedulerchoosesthebestschedulingpolicyeverytimetheschedulechanges,i.e.whenanewjobisplacedinthescheduleandarunningjobterminatesearlierthanestimatedandare-schedulingisrequired.Theself-tuningdynPschedulerplansschedulesforeachavailableschedulingpolicyandforallwaitingjobs.Thisresultsinanin-creasedcomputationaltimeofthescheduler.Asthis
canbeunappropriateincertainscenarios,halfself-tuningisanoption,asroughlyonlyhalfasmanyself-tuningcallsaredone.Someperformancelossmightoc-curwithhalfself-tuning,aswithnewjobsubmissionstheschedulingpolicymightbechanged.IntheMuP-SiEsimulationenvironmentasingleself-tuningcallforfindinganewpolicyiscompletedwithin6msforanav-erageof22.5waitingjobs(simulatedconfiguration:ad-vanceddecider,fullself-tuning,ARTwWasself-tuningmetric,CTCtrace)andapplyinghalfself-tuningmightnotbenecessary.
InTable4theslowdownresultsforthesimple,advanced,SJF-andFCFS-preferreddeciderarepre-sented.Onecanseethattheassumptionfromaboveisnotalwaystrue.InparticularfortheSDSCtrace,halfself-tuningisbetterthanfullself-tuning.Alsowiththesimpledeciderfullself-tuningisnotbeneficial.LookingattheperformanceoftheadvancedandSJF-preferreddecidershowsthatfortheCTCandLANLtracetheself-tuningdynPschedulerisalwaysbetterthanthebestbasicpolicy.Furthermore,itisinterestingtoob-servethathalfandfullself-tuninghavenomajorim-pactontheperformanceofthesetwodeciders.ThegeneratedSLDwAvaluesareclosertogether.Similartothecomparisonofthedifferentself-tuningmetrics,theSDSCtraceismorevulnerablefortheswitchingbe-havioroftheself-tuningdynPscheduler.InparticularthesimpleandFCFS-preferreddecidersgenerateverybadresultswithfullself-tuningapplied.Inthiscasedoingmoreself-tuningcallsincreasestheSLDwAbyafactoroftwo.Againthiscanbearesultofoverestimat-ingthejobruntimes.Bydoingself-tuningwhenjobsterminateandbypotentiallyswitchingtoadisadvanta-geousschedulingpolicy,differentjobsareimmediatelystarted.
Onecanalsoseethattheadvanceddeciderobvi-ouslyoutperformsthesimpledeciderduetoitsdesign.Thisisindependentofwhetherfullorhalfself-tuningisperformed.Theperformancebenefitoftheadvanceddeciderisdifferentforthefourtraces;quitelargefortheKTHandSDSCtraceandsmallerfortheLANLtrace.FortheSDSCtraceandfullself-tuningthedif-ferencebetweenthetwodecidersisalmost70%.AsfortheCTCandKTHtraceSJFisthebestba-
self-tuningmetrics
ARTARTwAARTwWSLDSLDwASLDwWMakespan
SLDwACTCKTHLANLSDSC
2.00733.14591.700814.6495
2.28665.65421.832820.5247
1.87812.57541.617710.1598
1.95851.87812.69392.57541.66261.617712.674210.1598
1.90212.55941.617911.8321
2.45825.38232.035724.8958
UtilizationCTCforallself-tuningmetrics:65.701%
forallself-tuningmetrics:68.716%KTH
LANLforallself-tuningmetrics:55.607%
SDSC81.787%81.812%81.633%81.762%81.633%81.309%
82.473%
Table3.OverallSLDwAandutilizationvaluesfordifferentself-tuningmetrics.Advanceddeciderandfullself-tuningapplied.ValuesforARTwWandSLDwAareequalbecauseofEquation2.
bestbasicpolicy
CTCKTHLANLSDSC
1.9277(SJF)2.5488(SJF)1.6801(FCFS)6.8260(FCFS)
decidermechanisms
advancedSJF-preferred1.90851.87812.58122.57541.60271.617710.095310.1598
1.85671.88732.57342.55781.63301.614310.289610.5198
self-tuning
halffullhalffullhalffullhalffull
simple2.30362.28124.72565.74331.75341.761013.335332.6934
FCFS-preferred
2.32702.28044.92815.74921.76051.768016.176632.6965
Table4.OverallSLDwAcomparisonoffullandhalfself-tuningwithdifferentdecidermechanisms.sicpolicyandFCFSfortheLANLandSDSCtrace,aSJF-andFCFS-preferredpolicymakessense.TheresultsindeedshowthattheSJF-preferreddecidercanimprovetheperformanceoftheadvanceddeciderfortheCTCandKTHtrace,butthesamedoesnotap-plyfortheFCFS-preferreddeciderandtheLANLandSDSCtraces.InfacttheFCFS-preferreddeciderisworse(almostthreetimesfortheSDSCtrace)thantheadvancedandSJF-preferreddecider.Thisissur-prising,asFCFSprovestobeagoodbasicpolicy.Thepoorperformancecanbebasedonthefactthatsomejobs,whicharenotstartedbyFCFS,alterthesched-uleinsuchawaythatmanysubsequentjobshavetowaitlongandthereforetheSLDwAdrops.Apossibleexampleforthisscenariocouldlooklikethefollowing:somejobswithalargearea(requestingmanyresourcesand/orwithalongestimatedruntime)mayinduceapolicychangeinordertofavorthesejobs.However,theestimateoftheruntimemayhavebeenwrong,sothattheendaftersometime.Inthiscasedelayingthesejobswouldbebeneficial,asduetotheirshortactualruntimetheirinfluenceontheoverallSLDwAperformancemayonlybesmall.
5.3.1
DetailedAnalysisoftheSwitchingBe-havior
Withfullself-tuningthedifferencebetweenthesimpleandadvanceddeciderisbestseenfortheSDSCtrace,henceadetailedcaseanalysisisdoneinthefollowing.Table5showstheamounteachcaseisreachedduringthedecisionprocess.Thenumbersshowasignificantdifferenceincase6b:theperformanceofFCFSisequaltoSJF,LJFisworsethanboth,andtheoldpolicyisSJF.In80,419(75.11%)of107,066totalself-tuningde-cisionsthissituationoccursandtheadvanceddeciderstayswithSJF.Incontrast,thesimpledeciderreachesthiscaseinonly42.56%ofallself-tuningdecisionsandswitchestoFCFSinthissituation.
Incase1allthreepolicieshavethesameperfor-mance.Thecorrectdecisionistostaywiththeoldpolicyliketheadvanceddeciderdoes,butthesimpledeciderarbitrarilyfavorsFCFS.Theothertwocases8cand10carenotreachedbythesimpleoradvanceddecider,hencetheycannotinducethedifferenceinperformance.Withthelargedifferencesincase6bthenumberofappearancesoftheothercasesisalsoinflu-enced.Thisisbestseenforcase4b.However,theothercaseshavenoinfluenceonthedifferentperformanceofthesimpleandadvanceddecider,asbothdeciders
case1234abc56abc78abc910abccombinations
FCFS=SJF=LJFSJF arestartedwithLJF. Thenumberofself-tuningcallswiththesimplede-cider(109,521)islargerthanwiththeadvancedde-cider(107,066).Thisresultsfromthefactthatmorethanonejobendsatthesametime.Why?Asfullself-tuningisappliedandthesamejobtraceisused,theamountofself-tuningcallsatjobsubmissiondoesnotchangeforoneofthedeciders.However,ifmorethanonejobendsatthesametime,arescheduletakesplaceonlyonceandthereforeself-tuningisalsocalledonlyonce.Hence,theadvanceddeciderperformsbet-terthanthesimpledeciderandatthesametimein-duceslessself-tuningcalls. Fromthisfactanotherquestionarises:Ifonlyhalfself-tuningisapplied,i.e.self-tuningisnotdonewhenjobsend,thenumberofself-tuningcallsshouldalmostbethesameforbothdeciders?Andyes,ifhalfself-tuningisappliedthesimpledecideriscalled56,738timeswhereastheadvanceddecideriscalled56,208times.Bothamountsareaslightlylessthanthenum-beroftotallyscheduledjobs(67,620).Thisisbasedonthefactthatifenoughresourcesarefreetostartalljobsimmediately,thesejobsdonothavetowait.Self-tuningandschedulingpoliciesingeneralmakenosenseinthiscase,asthestartingorderofjobsdoesnotmatter,aslongastheyarestartedimmediately. simpledecider switchestoeachpolicy FCFSSJFLJF nopolicyswitchFCFSSJFLJF 47,49947,28739514,34039,93626,737947 (43.37%)(43.17%)(0.36%)(13.09%)(59.06%)(39.54%)(1.40%) advanceddecider1,0869631,085103,9324,12053,6349,866 (1.01%)(0.90%)(1.01%)(97.07%)(6.09%)(79.32%)(14.59%) jobstartedwitheachpolicy Table6.ComparisonofthedecisionbehaviorandtheusageofpoliciesfortheSDSCtracewithfullself-tuningapplied. 6ConclusionsandFutureWork Inthispaperwepresentedtwooptionsforthedeci-sionprocessoftheself-tuningdynPscheduler.Theideaofdynamicallyswitchingtheschedulingpolicy(dynP)isbasedonthefactthatusuallynosinglepolicygener-atesgoodschedulesforeverypossiblejobcharacteris-tic.Inordertoachievethebestpossibleperformance,itbecomesnecessarytoswitchtheactiveschedulingpoliciesaccordingtothecurrentlywaitingjobs.Theschedulerswitchestheschedulingpolicieswithouttheneedofapermanentinterventionofthesystemad-ministrator.Withtheplanning-basedschedulingap-proach,theself-tuningdynPschedulergeneratesfullschedulesforeachavailablebasicpolicy,measuresthegeneratedscheduleswithaperformancemetric,andfi-nallyswitchestothebestpolicy.Adecidermechanismisinchargeofchoosingthebestpolicyaccordingtotheappliedperformancemetric.Inpreviouspaperswepresenteddifferentdecidermechanisms. Inthispaperweevaluatedtwogeneralenhance-ments,whichcanbeappliedtoalldecidermechanisms.Wecompareddifferentownerandusercentricperfor-mancemetricsfortheself-tuningprocessandstudiedtheirinfluence.Byusingdifferentself-tuningmetricstheobjectiveoftheself-tuningdynPschedulercanbealtered.Additionally,westudiedthebehaviorofcall-ingtheself-tuningprocessatdifferenttimes(halfandfullself-tuning)oftheschedulingprocess.Theevalua-tionisdonebyusingdiscreteeventsimulationwithjobtracesfromrealsupercomputerinstallationsasinput.Studyingthedifferentself-tuningmetricsoneas-sumesthatmostlikelythebestperformanceisachievedbyusingthesamemetricduringtheself-tuningpro-cessandafterthesimulationisfinishedtomeasurealljobs.Thisistrue,theusercentricaverageslowdownweightedbyarea(SLDwA)isthebestself-tuningmet-ricforthreetraces(CTC,SDSC,andLANL).FortheKTHtracetheaverageslowdownweightedbywidth(SLDwW)slightlyimprovestheperformanceslightlyby0.8%.Iftheobjectiveoftheself-tuningdynPsched- uleristooptimizetheownercentricoverallutilizationofthesystem,themakespangeneratesthebestresults,althoughonlyfortheSDSCtrace.Thecharacteristicsoftheremainingthreetracesgenerateequaloveralluti-lizationsforallappliedself-tuningmetrics. Wealsocomparedhalfandfullself-tuning,i.e.call-ingtheself-tuningprocessonlywhennewjobsaresub-mittedoradditionallywhenrunningjobsterminate.Althoughmuchlessself-tuningcallsaredonewithhalfself-tuning,theperformancedifferenttofullself-tuningissmallwiththeadvancedandSJF-preferreddecider.Similartothecomparisonofdifferentself-tuningmet-rics,theSDSCtraceismorevulnerablefortheswitch-ingbehavioroftheself-tuningdynPscheduler.Inpar-ticularthesimpleandFCFS-preferreddecidergenerateverybadresults,whicharealmostthreetimesasbadasfortheotherdeciders.Therefore,iflessself-tuningcallsareintendedbythesystemadministrators,e.g.toreducetheswitchingbehavioroftheself-tuningdynPscheduler,thegeneratedperformanceisonlysightlybehindandhalfself-tuningisagoodcompromise.Weshowedthatingeneralthepresentedself-tuningschedulerwithdynamicpolicyswitchingcanbebene-ficialtocommonlyusedstaticschedulingapproaches.Therefore,wethinkthatself-tuningschedulers,whichareabletoadapttheirschedulingbehavioraccord-ingtothejobcharacteristicsofthecurrentlywaitingjobs,shouldbeimplementedinmodernclusterresourcemanagementsystems.Fromapracticalperspectivetheself-tuningdynPschedulermightcauseproblemsfortheusers,astheschedulingbehaviorofthesystemmightbecomeunpredictable.Forpracticalmattersapolicyswitchingmightonlybedoneatwellestablishedtimes,e.g.onlyonceanhour. Inthefutureitwillbeinterestingtostudy,whetheracombinationofdifferentself-tuningperformancemet-ricsisbeneficial.Forexample,theownercentricmakespanmetricisconsideredwith20%,andtheusercentricaverageresponsetimeweightedbyjobwidthisconsideredwith80%. References [1]D.G.Feitelson.ASurveyofSchedulinginMulti-programmedParallelSystems.Researchreportrc19790(87657),IBMT.J.WatsonResearchCenter,YorktownHeights,NY,1995.[2]D.G.FeitelsonandM.Naaman.Self-Tuning Systems.InIEEESoftware16(2),pages52–60,April/Mai1999.[3]D.G.FeitelsonandB.Nitzberg.JobCharacter-isticsofaProductionParallelScientificWorkloadontheNASAAmesiPSC/860.InD.G.FeitelsonandL.Rudolph,editor,Proc.of1stWorkshoponJobSchedulingStrategiesforParallelProcessing,volume949ofLectureNotesinComputerScience,pages337–360.Springer,1995.[4]J.GehringandF.Ramme.Architecture-IndependentRequest-SchedulingwithTight Waiting-TimeEstimations.InD.G.FeitelsonandL.Rudolph,editor,Proc.of2ndWorkshoponJobSchedulingStrategiesforParallelProcess-ing,volume1162ofLectureNotesinComputerScience,pages65–80.Springer,1996.[5]M.Hovestadt,O.Kao,A.Keller,andA.Streit. SchedulinginHPCResourceManagementSys-tems:Queuingvs.Planning.InD.G.FeitelsonandL.Rudolph,editor,Proc.ofthe9thWorkshoponJobSchedulingStrategiesforParallelProcess-ing,volume2862ofLectureNotesinComputerScience,pages1–20.Springer,2003.[6]A.KellerandA.Reinefeld.AnatomyofaRe-sourceManagementSystemforHPCClusters.InAnnualReviewofScalableComputing,vol.3,Sin-gaporeUniversityPress,pages1–31,2001.[7]D.A.Lifka.TheANL/IBMSPSchedulingSys-tem.InD.G.FeitelsonandL.Rudolph,editor,Proc.of1stWorkshoponJobSchedulingStrate-giesforParallelProcessing,volume949ofLec-tureNotesinComputerScience,pages295–303.Springer,1995.[8]M.Maheswaran,S.Ali,H.J.Siegel,D.Hensgen, andR.F.Freund.DynamicMappingofaClassofIndependentTasksontoHeterogeneousComput-ingSystems.JournalofParallelandDistributedComputing,59(2):107–131,1999.[9]A.Mu’alemandD.G.Feitelson.Utilization,Pre-dictability,Workloads,andUserRuntimeEsti-matesinSchedulingtheIBMSP2withBackfill- ing.InIEEETrans.Parallel&DistributedSys-tems12(6),pages529–543.IEEEComputerSoci-etyPress,June2001. [10]S.Muthukrishnan,R.Rajaraman,A.Shaheen, andJ.E.Gehrke.OnlineSchedulingtoMinimizeAverageStretch.InProceedingsofthe40thAn-nualIEEESymposiumonFoundationsofCom-puterScience,pages433–442,1999.[11]TheplingItanium2ClusteratthePader-bornCenterforParallelComputing(PC2).http://www.upb.de/pc2/services/systems/pling/index.html,April2004.[12]ThePSCPentium3ClusteratthePader-bornCenterforParallelComputing(PC2).http://www.upb.de/pc2/services/systems/psc/index.html,April2004.[13]F.RammeandK.Kremer.SchedulingaMeta-computerbyanImplicitVotingSystem.In3rdInt.IEEESymposiumonHigh-PerformanceDistributedComputing,pages106–113,1994.[14]J.Skovira,W.Chan,H.Zhou,andD.Lifka.The EASY—LoadLevelerAPIProject.InD.G.FeitelsonandL.Rudolph,editor,Proc.of2ndWorkshoponJobSchedulingStrategiesforPar-allelProcessing,volume1162ofLectureNotesinComputerScience,pages41–47.Springer,1996.[15]A.Streit.ASelf-TuningJobSchedulerFamily withDynamicPolicySwitching.InD.G.FeitelsonandL.Rudolph,editor,Proc.ofthe8thWorkshoponJobSchedulingStrategiesforParallelProcess-ing,volume2537ofLectureNotesinComputerScience,pages1–23.Springer,2002.[16]A.Streit.TheSelf-TuningdynPJob-Scheduler. InProc.ofthe11thInternationalHeterogeneousComputingWorkshop(HCW)atIPDPS2002,pages87(bookofabstracts,paperonlyonCD).IEEEComputerSocietyPress,2002.[17]A.Streit.EvaluationofanUnfairDeciderMech-anismfortheSelf-TuningdynPJobScheduler.InProc.ofthe13thInternationalHeterogeneousComputingWorkshop(HCW)atIPDPS,pages108(bookofabstracts,paperonlyonCD).IEEEComputerSocietyPress,2004.[18]ParallelWorkloadsArchive.http://www.cs. huji.ac.il/labs/parallel/workload/,April2004. 因篇幅问题不能全部显示,请点此查看更多更全内容