微智科技网
您的当前位置:首页The Design Space for Data Replication Algorithms in Interactive Groupware

The Design Space for Data Replication Algorithms in Interactive Groupware

来源:微智科技网
TheDesignSpaceforDataReplicationAlgorithmsinInteractive

Groupware

SumeerBholaMustaqueAhamad

CollegeofComputingGeorgiaInstituteofTechnology

Atlanta,GA30332

sumeerb,mustaq

@cc.gatech.edu

TechnicalReportGIT-CC-98-15

Abstract

Interactive(Synchronous)Groupwareencompassesawiderangeofapplications,likecollaborativewhiteboards,texteditors,engineeringCAD(ComputerAidedDesign),DistributedVirtualEnvironments,andmulti-playergames.Averycriticalrequirementforalltheseapplicationsistheneedtosharedata,whichcanbereplicatedtoprovidebetterresponsiveness,fault-toleranceandscalability.Despitetheexistenceofmanysystemsanddistributedalgorithmsforreplicationininteractivegroupware,thereislittleagreementontheshareddataab-stractiontoprovide,oronthegeneralalgorithmicapproach.Similarly,thereisgenerallackofuniformityintheterminologyusedtodescribesuchalgorithms.Thispaperattemptstorectifythissituationby(1)Describingthefundamentaldatarequirementsofinteractivegroupware,(2)Developingadataabstractionwhichisappropriateformeetingtheserequirements,(3)Developingageneralalgorithmicstructureandterminologytoexpressthecharacteristicsofalgorithminstances,and(4)Classifyingthealgorithmsproposedintheresearchliterature.

1Introduction

SynchronousorInteractive1Groupwareisarapidlygrowingareaofcommercialandresearchinterest.Thisin-cludesawiderangeofapplications,fromtraditionaloneslikewhiteboardsandtexteditors,toemergingoneslikeengineeringCAD(ComputerAidedDesign),DIS(DistributedInteractiveSimulation),andmulti-playergames.Byourdefinition,thetermInteractiveGroupwarestandsforanycollectionofapplications,thatallowapotentiallydistributedgroupofuserstocollaborateorcompeteonataskatthesametime.Averycriticalrequirementforalltheseapplicationsistheneedtosharedata,andthatisthefocusofthispaper.

Theinteractivenatureofsuchapplicationsrequiresthattheeffectofauser'sactionisseenbyhimself(responsetimeor

RT)

aswellasotherusers(usertousertimeor

RT

UUT)inatimelymanner.However,theproblemof

providingappropriateand

UUT

isincreasinglydifficultasgroupwareisdeployedinawide-areadistributed

environmentliketheInternet,wherehighcommunicationlatenciesarecommon.Inparticular,end-usersareincreasinglyaccustomedtodirectmanipulationuserinterfaces,whichtypicallyrequireresponsetimesontheorderof50-100ms.However,duetothefundamentallimitationofthespeedoflight,theround-tripdelaytothefarsideoftheplanetisatleast200ms.Othercausesofhighermessagelatenciesaretheuseofmobilewirelesscomputers,andwhenconnectingfromhomethroughmodems.

Thoughtomake

RT

UUT

byitsverynaturehastodependonthemessagelatencybetweenapairofusers,itispossible

lessdependentonmessagelatencybyusingreplicationoftheshareddataatusersites.Thishasthe

potentialtoreduceresponsetimeforactionsthatreadormodifythisdata,sinceauser'sactioncanbeexecutedonherlocalreplica.Thereisgeneralagreementintheresearchcommunityontheneedforreplicationascanbeseenbythelargenumberofgroupwaresystems[9,12,17,16,18,20,5]thatsupportreplication.Inadditiontoreducing

RT,

replicationcanalsoprovideresiliencetofailures,whichcanbecommoninadistributedenvironment.Finally,

manyfactors,includingsmallersizeofmessagesandbatchingofmodifications,canallowreplicationtoconsumefewernetworkresourcesthanacentralizedapproach.

Datacanbereplicatedatallthesiteswhichhaveusersinterestedinthatdataoratsomestrategicallylocatedsites,dependingonthememoryresourcesavailable.Notethatreplicationatallinteresteduserssitesdoesnotconsumemorenetworkresourcesthanreplicationatonlysomesites,asnotificationsaboutthechangeinstateofthedatastillneedtogotoallthesesites,toupdatetheirviews.Therefore,inthispaperwewillonlybeconcernedwithreplicationatallinterestedsites,i.e.fullreplication.

Onecommonmisconceptionaboutreplicationisthatitmakeslifeharderfortheapplicationprogrammer.Weillustratewhythisisnotsobyconsideringthecommonwayinwhichtheseapplicationsarestructured.Theappli-cation,runningonacertainsite,receivesauseractionexpressedasawindowingsystemevent,andtranslatesthis

S

l

C1 C2 C3

Figure1:TotalOrderingthroughaServer

intoanoperationthatneedstobeexecutedontheshareddata.Itthenissuesthisoperationtotheunderlyingdatamanagementsystemwhichexecutesthisoperationonthelocalreplicaandtheotherreplicas.Additionalcomplex-ity,ifany,fortheapplicationprogrammer,comesfromthepotentialforconcurrency.Inparticular,betweenthetranslationstepandexecutionstepofthisoperation,anotheroperation(probablyonethatisconcurrentlyissuedbyanothersite)canbeexecutedontheshareddata.Andthiscanhappeneveninsinglecopysystems!Forexam-ple,considerthetexteditingexampleusedin[9].Thetextbufferisrepresentedasanarrayofcharacters,and2concurrentoperationsareissuedtoinsert

charactersatpositions5and10respectively.ifthefirstoperationgets

toexecuteearlier,thesecondinsertshouldhappenatposition

.Operationtransformationswereproposed

byEllisandGibbsin[9](andformalizedbyCormackin[7]),tosolvethisproblemofpreservingtheintentoftheuser,andalsotoguaranteereplicaconvergence.Wewilldiscussbothaspectslaterinthepaper(Section7.1),andassumefornowthatnooperationtransformationsareneeded.

1.1TheSimplestReplicationApproach

Withreplicationcomestheassociatedproblemofmaintainingthereplicasconsistentwithrespecttoeachother.Thesimplestapproachistoensurethatalloperationsthatmodifythedatagetexecutedinthesameorderatallreplicas,byorderingthemthroughaserver(asdepictedinfigure1).Read-onlyoperationscanstillbeexecutedinstantaneouslyatthelocalreplica.Despitethesimplicityofthisscheme,itsuffersfromsomemajordrawbacks.Firstly,assumingthattheone-waycommunicationlatencybetweeneverypairofsitesis,the

RT

and

UUT

is

.Thismaybetoohighinmanysituationswhere

islargeorwidelyvarying.Secondly,thesystemisnot

fault-toleranttoserverfailure,andtheservercanbeaperformancebottleneck.Finally,applicationscannotbatchoperationsbeforesendingtotheserver,asitwouldaffectRT.Batchingofoperations[5]canbeveryusefultoreducenetworktrafficwhenotherusersarenotreallyinterestedineachotherswork.

Thealgorithmsproposedforsynchronousgroupwareallreducebysomedegreethedependenceontheserver,i.e.thealgorithmsaremoredistributed.MostofthemhaveabestcaseRTandUUTthatislessthan

,butwhether

theyprovideanaveragecaselowerRT,UUTthantheserverbasedapproachdependsonthedistributionofthe

inter-sitenetworklatenciesandtheapplicationscenario.Thoughevaluationofsuchalgorithmsisnotcommon,andisnotthefocusofthispaper,someinitialworkisdescribedin[4].

2

Althoughmanysystemsanddistributedalgorithmshavebeenproposedforreplicationingroupware,thereseemstobelittleagreementontheshareddataabstractiontoprovide,oronthegeneralalgorithmicapproach.Similarly,thereisgenerallackofuniformityintheterminologyusedtodescribesuchalgorithms.Researchershaveapproachedtheproblemfromvariousangles,somecomingfromadatabasebackground,somefromageneraldistributedcomputingbackground,andsomefromhighperformancecomputing(forexample,distributedsharedmemory).Eachoftheseapproacheshasmissedsomecriticalneedofgroupware.Thefirstgoalofthispaperistoillustratethedatasharingrequirementsandcomeupwithadataabstractionwhichcapturesandextendsthepoweroftheexistingabstractions.Next,wedescribeageneralalgorithmicstructureforimplementingthisabstraction,anddiscusstheimportantdesignchoices.Thisisusedtoroughlyclassifythealgorithmsdescribedinthegroupwareresearchliterature,eventhoughsomeimplementadifferentabstractionthantheonewepropose.Thegoalofthispaperisnottoclaimthatalltheproblemshavebeensolved,ortostiflealgorithmresearchfortheseapplicationsbyrestrictingthemtoacertaingeneralstructure.Onthecontrary,wehopethatabetterunderstandingofthedesignspace,andacommonterminologyfordescribingtheworkingofthesealgorithms,willstimulateresearch.

Section2illustratesthedatacharacteristicsoftheseapplicationsusingtwoexamples.Thesecharacteristicsareusedtomotivatethesetofobjectsabstraction,whichisdescribedinSection3.Section4discusseswhyreplicateddatabasescannotbeeasilyadaptedtomeetourrequirements.Section5describesthegeneralstructureofthealgorithms,includingtheglobalorderingandtimestampingproperties,andtheissueofreplica-viewcoupling.Thisisusedinsection6toclassifymanyalgorithmsproposedinthegroupwareresearchliterature.Section7discussescertainsecondary,butimportantissues,includingoperationtransformation,andgranularityofaccess.Finally,weconcludeinsection8.

2SharedDataCharacteristics

Thissectiondescribesthecharacteristicsoftheshareddata.Wefirstillustratewithsomeexamplesofapplicationscenarios.

2.1ApplicationScenarios

Thefirstexampleisofthecollaborativedesignofakey-frameforacomputeranimation,whichcanbepartofthedevelopmentofananimatedmovie,orthedevelopmentofthevisualbehaviorofacomplexentity(e.g.avirtualhuman)inaninteractivesingle-user/multi-uservirtualworld.Thesecondexampleisofsuchadistributedvirtualworldinaction,specificallyofinteractionbetweendistributedentities,someofwhichcouldbeunderusercontrolandsomeautonomous.

Thesetwoexamplesillustratehowmulti-userinteractioncanbeimportantforallstagesofagroupwareappli-cation,fromdesignanddevelopmenttoactualuse.

3

Figure2:ArticulatedModels:StickModelofHuman,LeonardodaVinci'sWing

Key-frame/Snapshotdesign

Weassumethatthekey-framedesigninvolvesarrangementofmultiplerigiden-

titiesinacertain3-dimensionalspace.Inadditiontosimplerigidentitieslikeatableorabottle,thereareothermorecomplexentitieslikehumansorsyntheticcreatures.Suchcomplexentitiesarerepresentedbyanarticulatedmodel[11].Anarticulatedmodelisacollectionofobjectsconnectedtogetherbyjointsinahierarchical,tree-likestructure.Figure2showsasimplestickfiguremodelofahuman,whichcanbeinternallyrepresentedasanartic-ulatedmodel.Rotationabouttheelbowjointinthismodelwillaffectnotonlythepositionofthelowerarmbutalsothepositionoftheobjectsbelowitinthehierarchyi.e.thehandandfingers.Eachjointhascertaindegreesoffreedom,andforacertainaxisofrotationthereisaconstraintontherelativeanglebetweenthetwosticksconnectedbythatjoint.Forexample,theelbowjointhas2degreesoffreedom,witharangeofrotationfrom0to180degreesforbothaxes.

Akey-framedesigncaninvolvemanytaskssuchas,designinganarticulatedmodelofacomplexentitybyaddinganddeletingsticksandjoints,addinganddeletingentitiesfromthe3Dspace,translatingentitiestocertainpointsinspace,andorientingentitiesorpartsofentitiesbyrotatingaboutjoints.DistributedVirtualWorld

Adistributedvirtualworldhasmanyentities,someunderdirectusercontrol,some

autonomouswithprogrammedbehaviorpatterns,andsomeonlycontrolledbythelawsofphysics2(e.g.atennisball).Seechapter1of[19]foragoodintroduction.Mostentitieshaveaprimarysitewhichcontrolsthatentity,forexample,forausercontrolledentitytheprimarysiteisthesiteofthatuser.Primarysitesforautonomousentitiesmaybechosensoastobalancetheload.Asitecontrollinganentitycomposedofmultipleobjects(e.g.acomplexentity)hastoupdateasubsetoftheseobjectsinresponsetouserinput,orprogrammaticcontrol(e.g.tosimulateahumanwalking).Inaddition,non-primarysitescanissueupdatestoanentityinthecaseofentityinteraction,for

example,collisionbetweenentities.Properorderingbetweensuchconcurrentupdatesisimportanttodeterminethefuturebehavioroftheinteractingentities.Forexample,thecollisionofatennisballwitharacquetmustbeidenticallyobservedatallsites,includingthevelocityofeachentity,andtheexactpointofimpact.Entitiesmayalsobecreatedanddeleted,forexample,firingofamissilecausescreationofanewmissileentity,andafterhittingatargetthisentityisdeleted.

2.2Characteristics

Thefollowingarethedatacharacteristicsofgroupware.

1.Dynamicstateandsizeofshareddata:Theamountofdatabeingsharedischangingduringthecollab-oration,withnewdataitemsbeingaddedandexistingonesbeingdeleted.Also,thestateofthisdataiscontinuouslychanging.

2.Atomicityofaccess:Userscanatomicallyreadandwriteasubsetofthisdata.Forexample,rotatinganelbowjointchangesthecoordinatesofthelowerarm,handandfingers.Byatomicity,wedonotmeanfailureatomicityasintransactions,butthattheexecutionofanoperationisnotinterleavedwithotheroperationsatareplica.Failureatomicity,asimpliedbydatabases,isnotneededbecausereplicationgivesusfaulttolerance.Also,asthereplicasarenotpersistent,wedonotneedaheavyweightdistributedcommitprotocollike2-phasecommit.

3.Data-dependentaccesspatterns:Insomecases,thesubsetoftheshareddatathatisaccessedbyanatomicoperation3dependsonthestateofthedatawhentheoperationisexecuted.Forexample,assumethatwhilethearmofahumanmodelismoving,thehandgrabsadoor.Futureupdatesonthearmwillalsoneedtoupdatethepositionofthedoor,andthedegreesoffreedomofthedoorcanconstrainthedirectioninwhichthearmcanmove.

Despitesuchdata-dependentaccesspatterns,inmanycasesitispossibletoquiteaccuratelypredicttheaccessinformationwhenanoperationisissued.

4.Short,IncrementalOperations:Theexecutiontimeforeachoperationisshort,asuserswanttoseetheresultsoftheactionsofotherusers.

5.Commutativity:Manyoperationsmaybecommutative,andexploitingthecommutativityofoperationscanallowthedatamanagement/consistencyalgorithmtorelaxtheorderinwhichoperationsareexecutedatreplicas.Operationsarecommutativewhentheyaccessdifferentpartsofthedata,or,eveniftheyaccessthesamedata,theiraccessescommute.Forexample,inanarticulatedmodelofahuman,concurrentrotationsabouttheelbowandthewristarecommutative(iftherearenoexternalconstraintsonthemotion).

shared datasink=source=x=500handleUpdate(){ source.sub(x); sink.add(x);}updatename=‘Alice’checking=savings=amount=400amount=2000Figure3:Updatingano-set

3SetofObjectsAbstraction

Asetofobjects(referredtoasano-set)abstractionisusedtorepresentacollectionofrelatedshareddataobjects.Byobjects,wedonotnecessarilymeanobjectsintheOO(Object-Oriented)sense,wemeanthattheyarecompositei.e.composedofprimitiveobjectslikeinteger,realetc.Theseobjectscanhavepointerstootherobjectsinthesamesetandthesepointerscanformcycles.Pointersprovideaconvenientwayfortheapplicationprogrammertoexpressshareddata-structures.Sitesinacollaborationcandynamicallyconnectordisconnectfromano-setandobjectscanbeaddedtoanddeletedfromtheo-set.Thesitesthatarecurrentlyconnectedtotheo-setarereferredtoasparticipantsintheo-set.Ano-setisfullyreplicatedatalltheparticipantsites.Partialreplicationofano-setisoutsidethescopeofthispaper.Objectsaremodifiedusingupdatesthatcanreadandwriteasubsetofobjectsintheo-set.Themainoperationsonano-setare:

addObject(Objecpointerstoeachother.

t[]o)Addsoneormoreobjectstotheo-set.Theobjectsbeingaddedcanhave

deleteObject(Obbeingdeleted.

ject[]o)Deleteoneormoreobjectsfromtheo-set.Theapplicationisrespon-

sibleforensuringthattherearenodanglingpointersfromtheremainingobjectsintheo-settothesubset

updateSet(Updatee)Thisissuesanupdatetotheobjectsintheo-setwhichcanreadandwritea

subsetoftheseobjects.Theupdateobject,,encapsulatestheactualupdatei.e.ithasthestateandlogictocarryouttheupdate.Thisstateincludespointerstocertainobjectsintheo-set,andtheupdatecanonlyaccessobjectsreachablefromthesepointers.ThehandleUpdatemodifiestwoofthethreesharedobjectsintheo-set.

Thesethreeoperationsareasynchronous,i.e.theyarescheduledforexecutionlocallyandremotelyatsometimeinthefuturebytheunderlyingreplicamanagementalgorithm.Thedeleteoperationrequirescoordinationbetween

6

methodof,whenexecuted,performs

theactualupdate.Figure3showsanupdateobjectwithitsassociatedhandleUpdatemethodwhich

processesbeforethedeleteisdone,toensurethataprocessdoesnotreceiveanupdateonanobjectthathasalreadybeendeleted.Inadditiontotheabovethreeoperations,thereareread-onlyoperationswhichareonlyexecutedonthelocalsite.Theseareneeded,forexample,torefreshtheviewonthedata(theGUIshowingthedatatotheuser).Welimitourdiscussiontoonlytheaddandupdateoperations.Notethatanapplicationmaybeconnectedtomultipleo-sets,butthatdifferento-setsdonotshareobjects.

3.1AccessInformationforUpdates

Asupdatesmaybeissuedconcurrentlybymultiplesites,theunderlyingreplicamanagementalgorithmneedstoensuresomeorderingoftheseupdates,sothateachreplicastateisconsistent.Wewilldiscussconsistencyindetailinsection5,butatminimumwerequirethatthereplicasshouldconverge.

Toensureconvergence,theconsistencyalgorithmneedstoknowwhatobjectsareaccessedbyanupdate,andhowtheyareaccessed.Itisusefultoallowtwooptionsofspecifyingthisinformation.Predeclaredaccessinformationisprovidedbyanapplicationwhenitissuesanupdate,andgivesasupersetoftheobjectsthattheupdatemayaccess.However,fordata-dependentoperations,theaccessinformationcanbehardtoaccuratelypredict.Inthiscase,predeclaredinformationisnotprovidedandtheupdatehastobeexecutedtodetermineaccess.Astheupdateexecutes,wheneveritisgoingtoaccessanobjectinsomewayithasn'taccessedbefore,itnotifiestheconsistencyalgorithmbyinvokingamethod.Whenthenotificationmethodreturns,theupdatecancontinuewithitsexecution.Werefertothisasexecution-timeaccessinformation.

Wewillonlyconsiderread,writeaccessinformationfortheobjectsi.e.theapplicationorupdateinformstheconsistencyalgorithmaboutwhichobjectsareread,andwhicharewritten.Thismayseemveryrestricted,becauseitseemsnottocapturecommutativityandothersemanticswhichcanbeusedtorelaxtheoperationordering.Forexample,twoconcurrentoperationsthatincrementthesameintegerwillaccessthatintegerinwritemode,andthereforewillbeordered.However,wewillseelaterthatduetothenatureofthegeneralalgorithmicstructure(specificallythatoperationsarenotinterleavedatareplica,andonlyonenotificationissentperoperation),itispossibletospecifyaccessinformationthatisfactuallyincorrect,butenoughtoguaranteeconvergence.Forexample,intheabovecasewecanspecifythatthetwoincrementsarejustreadingtheinteger.Anotheroperationwhichjustoverwritestheintegervaluewithanewvaluecanbespecifiedtobewritingtheinteger,andthisisenoughtoguaranteeconvergence.

Generalizationbeyondread,writeaccessescanbedonebydefiningacompatibilitymatrixwhichexpresseswhichcombinationofaccessesconflict.Section2.5of[3]discussesthisissue,andisalsoproposedforcollabora-tiveapplicationsin[14].

7

3.2DecouplingAtomicityandLocking

Intheo-setabstraction,anoperationencapsulatestheunitofcomputationthatshouldbeperformedatomically.Accessinformation,suppliedintwopossibleways,isusedbythereplicamanagementalgorithmtoprovidecon-sistencybyimposingacertainorderingontheoperations.WewillrefertosuchalgorithmsasOrderingalgorithms.

Thisapproachisdifferentfromexplicitlockingbytheapplicationtoachievebothatomicityandconsistency.Theexplicitlockingapproachisundesirablebecauseofseveralreasons.

1.Thesemanticsofthelocks(forexample,optimisticorpessimisticlocks)usedhavetobefixedwhentheapplicationiswritten.Thisdrasticallyreducestheflexibilityinchoosingaconsistencyalgorithmatrun-time.Thefollowingexampleshowshowthereceiptofawindowingeventishandledwithexplicitlockingwhenusingpessimisticlocks.

while

(e=getEvent

(e.type)

{())

{

switch...case

MOUSE_DOWN:

lock(x);...lock(y);...unlock(y)unlock(x)...}}

;;

//modify

y

//modify

x

2.Changingthesemanticsofthelockscanrequireasignificantrestructuringoftheapplicationcode.Thefollowingexampleshowshowtheabovecasewouldbehandledwithoptimisticlocks.

case

MOUSE_DO

);

//modify);

//modify

yx

WN:

optLock(x...optLock(y...

if(!assump...}

if(!assump...}unlock(y)unlock(x)

;;

tionValid(x)){//recovery

code

tionValid(y)){//recovery

code

8

3.Theapplicationdeveloperhastobeawareofthepotentialfordeadlocksiflocksonmultipleobjectsareacquired.

Incontrast,ascomputationisencapsulatedasoperations,andoperationsareveryshort-duration,anorderingalgorithmcanundo(byjumpingbacktoanearliercheckpointedstateorusinganundooperation)andredoanoperationasmanytimesasneeded.Undoandredoisimportantforoptimisticalgorithms,andtobreakdeadlocksbyabortingexecution.Thisapproachisalsodifferentfromgeneraloptimisticschemesindistributedsystems,likeHOPE[8],whichcannotmakeassumptionsaboutshortencapsulatedoperations,andthereforeprovidelanguage-levelconstructstoexpressoptimismandrecovery.

4WhynotReplicatedDatabases?

Lookingatthedatacharacteristics,itmayseemthatupdateoperationsaresimilartotransactions,andthatatraditionalreplicateddatabaseshouldsufficefortheseapplications.Indeed,someideascanbeborrowedfromreplicateddatabases,buttherearemanykeydifferenceswhichleadtodifferentalgorithmicdesignchoices.

Shortandincrementaloperations:Thisresultsinthefollowing

1.Onenotificationperoperation:Asoperationsareofshortduration,thesourcesitecansendonlyonenotificationforthatoperation.Thisgivesusaninterestingandveryimportantchoice,whichisnotavailableinmostreplicateddatabases;thatofsendingtheoperationitselfinsteadofitseffect.Byeffectwemeanthatonlythewrites,i.e.thepartsofobjectswithnewvalues,arepropagated.Thisisextremelyimportantforoptimisticprotocolsbecauseitdecouplesundosatthesourceandothersites.2.Noexplicitaborts:Theapplicationsdonotabortanyoperation.Anyapplicationlevelundoofanoperationisdonebyissuinganotheroperation.

Non-persistentData:Theunderlyingalgorithmcanensurethatifonereplicasitereceivesanoperation,anddoesnotcrashforasufficientamountoftime,everyotherreplicasitethatisrunningwilleventuallyreceivethatoperation.Thisallowsustolocallycommitanoperation.Localcommitwouldbeproblematicifexplicitabortswerepermitted,orifthedatabeingmanagedwaspersistent,whichisnottrueinourcase.Persistenceisoutsidethescopeofthealgorithmsconsideredinthispaper.However,userscansavetheirreplicatoapersistentstore.Andthiscanbeusedlaterfornoncollaborativework,ortoinitializeothero-sets.Unlikeinconventionaldatabases,webelievethatleavingthistotheapplicationisacceptable.

Operationexecutionnotinterleavedatareplica:Thisisunliketraditional(non-replicated)databases,whichdealwithconcurrentoperationexecution,andallowssimplificationinalgorithmdesign.Moreimportantly,thisimpliesthattheoperationaccessinformationisonlyusedfordecidingonanoperationorderingthat

9

resultsinreplicaconvergence.Therefore,theaccessinformationprovidedtothealgorithmcanbefactuallyincorrect,butenoughtoguaranteeconvergence.Forexample,considerrotationabouttheelbowjointforanarticulatedhumanmodel.Assumethattheanglesofthejointsbelowtheelbow,inthetreerepresentationofthearticulatedmodel,arenotaffectedbytherotation.Eventhoughthisrotationischangingthepositionsofthesejoints,itiscommutativewithotherrotationsaboutjoints.Onlyamodificationtotheconstraintontheelbowjointisnotcommutativewiththisrotation.Therefore,thisupdateneedstoonlysaythatitaccessedtheelbowjoint.However,ifoperationswereallowedtointerleavethisaccessinformationwouldnotbeenough.

PredeclaredorExecution-timeaccessinformation:Unlikedatabaseswhichprimarilyuseexecution-timeaccessinformation,manygroupwareapplicationscanquiteaccuratelypredicttheaccessesoftheirupdates(especiallyinlightofthepreviousbullet).Somedatabasesdousepredeclaredinformationasasupplementtoexecution-timeinformationasitallowsthemtorelax2-phaselockingbyreleasinglocksearly.Predeclaredaccessinformationallowsustototallydecoupleorderingandexecution,whichcansignificantlyreduceabortsandspeedupcommitincertainalgorithms.

Databaseresearchershavelookedattherequirementsofasynchronousgroupware[2,1],andproposedsome

solutions,andsoitisinstructivetounderstandthedifferencesbetweentheirapproachandtheonewearguefor.Theirresearchhasprimarilylookedathowtoallowcooperationbetweenlong-livedtransactions.Thisrequiresrelaxingtheatomicityrequirementoftransactions.Synchronousgroupwareiscompletelyopposite,withveryshort-livedatomicoperations.Thiscanbepartlyexplainedbytheobservationthatsharingbetweenparticipantsinasynchronousgroupwareisrarerandatcertainwelldefinedpoints.Forexample,checkingintheupdatedsourcecodeofacodemoduleisnotfrequent.Thesewell-definedpointsusuallycorrespondtocommitofalong-livedtransaction.Theneedforcooperationbetweenthesetransactionsarisesbecausesometimesatransactionhasnotcommittedatasharingpoint,butanothertransactionneedstoreadsomethingwrittenbythisuncommittedtrans-action.Forexample,checkinginacertaincodemoduleneededbyanotheruser,whichcorrespondstoreleasingalockearlyinalong-livedtransaction.Also,cooperationbetweentransactionsisusuallytriggeredbytheusers.Anotherwayoflookingatthedifference,isbyconsideringthecooperationbetweenusersascontentionbetweentransactions/operations.Itseemsthatcontentioninasynchronousgroupwareisrarer,butcouldlastforalongerduration.Therefore,userinterventionisfeasible.Incontrast,contentioninsynchronous/interactivegroupwaremaybemorefrequentandtransient.Therefore,thealgorithmshavetoadapttothesesituationswithminimumuserintervention.

10

SITE 1User A

action

View

TranslateIssueOperation

SITE 2User B

action

View

Translate

IssueOperation

FlexibleCouplingReplicaExecuteOperation

ReplicaExecuteOperation

Consistency AlgorithmFigure4:SystemArchitecture

5GeneralAlgorithmicStructure

Figure4showsasimplifiedviewofthesystemarchitecture.Theuserperformsanactionwhichistranslatedintoanoperationbytheapplication4andthenissuedtotheconsistencyalgorithm,whichschedulesitforexecution.Read-onlyoperationsareonlyexecutedonthelocalreplica,whileothersareexecutedatallreplicas.Thenotificationfromthereplicatotheviewdependsontheircouplingandisdiscussedinsection5.4.

5.1GlobalOperationOrdering

Mostalgorithmsdefineaglobalorderingontheoperationsthatmodifytheshareddata,bytimestampingthemattheissuingsite.Aftertimestamping,anoperationisdistributedtoallsitesincludingtheissuer,whichthenexecuteit.Thisisasimplifiedstructurewhichisusefulfordescribingthehigh-levelcharacteristicsofalgorithms.However,wewillseemanyvariationslater,forexample,insomecasestimestampingandexecutionatthesourceoverlap,inothersituationsthetimestampingmayabortetc.Wealsoassumethattheoperationitself,andnotitseffect,isdistributedtoallsites.Wediscussthisassumptioninsomedetailinsection5.3.

Theglobalorderingdefinedbythetimestampscanbeatotalorder,butthisisusuallyunnecessary.Thereforeweassumethattheglobalorderingisapartialorderi.e.theorderingrelationisirreflexiveandtransitive(atotalorderisaspecialcaseofapartialorder).Ifasiteexecutesoperationsinatotalorderthatrespectsthisglobalorder,itiscorrect.Theglobalorderingshouldsatisfycertainproperties,forexample:

1.ConvergenceProperty:Executionoftheoperationsinanytotalorderthatrespectstheglobalorderwillresultinthesamestate.Thispropertyisnecessaryforthereplicastoconverge.

2.ProcessOrderingProperty:Operationsissuedbyasite/processaregloballyorderedintheordertheywereissued.

3.CausalOrderingProperty:Theglobalorderingsatisfiescausality.Allthealgorithmsdiscussedinsection6satisfythefirsttwoproperties.

Anoperationcommitsatasite,when

knowsthatithasexecutedalloperationsprecedingintheglobal

order.Ifasiteexecutesoperationsonlyaftertheycommit,itisguaranteedtobecorrect.Thisiswhereitisimportanttodistinguishbetweenoptimisticandpessimisticalgorithms.Apessimisticalgorithmonlyexecutesanoperationat

afteritcommitsat,thereforeitisguaranteedtobecorrect.Anoptimisticalgorithmdoesnotwait

forcommit,howeveritmighthavetoreorderoperationexecution,byundoingandredoingoperations,whenitnoticesthatithasmadeamistake.Ifthisreorderingcausessurprisetotheuser,itissaidtohavecausedjitterintheview.Ofcourse,insomecases,surprise,andhencejitter,issubjective.However,themainreasonwhyoptimisticalgorithmsarenotalwayssuitableisthepotentialforjitter,whichmaynotbeacceptabletotheuserincertainsituations.Asecondaryreasonisthatifalotofreorderingisnecessary,itmaybecomputationallytooexpensive.Notethatthisreorderingistotallylocaltoasite.

Anotherdesignchoiceforoptimisticalgorithmsistoonlydetectpotentialdivergenceofreplicas,andtoprovidemechanismstotheuserstomanuallyfixit.

5.2NatureofTimestamping

Orthogonaltotheoptimisticversuspessimisticchoiceisthenatureofthetimestampsassignedtotheoperations.1.AccessIndependent,Dependenttimestamps:Independenttimestampsdonotdependonthedataanoperationaccessesorthenatureoftheaccessi.e.readorwrite.Forexample,aglobaltotalorderingofalltheoperationscanbedefinedbyusingLamportclocks[13]andsiteidentifierstotimestampoperations.Independenttimestampsbydefaultdefineatotalorder.Dependenttimestampsusetheaccessinformationofanoperationfortimestampingandthereforecandefineapartialorderwhichisnottotal.Aweakerorderingcanleadtofastercommitiftheoverheadtocomeupwiththisorderingisnottoohigh.

2.TimestampPrecision:Thispropertyisusefultodeterminewhenanoperationhascommitted.Informally,atimestampisprecisewhenbyknowingthetimestampsofthealreadyreceivedoperations,andbylookingatthetimestampofanewlyreceivedoperation,say,asitecanaccuratelydeterminewhetherthereare

anyoperationspreceding

intheglobalorderthathavenotyetbeenreceived.Theadvantageofprecise

timestampsisthatasiteneedstoonlyreceivemessagesfromsiteswhichhaveissuedoperationspreceding

intheglobalorder.Therefore,atemporarilydisconnectedsitecannotslowdowncommitofothersites.

Usually,animprecisetimestampimpliesaneedtocommunicatewitheveryoneforcommit.TimestampsbasedonLamportclocksareimprecise,becauseclocksatdifferentsitesmaygeneratethesametimestampforconcurrentlyissuedoperations.However,ifacentralserverwithacounterisusedtoassigntimestampsto

12

alloperations,byincrementingthecounterbyoneaftereachtimestampassignment,theresultingtimestampsareindependentandprecise.Vectorclocksusedfortimestamping,whichdefineacausalorderontheoperations,areprecise(However,justcausalorderingdoesnotalwayssatisfytheconvergenceproperty).3.Instantaneoustimestamping:Analgorithmdoesinstantaneoustimestampingifitneverneedstocommu-nicatewithothersitestoassignatimestamptoanoperation.Precisionandinstantaneousnessareatcrosspurposes.Optimisticprotocolsbenefitfrominstantaneoustimestampingasoperationscanbequicklytimes-tampedanddistributed.

4.Aborts:Itispossibleforthetimestampassignmenttobeabortedincertainalgorithms(AnexampleisthealgorithmusedintheDECAFsystem,discussedinSection6).Timestampingofanoperation

can

beabortedwhenthetimestampingisonlypartiallycompleted,orafteritisfinished.Inthefirstcaseonlytheissuerisaffectedastheoperationhasnotyetbeendistributedtoothersites,andusuallyoccursfordependentandprecisetimestamps.Inthesecondcaseothersareaffectedtoo,astheoperationhasbeendistributedtoothersites.Timestampabortioncanincreasecommittime,andwhenusingexecution-timeaccessinformationcausesexecutionundoattheissuerwhichcanresultinjitter.

Ideally,wewantanalgorithmthatusesdependent,precisetimestampswithtimestampingthatisalwaysinstan-taneousandabort-free.Unfortunately,precisionandinstantaneousnessareconflictingrequirements.However,notalltimestampingpropertiesareequallyimportantinallinteractionscenarios,thereforeasuiteofalgorithmswhichsatisfydifferentscenarioshavebeendeveloped.

BeforediscussingtheactualalgorithmsinSection6,wediscusssomeotherissuesrelevanttoallorderingalgorithms.

5.3Distributingeffectorupdate

Intheabovediscussionweassumedthattheupdateoperationitselfwasdisseminated.Thismeansthatthelogicoftheupdate,i.e.thefunctionrelatingtheobjectvaluesreadtotheobjectvalueswrittenisdisseminated.Anotheroptionwouldbetoonlydisseminatetheeffect,i.e.theobjectvalueswrittenbyanupdate.Therearesomesituationsinwhichpropagationoftheeffectmaybenecessary.

1.PartialReplication:Assumethatano-setissomehowonlypartiallyreplicated,i.e.noteveryreplicasitehasalltheobjectsthataremembersoftheo-set.Althoughpartialreplicationisnotconsideredinthispaper,itcanbeimportantincertainapplications.Supposeasiteissuesanupdatewhichreadsobjects

and

and

thencomputes

and

.Ofcourse,thissitehasreplicasofboth

and.However,ifthis

updateisdirectlypropagatedtoanothersitewhichdoesn'thaveoneofthetwoobjects,itcannotbeexecutedthere,astheobjecttobereadisnotavailable.Insuchcases,disseminatingtheeffectbecomesessential.

13

2.Heterogeneity:Incertainheterogeneousscenariositmaynotbepossibletoexecuteanupdateatothersites,becauseofdifferentmachinearchitectures,operatingsystemsetc.However,thisisbecominglessofaproblemwiththewidecommercialavailabilityofcommonexecutionenvironmentsliketheJavaVirtualMachine.

However,disseminatingtheeffecthascertaindisadvantages.

1.CouplingRollback:Inanoptimisticprotocol,everysite,includingthesource,executestheupdateoptimisti-cally.Theeffectoftheoptimisticexecutionatthesourceiswhatisdisseminatedtoothersitesandappliedthere.Therefore,ifthesourcehastoundotheexecutionofthisoperation,allthesiteshavetoundotoo.Thiscoupledrollbackalsoincreasesnetworkcommunicationastheeffecthastobesentagain.

2.DelayingCommit:Inapessimisticprotocol,thesourcecannotsendouttheeffectuntilitexecutestheupdate.Anditcannotexecutetheupdateuntilitcommitsatthesource.Thiscyclicity,thoughnotadisaster,cancauseslowdownincommit.ConsiderthefollowingpathologicalcaseofaprotocolusingLamportclocks.Threeoperationsclockvalueis

areissuedconcurrentlybysites1,2and3,eachofwhosecurrentLamport

.Thereforeareassignedtimestamps

respectively,with

.Thefirstoperationtocommitwillbe

.Onlyafter

commitsanditseffectis

receivedbysite2,can

commitanditseffectbesentout.

willcommitonlyafterboththeeffectsof

and

havebeenreceived.Thesequentialcommitbehaviorinthisexamplegreatlyslowsdowncommit.

5.4Replica-ViewCoupling

Inreality,usersdonotseethestateoftheirreplicadirectly,butthroughviewsdisplayedonsomeoutputdevice(likeacomputerscreen),anditistheconsistencyandresponsivenessofthisview,andnotthedata,thatisrelevanttotheuser.Therefore,thenatureofthecouplingbetweenthereplicaandtheviewsisveryimportant.Instandardsingleuserapplications,wheneverthereisachangeinthestateofcertaindata,theview(s)attachedtothatdataarenotified,whichthendisplaythenewstate.Ifsuchanimmediatecouplingisusedinthepresenceofreplication,thecharacteristicsofthedataconsistencyalgorithmaredirectlyexposedtotheuser.Weakeningthecouplingcanbeusefulforhidingortransformingsomecharacteristicsofthesealgorithms.Forexample,anoptimisticconsistencyalgorithmmayneedtoreordersomeoperationsbyundoingandredoingtheirexecution.Itisusefultohidetheintermediatestagesinthisundoandredofromtheuser.Weakenedcouplingshouldcontinuetoguaranteethatonquiescence,theviewshowsthecurrentstateofthereplica.Therefore,withconsistencyalgorithmsinwhichthereplicasconvergeatquiescence,theviewswillalsoconverge.

TheDECAFsystem[20]pioneeredtheconceptofflexiblecouplingandisprobablythemostextremeexampleofhidingthecharacteristicsoftheunderlyingdataconsistencyalgorithm.DECAFusesanoptimisticconsistency

14

algorithmwithundoandredotoguaranteeconvergenceofdata.TheprogrammerspecifiestheshareddatainadeclarativemannerbycomposingitfromtheDECAFsuppliedprimitiveobjects,usingDECAF'scompositionconstructs.Relyingonthistotalknowledgeofthedatabeingshared,thealgorithmefficientlymaintainsmultipleversionsofthedata.Thisallowscompletecontrolofwhatdataversionstoshowintheview.DECAFsupportstwotypesofviews:optimisticandpessimistic.Pessimisticviewsonlyshowtheeffectofcommittedtransactions,whileoptimisticviewsshowthecurrentstateofthereplica.

5.5InitializinganewParticipant

Anewparticipant,i.e.onewhoisconnectingtotheset,needstogetaninitialstateofthesetandshouldreceivealloperationsthatareorderedafterthisinitialstate.Itiseasytogettheinitialstatefromanyoneofthecurrentparticipants,butthecriticalproblemisensuringthatthenewparticipantdoesnotmissanyoperationsi.e.operationsorderedaftertheinitialstate.Thismayrequiresomesynchronizationbetweensites.Webrieflyoutlinetwopossiblesolutions.

Inasystemwithacentralserverforroutingallmessagesbetweenparticipants,thesolutiontothisproblemisrelativelysimple.Thiscentralservercanmaintainthestateoftheset,eitherbyactuallyexecutingtheoperationsonitsownreplica,orbyperiodicallyrequestingthereplicastatefromaparticipant.Inthelattersituation,itcanlogalltheoperationsthathavebeensentbytheparticipantsbutnotexecutedonthereplicastateithas.Anewparticipantconnectstothesetbysendingamessagetothiscentralserver.Theserverreplieswiththereplicastateandthelogofoperations.Inaddition,alloperationsreceivedbyitaftersendingthisreplyarealsoforwardedtothisnewparticipant.

Inasystemwithdirectpeer-to-peercommunication,aflushprotocolcanbeusedtosynchronizethepartici-pants,beforelettingthenewparticipantjoin.Duringtheflush,nonewoperationsmaybeissued,sothatthenewparticipantdoesnotmissanyoperations.Aflushprotocol,liketheoneusedforvirtualsynchronyinISIS[6],canalsobeabuildingblockforprovidingfaulttolerance.Insituationswheresomeofthecurrentparticipantsareonlylooselyconnectedwiththerest,theflushmaytakeawhiletocomplete.Insuchcases,therequirementthatnooperationsbeissuedduringthetimetheflushisinprogressmaybetoosevereontheparticipants.Therefore,itmaybenecessarytoutilizeweakersynchronizationprotocols,likeweakvirtualsynchrony[10].

6Classificationofproposedorderingalgorithms

Thissectionclassifiesanddiscussesthereplicamanagementalgorithmsdescribedintheinteractivegroupwareresearchliterature.Figure5showstheclassificationaccordingtothenatureoftimestamping.

15

AccessIndependentAccessDependent

CentralServer

Hybrid

Imprecise,InstantaneousPrecise,Non-InstantaneousImprecise,InstantaneousPrecise,Non-Instantaneouse.g. 2PT, n2PT-ir

e.g. Villa, ...

(In)Dependent,(Im)Precisee.g. DECAF

e.g. dOpt,e.g. Stamp ServerORESTE, COAST

Figure5:AlgorithmClassification

6.1CentralServer

Thisisthesimplestapproach,andhasbeenusedininnumerablesystems.Itusesacentralserver,towhichalloperationsaresentbytheclients.Theserverdistributestheoperationsreceivedtoalltheclients(includingtheclientwhichissuedthatoperation).Thisimposesaverysimpletotalorderingontheoperations.Despiteitssimplicityofimplementationandsimpleboundsonapproach.

VillaTheVillasystem[5]providesanabstractionverysimilartotheo-setabstractiondescribedinthispaper.However,itonlyallowspredeclarationofaccessinformation.Ano-setcanbeeitheroptimisticorpessimistic.Forboth,theserverisusedtoconstructtheglobalorderingoftheoperations.Inpessimisticmode,anoperationisexecutedonlyafteritisreceivedfromtheserver.Inoptimisticmode,anoperationissuedatasite(say)is

RT

andUUT,itsuffersfromdrawbacksasdescribedin

section1.1.Weconsideroneexampleofsuchasystem,Villa,becauseofitsuniqueconsistency(orlackthereof)

immediatelyexecutedandalsosenttotheserver.Alogoftheaccessinformationofeachoperationexecutedby

iskeptat.Whenanoperationthatitissuedisreceivedfromtheserver,

examinesthislogtoseeifthere

havebeenanyconflictswhichcouldhavecausedreplicadivergence.Ifyes,itnotifiestheapplicationaboutwhichobjectsmaynolongerbeconsistent.TheVillasystemprovidesacloningandreinitializationmechanismfortheapplication/usertofixtheincorrectreplicas.

6.2Independent,Precise

Weconsideraclassicalgorithm,againonewhichhasbeenimplementedininnumerablesystems.

StampServerThebasicalgorithmisavariantofthecentralserveralgorithm.Aservermaintainsacounterusedtotimestampalloperations.Aclientsendsarequesttotheserverforatimestamp,whichreturnsthenextvalueofthecounter.Thisisusedbytheclienttotimestamptheoperation,andthensendittootherclients.Assumingthatone-waylatenciesbetweenclientsandbetweenclientsandtheserveris,theminimumpossibleRTandUUTis

and

respectively.Howeverifthenumberofactiveparticipantsisverylow,itcanbeimprovedbydynamically

16

movingthecountertoactivesites.

6.3

dOpt

Independent,Imprecise

EllisandGibbs[9]wereoneofthefirsttousereplicationtoimproveresponsetimeingroupware.They

consideraprogrammingmodelinwhichtheshareddata,forexampleadocument,isrepresentedbyasingleobjectreplicatedateachparticipant'ssite.Operationsissuedbyparticipantsareusedtomodifythisshareddata.Theychoosenottouseexplicitlockingformaintainingconsistencyforvariousreasons,oneofthembeingthatchoosingthegranularityofthelocksmaybeaproblem.Theyguaranteecausalorderingbetweenoperationsusingvectorclocks.However,thiscausalorderingisnotsufficienttoguaranteeconvergence,ascausallyconcurrentoperationswhichmodifythesamedatamaybeexecutedinadifferentorderatdifferentreplicas.Tosolvethisproblemtheyuseanoperationtransformationalgorithm(dOpt),inwhichanoperationmaybetransformedbeforeexecutionsuchthatexecutingtwooperationsindifferentordersatreplicasdoesnotcausedivergence.Theyprovidetransformationoperationsforasimpletextoutlineeditor,Grove.Later,researchersindependentlydiscoveredanerrorinthealgorithm.Cormack[7]hasformalizedthenotionofoperationtransformationsandprovidesacorrectalgorithm.Anadvantageofthisapproachisthatundoingoperationsisnevernecessary.However,operationtransformationforconvergencealsohascertaindisadvantages.Wediscussthistopicinmoredetailinsection7.1.ORESTE

KarsentyandBeaudouin-Lafon[12]describetheORESTEalgorithm,whichconsiderstheshareddata

tobeasetofobjectswhichisfullyreplicatedatalltheparticipantsites.Operationscanaddobjectstotheset,deleteobjectsfromtheset,orcanupdateasingleobjectintheset.TheydefineatotalorderingoftheoperationsusingtimestampsbasedonLamportclocks.However,eachsiteexecutesoperationsoptimistically,thereforeitmightreceiveoperationswithalowertimestampafterithasexecutedacertainoperation.Whenthishappens,undoandredoofoperationsmaybeneededtoguaranteereplicaconvergence.Theyuseinformationaboutcommutativityandmaskingofoperationstominimizeundos.Notethatthetimestampingisinstantaneousandabort-free.Theoptimisticapproachprovidesinstantaneousresponsewiththepotentialforjitter.Also,asorderingisbasedonLamportclocks,communicationwitheverysiteisrequiredforcommit.COAST

TheCOASTsystem[18]fullyreplicatesalltheobjectswhicharepartofashareddocument.Itremoves

therestrictionofORESTEbysupportingtransactionswithACIDproperties,whichcanaccessasubsetoftheobjects.TheorderingofthesetransactionsisdoneusinganalgorithmsimilartoORESTE.However,itisnotclearfromthepaperiforhowcommutativityandmaskinginformationforthesetransactionsisprovidedbytheapplication.

17

6.4Dependent,Precise

2PTAdaptedfromaclassicdatabaseapproach,theperformanceofthisalgorithmisanalyzedwithsyntheticworkloadsin[4].Itsupportstheo-setabstraction,andallowsbothpredeclaredandexecution-timeaccessinfor-mation.Bothreadandwriteaccessescanbespecified,butforsimplicityofdiscussionweconsideronlywrites.Thealgorithmusesatokenperobject.Atokenhasaversionnumber,andtheversionnumbersformacontinuoussequenceofintegersstartingwith0.Theseversionnumbersareusedtotimestampupdateswhichmodifythatobject.Forexample,supposethecurrentversionnumberoftokenAis3,andtokenBis1.ThenifuserAliceissuesanupdate(say)whichmodifiesbothobjectAandB,thenafteracquiringbothtokens,thisupdatewillbe

timestampedwith((A,4),(B,2)).Aftertimestamping,thetokensarereleased,andthisupdateismulticasttoothers.Update

willbeexecutedatasiteonlywhenupdate(s)withtimestamps(A,3)and(B,1)havebeenexecuted.

Note,thattokensareonlyusefulfortimestamping,andarenothelduntilanupdatecommits.Therefore,despitetheirsimilaritywithlocks,theyarecalledtokens.Toensurenocontradictoryorderinginformation,tokensthatareacquiredtotimestampanupdatearenotreleaseduntiltheupdateisfullytimestamped.Thisissimilartothe2-phaselockingschemeusedindatabases,andthereforethealgorithmissaidtodo2-phasetimestampingor2PT.n2PT-irThisalgorithmisavariationonthepreviousone,andrelaxesthe2PTrequirementforupdatesbyreleasingthetokensimmediately.However,itonlyworksforexecution-timeaccessinformation.Forexample,againconsiderupdate

andanotherconcurrentupdate

whichalsomodifiesAandB,andassumethattoken

AisacquiredbeforetokenB.Suppose

acquiresthetokenforAbefore

andgetstimestamp(A,4).Now,if

immediatelyreleasesthetokenA,beforeacquiringtokenB,

willgettimestamp(A,5).Asexecution-time

informationisbeingused,theexecutionof(A,4)i.e..Thisimpliesthat

willnowblock,untilitssitehasexecutedtheupdatewithtimestamp

doesnotgettothatstageofitsexecutionwhereitaccessesB,untilisexecuted

atitssite(Note,thatthiswouldnotbetrueiftheexecutionofeachupdatecausedmultiplenotifications,whereeachnotificationcorrespondstothenewvalueofanobject,tobesentout).Therefore,thereisnodangerthat

canacquiretokenBbefore,whichwouldcauseacycleintheorderingrelation.

6.5Hybrid

DECAFTheDECAFsystem[20]isthefirsttorecognizetheneedtoconsiderbothdataandviewconsistency,andprovidesalgorithmsthatmanagereplicatedobjectsandtheirviews.Itprovidesadeclarativemodelinwhichsharedobjectscanbecomposedfromtheprimitivesharedobjectslikeinteger,realandstring,providedbytheirframework.Sharedobjectsareupdatedusingtransactions(onlyexecution-timeinformationisused),withtherestrictionthattheissuerhavereplicasofalltheobjectsaccessedbythetransaction.However,unlikethepreviousalgorithms,othersitesmaynothavealltheobjectsaccessedbyatransaction.Therefore,thealgorithmdistributes

18

theeffectandnottheactualtransactiontotheothersites.

Theunderlyingdataconsistencyprotocolisoptimistic.Theissuingsite,say,executesthetransactionand

assignsitatimestampusingaLamportclock.Itthensendstheeffectandthistimestamptoallothersites.Thereisaprimarysiteforeachobjectwhichvalidatesthetimestampforatransactionwhichreadsorwritesthatobject.Theissuingsitewaitsforvalidationresponsesfromtheprimarysitesofalltheobjectsthetransactionreadorwrote.Thisvalidationgivessomedependentpropertiestothebasicallyindependenttimestamp.Whenalltheprimarysitessay`yes`,thetimestampisapproved,andthis`ok`issentbytoallsiteswhichhadbeensentthetransactions

effect.Ifevenoneprimarysitesays`no`,thetransactionisabortedatandatallothersites.Then,hastoretryby

executingandtimestampingagain.Inthecasethatatimestampisapproved,i.e.asitehasreceivedan`ok`messagefrom,thesitegoesandcheckswithalltheprimarysitestoensurethatithasnotmissedanoperationwhichhad

beenorderedearlier.ThisavoidstheneedtowaitforcommitusingtheusualLamportclockrules.However,itaddsaround-tripcommunicationwitheveryprimarycopybyeverysitethatreceivedan`ok`message.Evenif

istheprimarycopyforalltheobjectsaccessedbyitstransaction,thelowerboundonUUTforapessimisticviewis

.If

isnottheprimarycopyforevenoneoftheobjectsaccessedbyit,thelowerboundonRTforapessimistic

viewis

.Anotherdrawbackofthisapproachisthatthereisnoupperboundonthenumberoftimesatransaction

mayabort.Concurrenttimestampingoftwoconflictingtransactionscanalsocausebothtoabort.Eachtransactionrequiresaminimumoftwomulticasts,onetodistributethetimestampedoperation,andthesecondtoinformaboutabortorcommit.Overall,thebehaviorofthealgorithmisquitecomplex.

7

7.1

MiscellaneousIssues

OperationTransformation

OperationtransformationwaspioneeredbyEllisandGibbsinGrove[9],andisusedforoptimisticexecution.Cormackformalizeditin[7],andithasbeenusedinnumeroustexteditors[21]andaspreadsheet[15].

Thisapproachdealswithtwoproblems.Firstly,becauseofconcurrentissueofupdates,someoperationsmayneedtobetransformedtopreservetheintentoftheuser.Forexample,changingthepositionofinsertionordeletioninatextbufferwhenthereareconcurrentinsertsordeleteswhichareorderedearlier.ThiscorrespondstotheafterfunctioninCormack'scalculus,andisatechniquewhichisusefulforanysituationinwhichoperationsareissuedconcurrently(evensinglecopysystems).Secondly,itisusedtoprovideconvergenceofreplicasinoptimisticexecutionwithouttheneedforundoandredo.ThiscorrespondstothebeforefunctioninCormack'scalculus.

Theprimaryadvantageofprovidingreplicaconvergencethroughoperationtransformationisthepotentialforfasteroverallexecutionatasite,bysubstitutingtransformationforoperationundoandexecution.Supposeanoperation

isreceivedatasitewhere

operationsthatwereissuedconcurrentlywith

havealreadybeen

19

executed.Cormack'salgorithmrequiresatotalof

transformations,ofwhich

aredoneon,and1eachon

the

operationsthathavealreadyexecuted.Thenthetransformed

willbeexecuted.Nowconsideranundo/redo

approach,wherewassupposedtobeexecutedbeforethelast

oftheseoperations(avalidaveragecase

assumption).Firstly,thelastfunction.Then,thelastlast

areundone.Then,

istransformedwithrespecttothefirst

usingtheafter

operationswillbetransformedwrtto

usingtheafterfunction.Finally,

andthe

operationswillbeexecuted.Therefore,theundo/redoapproachcausestransformations,

undos,and

operationexecutions.Incontrast,theoperationstransformationapproachcausestransformationsand

operationexecution.

However,transformationsasareplicaconvergenceapproachhavesomedisadvantages.Firstly,beforefunctions

canbehardtocomeupwith,andhavetobeprovedcorrect.Secondly,beforefunctionscannotimplementmanysemanticswhichundo/redocanbecauseoflossofinformation.Forexample,abeforefunctioncannotcompensateforanoperationthatdeletessomedata(likedeletingsomecharactersinatexteditor),becausethedatadeletedhasbeenlost.Incontrast,withundo/redo,theoperationcanstorewhatitdeletedandcanrestorethatwhenundone.

7.2AccessGranularity

Multigranularitylockinghasbeenusedindatabasesystems[3],andalsoincentralizedgroupwaresystems[14].Itsusefulnessliesinpreventingamismatchinlockgranularityandaccessgranularity.Thiscanbegeneralizedtomultigranularaccessinformationinthecontextoftheo-setabstraction.However,thebenefitsforreplicateddataarenotimmediatelyapparent.Considera2PTalgorithmwhichusesthismultigranularaccessinformationtoacquiretokens.Havingmanyfine-grainedlockscanreducethelocalityofacquiringtokensbyasite.Forexample,inthefirstsequencecompatibilitytablegivenin[14],whichcanbeusedfortexteditors,insertionofeveryconsecutivecharactercausesanewtokentobeacquired.Thisisacceptableforacentralizedsystem,butmaynotbeappropriateifacquiringatokenrequirescommunicationwithothersites.Also,inscenarioswithdata-dependentaccesspatterns,thegranularityofwhatwillbeaccessedbyanoperationisnotknownbeforehand,andsotoomanytokensmaybeacquired.

Despitethis,multigranularaccessisapromisingapproach,andhowtoadaptittoorderingprotocolsanditsperformanceinadistributedenvironmentneedsmoreinvestigation.

8Conclusion

Thealgorithmrequirementsforinteractivegroupwarearevaried,withtheapplicationscenario,andnetworklatencybeingtheprimaryfactorsaffectingthechoiceofalgorithmtouse.Despitethemanytypesofreplicamanagementalgorithmsproposed,thereisanunstatedcommonthemeinthem.Thispaperillustratesthecommondatashar-ingrequirementsofinteractivegroupwareandcomesupwithadataabstractionwhichisappropriateforthese

20

requirements.Wediscusswhyconventionalreplicateddatabasesarenotappropriate,anddescribeageneralalgo-rithmicstructureforimplementingthisabstraction.Theimportantdesignchoicesandcharacteristicsofalgorithminstancesaredescribed.Thisisusedtoroughlyclassifythealgorithmsdescribedinthegroupwareresearchlitera-ture.Thegoalofthispaperisthatanunderstandingofthefundamentaldesignchoicesandtheirimplications,willstimulatemoreresearchinsuchalgorithms.Atthesametime,acommonterminologyfordescribingalgorithmcharacteristicswillhelpinalgorithmclassification.

References

[1]D.Agrawal,J.L.Bruno,A.E.Abbadi,andV.Krishnaswamy.Relativeserializability:Anapproachforrelaxingthe

atomicityoftransactions.InProceedingsofthe13thACMSymposiumonPrinciplesofDatabaseSystems,1994.[2]N.S.BarghoutiandG.E.Kaiser.Concurrencycontrolinadvanceddatabaseapplications.ACMComputingSurveys,

23(3):269–317,1991.[3]P.A.Bernstein,V.Hadzilacos,andN.Goodman.ConcurrencyControlandRecoveryinDatabaseSystems.Addison-Wesley,1987.[4]S.Bhola,G.Banavar,andM.Ahamad.Responsivenessandconsistencytradeoffsininteractivegroupware.InProceed-ingsof7thACMConferenceonComputerSupportedCooperativeWork,November1998.Toappear.[5]S.Bhola,B.Mukherjee,S.Doddapaneni,andM.Ahamad.Flexiblebatchingandconsistencymechanismsforbuilding

interactivegroupwareapplications.InProceedingsofthe18thICDCS,1998.[6]K.Birman,A.Schiper,andP.Stephenson.Lightweightcausalandatomicgroupmulticast.ACMTransactionson

ComputerSystems,9(3):272–314,August1991.[7]G.V.Cormack.Acalculusforconcurrentupdate.TechnicalReportCS-95-06,DepartmentofComputerScience,

UniversityofWaterloo,Ontario,1995.[8]C.CowanandH.L.Lutfiyya.Await-freealgorithmforoptimisticprogramming:Hoperealized.InProceedingsofthe

16thInternationalConferenceonDistributedComputingSystems(ICDCS),pages484–493,1996.[9]C.A.EllisandS.J.Gibbs.Concurrencycontrolingroupwaresystems.InProceedingsoftheACMSIGMOD',pages

399–407,19.[10]R.FriedmanandR.vanRenesse.Strongandweakvirtualsynchronyinhorus.TechnicalReportTR95-1537,Computer

ScienceDepartment,CornellUniversity,1995.[11]J.P.Granieri,J.Crabtree,andN.I.Badler.Productionandplaybackofhumanfiguremotionforvisualsimulation.ACM

TransactionsonModelingandComputerSimulation,5(3):222–241,July1995.[12]A.KarsentyandM.Beaudouin-Lafon.Analgorithmfordistributedgroupwareapplications.InProceedingsofthe13th

ICDCS,pages195–202,1993.[13]L.Lamport.Time,clocks,andtheorderingofeventsinadistributedsystem.CommunicationsoftheACM,21(7):558–

565,July1978.[14]J.MunsonandP.Dewan.Aconcurrencycontrolframeworkforcollaborativesystems.InProceedingsofthe6thACM

CSCW,1996.[15]C.R.PalmerandG.V.Cormack.Operationtransformsforadistributedsharedspreadsheet.InProceedingsof7thACM

ConferenceonComputerSupportedCooperativeWork,November1998.Toappear.[16]A.PrakashandH.S.Shim.Distview:Supportforbuildingefficientcollaborativeapplicationsusingreplicatedobjects.

InProceedingsofthe5thCSCW,1994.[17]M.RosemanandS.Greenberg.Buildingrealtimegroupwarewithgroupkit,agroupwaretoolkit.ACMTransactions

onComputerHumanInteraction,1996.[18]C.Schuckmann,L.Kirchner,J.Schummer,andJ.M.Haake.Designingobject-orientedsynchronousgroupwarewith

COAST.InACMCSCW'96,1996.

21

[19]S.Singhal.EffectiveRemoteModelinginLarge-ScaleDistributedSimulationandVisualizationEnvironments.PhD

thesis,Dept.ofComputerScience,StanfordUniversity,August1996.[20]R.Strom,G.Banavar,K.Miller,A.Prakash,andM.Ward.Concurrencycontrolandviewnotificationalgorithmsfor

collaborativereplicatedobjects.IEEETransactionsonComputers,47(4):458–471,April1998.[21]C.SunandC.Ellis.Operationaltransformationinreal-timegroupeditors:Issues,algorithmsandachievements.In

Proceedingsof7thACMConferenceonComputerSupportedCooperativeWork,November1998.Toappear.

22

因篇幅问题不能全部显示,请点此查看更多更全内容