Groupware
SumeerBholaMustaqueAhamad
CollegeofComputingGeorgiaInstituteofTechnology
Atlanta,GA30332
sumeerb,mustaq
@cc.gatech.edu
TechnicalReportGIT-CC-98-15
Abstract
Interactive(Synchronous)Groupwareencompassesawiderangeofapplications,likecollaborativewhiteboards,texteditors,engineeringCAD(ComputerAidedDesign),DistributedVirtualEnvironments,andmulti-playergames.Averycriticalrequirementforalltheseapplicationsistheneedtosharedata,whichcanbereplicatedtoprovidebetterresponsiveness,fault-toleranceandscalability.Despitetheexistenceofmanysystemsanddistributedalgorithmsforreplicationininteractivegroupware,thereislittleagreementontheshareddataab-stractiontoprovide,oronthegeneralalgorithmicapproach.Similarly,thereisgenerallackofuniformityintheterminologyusedtodescribesuchalgorithms.Thispaperattemptstorectifythissituationby(1)Describingthefundamentaldatarequirementsofinteractivegroupware,(2)Developingadataabstractionwhichisappropriateformeetingtheserequirements,(3)Developingageneralalgorithmicstructureandterminologytoexpressthecharacteristicsofalgorithminstances,and(4)Classifyingthealgorithmsproposedintheresearchliterature.
1Introduction
SynchronousorInteractive1Groupwareisarapidlygrowingareaofcommercialandresearchinterest.Thisin-cludesawiderangeofapplications,fromtraditionaloneslikewhiteboardsandtexteditors,toemergingoneslikeengineeringCAD(ComputerAidedDesign),DIS(DistributedInteractiveSimulation),andmulti-playergames.Byourdefinition,thetermInteractiveGroupwarestandsforanycollectionofapplications,thatallowapotentiallydistributedgroupofuserstocollaborateorcompeteonataskatthesametime.Averycriticalrequirementforalltheseapplicationsistheneedtosharedata,andthatisthefocusofthispaper.
Theinteractivenatureofsuchapplicationsrequiresthattheeffectofauser'sactionisseenbyhimself(responsetimeor
RT)
aswellasotherusers(usertousertimeor
RT
UUT)inatimelymanner.However,theproblemof
providingappropriateand
UUT
isincreasinglydifficultasgroupwareisdeployedinawide-areadistributed
environmentliketheInternet,wherehighcommunicationlatenciesarecommon.Inparticular,end-usersareincreasinglyaccustomedtodirectmanipulationuserinterfaces,whichtypicallyrequireresponsetimesontheorderof50-100ms.However,duetothefundamentallimitationofthespeedoflight,theround-tripdelaytothefarsideoftheplanetisatleast200ms.Othercausesofhighermessagelatenciesaretheuseofmobilewirelesscomputers,andwhenconnectingfromhomethroughmodems.
Thoughtomake
RT
UUT
byitsverynaturehastodependonthemessagelatencybetweenapairofusers,itispossible
lessdependentonmessagelatencybyusingreplicationoftheshareddataatusersites.Thishasthe
potentialtoreduceresponsetimeforactionsthatreadormodifythisdata,sinceauser'sactioncanbeexecutedonherlocalreplica.Thereisgeneralagreementintheresearchcommunityontheneedforreplicationascanbeseenbythelargenumberofgroupwaresystems[9,12,17,16,18,20,5]thatsupportreplication.Inadditiontoreducing
RT,
replicationcanalsoprovideresiliencetofailures,whichcanbecommoninadistributedenvironment.Finally,
manyfactors,includingsmallersizeofmessagesandbatchingofmodifications,canallowreplicationtoconsumefewernetworkresourcesthanacentralizedapproach.
Datacanbereplicatedatallthesiteswhichhaveusersinterestedinthatdataoratsomestrategicallylocatedsites,dependingonthememoryresourcesavailable.Notethatreplicationatallinteresteduserssitesdoesnotconsumemorenetworkresourcesthanreplicationatonlysomesites,asnotificationsaboutthechangeinstateofthedatastillneedtogotoallthesesites,toupdatetheirviews.Therefore,inthispaperwewillonlybeconcernedwithreplicationatallinterestedsites,i.e.fullreplication.
Onecommonmisconceptionaboutreplicationisthatitmakeslifeharderfortheapplicationprogrammer.Weillustratewhythisisnotsobyconsideringthecommonwayinwhichtheseapplicationsarestructured.Theappli-cation,runningonacertainsite,receivesauseractionexpressedasawindowingsystemevent,andtranslatesthis
S
l
C1 C2 C3
Figure1:TotalOrderingthroughaServer
intoanoperationthatneedstobeexecutedontheshareddata.Itthenissuesthisoperationtotheunderlyingdatamanagementsystemwhichexecutesthisoperationonthelocalreplicaandtheotherreplicas.Additionalcomplex-ity,ifany,fortheapplicationprogrammer,comesfromthepotentialforconcurrency.Inparticular,betweenthetranslationstepandexecutionstepofthisoperation,anotheroperation(probablyonethatisconcurrentlyissuedbyanothersite)canbeexecutedontheshareddata.Andthiscanhappeneveninsinglecopysystems!Forexam-ple,considerthetexteditingexampleusedin[9].Thetextbufferisrepresentedasanarrayofcharacters,and2concurrentoperationsareissuedtoinsert
charactersatpositions5and10respectively.ifthefirstoperationgets
toexecuteearlier,thesecondinsertshouldhappenatposition
.Operationtransformationswereproposed
byEllisandGibbsin[9](andformalizedbyCormackin[7]),tosolvethisproblemofpreservingtheintentoftheuser,andalsotoguaranteereplicaconvergence.Wewilldiscussbothaspectslaterinthepaper(Section7.1),andassumefornowthatnooperationtransformationsareneeded.
1.1TheSimplestReplicationApproach
Withreplicationcomestheassociatedproblemofmaintainingthereplicasconsistentwithrespecttoeachother.Thesimplestapproachistoensurethatalloperationsthatmodifythedatagetexecutedinthesameorderatallreplicas,byorderingthemthroughaserver(asdepictedinfigure1).Read-onlyoperationscanstillbeexecutedinstantaneouslyatthelocalreplica.Despitethesimplicityofthisscheme,itsuffersfromsomemajordrawbacks.Firstly,assumingthattheone-waycommunicationlatencybetweeneverypairofsitesis,the
RT
and
UUT
is
.Thismaybetoohighinmanysituationswhere
islargeorwidelyvarying.Secondly,thesystemisnot
fault-toleranttoserverfailure,andtheservercanbeaperformancebottleneck.Finally,applicationscannotbatchoperationsbeforesendingtotheserver,asitwouldaffectRT.Batchingofoperations[5]canbeveryusefultoreducenetworktrafficwhenotherusersarenotreallyinterestedineachotherswork.
Thealgorithmsproposedforsynchronousgroupwareallreducebysomedegreethedependenceontheserver,i.e.thealgorithmsaremoredistributed.MostofthemhaveabestcaseRTandUUTthatislessthan
,butwhether
theyprovideanaveragecaselowerRT,UUTthantheserverbasedapproachdependsonthedistributionofthe
inter-sitenetworklatenciesandtheapplicationscenario.Thoughevaluationofsuchalgorithmsisnotcommon,andisnotthefocusofthispaper,someinitialworkisdescribedin[4].
2
Althoughmanysystemsanddistributedalgorithmshavebeenproposedforreplicationingroupware,thereseemstobelittleagreementontheshareddataabstractiontoprovide,oronthegeneralalgorithmicapproach.Similarly,thereisgenerallackofuniformityintheterminologyusedtodescribesuchalgorithms.Researchershaveapproachedtheproblemfromvariousangles,somecomingfromadatabasebackground,somefromageneraldistributedcomputingbackground,andsomefromhighperformancecomputing(forexample,distributedsharedmemory).Eachoftheseapproacheshasmissedsomecriticalneedofgroupware.Thefirstgoalofthispaperistoillustratethedatasharingrequirementsandcomeupwithadataabstractionwhichcapturesandextendsthepoweroftheexistingabstractions.Next,wedescribeageneralalgorithmicstructureforimplementingthisabstraction,anddiscusstheimportantdesignchoices.Thisisusedtoroughlyclassifythealgorithmsdescribedinthegroupwareresearchliterature,eventhoughsomeimplementadifferentabstractionthantheonewepropose.Thegoalofthispaperisnottoclaimthatalltheproblemshavebeensolved,ortostiflealgorithmresearchfortheseapplicationsbyrestrictingthemtoacertaingeneralstructure.Onthecontrary,wehopethatabetterunderstandingofthedesignspace,andacommonterminologyfordescribingtheworkingofthesealgorithms,willstimulateresearch.
Section2illustratesthedatacharacteristicsoftheseapplicationsusingtwoexamples.Thesecharacteristicsareusedtomotivatethesetofobjectsabstraction,whichisdescribedinSection3.Section4discusseswhyreplicateddatabasescannotbeeasilyadaptedtomeetourrequirements.Section5describesthegeneralstructureofthealgorithms,includingtheglobalorderingandtimestampingproperties,andtheissueofreplica-viewcoupling.Thisisusedinsection6toclassifymanyalgorithmsproposedinthegroupwareresearchliterature.Section7discussescertainsecondary,butimportantissues,includingoperationtransformation,andgranularityofaccess.Finally,weconcludeinsection8.
2SharedDataCharacteristics
Thissectiondescribesthecharacteristicsoftheshareddata.Wefirstillustratewithsomeexamplesofapplicationscenarios.
2.1ApplicationScenarios
Thefirstexampleisofthecollaborativedesignofakey-frameforacomputeranimation,whichcanbepartofthedevelopmentofananimatedmovie,orthedevelopmentofthevisualbehaviorofacomplexentity(e.g.avirtualhuman)inaninteractivesingle-user/multi-uservirtualworld.Thesecondexampleisofsuchadistributedvirtualworldinaction,specificallyofinteractionbetweendistributedentities,someofwhichcouldbeunderusercontrolandsomeautonomous.
Thesetwoexamplesillustratehowmulti-userinteractioncanbeimportantforallstagesofagroupwareappli-cation,fromdesignanddevelopmenttoactualuse.
3
Figure2:ArticulatedModels:StickModelofHuman,LeonardodaVinci'sWing
Key-frame/Snapshotdesign
Weassumethatthekey-framedesigninvolvesarrangementofmultiplerigiden-
titiesinacertain3-dimensionalspace.Inadditiontosimplerigidentitieslikeatableorabottle,thereareothermorecomplexentitieslikehumansorsyntheticcreatures.Suchcomplexentitiesarerepresentedbyanarticulatedmodel[11].Anarticulatedmodelisacollectionofobjectsconnectedtogetherbyjointsinahierarchical,tree-likestructure.Figure2showsasimplestickfiguremodelofahuman,whichcanbeinternallyrepresentedasanartic-ulatedmodel.Rotationabouttheelbowjointinthismodelwillaffectnotonlythepositionofthelowerarmbutalsothepositionoftheobjectsbelowitinthehierarchyi.e.thehandandfingers.Eachjointhascertaindegreesoffreedom,andforacertainaxisofrotationthereisaconstraintontherelativeanglebetweenthetwosticksconnectedbythatjoint.Forexample,theelbowjointhas2degreesoffreedom,witharangeofrotationfrom0to180degreesforbothaxes.
Akey-framedesigncaninvolvemanytaskssuchas,designinganarticulatedmodelofacomplexentitybyaddinganddeletingsticksandjoints,addinganddeletingentitiesfromthe3Dspace,translatingentitiestocertainpointsinspace,andorientingentitiesorpartsofentitiesbyrotatingaboutjoints.DistributedVirtualWorld
Adistributedvirtualworldhasmanyentities,someunderdirectusercontrol,some
autonomouswithprogrammedbehaviorpatterns,andsomeonlycontrolledbythelawsofphysics2(e.g.atennisball).Seechapter1of[19]foragoodintroduction.Mostentitieshaveaprimarysitewhichcontrolsthatentity,forexample,forausercontrolledentitytheprimarysiteisthesiteofthatuser.Primarysitesforautonomousentitiesmaybechosensoastobalancetheload.Asitecontrollinganentitycomposedofmultipleobjects(e.g.acomplexentity)hastoupdateasubsetoftheseobjectsinresponsetouserinput,orprogrammaticcontrol(e.g.tosimulateahumanwalking).Inaddition,non-primarysitescanissueupdatestoanentityinthecaseofentityinteraction,for
example,collisionbetweenentities.Properorderingbetweensuchconcurrentupdatesisimportanttodeterminethefuturebehavioroftheinteractingentities.Forexample,thecollisionofatennisballwitharacquetmustbeidenticallyobservedatallsites,includingthevelocityofeachentity,andtheexactpointofimpact.Entitiesmayalsobecreatedanddeleted,forexample,firingofamissilecausescreationofanewmissileentity,andafterhittingatargetthisentityisdeleted.
2.2Characteristics
Thefollowingarethedatacharacteristicsofgroupware.
1.Dynamicstateandsizeofshareddata:Theamountofdatabeingsharedischangingduringthecollab-oration,withnewdataitemsbeingaddedandexistingonesbeingdeleted.Also,thestateofthisdataiscontinuouslychanging.
2.Atomicityofaccess:Userscanatomicallyreadandwriteasubsetofthisdata.Forexample,rotatinganelbowjointchangesthecoordinatesofthelowerarm,handandfingers.Byatomicity,wedonotmeanfailureatomicityasintransactions,butthattheexecutionofanoperationisnotinterleavedwithotheroperationsatareplica.Failureatomicity,asimpliedbydatabases,isnotneededbecausereplicationgivesusfaulttolerance.Also,asthereplicasarenotpersistent,wedonotneedaheavyweightdistributedcommitprotocollike2-phasecommit.
3.Data-dependentaccesspatterns:Insomecases,thesubsetoftheshareddatathatisaccessedbyanatomicoperation3dependsonthestateofthedatawhentheoperationisexecuted.Forexample,assumethatwhilethearmofahumanmodelismoving,thehandgrabsadoor.Futureupdatesonthearmwillalsoneedtoupdatethepositionofthedoor,andthedegreesoffreedomofthedoorcanconstrainthedirectioninwhichthearmcanmove.
Despitesuchdata-dependentaccesspatterns,inmanycasesitispossibletoquiteaccuratelypredicttheaccessinformationwhenanoperationisissued.
4.Short,IncrementalOperations:Theexecutiontimeforeachoperationisshort,asuserswanttoseetheresultsoftheactionsofotherusers.
5.Commutativity:Manyoperationsmaybecommutative,andexploitingthecommutativityofoperationscanallowthedatamanagement/consistencyalgorithmtorelaxtheorderinwhichoperationsareexecutedatreplicas.Operationsarecommutativewhentheyaccessdifferentpartsofthedata,or,eveniftheyaccessthesamedata,theiraccessescommute.Forexample,inanarticulatedmodelofahuman,concurrentrotationsabouttheelbowandthewristarecommutative(iftherearenoexternalconstraintsonthemotion).
shared datasink=source=x=500handleUpdate(){ source.sub(x); sink.add(x);}updatename=‘Alice’checking=savings=amount=400amount=2000Figure3:Updatingano-set
3SetofObjectsAbstraction
Asetofobjects(referredtoasano-set)abstractionisusedtorepresentacollectionofrelatedshareddataobjects.Byobjects,wedonotnecessarilymeanobjectsintheOO(Object-Oriented)sense,wemeanthattheyarecompositei.e.composedofprimitiveobjectslikeinteger,realetc.Theseobjectscanhavepointerstootherobjectsinthesamesetandthesepointerscanformcycles.Pointersprovideaconvenientwayfortheapplicationprogrammertoexpressshareddata-structures.Sitesinacollaborationcandynamicallyconnectordisconnectfromano-setandobjectscanbeaddedtoanddeletedfromtheo-set.Thesitesthatarecurrentlyconnectedtotheo-setarereferredtoasparticipantsintheo-set.Ano-setisfullyreplicatedatalltheparticipantsites.Partialreplicationofano-setisoutsidethescopeofthispaper.Objectsaremodifiedusingupdatesthatcanreadandwriteasubsetofobjectsintheo-set.Themainoperationsonano-setare:
addObject(Objecpointerstoeachother.
t[]o)Addsoneormoreobjectstotheo-set.Theobjectsbeingaddedcanhave
deleteObject(Obbeingdeleted.
ject[]o)Deleteoneormoreobjectsfromtheo-set.Theapplicationisrespon-
sibleforensuringthattherearenodanglingpointersfromtheremainingobjectsintheo-settothesubset
updateSet(Updatee)Thisissuesanupdatetotheobjectsintheo-setwhichcanreadandwritea
subsetoftheseobjects.Theupdateobject,,encapsulatestheactualupdatei.e.ithasthestateandlogictocarryouttheupdate.Thisstateincludespointerstocertainobjectsintheo-set,andtheupdatecanonlyaccessobjectsreachablefromthesepointers.ThehandleUpdatemodifiestwoofthethreesharedobjectsintheo-set.
Thesethreeoperationsareasynchronous,i.e.theyarescheduledforexecutionlocallyandremotelyatsometimeinthefuturebytheunderlyingreplicamanagementalgorithm.Thedeleteoperationrequirescoordinationbetween
6
methodof,whenexecuted,performs
theactualupdate.Figure3showsanupdateobjectwithitsassociatedhandleUpdatemethodwhich
processesbeforethedeleteisdone,toensurethataprocessdoesnotreceiveanupdateonanobjectthathasalreadybeendeleted.Inadditiontotheabovethreeoperations,thereareread-onlyoperationswhichareonlyexecutedonthelocalsite.Theseareneeded,forexample,torefreshtheviewonthedata(theGUIshowingthedatatotheuser).Welimitourdiscussiontoonlytheaddandupdateoperations.Notethatanapplicationmaybeconnectedtomultipleo-sets,butthatdifferento-setsdonotshareobjects.
3.1AccessInformationforUpdates
Asupdatesmaybeissuedconcurrentlybymultiplesites,theunderlyingreplicamanagementalgorithmneedstoensuresomeorderingoftheseupdates,sothateachreplicastateisconsistent.Wewilldiscussconsistencyindetailinsection5,butatminimumwerequirethatthereplicasshouldconverge.
Toensureconvergence,theconsistencyalgorithmneedstoknowwhatobjectsareaccessedbyanupdate,andhowtheyareaccessed.Itisusefultoallowtwooptionsofspecifyingthisinformation.Predeclaredaccessinformationisprovidedbyanapplicationwhenitissuesanupdate,andgivesasupersetoftheobjectsthattheupdatemayaccess.However,fordata-dependentoperations,theaccessinformationcanbehardtoaccuratelypredict.Inthiscase,predeclaredinformationisnotprovidedandtheupdatehastobeexecutedtodetermineaccess.Astheupdateexecutes,wheneveritisgoingtoaccessanobjectinsomewayithasn'taccessedbefore,itnotifiestheconsistencyalgorithmbyinvokingamethod.Whenthenotificationmethodreturns,theupdatecancontinuewithitsexecution.Werefertothisasexecution-timeaccessinformation.
Wewillonlyconsiderread,writeaccessinformationfortheobjectsi.e.theapplicationorupdateinformstheconsistencyalgorithmaboutwhichobjectsareread,andwhicharewritten.Thismayseemveryrestricted,becauseitseemsnottocapturecommutativityandothersemanticswhichcanbeusedtorelaxtheoperationordering.Forexample,twoconcurrentoperationsthatincrementthesameintegerwillaccessthatintegerinwritemode,andthereforewillbeordered.However,wewillseelaterthatduetothenatureofthegeneralalgorithmicstructure(specificallythatoperationsarenotinterleavedatareplica,andonlyonenotificationissentperoperation),itispossibletospecifyaccessinformationthatisfactuallyincorrect,butenoughtoguaranteeconvergence.Forexample,intheabovecasewecanspecifythatthetwoincrementsarejustreadingtheinteger.Anotheroperationwhichjustoverwritestheintegervaluewithanewvaluecanbespecifiedtobewritingtheinteger,andthisisenoughtoguaranteeconvergence.
Generalizationbeyondread,writeaccessescanbedonebydefiningacompatibilitymatrixwhichexpresseswhichcombinationofaccessesconflict.Section2.5of[3]discussesthisissue,andisalsoproposedforcollabora-tiveapplicationsin[14].
7
3.2DecouplingAtomicityandLocking
Intheo-setabstraction,anoperationencapsulatestheunitofcomputationthatshouldbeperformedatomically.Accessinformation,suppliedintwopossibleways,isusedbythereplicamanagementalgorithmtoprovidecon-sistencybyimposingacertainorderingontheoperations.WewillrefertosuchalgorithmsasOrderingalgorithms.
Thisapproachisdifferentfromexplicitlockingbytheapplicationtoachievebothatomicityandconsistency.Theexplicitlockingapproachisundesirablebecauseofseveralreasons.
1.Thesemanticsofthelocks(forexample,optimisticorpessimisticlocks)usedhavetobefixedwhentheapplicationiswritten.Thisdrasticallyreducestheflexibilityinchoosingaconsistencyalgorithmatrun-time.Thefollowingexampleshowshowthereceiptofawindowingeventishandledwithexplicitlockingwhenusingpessimisticlocks.
while
(e=getEvent
(e.type)
{())
{
switch...case
MOUSE_DOWN:
lock(x);...lock(y);...unlock(y)unlock(x)...}}
;;
//modify
y
//modify
x
2.Changingthesemanticsofthelockscanrequireasignificantrestructuringoftheapplicationcode.Thefollowingexampleshowshowtheabovecasewouldbehandledwithoptimisticlocks.
case
MOUSE_DO
);
//modify);
//modify
yx
WN:
optLock(x...optLock(y...
if(!assump...}
if(!assump...}unlock(y)unlock(x)
;;
tionValid(x)){//recovery
code
tionValid(y)){//recovery
code
8
3.Theapplicationdeveloperhastobeawareofthepotentialfordeadlocksiflocksonmultipleobjectsareacquired.
Incontrast,ascomputationisencapsulatedasoperations,andoperationsareveryshort-duration,anorderingalgorithmcanundo(byjumpingbacktoanearliercheckpointedstateorusinganundooperation)andredoanoperationasmanytimesasneeded.Undoandredoisimportantforoptimisticalgorithms,andtobreakdeadlocksbyabortingexecution.Thisapproachisalsodifferentfromgeneraloptimisticschemesindistributedsystems,likeHOPE[8],whichcannotmakeassumptionsaboutshortencapsulatedoperations,andthereforeprovidelanguage-levelconstructstoexpressoptimismandrecovery.
4WhynotReplicatedDatabases?
Lookingatthedatacharacteristics,itmayseemthatupdateoperationsaresimilartotransactions,andthatatraditionalreplicateddatabaseshouldsufficefortheseapplications.Indeed,someideascanbeborrowedfromreplicateddatabases,buttherearemanykeydifferenceswhichleadtodifferentalgorithmicdesignchoices.
Shortandincrementaloperations:Thisresultsinthefollowing
1.Onenotificationperoperation:Asoperationsareofshortduration,thesourcesitecansendonlyonenotificationforthatoperation.Thisgivesusaninterestingandveryimportantchoice,whichisnotavailableinmostreplicateddatabases;thatofsendingtheoperationitselfinsteadofitseffect.Byeffectwemeanthatonlythewrites,i.e.thepartsofobjectswithnewvalues,arepropagated.Thisisextremelyimportantforoptimisticprotocolsbecauseitdecouplesundosatthesourceandothersites.2.Noexplicitaborts:Theapplicationsdonotabortanyoperation.Anyapplicationlevelundoofanoperationisdonebyissuinganotheroperation.
Non-persistentData:Theunderlyingalgorithmcanensurethatifonereplicasitereceivesanoperation,anddoesnotcrashforasufficientamountoftime,everyotherreplicasitethatisrunningwilleventuallyreceivethatoperation.Thisallowsustolocallycommitanoperation.Localcommitwouldbeproblematicifexplicitabortswerepermitted,orifthedatabeingmanagedwaspersistent,whichisnottrueinourcase.Persistenceisoutsidethescopeofthealgorithmsconsideredinthispaper.However,userscansavetheirreplicatoapersistentstore.Andthiscanbeusedlaterfornoncollaborativework,ortoinitializeothero-sets.Unlikeinconventionaldatabases,webelievethatleavingthistotheapplicationisacceptable.
Operationexecutionnotinterleavedatareplica:Thisisunliketraditional(non-replicated)databases,whichdealwithconcurrentoperationexecution,andallowssimplificationinalgorithmdesign.Moreimportantly,thisimpliesthattheoperationaccessinformationisonlyusedfordecidingonanoperationorderingthat
9
resultsinreplicaconvergence.Therefore,theaccessinformationprovidedtothealgorithmcanbefactuallyincorrect,butenoughtoguaranteeconvergence.Forexample,considerrotationabouttheelbowjointforanarticulatedhumanmodel.Assumethattheanglesofthejointsbelowtheelbow,inthetreerepresentationofthearticulatedmodel,arenotaffectedbytherotation.Eventhoughthisrotationischangingthepositionsofthesejoints,itiscommutativewithotherrotationsaboutjoints.Onlyamodificationtotheconstraintontheelbowjointisnotcommutativewiththisrotation.Therefore,thisupdateneedstoonlysaythatitaccessedtheelbowjoint.However,ifoperationswereallowedtointerleavethisaccessinformationwouldnotbeenough.
PredeclaredorExecution-timeaccessinformation:Unlikedatabaseswhichprimarilyuseexecution-timeaccessinformation,manygroupwareapplicationscanquiteaccuratelypredicttheaccessesoftheirupdates(especiallyinlightofthepreviousbullet).Somedatabasesdousepredeclaredinformationasasupplementtoexecution-timeinformationasitallowsthemtorelax2-phaselockingbyreleasinglocksearly.Predeclaredaccessinformationallowsustototallydecoupleorderingandexecution,whichcansignificantlyreduceabortsandspeedupcommitincertainalgorithms.
Databaseresearchershavelookedattherequirementsofasynchronousgroupware[2,1],andproposedsome
solutions,andsoitisinstructivetounderstandthedifferencesbetweentheirapproachandtheonewearguefor.Theirresearchhasprimarilylookedathowtoallowcooperationbetweenlong-livedtransactions.Thisrequiresrelaxingtheatomicityrequirementoftransactions.Synchronousgroupwareiscompletelyopposite,withveryshort-livedatomicoperations.Thiscanbepartlyexplainedbytheobservationthatsharingbetweenparticipantsinasynchronousgroupwareisrarerandatcertainwelldefinedpoints.Forexample,checkingintheupdatedsourcecodeofacodemoduleisnotfrequent.Thesewell-definedpointsusuallycorrespondtocommitofalong-livedtransaction.Theneedforcooperationbetweenthesetransactionsarisesbecausesometimesatransactionhasnotcommittedatasharingpoint,butanothertransactionneedstoreadsomethingwrittenbythisuncommittedtrans-action.Forexample,checkinginacertaincodemoduleneededbyanotheruser,whichcorrespondstoreleasingalockearlyinalong-livedtransaction.Also,cooperationbetweentransactionsisusuallytriggeredbytheusers.Anotherwayoflookingatthedifference,isbyconsideringthecooperationbetweenusersascontentionbetweentransactions/operations.Itseemsthatcontentioninasynchronousgroupwareisrarer,butcouldlastforalongerduration.Therefore,userinterventionisfeasible.Incontrast,contentioninsynchronous/interactivegroupwaremaybemorefrequentandtransient.Therefore,thealgorithmshavetoadapttothesesituationswithminimumuserintervention.
10
SITE 1User A
action
View
TranslateIssueOperation
SITE 2User B
action
View
Translate
IssueOperation
FlexibleCouplingReplicaExecuteOperation
ReplicaExecuteOperation
Consistency AlgorithmFigure4:SystemArchitecture
5GeneralAlgorithmicStructure
Figure4showsasimplifiedviewofthesystemarchitecture.Theuserperformsanactionwhichistranslatedintoanoperationbytheapplication4andthenissuedtotheconsistencyalgorithm,whichschedulesitforexecution.Read-onlyoperationsareonlyexecutedonthelocalreplica,whileothersareexecutedatallreplicas.Thenotificationfromthereplicatotheviewdependsontheircouplingandisdiscussedinsection5.4.
5.1GlobalOperationOrdering
Mostalgorithmsdefineaglobalorderingontheoperationsthatmodifytheshareddata,bytimestampingthemattheissuingsite.Aftertimestamping,anoperationisdistributedtoallsitesincludingtheissuer,whichthenexecuteit.Thisisasimplifiedstructurewhichisusefulfordescribingthehigh-levelcharacteristicsofalgorithms.However,wewillseemanyvariationslater,forexample,insomecasestimestampingandexecutionatthesourceoverlap,inothersituationsthetimestampingmayabortetc.Wealsoassumethattheoperationitself,andnotitseffect,isdistributedtoallsites.Wediscussthisassumptioninsomedetailinsection5.3.
Theglobalorderingdefinedbythetimestampscanbeatotalorder,butthisisusuallyunnecessary.Thereforeweassumethattheglobalorderingisapartialorderi.e.theorderingrelationisirreflexiveandtransitive(atotalorderisaspecialcaseofapartialorder).Ifasiteexecutesoperationsinatotalorderthatrespectsthisglobalorder,itiscorrect.Theglobalorderingshouldsatisfycertainproperties,forexample:
1.ConvergenceProperty:Executionoftheoperationsinanytotalorderthatrespectstheglobalorderwillresultinthesamestate.Thispropertyisnecessaryforthereplicastoconverge.
2.ProcessOrderingProperty:Operationsissuedbyasite/processaregloballyorderedintheordertheywereissued.
3.CausalOrderingProperty:Theglobalorderingsatisfiescausality.Allthealgorithmsdiscussedinsection6satisfythefirsttwoproperties.
Anoperationcommitsatasite,when
knowsthatithasexecutedalloperationsprecedingintheglobal
order.Ifasiteexecutesoperationsonlyaftertheycommit,itisguaranteedtobecorrect.Thisiswhereitisimportanttodistinguishbetweenoptimisticandpessimisticalgorithms.Apessimisticalgorithmonlyexecutesanoperationat
afteritcommitsat,thereforeitisguaranteedtobecorrect.Anoptimisticalgorithmdoesnotwait
forcommit,howeveritmighthavetoreorderoperationexecution,byundoingandredoingoperations,whenitnoticesthatithasmadeamistake.Ifthisreorderingcausessurprisetotheuser,itissaidtohavecausedjitterintheview.Ofcourse,insomecases,surprise,andhencejitter,issubjective.However,themainreasonwhyoptimisticalgorithmsarenotalwayssuitableisthepotentialforjitter,whichmaynotbeacceptabletotheuserincertainsituations.Asecondaryreasonisthatifalotofreorderingisnecessary,itmaybecomputationallytooexpensive.Notethatthisreorderingistotallylocaltoasite.
Anotherdesignchoiceforoptimisticalgorithmsistoonlydetectpotentialdivergenceofreplicas,andtoprovidemechanismstotheuserstomanuallyfixit.
5.2NatureofTimestamping
Orthogonaltotheoptimisticversuspessimisticchoiceisthenatureofthetimestampsassignedtotheoperations.1.AccessIndependent,Dependenttimestamps:Independenttimestampsdonotdependonthedataanoperationaccessesorthenatureoftheaccessi.e.readorwrite.Forexample,aglobaltotalorderingofalltheoperationscanbedefinedbyusingLamportclocks[13]andsiteidentifierstotimestampoperations.Independenttimestampsbydefaultdefineatotalorder.Dependenttimestampsusetheaccessinformationofanoperationfortimestampingandthereforecandefineapartialorderwhichisnottotal.Aweakerorderingcanleadtofastercommitiftheoverheadtocomeupwiththisorderingisnottoohigh.
2.TimestampPrecision:Thispropertyisusefultodeterminewhenanoperationhascommitted.Informally,atimestampisprecisewhenbyknowingthetimestampsofthealreadyreceivedoperations,andbylookingatthetimestampofanewlyreceivedoperation,say,asitecanaccuratelydeterminewhetherthereare
anyoperationspreceding
intheglobalorderthathavenotyetbeenreceived.Theadvantageofprecise
timestampsisthatasiteneedstoonlyreceivemessagesfromsiteswhichhaveissuedoperationspreceding
intheglobalorder.Therefore,atemporarilydisconnectedsitecannotslowdowncommitofothersites.
Usually,animprecisetimestampimpliesaneedtocommunicatewitheveryoneforcommit.TimestampsbasedonLamportclocksareimprecise,becauseclocksatdifferentsitesmaygeneratethesametimestampforconcurrentlyissuedoperations.However,ifacentralserverwithacounterisusedtoassigntimestampsto
12
alloperations,byincrementingthecounterbyoneaftereachtimestampassignment,theresultingtimestampsareindependentandprecise.Vectorclocksusedfortimestamping,whichdefineacausalorderontheoperations,areprecise(However,justcausalorderingdoesnotalwayssatisfytheconvergenceproperty).3.Instantaneoustimestamping:Analgorithmdoesinstantaneoustimestampingifitneverneedstocommu-nicatewithothersitestoassignatimestamptoanoperation.Precisionandinstantaneousnessareatcrosspurposes.Optimisticprotocolsbenefitfrominstantaneoustimestampingasoperationscanbequicklytimes-tampedanddistributed.
4.Aborts:Itispossibleforthetimestampassignmenttobeabortedincertainalgorithms(AnexampleisthealgorithmusedintheDECAFsystem,discussedinSection6).Timestampingofanoperation
can
beabortedwhenthetimestampingisonlypartiallycompleted,orafteritisfinished.Inthefirstcaseonlytheissuerisaffectedastheoperationhasnotyetbeendistributedtoothersites,andusuallyoccursfordependentandprecisetimestamps.Inthesecondcaseothersareaffectedtoo,astheoperationhasbeendistributedtoothersites.Timestampabortioncanincreasecommittime,andwhenusingexecution-timeaccessinformationcausesexecutionundoattheissuerwhichcanresultinjitter.
Ideally,wewantanalgorithmthatusesdependent,precisetimestampswithtimestampingthatisalwaysinstan-taneousandabort-free.Unfortunately,precisionandinstantaneousnessareconflictingrequirements.However,notalltimestampingpropertiesareequallyimportantinallinteractionscenarios,thereforeasuiteofalgorithmswhichsatisfydifferentscenarioshavebeendeveloped.
BeforediscussingtheactualalgorithmsinSection6,wediscusssomeotherissuesrelevanttoallorderingalgorithms.
5.3Distributingeffectorupdate
Intheabovediscussionweassumedthattheupdateoperationitselfwasdisseminated.Thismeansthatthelogicoftheupdate,i.e.thefunctionrelatingtheobjectvaluesreadtotheobjectvalueswrittenisdisseminated.Anotheroptionwouldbetoonlydisseminatetheeffect,i.e.theobjectvalueswrittenbyanupdate.Therearesomesituationsinwhichpropagationoftheeffectmaybenecessary.
1.PartialReplication:Assumethatano-setissomehowonlypartiallyreplicated,i.e.noteveryreplicasitehasalltheobjectsthataremembersoftheo-set.Althoughpartialreplicationisnotconsideredinthispaper,itcanbeimportantincertainapplications.Supposeasiteissuesanupdatewhichreadsobjects
and
and
thencomputes
and
.Ofcourse,thissitehasreplicasofboth
and.However,ifthis
updateisdirectlypropagatedtoanothersitewhichdoesn'thaveoneofthetwoobjects,itcannotbeexecutedthere,astheobjecttobereadisnotavailable.Insuchcases,disseminatingtheeffectbecomesessential.
13
2.Heterogeneity:Incertainheterogeneousscenariositmaynotbepossibletoexecuteanupdateatothersites,becauseofdifferentmachinearchitectures,operatingsystemsetc.However,thisisbecominglessofaproblemwiththewidecommercialavailabilityofcommonexecutionenvironmentsliketheJavaVirtualMachine.
However,disseminatingtheeffecthascertaindisadvantages.
1.CouplingRollback:Inanoptimisticprotocol,everysite,includingthesource,executestheupdateoptimisti-cally.Theeffectoftheoptimisticexecutionatthesourceiswhatisdisseminatedtoothersitesandappliedthere.Therefore,ifthesourcehastoundotheexecutionofthisoperation,allthesiteshavetoundotoo.Thiscoupledrollbackalsoincreasesnetworkcommunicationastheeffecthastobesentagain.
2.DelayingCommit:Inapessimisticprotocol,thesourcecannotsendouttheeffectuntilitexecutestheupdate.Anditcannotexecutetheupdateuntilitcommitsatthesource.Thiscyclicity,thoughnotadisaster,cancauseslowdownincommit.ConsiderthefollowingpathologicalcaseofaprotocolusingLamportclocks.Threeoperationsclockvalueis
areissuedconcurrentlybysites1,2and3,eachofwhosecurrentLamport
.Thereforeareassignedtimestamps
respectively,with
.Thefirstoperationtocommitwillbe
.Onlyafter
commitsanditseffectis
receivedbysite2,can
commitanditseffectbesentout.
willcommitonlyafterboththeeffectsof
and
havebeenreceived.Thesequentialcommitbehaviorinthisexamplegreatlyslowsdowncommit.
5.4Replica-ViewCoupling
Inreality,usersdonotseethestateoftheirreplicadirectly,butthroughviewsdisplayedonsomeoutputdevice(likeacomputerscreen),anditistheconsistencyandresponsivenessofthisview,andnotthedata,thatisrelevanttotheuser.Therefore,thenatureofthecouplingbetweenthereplicaandtheviewsisveryimportant.Instandardsingleuserapplications,wheneverthereisachangeinthestateofcertaindata,theview(s)attachedtothatdataarenotified,whichthendisplaythenewstate.Ifsuchanimmediatecouplingisusedinthepresenceofreplication,thecharacteristicsofthedataconsistencyalgorithmaredirectlyexposedtotheuser.Weakeningthecouplingcanbeusefulforhidingortransformingsomecharacteristicsofthesealgorithms.Forexample,anoptimisticconsistencyalgorithmmayneedtoreordersomeoperationsbyundoingandredoingtheirexecution.Itisusefultohidetheintermediatestagesinthisundoandredofromtheuser.Weakenedcouplingshouldcontinuetoguaranteethatonquiescence,theviewshowsthecurrentstateofthereplica.Therefore,withconsistencyalgorithmsinwhichthereplicasconvergeatquiescence,theviewswillalsoconverge.
TheDECAFsystem[20]pioneeredtheconceptofflexiblecouplingandisprobablythemostextremeexampleofhidingthecharacteristicsoftheunderlyingdataconsistencyalgorithm.DECAFusesanoptimisticconsistency
14
algorithmwithundoandredotoguaranteeconvergenceofdata.TheprogrammerspecifiestheshareddatainadeclarativemannerbycomposingitfromtheDECAFsuppliedprimitiveobjects,usingDECAF'scompositionconstructs.Relyingonthistotalknowledgeofthedatabeingshared,thealgorithmefficientlymaintainsmultipleversionsofthedata.Thisallowscompletecontrolofwhatdataversionstoshowintheview.DECAFsupportstwotypesofviews:optimisticandpessimistic.Pessimisticviewsonlyshowtheeffectofcommittedtransactions,whileoptimisticviewsshowthecurrentstateofthereplica.
5.5InitializinganewParticipant
Anewparticipant,i.e.onewhoisconnectingtotheset,needstogetaninitialstateofthesetandshouldreceivealloperationsthatareorderedafterthisinitialstate.Itiseasytogettheinitialstatefromanyoneofthecurrentparticipants,butthecriticalproblemisensuringthatthenewparticipantdoesnotmissanyoperationsi.e.operationsorderedaftertheinitialstate.Thismayrequiresomesynchronizationbetweensites.Webrieflyoutlinetwopossiblesolutions.
Inasystemwithacentralserverforroutingallmessagesbetweenparticipants,thesolutiontothisproblemisrelativelysimple.Thiscentralservercanmaintainthestateoftheset,eitherbyactuallyexecutingtheoperationsonitsownreplica,orbyperiodicallyrequestingthereplicastatefromaparticipant.Inthelattersituation,itcanlogalltheoperationsthathavebeensentbytheparticipantsbutnotexecutedonthereplicastateithas.Anewparticipantconnectstothesetbysendingamessagetothiscentralserver.Theserverreplieswiththereplicastateandthelogofoperations.Inaddition,alloperationsreceivedbyitaftersendingthisreplyarealsoforwardedtothisnewparticipant.
Inasystemwithdirectpeer-to-peercommunication,aflushprotocolcanbeusedtosynchronizethepartici-pants,beforelettingthenewparticipantjoin.Duringtheflush,nonewoperationsmaybeissued,sothatthenewparticipantdoesnotmissanyoperations.Aflushprotocol,liketheoneusedforvirtualsynchronyinISIS[6],canalsobeabuildingblockforprovidingfaulttolerance.Insituationswheresomeofthecurrentparticipantsareonlylooselyconnectedwiththerest,theflushmaytakeawhiletocomplete.Insuchcases,therequirementthatnooperationsbeissuedduringthetimetheflushisinprogressmaybetoosevereontheparticipants.Therefore,itmaybenecessarytoutilizeweakersynchronizationprotocols,likeweakvirtualsynchrony[10].
6Classificationofproposedorderingalgorithms
Thissectionclassifiesanddiscussesthereplicamanagementalgorithmsdescribedintheinteractivegroupwareresearchliterature.Figure5showstheclassificationaccordingtothenatureoftimestamping.
15
AccessIndependentAccessDependent
CentralServer
Hybrid
Imprecise,InstantaneousPrecise,Non-InstantaneousImprecise,InstantaneousPrecise,Non-Instantaneouse.g. 2PT, n2PT-ir
e.g. Villa, ...
(In)Dependent,(Im)Precisee.g. DECAF
e.g. dOpt,e.g. Stamp ServerORESTE, COAST
Figure5:AlgorithmClassification
6.1CentralServer
Thisisthesimplestapproach,andhasbeenusedininnumerablesystems.Itusesacentralserver,towhichalloperationsaresentbytheclients.Theserverdistributestheoperationsreceivedtoalltheclients(includingtheclientwhichissuedthatoperation).Thisimposesaverysimpletotalorderingontheoperations.Despiteitssimplicityofimplementationandsimpleboundsonapproach.
VillaTheVillasystem[5]providesanabstractionverysimilartotheo-setabstractiondescribedinthispaper.However,itonlyallowspredeclarationofaccessinformation.Ano-setcanbeeitheroptimisticorpessimistic.Forboth,theserverisusedtoconstructtheglobalorderingoftheoperations.Inpessimisticmode,anoperationisexecutedonlyafteritisreceivedfromtheserver.Inoptimisticmode,anoperationissuedatasite(say)is
RT
andUUT,itsuffersfromdrawbacksasdescribedin
section1.1.Weconsideroneexampleofsuchasystem,Villa,becauseofitsuniqueconsistency(orlackthereof)
immediatelyexecutedandalsosenttotheserver.Alogoftheaccessinformationofeachoperationexecutedby
iskeptat.Whenanoperationthatitissuedisreceivedfromtheserver,
examinesthislogtoseeifthere
havebeenanyconflictswhichcouldhavecausedreplicadivergence.Ifyes,itnotifiestheapplicationaboutwhichobjectsmaynolongerbeconsistent.TheVillasystemprovidesacloningandreinitializationmechanismfortheapplication/usertofixtheincorrectreplicas.
6.2Independent,Precise
Weconsideraclassicalgorithm,againonewhichhasbeenimplementedininnumerablesystems.
StampServerThebasicalgorithmisavariantofthecentralserveralgorithm.Aservermaintainsacounterusedtotimestampalloperations.Aclientsendsarequesttotheserverforatimestamp,whichreturnsthenextvalueofthecounter.Thisisusedbytheclienttotimestamptheoperation,andthensendittootherclients.Assumingthatone-waylatenciesbetweenclientsandbetweenclientsandtheserveris,theminimumpossibleRTandUUTis
and
respectively.Howeverifthenumberofactiveparticipantsisverylow,itcanbeimprovedbydynamically
16
movingthecountertoactivesites.
6.3
dOpt
Independent,Imprecise
EllisandGibbs[9]wereoneofthefirsttousereplicationtoimproveresponsetimeingroupware.They
consideraprogrammingmodelinwhichtheshareddata,forexampleadocument,isrepresentedbyasingleobjectreplicatedateachparticipant'ssite.Operationsissuedbyparticipantsareusedtomodifythisshareddata.Theychoosenottouseexplicitlockingformaintainingconsistencyforvariousreasons,oneofthembeingthatchoosingthegranularityofthelocksmaybeaproblem.Theyguaranteecausalorderingbetweenoperationsusingvectorclocks.However,thiscausalorderingisnotsufficienttoguaranteeconvergence,ascausallyconcurrentoperationswhichmodifythesamedatamaybeexecutedinadifferentorderatdifferentreplicas.Tosolvethisproblemtheyuseanoperationtransformationalgorithm(dOpt),inwhichanoperationmaybetransformedbeforeexecutionsuchthatexecutingtwooperationsindifferentordersatreplicasdoesnotcausedivergence.Theyprovidetransformationoperationsforasimpletextoutlineeditor,Grove.Later,researchersindependentlydiscoveredanerrorinthealgorithm.Cormack[7]hasformalizedthenotionofoperationtransformationsandprovidesacorrectalgorithm.Anadvantageofthisapproachisthatundoingoperationsisnevernecessary.However,operationtransformationforconvergencealsohascertaindisadvantages.Wediscussthistopicinmoredetailinsection7.1.ORESTE
KarsentyandBeaudouin-Lafon[12]describetheORESTEalgorithm,whichconsiderstheshareddata
tobeasetofobjectswhichisfullyreplicatedatalltheparticipantsites.Operationscanaddobjectstotheset,deleteobjectsfromtheset,orcanupdateasingleobjectintheset.TheydefineatotalorderingoftheoperationsusingtimestampsbasedonLamportclocks.However,eachsiteexecutesoperationsoptimistically,thereforeitmightreceiveoperationswithalowertimestampafterithasexecutedacertainoperation.Whenthishappens,undoandredoofoperationsmaybeneededtoguaranteereplicaconvergence.Theyuseinformationaboutcommutativityandmaskingofoperationstominimizeundos.Notethatthetimestampingisinstantaneousandabort-free.Theoptimisticapproachprovidesinstantaneousresponsewiththepotentialforjitter.Also,asorderingisbasedonLamportclocks,communicationwitheverysiteisrequiredforcommit.COAST
TheCOASTsystem[18]fullyreplicatesalltheobjectswhicharepartofashareddocument.Itremoves
therestrictionofORESTEbysupportingtransactionswithACIDproperties,whichcanaccessasubsetoftheobjects.TheorderingofthesetransactionsisdoneusinganalgorithmsimilartoORESTE.However,itisnotclearfromthepaperiforhowcommutativityandmaskinginformationforthesetransactionsisprovidedbytheapplication.
17
6.4Dependent,Precise
2PTAdaptedfromaclassicdatabaseapproach,theperformanceofthisalgorithmisanalyzedwithsyntheticworkloadsin[4].Itsupportstheo-setabstraction,andallowsbothpredeclaredandexecution-timeaccessinfor-mation.Bothreadandwriteaccessescanbespecified,butforsimplicityofdiscussionweconsideronlywrites.Thealgorithmusesatokenperobject.Atokenhasaversionnumber,andtheversionnumbersformacontinuoussequenceofintegersstartingwith0.Theseversionnumbersareusedtotimestampupdateswhichmodifythatobject.Forexample,supposethecurrentversionnumberoftokenAis3,andtokenBis1.ThenifuserAliceissuesanupdate(say)whichmodifiesbothobjectAandB,thenafteracquiringbothtokens,thisupdatewillbe
timestampedwith((A,4),(B,2)).Aftertimestamping,thetokensarereleased,andthisupdateismulticasttoothers.Update
willbeexecutedatasiteonlywhenupdate(s)withtimestamps(A,3)and(B,1)havebeenexecuted.
Note,thattokensareonlyusefulfortimestamping,andarenothelduntilanupdatecommits.Therefore,despitetheirsimilaritywithlocks,theyarecalledtokens.Toensurenocontradictoryorderinginformation,tokensthatareacquiredtotimestampanupdatearenotreleaseduntiltheupdateisfullytimestamped.Thisissimilartothe2-phaselockingschemeusedindatabases,andthereforethealgorithmissaidtodo2-phasetimestampingor2PT.n2PT-irThisalgorithmisavariationonthepreviousone,andrelaxesthe2PTrequirementforupdatesbyreleasingthetokensimmediately.However,itonlyworksforexecution-timeaccessinformation.Forexample,againconsiderupdate
andanotherconcurrentupdate
whichalsomodifiesAandB,andassumethattoken
AisacquiredbeforetokenB.Suppose
acquiresthetokenforAbefore
andgetstimestamp(A,4).Now,if
immediatelyreleasesthetokenA,beforeacquiringtokenB,
willgettimestamp(A,5).Asexecution-time
informationisbeingused,theexecutionof(A,4)i.e..Thisimpliesthat
willnowblock,untilitssitehasexecutedtheupdatewithtimestamp
doesnotgettothatstageofitsexecutionwhereitaccessesB,untilisexecuted
atitssite(Note,thatthiswouldnotbetrueiftheexecutionofeachupdatecausedmultiplenotifications,whereeachnotificationcorrespondstothenewvalueofanobject,tobesentout).Therefore,thereisnodangerthat
canacquiretokenBbefore,whichwouldcauseacycleintheorderingrelation.
6.5Hybrid
DECAFTheDECAFsystem[20]isthefirsttorecognizetheneedtoconsiderbothdataandviewconsistency,andprovidesalgorithmsthatmanagereplicatedobjectsandtheirviews.Itprovidesadeclarativemodelinwhichsharedobjectscanbecomposedfromtheprimitivesharedobjectslikeinteger,realandstring,providedbytheirframework.Sharedobjectsareupdatedusingtransactions(onlyexecution-timeinformationisused),withtherestrictionthattheissuerhavereplicasofalltheobjectsaccessedbythetransaction.However,unlikethepreviousalgorithms,othersitesmaynothavealltheobjectsaccessedbyatransaction.Therefore,thealgorithmdistributes
18
theeffectandnottheactualtransactiontotheothersites.
Theunderlyingdataconsistencyprotocolisoptimistic.Theissuingsite,say,executesthetransactionand
assignsitatimestampusingaLamportclock.Itthensendstheeffectandthistimestamptoallothersites.Thereisaprimarysiteforeachobjectwhichvalidatesthetimestampforatransactionwhichreadsorwritesthatobject.Theissuingsitewaitsforvalidationresponsesfromtheprimarysitesofalltheobjectsthetransactionreadorwrote.Thisvalidationgivessomedependentpropertiestothebasicallyindependenttimestamp.Whenalltheprimarysitessay`yes`,thetimestampisapproved,andthis`ok`issentbytoallsiteswhichhadbeensentthetransactions
effect.Ifevenoneprimarysitesays`no`,thetransactionisabortedatandatallothersites.Then,hastoretryby
executingandtimestampingagain.Inthecasethatatimestampisapproved,i.e.asitehasreceivedan`ok`messagefrom,thesitegoesandcheckswithalltheprimarysitestoensurethatithasnotmissedanoperationwhichhad
beenorderedearlier.ThisavoidstheneedtowaitforcommitusingtheusualLamportclockrules.However,itaddsaround-tripcommunicationwitheveryprimarycopybyeverysitethatreceivedan`ok`message.Evenif
istheprimarycopyforalltheobjectsaccessedbyitstransaction,thelowerboundonUUTforapessimisticviewis
.If
isnottheprimarycopyforevenoneoftheobjectsaccessedbyit,thelowerboundonRTforapessimistic
viewis
.Anotherdrawbackofthisapproachisthatthereisnoupperboundonthenumberoftimesatransaction
mayabort.Concurrenttimestampingoftwoconflictingtransactionscanalsocausebothtoabort.Eachtransactionrequiresaminimumoftwomulticasts,onetodistributethetimestampedoperation,andthesecondtoinformaboutabortorcommit.Overall,thebehaviorofthealgorithmisquitecomplex.
7
7.1
MiscellaneousIssues
OperationTransformation
OperationtransformationwaspioneeredbyEllisandGibbsinGrove[9],andisusedforoptimisticexecution.Cormackformalizeditin[7],andithasbeenusedinnumeroustexteditors[21]andaspreadsheet[15].
Thisapproachdealswithtwoproblems.Firstly,becauseofconcurrentissueofupdates,someoperationsmayneedtobetransformedtopreservetheintentoftheuser.Forexample,changingthepositionofinsertionordeletioninatextbufferwhenthereareconcurrentinsertsordeleteswhichareorderedearlier.ThiscorrespondstotheafterfunctioninCormack'scalculus,andisatechniquewhichisusefulforanysituationinwhichoperationsareissuedconcurrently(evensinglecopysystems).Secondly,itisusedtoprovideconvergenceofreplicasinoptimisticexecutionwithouttheneedforundoandredo.ThiscorrespondstothebeforefunctioninCormack'scalculus.
Theprimaryadvantageofprovidingreplicaconvergencethroughoperationtransformationisthepotentialforfasteroverallexecutionatasite,bysubstitutingtransformationforoperationundoandexecution.Supposeanoperation
isreceivedatasitewhere
operationsthatwereissuedconcurrentlywith
havealreadybeen
19
executed.Cormack'salgorithmrequiresatotalof
transformations,ofwhich
aredoneon,and1eachon
the
operationsthathavealreadyexecuted.Thenthetransformed
willbeexecuted.Nowconsideranundo/redo
approach,wherewassupposedtobeexecutedbeforethelast
oftheseoperations(avalidaveragecase
assumption).Firstly,thelastfunction.Then,thelastlast
areundone.Then,
istransformedwithrespecttothefirst
usingtheafter
operationswillbetransformedwrtto
usingtheafterfunction.Finally,
andthe
operationswillbeexecuted.Therefore,theundo/redoapproachcausestransformations,
undos,and
operationexecutions.Incontrast,theoperationstransformationapproachcausestransformationsand
operationexecution.
However,transformationsasareplicaconvergenceapproachhavesomedisadvantages.Firstly,beforefunctions
canbehardtocomeupwith,andhavetobeprovedcorrect.Secondly,beforefunctionscannotimplementmanysemanticswhichundo/redocanbecauseoflossofinformation.Forexample,abeforefunctioncannotcompensateforanoperationthatdeletessomedata(likedeletingsomecharactersinatexteditor),becausethedatadeletedhasbeenlost.Incontrast,withundo/redo,theoperationcanstorewhatitdeletedandcanrestorethatwhenundone.
7.2AccessGranularity
Multigranularitylockinghasbeenusedindatabasesystems[3],andalsoincentralizedgroupwaresystems[14].Itsusefulnessliesinpreventingamismatchinlockgranularityandaccessgranularity.Thiscanbegeneralizedtomultigranularaccessinformationinthecontextoftheo-setabstraction.However,thebenefitsforreplicateddataarenotimmediatelyapparent.Considera2PTalgorithmwhichusesthismultigranularaccessinformationtoacquiretokens.Havingmanyfine-grainedlockscanreducethelocalityofacquiringtokensbyasite.Forexample,inthefirstsequencecompatibilitytablegivenin[14],whichcanbeusedfortexteditors,insertionofeveryconsecutivecharactercausesanewtokentobeacquired.Thisisacceptableforacentralizedsystem,butmaynotbeappropriateifacquiringatokenrequirescommunicationwithothersites.Also,inscenarioswithdata-dependentaccesspatterns,thegranularityofwhatwillbeaccessedbyanoperationisnotknownbeforehand,andsotoomanytokensmaybeacquired.
Despitethis,multigranularaccessisapromisingapproach,andhowtoadaptittoorderingprotocolsanditsperformanceinadistributedenvironmentneedsmoreinvestigation.
8Conclusion
Thealgorithmrequirementsforinteractivegroupwarearevaried,withtheapplicationscenario,andnetworklatencybeingtheprimaryfactorsaffectingthechoiceofalgorithmtouse.Despitethemanytypesofreplicamanagementalgorithmsproposed,thereisanunstatedcommonthemeinthem.Thispaperillustratesthecommondatashar-ingrequirementsofinteractivegroupwareandcomesupwithadataabstractionwhichisappropriateforthese
20
requirements.Wediscusswhyconventionalreplicateddatabasesarenotappropriate,anddescribeageneralalgo-rithmicstructureforimplementingthisabstraction.Theimportantdesignchoicesandcharacteristicsofalgorithminstancesaredescribed.Thisisusedtoroughlyclassifythealgorithmsdescribedinthegroupwareresearchlitera-ture.Thegoalofthispaperisthatanunderstandingofthefundamentaldesignchoicesandtheirimplications,willstimulatemoreresearchinsuchalgorithms.Atthesametime,acommonterminologyfordescribingalgorithmcharacteristicswillhelpinalgorithmclassification.
References
[1]D.Agrawal,J.L.Bruno,A.E.Abbadi,andV.Krishnaswamy.Relativeserializability:Anapproachforrelaxingthe
atomicityoftransactions.InProceedingsofthe13thACMSymposiumonPrinciplesofDatabaseSystems,1994.[2]N.S.BarghoutiandG.E.Kaiser.Concurrencycontrolinadvanceddatabaseapplications.ACMComputingSurveys,
23(3):269–317,1991.[3]P.A.Bernstein,V.Hadzilacos,andN.Goodman.ConcurrencyControlandRecoveryinDatabaseSystems.Addison-Wesley,1987.[4]S.Bhola,G.Banavar,andM.Ahamad.Responsivenessandconsistencytradeoffsininteractivegroupware.InProceed-ingsof7thACMConferenceonComputerSupportedCooperativeWork,November1998.Toappear.[5]S.Bhola,B.Mukherjee,S.Doddapaneni,andM.Ahamad.Flexiblebatchingandconsistencymechanismsforbuilding
interactivegroupwareapplications.InProceedingsofthe18thICDCS,1998.[6]K.Birman,A.Schiper,andP.Stephenson.Lightweightcausalandatomicgroupmulticast.ACMTransactionson
ComputerSystems,9(3):272–314,August1991.[7]G.V.Cormack.Acalculusforconcurrentupdate.TechnicalReportCS-95-06,DepartmentofComputerScience,
UniversityofWaterloo,Ontario,1995.[8]C.CowanandH.L.Lutfiyya.Await-freealgorithmforoptimisticprogramming:Hoperealized.InProceedingsofthe
16thInternationalConferenceonDistributedComputingSystems(ICDCS),pages484–493,1996.[9]C.A.EllisandS.J.Gibbs.Concurrencycontrolingroupwaresystems.InProceedingsoftheACMSIGMOD',pages
399–407,19.[10]R.FriedmanandR.vanRenesse.Strongandweakvirtualsynchronyinhorus.TechnicalReportTR95-1537,Computer
ScienceDepartment,CornellUniversity,1995.[11]J.P.Granieri,J.Crabtree,andN.I.Badler.Productionandplaybackofhumanfiguremotionforvisualsimulation.ACM
TransactionsonModelingandComputerSimulation,5(3):222–241,July1995.[12]A.KarsentyandM.Beaudouin-Lafon.Analgorithmfordistributedgroupwareapplications.InProceedingsofthe13th
ICDCS,pages195–202,1993.[13]L.Lamport.Time,clocks,andtheorderingofeventsinadistributedsystem.CommunicationsoftheACM,21(7):558–
565,July1978.[14]J.MunsonandP.Dewan.Aconcurrencycontrolframeworkforcollaborativesystems.InProceedingsofthe6thACM
CSCW,1996.[15]C.R.PalmerandG.V.Cormack.Operationtransformsforadistributedsharedspreadsheet.InProceedingsof7thACM
ConferenceonComputerSupportedCooperativeWork,November1998.Toappear.[16]A.PrakashandH.S.Shim.Distview:Supportforbuildingefficientcollaborativeapplicationsusingreplicatedobjects.
InProceedingsofthe5thCSCW,1994.[17]M.RosemanandS.Greenberg.Buildingrealtimegroupwarewithgroupkit,agroupwaretoolkit.ACMTransactions
onComputerHumanInteraction,1996.[18]C.Schuckmann,L.Kirchner,J.Schummer,andJ.M.Haake.Designingobject-orientedsynchronousgroupwarewith
COAST.InACMCSCW'96,1996.
21
[19]S.Singhal.EffectiveRemoteModelinginLarge-ScaleDistributedSimulationandVisualizationEnvironments.PhD
thesis,Dept.ofComputerScience,StanfordUniversity,August1996.[20]R.Strom,G.Banavar,K.Miller,A.Prakash,andM.Ward.Concurrencycontrolandviewnotificationalgorithmsfor
collaborativereplicatedobjects.IEEETransactionsonComputers,47(4):458–471,April1998.[21]C.SunandC.Ellis.Operationaltransformationinreal-timegroupeditors:Issues,algorithmsandachievements.In
Proceedingsof7thACMConferenceonComputerSupportedCooperativeWork,November1998.Toappear.
22
因篇幅问题不能全部显示,请点此查看更多更全内容