A COMPARISON OF THE EFFICIENCY OF AN ATOMIC COMPONENT OPERATION
Transcript of A COMPARISON OF THE EFFICIENCY OF AN ATOMIC COMPONENT OPERATION
A COMPARISON OF THE EFFICIENCY OF AN ATOMIC COMPONENT
OPERATION VERSUS PRIMITIVE OPERATIONS FOR BUILDING A
REAL-TIME COLLABORATIVE EDITING API
_______________
A Thesis
Presented to the
Faculty of
San Diego State University
_______________
In Partial Fulfillment
of the Requirements for the Degree
Master of Science
in
Computer Science
_______________
by
Leslie A. Viviani
Summer 2013
iii
Copyright © 2013
by
Leslie A. Viviani
All Rights Reserved
iv
DEDICATION
This thesis is dedicated to my family who has stood by and supported me in every
way possible; from loving encouragement to those not-so-gentle nudges to “just get it done.”
v
Before software can be reusable it first has to be usable. ---Ralph Johnson
vi
ABSTRACT OF THE THESIS
A Comparison of the Efficiency of an Atomic Component Operation versus Primitive Operations for Building a Real-Time
Collaborative Editing API by
Leslie A Viviani Master of Science in Computer Science
San Diego State University, 2013
Real-time collaborative editing is a productive way to work in groups and drive innovation. A software application is more likely to be adopted by its users if it is familiar to them and something they already know how to use. Thus, an API that would allow a development team to turn a single-user application into a collaborative application is needed. Such an API would need to find a balance between complexity from the perspective of the developers building the API and the developers using the API to build a real time collaborative editor.
The API should be flexible and include enough operations so as to be useful, but not so many operations as to make the operation transformations overly complex. This paper presents a comparison of the efficiency of primitive algorithms versus atomic component algorithms in the context of building a real-time collaborative editing API. The atomic component operations perform better, both in terms of CPU clock cycles as well as in terms of ease of use for a developer building an application.
vii
TABLE OF CONTENTS
PAGE
ABSTRACT ............................................................................................................................. vi
LIST OF TABLES ................................................................................................................. viii
LIST OF FIGURES ................................................................................................................. ix
ACKNOWLEDGEMENTS .......................................................................................................x
CHAPTER
1 INTRODUCTION .........................................................................................................1
2 BACKGROUND AND RELATED WORK .................................................................2
2.1 Real Time Collaborative Editing .......................................................................2
2.2 Concurrency Control ..........................................................................................3
2.3 Operational Transform .......................................................................................3
2.4 Algorithms to Support OT for RTCE ................................................................6
3 METHODS ....................................................................................................................8
3.1 Efficiency ...........................................................................................................8
3.2 Functionality ......................................................................................................8
3.3 Design ................................................................................................................8
3.4 Implementation ..................................................................................................9
3.4.1 The Test Code ...........................................................................................9
3.4.2 Test Code Example .................................................................................13
4 RESULTS ....................................................................................................................15
5 DISCUSSION ..............................................................................................................23
5.1 Average Run Time Comparison ......................................................................23
5.2 Developer Time ...............................................................................................24
6 CONCLUSION ............................................................................................................25
REFERENCES ........................................................................................................................27
viii
LIST OF TABLES
PAGE
Table 3.1. Test Suites Description ...........................................................................................10
Table 3.2. Core Classes of the Model RTCE System ..............................................................11
Table 3.3. Test Descriptions ....................................................................................................14
Table 4.1. Time Results Primitive Algorithms Run First ........................................................16
Table 4.2. Time Results Atomic Component Algorithm Run First .........................................17
Table 4.3. Total Time Results All Primitive Operation Tests .................................................17
Table 4.4. Total Time Results All Atomic Component Operation Tests .................................18
ix
LIST OF FIGURES
PAGE
Figure 2.1. Graphical representation of operation transformation. ............................................5
Figure 3.1. Code listing 1 primitive operation test pseudo code. ............................................12
Figure 3.2. Code listing 2 atomic component operation test pseudo code. .............................13
Figure 4.1. Time by test - primitives run first. .........................................................................18
Figure 4.2. Time by test – atomic component run first. ...........................................................19
Figure 4.3. Total time of all tests .............................................................................................20
Figure 4.4. Time by run number primitive operation tests run first. .......................................20
Figure 4.5. Time by run number atomic component operation tests run first. ........................21
Figure 4.6. Combined time by run number. .............................................................................22
Figure 4.7. Total combined test execution time. ......................................................................22
x
ACKNOWLEDGEMENTS
I would like to thank my advisor, Dr. Joseph Lewis, for his support through this
process and his enthusiasm for teaching. His work has opened my eyes and mind to a world
that is beyond my normal black and white thinking.
Additionally, I would like to thank the members of my committee for their work and
advice through this process.
Finally, I wish to thank my family whose enthusiasm and support have helped make
this achievement possible.
1
CHAPTER 1
INTRODUCTION
Many companies have workers that are not co-located, but must work together
anyway to collaborate on work. It is often the case that many hours are wasted sending
documents back and forth across email or servers. A typical scenario is that one user updates
the document, sends the document with their changes to another user and then must wait for
the next user to send their changes back. In today’s fast paced world of short deadlines and
around-the-globe workforce, a real-time collaborative editing (RTCE) system would increase
productivity and lessen frustration among those users.
A real-time collaborative editor could be implemented as a stand-alone application
that users could log into and use or it could be implemented as an interface to an existing
single user application, which would allow it to become a real-time group editor. Building an
interface to an existing single user application is beneficial for the end user in that they don’t
have a new tool to learn. However, this approach raises an important issue – the developer of
the API must find a good balance between the complexity of the RTCE from the point of
view of API development and the ease of use of that API from the perspective of the
developer using that API to build such an interface.
Real time collaborative editors must handle issues several consistency issues and
prevent problems such as divergence, causality violations, and intention violations. There are
many algorithms to handle these issues, but one of the most prevalent is Operational
Transform (OT). Operational transform resolves conflicts that can occur when two or more
users are attempting to update the same portion of a document model. OT allows those users
to do this without locking or manual intervention.
An OT system for document editing can be built using the primitive operations of
Insert and Delete. Any type of behavior needed in a real time collaborative text editor could
be modeled using these two operations. However, it is more efficient, both in terms of CPU
time and in terms of developer time to provide higher level atomic component operations,
such as a Move operation.
2
CHAPTER 2
BACKGROUND AND RELATED WORK
2.1 REAL TIME COLLABORATIVE EDITING
Real-time collaborative editing (RTCE) allows two or more users in possibly
different locations working on different computers to simultaneously work on the same
electronic file (for example, a text document) and see each other’s changes in real time.
Most modern RTCEs based on Operational Transformation (OT) typically use a replicated
architecture in which the shared document is replicated at each site involved in the
collaboration [1]. A shared copy of the document model at each collaboration site helps
ensure a good user experience. A user makes a change to the document and they see that
change performed in the document immediately. The operation is then propagated to the
remote sites and transformed against local operations at each collaboration site. Some
systems include a server that contains the master document, and each remote site involved in
the collaboration then performs their transformations only with the master document, instead
of every other remote site.
There are many different implementations of RTCE systems developed over the last
several years. These systems all belong in one of two broad categories – the system is either a
standalone, fully self-contained separate application or is an add-on to an existing single user
application to make it collaborative [2]. Some examples of these applications are CoWord,
CoMaya, Ace Editor, ShareJs, and Google Wave.
There are advantages to the approach of modifying a single user application to make
it collaborative. For example, taking a tool that is familiar to many users, such as Microsoft
Word, and providing a way to make it collaborative would get more user buy-in rather than
asking a user to become familiar with a different word processing tool. There is much less of
a learning curve and it may be easier to convince users to try it rather than try a new software
package [3-5].
3
Regardless of the type of collaborative editing application (stand alone or existing
modification) any RTCE application must handle concurrency control and consistency
correctness.
2.2 CONCURRENCY CONTROL
Concurrency control is concerned with the coordination of concurrent access to a
shared resource and resolving any conflicts that may arise when two or more users attempt to
modify the same portion of a document model [6, 7]. One of the primary functions of
concurrency control is to ensure the consistent state of the model. In other words, it must
ensure that the correct results are generated in all instances of the document model. There are
several inconsistency problems that can occur in RTCE. The primary errors that a RTCE
needs its concurrency control algorithm to prevent are divergence, causality violations, and
intention violations [3, 7, 8].
Divergence has occurred when the final result in all instances of the document model
are not identical. This can happen when operations arrive at different sites in different orders.
Since there may be dependencies among the operations originating from a site, but the
operations get executed in different orders at other sites, the final document state may
diverge [8-10].
A causality violation is when the executing order of the operations is different than
the cause and effect order. Since operations may arrive in different orders in which they were
generated, they may be executed out of their original order which may cause confusion to the
user [11-13].
An intention violation can occur when the actual effect of an operation execution is
different from the intended effect. This can happen if operations cause a different operation
to commit an unintended effect. The intention of an operation must be preserved across all
client sites, regardless of any concurrent operations. This means that the observed effect of
the operation at all sites is the same as the operation at the site it originated from [14, 15].
2.3 OPERATIONAL TRANSFORM
One such method of concurrency control is Operational Transform (OT) which is an
optimistic concurrency algorithm. The premise of optimistic concurrency is that the
probability of two transactions accessing and modifying the same object or set of data is low.
4
Operations are allowed to execute as if there were no possibility of conflicts (as
opposed to a locking concurrency control which only allows one user at a time to modify an
object or set of data) [4, 16].
An OT system is built from a collection of algorithms that provide a way to resolve
conflicts without user intervention and without locking the data model so that more than one
user may work on the same data model at the same time. Without locking, OT is able to
operate in high latency environments, such as web applications, without lag time and delay in
the user experience [17].
OT allows you to look at handling changes based on the operation level instead of on
a whole document model level. It is much easier to handle transforming a single operation
against another operation and providing those to the remote document sites to bring the
document model into convergence than it is to consider doing so for an entire document
model [7, 4].
Inevitably, conflicts do occur and the algorithm must be accompanied by the
transformation of the operations so that operations invoked by different users can be applied
to the documents whose states have diverged and bring those documents back to the same
state. An example scenario of how a conflict can occur and is resolved with operational
transform follows.
A user named Ben and his colleague named Charlie are both working to complete the
end of the month sales report for their manager. Ben and Charlie are not co-located and are
using a RTCE to complete their work. They both see an issue at the end of the document and
both attempt to insert text in the same spot in the document model. Ben inserts a “b” and
Charlie simultaneously inserts an “s”. Both operations are sent to the server. The
transformation that happens with the operation sent by Ben will retain two items and insert
bs, and the operation send by Charlie will retain the two items and insert sb. The server can
only apply one operation at a time and chooses one set of transformed operations to apply
first. However, as soon as one of the transformed operations is applied, the retain portion of
the second transformed operation becomes invalid. Depending on which transformed
operation was applied first, the ending of the document will either contain sb or bs.
Each client, as well as the server, needs to be made aware of every other client’s
operations. However, just sending Ben’s operations to Charlie’s version of the document
5
model and vice versa will not work, and the document states will not converge. This is where
Operational Transform can help.
The problem can be visualized as in the diamond problem shown in Figure 2.1. This
diagram shows the application of two separate operations on a document model at the same
time, operation a and operation b. In a diagram such as this, the client operations move the
document model to the left and server operations move the document model to the right.
However, both types of operations move the document model downward. This view is a
representation of the operations applied in what is called a state space. When both the client
and server lines pass through the same point, in means that the document model, at that point
in time, has converged.
Figure 2.1. Graphical representation of operation transformation.
Going back to the above example, imagine that Ben’s operation is the a operation in
the diagram and Charlie’s operation is the b operation. In this case, the b operation is applied
first by the server, followed by the a operation. In order to ensure the document model states
converge, the a operation needs to be transformed with respect to the b operation and the b
operation needs to transform with respect to the a operation.
The transform function is based on the mathematical identity:
xform(a,b) = {a’,b’}
where a and b are the original operations; one server and one client operation. The transform
function takes these two operations and produces a pair of operations such that when applied,
both documents wind up converging. In other words, if the client applies a followed by b’
6
and the server applies b followed by a’, then both documents will end up with the same final
state.
Using the operational transform identity, when operation b is received on the client
side from the server, it is paired with operation a to produce (a’, b’) and then compose
operation b with a to produce the final document model state. The comparable procedure is
followed on the server side and ensures that the document models on the client side and the
server side converge to the same document model state.
2.4 ALGORITHMS TO SUPPORT OT FOR RTCE
One of the great pioneers of RTCE using OT is Dr. Sun Chengzheng who explains in
several papers that the operational model of a basic OT system needs only the primitive
operations of Insert and Delete. This is correct for string based document transformations;
these two primitives can model virtually any complex operation needed.
However, when building a RTCE API in order for a development team to program an
interface to turn a single-user application into a multi-user collaborative application, it would
be not only more efficient, but more usable for a developer to have access to higher level
atomic component operations.
An application programming interface (API) is the collection of methods intended to
be called to build a program. A good API should be readable and easy to use even without
formal documentation. It should also provide enough building blocks to make it worthwhile
for a development team to invest it’s time to learn and use.
There are tradeoffs that must be considered in designing an API between the usability
of that API for the developer using it, and the complexity of the underlying library of code.
An API has greater usability by providing higher level operations. Higher level operations
make the API easier for a developer to implement a system using the API [18]. Ensuring that
the API is easy to use and well written will increase its chance of being used and adopted by
a developer community [19, 20].
When designing an API to be used to build a RTCE based on Operational Transform,
great care must be taken to balance the usability of the API with the number of operations
provided. For each operation that is provided in the API, you must be able to transform that
operation against every other operation in the API.
7
The more operations you provide, the greater the complexity of performing the OT
operations that you need to handle [21]. However, you really want someone to use your API
and the more readable and usable it is, the greater the likelihood is that they will use it. If you
push off the cost of effort to the developers using the API, it has a lower chance of being
adopted.
8
CHAPTER 3
METHODS
In order to answer the question of whether primitive operations are more or less
efficient than an atomic component operation in the context of an RTCE, a couple of issues
need to be identified and described. The first was to define what is meant by efficiency for
this problem. The second was to define a functionality that could be completed by both an
atomic component operation and by using primitive operations.
3.1 EFFICIENCY
An efficient operation is defined as the measurement of a comparison of production
with cost [22]. Cost can be measured in terms of energy, time, and/or money. In this case, I
am considering the efficiency of the algorithms by measuring the CPU clock cycles that it
takes to perform the same amount of work (moving text from one area of a document to
another) as well as considering the efficiency of the algorithm in terms of the amount of work
it takes from a developer standpoint to use the algorithm in question.
3.2 FUNCTIONALITY
The behavior I decided to test is a move behavior; for example, moving a paragraph
of text from one part of the document model to another part. This behavior can be achieved
by both a combination of primitive operations of insert/delete as well as a fully atomic
component operation.
3.3 DESIGN
In order to test the efficiency of the different operations I wrote a suite of tests that
perform various move behaviors and measured the time it took to run those tests. I wrote a
test suite for the move functionality modeled by the Insert and Delete operations and an
analogous test suite for the move functionality performed by the Move operation.
In order to minimize effects on the test results from outside influences such as the
size of text moved, or position of text moved (i.e. from beginning to end versus middle to
9
middle). I used the same set of starting text for each suite of tests and performed the same
operations. I compared a Move Text From Beginning To End for the primitives with a Move
Text From Beginning To End with the atomic component move algorithm.
I ran the suite of tests in two test batches; test batch 1 ran the primitive operations
first followed by the atomic component operations and in test batch two ran the atomic
component operations followed by the primitive operations. Each test batch consisted of 5
runs of primitive operation tests and 5 runs of atomic component operation tests with each
run consisting of 1000 executions of each of the 7 tests. A description of each test batch,
including the number of runs and the number of executions per run, is given in Table 3.1. I
ran the test suites on a laptop computer with the following specifications:
Intel Core i7-3632QM CPU @ 2.20GHz
8.0 GB Ram
64-bit Operating System, x64-based processor
3.4 IMPLEMENTATION
The algorithms described in this paper were written in Java using built-in libraries
and the test classes were written in Java using JUnit. The data model for this RTCE was
modeled using strings. The core classes that make up the model RTCE are briefly described
in Table 3.2.
3.4.1 The Test Code
The test suites consist of the same set of tests for the primitive operations, as well as
the atomic component operation. Each test for the primitive operations had a corresponding
test for the move operation, which performed the same functionality. Each test in the Atomic
Component Operations suite of tests performed the same amount of work as its counterpart in
the Primitive Operations suite of tests.
The text chosen as the document model is as follows; in the order of how it should
look at the end of each test. I numbered the paragraphs for ease of discussion.
1. When in the course of human events, it becomes necessary for one people to dissolve the political bands which have connected them with another, and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation.
10
Table 3.1. Test Suites Description
Test Batch Run #’s Test Name # of Executions
Batch 1 Primitives 1-5 Move Beginning to End 1000
Move Beginning to Middle
Move End to Beginning
Move End to Middle
Move Middle to Beginning
Move Middle to End
Move Middle to Middle
Batch 1 Atomic Component 6-10 Move Beginning to End 1000
Move Beginning to Middle
Move End to Beginning
Move End to Middle
Move Middle to Beginning
Move Middle to End
Move Middle to Middle
Batch 2 Atomic Component 11-15 Move Beginning to End 1000
Move Beginning to Middle
Move End to Beginning
Move End to Middle
Move Middle to Beginning
Move Middle to End
Move Middle to Middle
Batch 2 Primitives 16-20 Move Beginning to End 1000
Move Beginning to Middle
Move End to Beginning
Move End to Middle
Move Middle to Beginning
Move Middle to End
Move Middle to Middle
11
Table 3.2. Core Classes of the Model RTCE System
Classes Description
Insert: Setup a new Insert operation by providing the index to insert at,
and the characters to insert.
This operation is then applied to a string.
Insert insertOp = new Insert(14, “and the dog is blue”);
insertOp.apply(The sky is red.”);
Resulting String: The sky is red and my dog is blue.
Delete Setup a new Delete operation by providing the index range of text
to be removed.
This operation is then applied to a string.
Delete deleteOp = new Delete(10,14);
deleteOp.apply(“My dog is was named Jasper”);
Resulting String: My dog is named Jasper.
Move Setup a move operation by providing the index range of the text to
move and the index of where to move it to.
This operation is then applied to a string
OTModel Simple POJO that used a String to model the data and an ID to
keep track of the client.
DeriveOperations Derives the steps necessary to transform from one operation to
another to lead to convergence.
ConcurrencyControl
The controller for handling the concurrency issues.
12
2. We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable rights, that among these are life, liberty and the pursuit of happiness
3. That to secure these rights, governments are instituted among men, deriving their just powers from the consent of the governed. That whenever any form of government becomes destructive of these ends, it is the right of the people to alter or abolish it, and to institute new government, laying its foundation on such principles and organizing its powers in such form, as to them shall seem most likely to effect their safety and happiness.
For each test, I rearranged the order of the starting text to work with, ran the test,
validated that the test passed, and then recorded the time results.
All of the tests work basically the same way, they just test different starting points,
ending points, and insert points. However, the pseudo code given below is a model for all of
the tests in their respective test suite. Note that the work of finding the indices is outside the
scope of the operations themselves.
Code Listing 1 (Figure 3.1) shows the pseudo code for modeling the move behavior
using only the primitive operations of Insert and Delete.
//create the model with the starting string (model the current document state)
OTModel documentModel = new OTModel(getStartString());
//first, find the string to move and then delete it from the main document model
String textToMove = findString(startIndex, endIndex);
//create the delete operation
DeleteOperation deleteOp = new DeleteOperation(startIndex, endIndex);
//apply operation to main string & store the result (main string minus removed text)
String tempString = deleteOp.apply(model.getValue());
// create the insert operation with the correct indices and the text we want to insert
InsertOperation insertOp = new InsertOperation(insertIndex, textToMove);
//now apply the insert operation to the temp string to combine them back together
String endString = insertOp.apply(tempString);
//verify that the endString matches what we expect (with JUnit)
assertEquals(endString, getModelEndString());
//end the test and measure the time
Figure 3.1. Code listing 1 primitive operation test pseudo code.
13
Code Listing 2 (Figure 3.2) shows the pseudo code for modeling the move behavior
using atomic component Move operation.
//create the model with the starting string (model the current document state)
OTModel documentModel = new OTModel(this.getStartString());
//create the move operation with the indices
MoveOperation moveOp = new MoveOperation(startIndex,endIndex, moveToIndex);
//apply the move operation to the original model
String endString = moveOp.apply(model);
//verify that the endString matches what we expect (with JUnit)
assertEquals(endString, getModelEndString());
//end the test and measure the time
Figure 3.2. Code listing 2 atomic component operation test pseudo code.
I will describe one test in detail, and provide a high level overview of the remaining
tests. Each of the tests had a different setup so that the test result would be easily verifiable
against the same ending string. The purpose of each test was moving a substring of data from
one position of the document to another.
3.4.2 Test Code Example
The specific test, Move Beginning to End, provides a good template for all of the
tests that I wrote. Each test in the test suite can follow the same pattern in the test setup, test
body, and test result.
Test setup: I arranged the setup document such that it looked like paragraph 3,
paragraph 1, then paragraph 2. Abbreviated as P3->P1->P2.
Test body: The code was to move paragraph 3 which is currently in position one, to
the end of the document which is where it belongs.
Test result: Verify the final document was in order such that it was Paragraph 1,
Paragraph 2, Paragraph 3. This is abbreviated as P1->P2->P3.
The lists of tests performed in each test suite along with a brief explanation of each is
given in Table 3.3.
14
Table 3.3. Test Descriptions
Test Name Test Description
Move Beginning to End Move text from the beginning of the document to the end.
Move Beginning to
Middle
Setup the test in order of P2->P1->P3. Move the text such that
it ends up as P1->P2->P3.
Move End to Beginning Setup the test in order of P2->P3->P1. Move the text such that
it ends up as P1->P2->P3.
Move End to Middle Setup the test in order of P1->P3->P2. Move the text such that
it ends up as P1->P2->P3.
Move Middle to
Beginning
Setup the test in order of P2->P1->P3. Move the text such that
it ends up as P1->P2->P3.
Move Middle to End Setup the test in order of P1->P3->P2. Move the text such that
it ends up as P1->P2->P3.
Move Middle to Middle This test was setup such that it was in order P1->P2->P3, but
that P2 was rearranged into P2a and P2b, so that a middle
section could be moved around and it wind up in the correct
P1->P2->P3 order.
15
CHAPTER 4
RESULTS
The results of the test runs are given in the next several tables and graphs. The tests
were run in two batches; the first batch ran the primitive operations test first and the atomic
component operations test second and the second batch did the reverse order. Each batch
consisted of 5 runs and each run consisted of 1000 executions of each test, for a total of 5000
executions of each test per batch.
Table 4.1 shows the results of the 5 runs of tests where the primitive operation tests
were run first. The average execution of each test in the atomic component operation test set
was consistently faster than the average time of each test in the primitive operation tests.
Notice though that certain runs of certain tests for the atomic component operation
tests were slower than the corresponding run of the primitive operation test. Most notably,
Run 1 of the Atomic Component Operation test “Move Beginning to End” took 110 ms while
its Primitive Test counterpart took 109 ms and Run 4 of the Atomic Component Operation
test “Move Middle to End” took 63 ms while its Primitive test counterpart only took 47ms.
Table 4.2 shows the results of the 5 runs of tests where the atomic component
operation tests were run first. The average execution of each test in the atomic component
operation test set was consistently slower than the average time of each test in the primitive
operation tests. Notice though that certain runs of certain tests for the primitive operation
tests were slower than the corresponding run of the atomic component operation test. Most
notably, Run 4 of the Primitive Operation test “Move Beginning to End” took 47 ms while its
Atomic Component Test counterpart took 31 ms and Run 1 of the Primitive Operation test
“Move Middle to Middle” took 62 ms while its Atomic Component test counterpart only
took 47ms.
Table 4.3 shows a summary of the primitive operations test results. This table shows
the total combined time of all the atomic component operations when this set was run first,
when it was run second as well as the total combined time of both runs and the average time
of the tests.
16
Table 4.1. Time Results Primitive Algorithms Run First
Primitive Operations Tests Run 1 Run 2 Run 3 Run 4 Run 5 Average
Move Beginning To End 109 47 63 31 47 59.4
Move Beginning To Middle 93 63 78 32 47 62.6
Move End to Beginning 94 78 47 32 32 56.6
Move End To Middle 94 203 47 47 47 87.6
Move Middle to Begin 78 62 47 63 31 56.2
Move Middle to End 63 47 47 47 31 47
Move Middle to Middle 63 47 31 47 63 50.2
Total time per run 594 547 360 299 298 419.6
Atomic Component Operation
Tests
Run 1 Run 2 Run 3 Run 4 Run 5 Average
Move Beginning To End 110 47 31 31 47 53.2
Move Beginning To Middle 62 63 47 31 47 50
Move End to Beginning 47 47 47 32 47 44
Move End To Middle 63 47 31 47 47 47
Move Middle to Begin 47 47 31 46 47 43.6
Move Middle to End 32 46 31 63 47 43.8
Move Middle to Middle 63 63 31 31 47 47
Total time per run 424 360 249 281 329 328.6
17
Table 4.2. Time Results Atomic Component Algorithm Run First
Primitive Operations Tests Run 1 Run 2 Run 3 Run 4 Run 5 Average
Move Beginning To End 94 47 47 47 31 53.2
Move Beginning To Middle 94 62 47 47 31 56.2
Move End to Beginning 62 47 47 47 31 46.8
Move End To Middle 47 47 78 47 31 50
Move Middle to Begin 47 31 47 47 47 43.8
Move Middle to End 47 63 47 47 31 47
Move Middle to Middle 62 47 46 47 31 46.6
Total time per run 453 344 359 329 233 343.6
Atomic Component Operation
Tests
Run 1 Run 2 Run 3 Run 4 Run 5 Average
Move Beginning To End 110 47 78 31 46 62.4
Move Beginning To Middle 109 63 47 31 47 59.4
Move End to Beginning 78 62 62 47 32 56.2
Move End To Middle 93 63 31 47 31 53
Move Middle to Begin 94 204 31 47 31 81.4
Move Middle to End 62 31 47 62 31 46.6
Move Middle to Middle 47 47 47 47 31 43.8
Total time per run 593 517 343 312 249 402.8
Table 4.3. Total Time Results All Primitive Operation Tests
Summary Time
Primitive Operations Tests run first 343.6
Primitive Operations Tests run second 419.6
Total run time all primitive operations tests 763.2
Average run time all primitive operations tests 381.6
18
Table 4.4 shows a summary of the atomic component operations test results. This
table shows the total combined time of all the atomic component operations when this set
was run first, when it was run second as well as the total combined time of both runs and the
average time of the tests.
Table 4.4. Total Time Results All Atomic Component Operation Tests
Summary Time
Atomic Component Operations Tests run first 402.8
Atomic Component Operations Tests run second 328.6
Total run time all atomic component operations tests 731.4
Average run time all atomic component operations tests 365.7
Figure 4.1 shows a comparison of running time for each test run in the first test batch
broken out by test where the primitive test operations were run first. The atomic component
operations slightly outperformed the primitive operations, except in the Move End to Middle
test where the atomic component operation noticeably outperformed the primitive operation.
Figure 4.1. Time by test - primitives run first.
Figure 4.2 shows a comparison of running time for each test run in the second test
batch broken out by test where the atomic component test operations were run first. The
primitive operations slightly outperformed the atomic component operations, except in the
0102030405060708090100
Component
Primitive
19
Figure 4.2. Time by test – atomic component run first.
Move Middle to Begin test where the primitive operation noticeably outperformed the atomic
component operation.
Figure 4.3 shows a comparison of the total running time for each test in both test
batches broken out by test. The Atomic Component operations ran faster than the Primitive
operations for Move Beginning to Middle, Move End to Beginning, Move End to Middle,
Move Middle to End, and Move Middle to Middle.
For some tests, the difference in running times were small; such as the Move End to
Beginning test, but there was a much greater difference for the Move End to Middle test.
The Primitive Operations outperformed the Atomic Component Operations in the Move
Beginning to End and the Move Middle to Begin tests. The difference in performance for the
Move Beginning to End was slight, while the different for the Move Middle to Begin is more
noticeable.
Figure 4.4 shows a comparison of running time for each test run in the first test batch
broken out by test run where the primitive test operations were run first. The combined tests
for the atomic component operation outperformed the primitive operations for all runs except
for Run 5.
Figure 4.5 shows a comparison of running time for each test run in the first test batch
broken out by test run where the atomic component test operations were run first. The
0102030405060708090
Component
Primitive
20
Figure 4.3. Total time of all tests
Figure 4.4. Time by run number primitive operation tests run first.
0
20
40
60
80
100
120
140
160
Component
Primitive
0
100
200
300
400
500
600
700
Run 1 Run 2 Run 3 Run 4 Run 5
Component
Primitive
21
Figure 4.5. Time by run number atomic component operation tests run first.
combined tests for the primitive operations outperformed the atomic component operations in
Run 1, Run 2, and Run 5, but not for Run 3 and Run 4.
Figure 4.6 shows a comparison of the combined running time for all tests run in the
both test batches broken out by test run. The combined tests for the atomic component
operation outperformed the primitive operations for all runs except for Run 5.
Figure 4.7 shows a comparison of the combined running time for all tests across all
runs in the both test batches. The combined tests for the atomic component operation
outperformed the primitive operations.
0
100
200
300
400
500
600
700
Run 1 Run 2 Run 3 Run 4 Run 5
Component
Primitive
22
Figure 4.6. Combined time by run number.
Figure 4.7. Total combined test execution time.
0
200
400
600
800
1000
1200
Run 1 Run 2 Run 3 Run 4 Run 5
Component
Primitive
3550 3600 3650 3700 3750 3800 3850
Primitive
Component
Total Time
Primitive
Component
23
CHAPTER 5
DISCUSSION
5.1 AVERAGE RUN TIME COMPARISON
Table 4.1 shows the average time per run per test for batch 1 of the test, in which the
Primitive Operations tests ran first and the atomic component operations tests ran second.
Table 4.2 shows the average time per run per test for batch 2 of the test, in which the Atomic
Component Operations tests ran first and the Primitive operations tests ran second.
From these two tables you can see that the order in which the tests were run had an
impact on the time it took for the tests to complete. The Primitive Operations suite of tests
took an average of 419.6 ms to run when run first versus 343.6 ms when run second. The
Atomic Component Operations test took an average of 402.8 ms when run first and an
average of 328.6 ms when run second.
The individual by test average was generally also affected by the order in which the
test batch was run. All of the primitive tests ran faster when they were run as part of the
second batch of tests, except for Move Middle To End which performed on average the
same. The entire atomic component operations suite of tests performed faster when run as
part of the second batch.
The time efficiency difference seen based on the order in which the tests were run
likely was affected by issues outside the control of these tests. There are specific issues
related to the JVM that are outside the control of this code, such as garbage collection and
object finalization. In addition, there are other issues that could contribute to an individual
run of a test showing a greater slowdown such as other processes running on the computer at
the same time.
I attempted to minimize the influence of these outside variables as much as possible
by running several iterations of the tests prior to capturing results, minimize any other
automatic processes running on the machine, as well as calculating the total average time
over both test batches.
Table 4.3 and Table 4.4 show the total average time across all runs of both batches of
tests. You can see from this data that the Atomic Component Operations tests ran faster than
24
the primitive operations tests. Keeping in mind that both sets of tests performed the same
amount of work, this shows that the atomic component operations perform slightly faster
than the primitive operations.
5.2 DEVELOPER TIME
It is harder to quantify the amount of work required to use the operations themselves.
This is a measurement of the work done by the development team using the algorithms.
In order to setup the tests for this thesis, the average amount of code to setup the testing to
model a move operation using only Insert/Delete was much greater the average number to
setup the tests to run the atomic component move operation.
Code listing 1 (Figure 3.1) shows the pseudo code for a sample move test using only
primitive operations and Code listing 2 (Figure 3.2) shows the pseudo code for that same
move test using the atomic component move operation. You can see from Code listing 1 that
the number of lines of code is almost double what is shown in Code listing 2, primarily
because I was forced to handle the data structures to store the text and its parts that needed to
be moved. In contrast, all of the complexity of the move is handled behind the scenes for the
developer when using the atomic component move operation.
The amount of time I took setting up the tests for modeling different moves using just
Insert and Delete took at least double the amount of time it took to setup the tests using the
Move operation. This was a small set of tests, simply moving text around. It would be
impractical to develop an application that could turn a single user application into a RTCE
application using only Insert and Delete.
25
CHAPTER 6
CONCLUSION
I conducted an efficiency comparison of primitive algorithms versus atomic
component algorithms in the context of a RTCE using Operational Transform. Although
virtually any operation that would need to be performed in a RTCE could be modeled just
using primitive Insert and Delete operations, it is more efficient in terms of running time of
the algorithm, as well as from the perspective of the developer building an application using
such an API, to use the atomic component operations.
The tests I ran ensured that each test in the atomic component operation did the same
amount of work as was done by the primitive operations. In other words, I made sure the
amount of text to move, the distance they needed to move the text, and the work of finding
the indices was the same. I ran the tests in two batches, the first batch ran the primitive
operations tests first and the second batch ran the atomic component operations test first.
Each batch consisted of 5 runs of 1000 executions of each test.
The results show that the atomic component operations are more efficient in run time
compared with the primitive operations after you average the total time across all tests, and
across all runs of the tests. Additionally, based on the greater amount of code and the longer
amount of time it took to setup and use the primitive operations compared with the atomic
component operations to perform the same function (i.e., move a paragraph from one part of
a document to another), the atomic component operations outperformed the primitives in this
area as well.
There are several areas for future work in this domain. Another atomic component
operation could be developed and then compared for efficiency against something modeled
with Insert and Delete primitive operations. Further work could be explored on the difference
in effort from the API side of development in how much more time it takes to handle the
transformations when dealing with atomic component operations versus the primitive
operations. Also, the same algorithms used in this paper could be compared for efficiency
with complex custom objects instead of strings.
26
Any operation needed by an RTCE could be modeled using the primitive operations
of Insert and Delete. However, the atomic component operation performed better in terms in
terms of efficiency of running time, as well as in terms of the amount of code required to use
those operations and the time in terms of developer cost. When building an API for an
RTCE, it would be better to implement some higher level API calls rather than require
consumers of your API to rely solely on the primitive operations.
27
REFERENCES
[1] D. Wang, A. Mah, and S. Lassen. Google wave operational transformation, 2010. http://wave-protocol.googlecode.com/hg/whitepapers/operational-transform/operational-transform.html, accessed Feb. 2013.
[2] C. Sun, S. Xia, D. Sun, D. Chen. H. F. Shen and W. Cai. Transparent adaptation of single-user applications for multi-user real-time collaboration. ACM Transactions on Computer-Human Interaction, 13(4): 531–582, 2006.
[3] C. A. Ellis and C. Sun. Operational transformation in real-time group editors: Issues, algorithms, and achievements. Proceedings of the 1998 ACM Conference on Computer Supported Cooperative Work, Seattle, 1998. ACM.
[4] S. Xia, D. Sun, C. Sun, D. Chen and H. Shen. Leveraging single-user applications for multi-user collaboration: The coword approach. Proceedings of the 2004 ACM Conference on Computer Supported Cooperative Work, Chicago, 2004. ACM.
[5] L. Kaewkitipong. Diffusion of an Online Collaboration Tool: The case of google wave adoption failure. Proceedings of the System Science (HICSS), 2012 45th Hawaii International Conference on System Sciences, Maui, 2012. IEEE.
[6] D. A. Nichols, P. Curtis, M. Dixon, and J. Lamping. High-latency, low-bandwidth windowing in the Jupiter collaboration system. Proceedings of the 8th Annual ACM Symposium on User Interface and Software Technology, Pittsburgh, 1995. ACM.
[7] C. A. Ellis, and S. J. Gibbs. Concurrency control in groupware systems. Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data (SIGMOD '89), New York, 1989. ACM.
[8] C. Ignat and M. C. Norrie. Tree-based model algorithm for maintaining consistency in real-time collaborative editing systems. Proceeding of the 4th International Workshop on Collaborative Editing, New Orleans, 2002. CSCW.
[9] Q. Wu, C. Pu, and J. E. Ferreira. A partial persistent data structure to support consistency in real-time collaborative editing. Proceeding of Data Engineering (ICDE), 2010 IEEE 26th International Conference, Long Beach, 2010. IEEE.
[10] G. Oster, P. Molli, P. Urso, and A. Imine. Tombstone transformation functions for ensuring consistency in collaborative editing systems. Proceedings of the 2006 International Conference on Collaborative Computing: Networking, Applications and Worksharing, Atlanta, 2006. IEEE.
[11] C. Sun, & D. Chen. Consistency maintenance in real-time collaborative graphics editing systems. ACM Transactions on Computer-Human Interaction (TOCHI), 9: 1-41, 2002.
28
[12] M. Ressel, D. Nitsche-Ruhland, and R. Gunzenhäuser. An integrating, transformation-oriented approach to concurrency control and undo in group editors. Proceedings of the 1996 ACM conference on Computer Supported Cooperative Work, Boston, 1996. ACM.
[13] Y. Cheng, F. He, S. Jing, and Z. Huang. An multiuser undo/redo method for replicated collaborative modeling systems. Proceedings of the 13th International Conference on Computer Supported Cooperative Work in Design, Santiago, 2009. IEEE.
[14] L. Xue, M. Orgun, and K. Zhang. A multi-versioning algorithm for intention preservation in distributed real-time group editors. Proceedings of the 26th Australasian Computer Science Conference, Adelaide, 2003. Australian Computer Society, Inc.
[15] C. Sun, and D. Chen. A multi-version approach to conflict resolution in distributed groupware systems. Proceedings of the 20th International Conference of Distributed Computing Systems, Taipei, 2000. IEEE.
[16] Wikipedia. Operational transformation, 2013. http://en.wikipedia.org/wiki/Operational_transformation, accessed Mar. 20, 2013
[17] D. Li, L. Zhou, R. Muntz, and C. Sun. Operation propagation in real-time group editors. Multimedia, IEEE, 7(4): 55-61, 2000.
[18] Robert W. Sebesta. Language evaluation criteria concepts of programming languages, 9th ed, pages 7-17. Addison-Wesley Publishing Co., Reading, Mass., 2009.
[19] S. G. McLellan, A. W. Roesler, J. T. Tempest, and C. I. Spinuzzi. Building more usable APIs. Software, IEEE, 15: 78-86, 1998.
[20] B. E. Teasley. The effects of naming style and expertise on program comprehension. International Journal of Human-Computer Studies, 40: 757-770, 1994.
[21] D. Li, and R. Li, An admissibility-based operational transformation framework for collaborative editing systems. Computer Supported Cooperative Work (CSCW), 19: 1-43, 2010.
[22] Merriam-Webster Online. Efficiency [Def. 2], 2013. http://www.merriam-webster.com/dictionary/efficiency, accessed Mar. 20, 2013.