Borland® JDataStore™ high availability
Fail-safe with increased scalability
A Borland White Paper
May 2005
Borland® JDataStore™ high availability
Contents
Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Architectural overview ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Mirror types .......................................................................................................................... 4 Leveraged solution................................................................................................................ 5 Synchronization .................................................................................................................... 5 Failover ................................................................................................................................. 6
The webbench benchmark .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Platform configuration .......................................................................................................... 8 Webbench configuration ....................................................................................................... 9
Installing JDataStore 7.03 ................................................................................................................ 9 Configuring the benchmark............................................................................................................ 10 Creating and loading the database .................................................................................................. 13 Running the benchmark.................................................................................................................. 13 Creating the mirrors........................................................................................................................ 14 Configuring the benchmark with mirroring.................................................................................... 19
Benchmark results............................................................................................................... 20
Scaling higher .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Mirror status monitoring .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Mirror performance monitoring .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Summary.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2
Borland® JDataStore™ high availability
Introduction
Borland® JDataStore™ 7 introduced a new high availability feature (JDataStore HA) that
provides support for incremental backup, manual/automatic failover, and increased
scalability.
This document provides a wealth of information about the architecture of JDataStore HA and
how to configure and monitor a JDataStore HA system using ordinary, low-cost components.
The webbench sample database is used to show how to configure a JDataStore high-
availability system. In order to show the fault tolerance and scalability benefits of the
JDataStore HA solution, the webbench sample is executed with and without JDataStore HA
enabled.
Many performance and reliability improvements were made to JDataStore HA in the
JDataStore 7.03 maintenance release. The discussion and performance results of this white
paper are based on the usage of JDataStore version 7.03.
The key benefits of JDataStore HA covered in this document include:
• Ease of configuration. This configuration can be created with the execution of four SQL
statements.
• Ease of monitoring. The system can be monitored and tuned with relatively simple tools
provided with the JDataStore product and the operating systems used.
• Fault tolerance. Either manual or automatic failover can be employed.
• Scalability. The simple example in this paper shows that the transaction throughput can
be increased manifold by moving from a one-server system to a three-server system.
• Low-cost solution. Ordinary, low-cost hardware and software components are used to
build a system that is extremely fault tolerant while also providing a manifold increase in
transaction throughput.
3
Borland® JDataStore™ high availability
Architectural overview
One of the most important areas of concern for any database application is eliminating single
points of failure. The High Availability Server uses database mirroring to ensure database
access in the face of either software or hardware failure. A secondary benefit of mirroring is
increased scalability for read-only transactions.
Mirror types
There are three mirror types that can be used by an application are primary, read-only, and
directory.
• The primary mirror is the only mirror type that can accept both read and write
transactions. Only one primary mirror at a time is allowed.
• There can be any number of read-only mirrors. Connections to these databases can
perform only read transactions. Read-only mirrors always provide a transactionally
consistent view of the primary mirror database. However, a read-only mirror database
might not reflect the most recent write transactions made against the primary mirror
database. Read-only mirrors can be synchronized with changes to the primary mirror
instantly, on a scheduled basis, or manually. Instant synchronization is required for
automatic failover. Scheduled and manual synchronization can be used for incremental
synchronization or backup.
• A directory mirror mirrors only the mirror configuration table and the other system tables
needed for security definition. They do not mirror the actual application tables in the
primary mirror. There can be any number of directory mirrors. The storage requirements
for a directory mirror database are very small, because they contain only the mirror table
and system security tables. Directory mirrors redirect read-only connection requests to
read-only mirrors. Writable connection requests are redirected to the primary mirror.
Another important benefit for Directory mirrors is that they provide load balancing for
read-only connection requests across all of the available read-only mirrors.
4
Borland® JDataStore™ high availability
Leveraged solution
JDataStore HA technology heavily leverages existing subsystems. This contributes
significantly to the simplicity, reliability, and performance of the JDataStore HA solution.
(Competitive solutions often have completely separate database storage engines and
connectivity solutions for their HA product offerings.)
• The JDataStore database kernel uses the same log files used for transaction rollback and
crash recovery to incrementally update read-only mirror images of a database.
• The existing support in JDataStore for read-only transactions provides a transactionally
consistent view of the mirrored data while the mirror is being synchronized with contents
of more recent write transactions from the primary mirror.
• JDataStore also uses the same TCP/IP database connections used for general database
access to synchronize mirrored databases.
Synchronization
There are two steps to synchronizing a read-only mirror:
• Synchronizing the log files. This is the most important step of synchronization. The log
files contain every change made to a database. Once all of a transaction’s log records
have been transmitted to a read-only mirror, that transaction has been made durable for
both the primary and read-only mirror.
• Replaying the log files against the read-only mirror database. The primary benefit of this
action is to bring the transactional state of the read-only mirror to a more recent state
relative to the primary mirror. A secondary benefit is to allow log files to be dropped.
Once a log file has been replayed against all read-only mirrors, it can be dropped to free
disk space.
5
Borland® JDataStore™ high availability
There are configuration settings that can be used to automatically synchronize log files with
read-only mirrors and replay the synchronized log records against the read-only mirror’s
database.
• INSTANT_SYNCH mirror property. The primary mirror will not return from a commit
operation until a majority of read-only mirrors with INSTANT_SYNCH set to true
confirm that they have made all log records for the committed transaction durable.
• AUTO_FAIL_OVER mirror property. This is a superset of INSTANT_SYNCH. If true,
automatic failover can be initiated when the primary mirror fails.
• Synch mirror operation. This can be performed on request, or it can be scheduled as a
periodic operation. If the mirror has INSTANT_SYNCH or AUTO_FAIL_OVER set,
then this operation incrementally replays log records received since the last replay
operation. If INSTANT_SYNCH or AUTO_FAIL_OVER are not set, then the synch
operation will first need to incrementally copy over any of the more recent log records it
does not already have before replaying log records against the database.
Synchronizing directory mirrors is simpler. The directory mirrors only a small set of system
tables, not the application tables. INSTANT_SYNCH and AUTO_FAIL_OVER settings are
not allowed for directory mirrors.
Failover
Both manual and automatic failover are supported.
Two actions can trigger an automatic failover after the primary mirror fails. If a connection to
the primary was made to the primary before it failed, this connection can trigger an automatic
failover by calling the rollback method on the connection object. Using rollback to trigger the
failover operation is identical to how online transaction processing (OLTP) applications deal
with other failures such as lock manager deadlocks and timeouts.
The second action that can trigger a failover is connecting to a directory mirror. If the
connection request is not for a read-only connection, and the current primary is not accessible,
6
Borland® JDataStore™ high availability
the directory mirror will automatically trigger the failover operation to satisfy the request for a
writable connection.
Note that for any automatic failover request to succeed, a majority of AUTO_FAIL_OVER
mirrors must be accessible by the new primary candidate, and they must all agree that the old
primary mirror is no longer accessible. This prevents write transactions from being performed
against two primary mirrors, because a majority of AUTO_FAIL_OVER mirrors is required
for failover and transaction commit operations.
Unlike automatic failover, manual failover is performed only on request. Any read-only
mirror can become the primary mirror. This is useful when the computer the primary server is
executing on needs to be taken offline for system maintenance.
The webbench benchmark
The webbench benchmark is delivered as a JDataStore sample in the <JdataStore Install
directory>/samples/JdataStore/WebBench/src/com/borland/webbench subdirectory. There is a
JBuilder project in this directory that can be used to build and launch the sample.
The webbench benchmark is ideal for testing the capabilities of JDataStore HA. It uses a
database that has a prototypical order entry schema. The database schema and generated
contents are very similar to the industry-standard TPC-C database. The size of the database
created during the load operation is configurable in the Options dialog by specifying the Load
multiplier value. The default is 10, which generates a database with a size of about 92 MB. A
multiplier of 45 should generate a 1 GB database. Two transaction types can be run against
the database: ReadWrite and ReadOnly. The ReadWrite transaction is very similar to the
TPC-C new order transaction. Each transaction is comprised of about 50 select, insert, and
update SQL statements. The ReadOnly transaction is just a modified version of the ReadWrite
transaction that performs only select operations. There are two options for running these
transactions:
7
Borland® JDataStore™ high availability
• Number of threads. Each thread uses a separate connection.
• Duration in minutes. Each thread repetitively executes the specified transaction for this
duration.
The multiplier option allows for very large database testing. The ReadWrite transactions
allow for testing the performance impact on write transactions when mirroring is enabled. The
ReadOnly transactions can be used to show the scalability benefits of read-only mirrors as
they offload the primary mirror for read-only query requests.
Platform configuration
For all tests run, one computer (the bench4 host) will execute the benchmark against a
JDataStore Server running on another computer (bench1 host). When the benchmark is run
with mirroring enabled, the JDataStore Server on bench1 will be mirrored by two read-only
mirrors running on bench2 and bench3. To optimize network throughput, all four computers
are connected to the same 100 megabit network switch. This minimizes the impact of network
traffic external to the switch.
Ideally, the primary mirror and the two read-only mirrors would have identical performance
characteristics. If the primary mirror goes down, you want the same performance throughput
when one of the read-only mirrors becomes the primary after a failover. The two read-only
mirrors are identical. However, the primary mirror has a faster clock rate (1200 MHz vs. 930
MHz for the read-only mirrors).
All four computers have JDataStore version 7.03 installed.
8
Borland® JDataStore™ high availability
Following is a summary of the configuration:
Purpose CPU RAM OS NIC
Primary mirror, hostname=bench1
1200 MHz Intel® Pentium® III
500 MB RedHat® Linux® release 2.4.21-27.0.2.EL #1
3Com® 3c905-tx 100 megabit
Read-only mirror, hostname=bench2
930 MHz Pentium III
500 MB RedHat Linux release 2.4.21-20.0.1.EL #1
3Com 3c905-tx 100 megabit
Read-only mirror, hostname=bench3
930 MHz Pentium III
500 MB RadHat Linux release 2.4.21-27.0.2.EL #1
3Com 3c905-tx 100 megabit
Execute benchmark, hostname=bench4
1700 mhz Pentium M
2 GB Microsoft® Windows® XP pro version 2002
Broadcom® 570x gigabit
Webbench configuration
Installing JDataStore 7.03 JDataStore 7.03 is installed on all servers used for the benchmark.
Because JBuilder is being used to run the webbench benchmark, it also must be updated to
use JDataStore 7.03. One of the following approaches can be used to make sure JDataStore
7.03 is used for the benchmark:
1) Copy the beandt.jar, dbtools.jar, dx.jar, jds.jar, jdsremote.jar, and jdsserver.jar to the
Jbuilder lib directory
2) Create a Jbuilder library that references the JDataStore 7.03 jars and include it in the
webbench project.
9
Borland® JDataStore™ high availability
Before any tests can be run, all four servers must be launched on bench1, bench2, bench3, and
bench4 computers. If the JDataStore server is installed to run as a service on all the different
computers, it is already running. Otherwise, JdsServer must be launched on each machine
from the JdataStore bin directory (execute \JdataStore7\bin\JdsServer).
The timings provided for benchmark runs that use JDataStore HA use a separate computer for
each mirror. A JDataStore server is executing on each computer using the default port setting
of 2508. One mirror per computer maximizes fault tolerance and transaction throughput.
However, this same test can be performed with a single computer by launching multiple
JdsServers with different port assignments. JdsServer in the bin directory can be instructed to
use a different port by specifying the port command line option. For example, four servers
could be launched on the same computer with the following commands:
\JdataStore7\bin\JdsServer –port=2511
\JdataStore7\bin\JdsServer –port=2512
\JdataStore7\bin\JdsServer –port=2513
\JdataStore7\bin\JdsServer –port=2514
Configuring the benchmark After all four servers have been launched, the webbench benchmark needs to be configured.
Select the Bench|Options… menu option. You should see a dialog box like this:
1 0
Borland® JDataStore™ high availability
Figure 1: Configuring the benchmark: Example
This benchmark is already has been configured to use the bench1 server. The initial database
is created as “webbenchtest” in the “/tmp/webbench” directory. It is a good practice to locate
your database in a separate directory. To improve performance, log files can be located on a
different disk drive. However, for this test, the database and its log files are in the same
directory.
The minimum cache size has been set to a value larger than the default. This is not a required
change but does increase the performance of the benchmark. All three mirrors have 512 MB
of memory and are dedicated database servers. It makes sense to take advantage of the
available memory for the database cache. Note that if you reserve too much memory for the
cache, performance can degrade. The Linux operating system manages memory allocations
between resident applications and its own disk cache. If the database server process becomes
too large, the Linux operating system might start paging out process memory, causing
performance to degrade. The Linux vmstat command displays blocks that are swapped in and
out. This is a simple way to check if the database server process size is too large. The default
1 1
Borland® JDataStore™ high availability
minimum cache size is 512 MB, which results in a 2 MB cache (512*4 KB block size =
2 MB). The minimum cache size has been set to 32768 for this benchmark, which results in a
128 MB cache. To have such a large cache, the maximum heap size for the JVM also must be
increased. This can be accomplished by changing the following two lines in <JDataStore
Install directory>/bin/JdsServer.config from:
vmparam -Xms128m
vmparam -Xmx128m
To:
vmparam –Xms256m
vmparam –Xmx256m
The load multiplier has been increased from a default of 10 to 45. As mentioned earlier, a load
factor of 45 produces a database of about 1 GB. Larger databases take longer to load, so if
you want to try things out, use the default load size of 10.
Use a remote connection Because multiple computers are being used for this test, a remote connection must be used. Local
connections perform faster because they allow the application and database server to execute in
the same process. Use remote connections to connect to mirrored databases.
Remote database connections typically should be used for systems configured for failover. You can
still use local connections (in process), but there are significant benefits to using a remote
connection instead:
1) If the primary mirror fails, applications executing on separate computers can fail over to
another mirror.
2) Directory mirrors can load balance read-only connection requests across multiple read-only
mirrors.
1 2
Borland® JDataStore™ high availability
Creating and loading the database After configuring the initial database connection, the next step is to create the database. Do
this by selecting the Bench | Create Database menu. Once the database is created, it needs to
be populated with some data. Do this by selecting the Bench | Load Data menu.
Running the benchmark Two benchmark runs are made against both mirrored and unmirrored configurations. The first
run executes only read-write transactions. The second run executes both read-write and read-
only transactions.
Both benchmark configuration runs start in a similar state. The 1 GB database is saved off
after the initial load. Before the benchmark runs for each configuration, a fresh copy of this
saved-off database is restored. This way, both benchmark configurations start off with the
same database size and state. The servers and benchmark test also are restarted for both
configurations.
To run the read-write transactions, select the Bench | Run menu. This launches the following
dialog box:
Figure 2: Running the benchmark: Example
This run puts 16 threads in a loop for five minutes each executing ReadWrite transactions.
Each thread uses a separate database connection.
1 3
Borland® JDataStore™ high availability
The second benchmark run adds four more threads running ReadOnly transactions:
Figure 3: Running the benchmark: Example
Creating the mirrors After the nonmirror benchmarks have been run, a fresh copy of the preloaded database is
restored, and then the database is configured for mirroring so that the same benchmark runs
can be run against a mirrored configuration. Mirrors can be created using the ServerConsole
GUI, isql SQL scripts, or programmatically using JDBC.™ The steps to configure the mirrors
with the ServerConsole are presented first, followed by a four-line SQL script that can be
used to complete the same mirror configuration task.
To start, a data source must be defined for the server running on bench1. In the screen shot
below, this data source is called webbench1.
1 4
Borland® JDataStore™ high availability
Figure 4: Webbench data souce view: Example
After the data source is created, it can be connected to by right-clicking on the webbench1
data source and selecting the connect menu option.
After connecting, click on the mirror node in structure pane and select the Add Mirror menu.
1 5
Borland® JDataStore™ high availability
Figure 5: Webbench mirror configuration: Example
You can see that the first mirror created is the primary. The Host, Database File name, and
Auto failover properties are set. All other properties are left with their default values. Note
that after these changes are saved, the Instant Synchronization property is always forced to
true if Auto failover is set to true. Instant Synchronization causes changes to the primary to be
sent to any read-only mirror that also has Instant Synchronization set before the transaction
commits. This is important behavior for Auto failover, because you do not want to lose any
committed transactions if the primary encounters a permanent failure.
1 6
Borland® JDataStore™ high availability
Some applications prefer to control failover manually but still need transactions to be
persistent across all read-only mirrors. These applications can set Instant Synchronization
without setting Auto failover.
Although the Host property defaults to localhost, make sure to set the Host property to the
host name of the computer the server is running on. The reason for doing so is that the mirrors
need to be able to attach to each other. The only time localhost should be used is when all
mirrors are on the same computer. This can make sense for test-case scenarios but usually
does not make sense for deployment scenarios.
Editing properties in the Server console 1) Grid views in ServerConsole are typically not editable. To edit an item in the ServerConsole,
you must select it in the structure pane. This will bring up the properties for that item in the
property inspector.
2) Connection properties cannot be edited while connected. You must disconnect to edit these
properties.
3) Edit operations are “cached”. They are not applied to the underlying database(s) until the save
changes button is pressed.
Two more read-only mirrors with Auto failover set to true and one directory mirror are then
created. When this mirror configuration is saved for a database loaded with a load multiplier
of 45, it took 200 seconds to complete the creation of the mirrors. The database and log files
are well over 1 GB. They have to be copied to two read-only mirrors. The operation was
bound by network bandwidth. Roughly 11 MB per second were transmitted over a 100
megabit line.
After these four mirrors are created in the ServerConsole, the ServerConsole isql content pane
or isql command can be executed to generate the SQL statements needed to create these
mirrors. This is achieved by executing the “show ddl” command. At the end of the output
from the show ddl command the following statements are generated for the mirrors I created:
1 7
Borland® JDataStore™ high availability
CALL
DB_ADMIN.CREATE_MIRROR('NAME=Mirror1,TYPE=PRIMARY,HOST=bench1,
PORT=2508,FILE_NAME=/tmp/webbench/webbenchtest.jds,AUTO_FAIL_O
VER=true,FAIL_OVER_PRIORITY=1,INSTANT_SYNCH=true,LAST_KNOWN_RE
PLAY=8589934601');
CALL
DB_ADMIN.CREATE_MIRROR('NAME=Mirror2,TYPE=READONLY,HOST=bench2
,PORT=2508,FILE_NAME=/tmp/webbench/webbenchtest_Mirror2,AUTO_F
AIL_OVER=true,FAIL_OVER_PRIORITY=1,INSTANT_SYNCH=true,LAST_KNO
WN_REPLAY=8589934601');
CALL
DB_ADMIN.CREATE_MIRROR('NAME=Mirror3,TYPE=READONLY,HOST=bench3
,PORT=2508,FILE_NAME=/tmp/webbench/webbenchtest_Mirror3,AUTO_F
AIL_OVER=true,FAIL_OVER_PRIORITY=1,INSTANT_SYNCH=true,LAST_KNO
WN_REPLAY=8589934601');
CALL
DB_ADMIN.CREATE_MIRROR('NAME=Mirror4,TYPE=DIRECTORY,HOST=bench
4,PORT=2508,FILE_NAME=/test/webbench/webbenchtest_Mirror4,FAIL
_OVER_PRIORITY=1,LAST_KNOWN_REPLAY=9');
CALL
DB_ADMIN.CREATE_MIRROR_SCHEDULE('Mirror1','ID=0,REF=Mirror1,PE
RIOD=MILLIS,DAY=1,TIME=21:00:00,MILLIS=10000');
CALL
DB_ADMIN.CREATE_MIRROR_SCHEDULE('Mirror2','ID=1,REF=Mirror2,PE
RIOD=MILLIS,DAY=1,TIME=21:00:00,MILLIS=10000');
CALL
DB_ADMIN.CREATE_MIRROR_SCHEDULE('Mirror3','ID=2,REF=Mirror3,PE
RIOD=MILLIS,DAY=1,TIME=21:00:00,MILLIS=10000');
1 8
Borland® JDataStore™ high availability
The last three statements are interesting in that I did not create any mirror schedules for any of
the mirrors in ServerConsole. These were created automatically for all of the mirrors that have
the AUTO_FAIL_OVER or INSTANT_SYNCH property set to true. The mirror schedules
can be dropped or altered if necessary. When INSTANT_SYNCH is set, the log records for
all committed transactions are guaranteed to be durable across a majority of read-only
mirrors. However, these log records must be replayed against the read-only mirror before you
will able to see the changes when querying the read-only mirror’s database. You can manually
cause log records to be replayed by right-clicking on the mirror in ServerConsole and
selecting the Synch Mirror menu item. Note that for a mirror without INSTANT_SYNCH set
to true, this action would first bring the read-only mirror’s log files up to date with the
primary mirror and then play the log records against the read-only mirror. For
INSTANT_SYNCH mirrors, the log records are already there. They just need to be replayed
against the read-only mirror.
Mirror schedules will automatically synchronize the read-only mirrors using the time interval
you specify. For INSTANT_SYNCH mirrors, the default replay operation is scheduled for
every 10 seconds. Replaying log records against a read-only mirror is quite fast, even when
the primary database is heavily updated. Running a performance monitor on one of the read-
only mirrors used in this white paper shows that each replay takes only about one second. So
about 90% of the time, the read-only mirror is using little or none of the CPU bandwidth.
Most of the time it is just receiving sequential I/O from the primary for log records of
committed transactions.
Configuring the benchmark with mirroring After configuring all four mirrors, the webbench options must be slightly modified. The
options dialog box below shows that the server has been changed to bench4, which is where
both the webbench benchmark process and the directory mirror are located. The database to
which to connect is now the directory mirror. This causes read/write connections to be
redirected to the primary mirror on bench1. Read-only connections will be redirected to either
bench2 or bench3, which is where the read-only mirrors are located. As mentioned, the
directory mirror will use a round-robin approach to distributing read-only connection requests
across the two read-only mirrors. This provides a simple load-balancing mechanism.
1 9
Borland® JDataStore™ high availability
Figure 6: Configuring the benchmark: Example
Benchmark results
Below is the output from running the benchmark with two read-only mirrors and one
directory:
Mirrored ReadWrite Threads
ReadOnly Threads
ReadWrite Tx per second
ReadOnly Tx per second
No 16 0 69.66
No 16 4 47.26 21.73
Yes 16 0 48.96
Yes 16 4 47.12 140.42
2 0
Borland® JDataStore™ high availability
Write transaction throughput is about 30% slower when mirrors are used. However, this is
still a very good transaction throughput rate for a system that keeps three database images in
synch. The mirrored configuration’s 48.96 transactions per second sustained for a single day
results in more than 4.2 million orders per day. This is a very healthy business to operate
using such a modest hardware investment.
As read-only transactions are added to the mix, the transaction throughput of the mirrored
configuration far exceeds the nonmirrored configuration. This is because the read-only
connections made to the directory mirror are automatically redirected to the read-only mirrors.
The primary mirror receives only ReadWrite transactions, and the read-only mirrors receive
all of the read-only transaction requests. The result is that the write transaction throughput is
about the same for both the mirrored and nonmirrored configuration. However, the mirrored
configuration completes almost seven times as many read-only transactions as the non-
mirrored configuration. This is the increased scalability that JDataStore HA provides. Most
applications generate far more read transactions than read/write transactions, so this is a
significant win.
It is also possible to narrow the 30% gap between mirrored and unmirrored executions of the
ReadWrite transaction. The primary reason for this gap is that when log records are replayed
on the read-only mirrors, many write operations are applied to the database file itself. With
such a heavy transaction load, this makes for many disk writes to the database file. The
problem is that to commit transactions from the primary mirror, the read-only mirrors also
must constantly commit log records to the database log files. So there is much file write
competition between the database file and the database log files. A very simple solution is to
have a disk drive for the database file and a separate disk drive for the log files. A raid-
stripping configuration is a more expensive alternative.
Another approach to narrowing this gap is to configure the database to use “soft commit”.
When the benchmark was run with soft commit, the gap narrowed from 30% to 10%. Soft
commit still guarantees that no database blocks will be written to disk before the log records
for the changes to these blocks have been durably written to disk. However, soft commit does
not make this guarantee for transaction commit operations. For transaction commit operations,
the soft commit guarantees only that the committed transactions have been written into the
2 1
Borland® JDataStore™ high availability
operating system cache. However, these writes are not guaranteed to be durable. The net
effect of soft commit is that:
1) The database will not be corrupt, because log changes are made durable before database
blocks are written.
2) All transactions will be durable as long as the operating system does not fail.
3) The most recent committed transactions might not be durable if the operating system
fails.
Soft commit might be an acceptable solution for mirrored configurations that have
AUTO_FAILOVER or INSTANT_SYNCH set to true, because committed log records are
written to all mirrors before a commit operation completes.
Scaling higher
As mentioned, the mirrored configuration can service seven times as many Read transactions
and about the same number for ReadWrite transactions when Read and ReadWrite
transactions are executed together. It is not uncommon for 70% to 90% of all transactions to
be read-only. So it makes sense that scalability could be improved by using more than just
two read-only mirrors.
Improving the hardware also could help. Some suggestions:
• If the primary mirror is CPU bound, faster CPUs, increased RAM, and SMP
configurations will help. If you add RAM to improve performance, you can increase the
JdataStore minCacheSize property. If you set the minCacheSize property, you might need
to increase the JVM heap settings, as discussed earlier. Keep in mind that if failover is
important, you will want all mirrors configured for auto failover to have similar
configurations. Otherwise, your transaction throughput will change after a failover.
2 2
Borland® JDataStore™ high availability
• In this configuration, the maximum bytes per second from the benchmark application
computer was between 1 MB and 2 MB per second. During bulk data transfers using
FTP, or when large mirrors for large databases are created, the maximum throughput of
this 100 megabit network was between 10 MB and 11 MB per second. The bulk
throughput rate is always going to be better than the throughput of the smaller-sized
packets generated from executing SQL statements of the benchmark transactions.
However, in this test run, the CPU bandwidth of the primary mirror was maximized, so
the network throughput did not appear to be an issue. However, with faster computers,
the network could be an issue. The network throughput could be improved by going to a
gigabit network or by installing an extra network interface card (NIC) in each mirror.
With two NICs, one network can be used for mirror synching and the other for
transaction requests.
• Improving the I/O system definitely will help. A separate disk drive for log files might be
an inexpensive solution. Note that the log file disk drive does not need to be large,
because unused log files are constantly dropped. A 10 GB drive is probably more than
enough. The important point is to have a separate drive that is very fast with sequential
I/O. A raid I/O system configured for stripping also would be a significant benefit.
Mirror status monitoring
The status of the mirrors can be monitored using the server console or the DB_ADMIN built-
in stored procedures. If you right-click on “Mirror Status”, you will see a grid with a row for
each mirror. If a mirror cannot be accessed, you will see an exception in the “Connection
Exception” column. The “Validated Primary” column is also important. A validated primary
is a primary that has been able to successfully attach to a majority of read-only mirrors. A
validated primary will accept write transactions. A primary mirror that is not validated will
accept only read-only transactions. You can recheck the status of these mirrors by pressing the
refresh button above the grid view. You can retrieve this status information programmatically
by calling the DB_ADMIN.GET_MIRRORS() stored procedure. There is java doc for all
DB_ADMIN stored procedures. The “Database log” node will display the contents of a
2 3
Borland® JDataStore™ high availability
textual status log files. This shows only the most recent status log file. These status log files
are located in the same directory as the binary transaction log files.
Mirror performance monitoring
As previously discussed, the likely bottlenecks are CPU, network, and disk I/O bandwidth. On
Windows, perfmon can provide a nice high-level view of CPU, network, and physical disk
I/O activity. On Linux, the vmstat command is often a quick way to get a feel for CPU,
application I/O, and swap I/O activity.
Summary
The JDataStore HA solution provides one the simplest possible solutions to the complex
problems of fault tolerance and scalability for database applications. The system is kept
simple in part by leveraging existing subsystems for transaction management, transaction
versioning, and network I/O. The webbench sample benchmark provides a simulation of real-
world read-write and read-only transactions. The webbench results show that mirrored
configurations using modest hardware can support a throughput rate for the complex new
order transaction of more than 4.2 million transactions a day. At the same time, the three-
mirror configuration can provide more than seven times the throughput as a nonmirrored
configuration for read-only transactions.
Made in Borland® Copyright © 2005 Borland Software Corporation. All rights reserved. All Borland brand and product names are trademarks or registered trademarks of Borland Software Corporation in the United States and other countries. All other marks are the property of their respective owners. Corporate Headquarters: 100 Enterprise Way, Scotts Valley, CA 95066-3249 • 831-431-1000 • www.borland.com • Offices in: Australia, Brazil, Canada, China, Czech Republic, Finland, France, Germany, Hong Kong, Hungary, India, Ireland, Italy, Japan, Korea, Mexico, the Netherlands, New Zealand, Russia, Singapore, Spain, Sweden, Taiwan, the United Kingdom, and the United States. • 23423
2 4
Top Related