Sabre presentation for MySQL user conference 2004
-
Upload
alan-walker -
Category
Technology
-
view
173 -
download
1
description
Transcript of Sabre presentation for MySQL user conference 2004
Confidential
MySQL at Sabre
Alan Walker Sabre Labs
February 2004
2 2 2
Agenda
• Sabre Holdings Overview
• Business drivers for MySQL & Open Source
• Shopping for fares
• Air Travel Shopping Engine (ATSE)
• Data replication strategy
• ESQL precompiler for MySQL
• Other MySQL users at Sabre
3 3 3
A world leader in travel commerce,
retailing travel products, and
providing distribution and
technology solutions for the
travel industry
Who is Sabre Holdings?
4 4 4
Sabre Holdings Businesses
5 5 5
Sabre Holdings Fast Facts
• Industry leader in multiple travel channels
• Revenues of $2.06 billion in 2002
• S&P 500 company
• NYSE:TSG
• Headquarters in Dallas/Fort Worth, Texas
• 6,500 employees in 45 countries
6 6 6
Business drivers
for a single customer request
fare combinations
Over 3 billion
Multiple airlines, flights, fare types, dates
prices, taxes, surcharges
7 7 7
Business drivers
• No direct revenue for shopping queries
• Revenue for booking, but not looking (searching)
• Look-to-book ratio increasing
• Competition requires staying on the “leading edge”
• Highly reliable and scalable database
• Fast processors
• Large real memory
• Smart algorithms
• Shopping is a good fit for horizontal scale
• Pricing requires higher precision
8 8 8
Business drivers
DB / Middleware
Operating System
Application
Computing
Stack
Commodity
Point
Hardware
Hardware, operating system, database and middleware are
becoming commodities. This drives the cost down rapidly.
Open source software is a major driver of this effect.
9 9 9
Business Solution
• Linux servers alongside HP NonStop servers to create
“hybrid” Air Travel Shopping Engine (ATSE) platform
• HP NonStop delivers high availability and reliability
– Better than or equal to legacy, but at significantly lower cost
– Best fit for critical workloads and master database
management
• Linux / MySQL delivers 64-bit memory and faster CPUs
– Lower availability and reliability than HP NonStop but at
significantly lower cost
– Best fit for CPU-intensive shopping workloads
Most cost-effective platform for the shopping workload
10 10 10
Business drivers
• Sabre’s legacy
• World’s first commercial OLTP system in 1960 • Mainframe clusters running TPF • Operating system customized to our needs • True 7*24 application, with zero scheduled downtime • Most application code in assembler
• Sabre’s future
• Higher-level languages • Relational databases • Internet
• Open systems
• Reduce specialized training • Use off the shelf software • HP NonStop with OSS is a key component (LINUX?)
11 11 11
Shopping
• Finding cheap air fares is hard!
• With 50+ connect points to consider, and >100 fares per
leg, we need to evaluate >3 billion combinations
• Up to a million fares can change every day
• Availability changes continuously
• Solve it >100 times per second
• Other functions
• Price 250 tickets per second
• Process 1000 flight routing requests per second
12 12 12
Pricing
• Shopping vs. Pricing
• Shopping is the problem of finding low fares
• Pricing is used to print the ticket
• Pricing has to be accurate, or we pay the difference to the
airline
• Many internet search engines still rely on mainframes to
actually print the ticket
• Pricing also requires additional functions, such as refunds,
exchanges and auditing
13 13 13
Algorithms
• Fare-led search
• Graph-based algorithm that searches all fare
combinations across 50+ connect points
• Can generate up to a 4-segment connection
• Search space of >3 billion fare combinations
• Match or exceed any competitor in finding lowest fare
• Only loses to competitors to have access to exclusive
private fares and/or other discounts
• Search actually checks Direct Connect Availability, so that
low fare options are actually bookable
14 14 14
Algorithms
• Dynamic schedules
• Connections are not generated overnight and stored
• Not limited to routes explicitly setup by airlines or other
marketing staff
• Availability Manager
• Flexible rules to access airline availability
• Current methods
– Direct Connect
– Host Availability
– Teletype (AVS)
• Can also use
– Cached DCA
– Inventory proxy
15 15 15
ATSE Hybrid
• Air shopping for desirable itineraries
• Must search through multiple airlines, flights, fare types,
dates, adjacent airports, etc.
• Must calculate prices, taxes, surcharges
• Complexity
• Single round-trip request can have over 3 billion fare
combinations
• Search is CPU and memory intensive
• Business driver
• No direct revenue for shopping transactions
• Increasing look to book ratio
16 16 16
ATSE Hybrid
• Combine Linux servers and HP NonStop servers
• HP NonStop delivers high availability and reliability
• Better than or equal to TPF at significantly lower cost
• Master database management
• Data replicated in real-time to Linux servers
• PNR pricing, schedules and availability
• Linux delivers 64-bit memory model and faster CPUs
• Lower availability and reliability than HP NonStop but at
significantly lower cost
• Horizontally scaled server farm with spare capacity
• Best fit for CPU-intensive shopping workloads
17 17 17
ATSE Hybrid
I B M
PSS
Naming Service
And
Load Balancing
Load Information
Schedule and Availability
Updates
I B M
MVS
Fare and Rule
Updates
HP Non-Stop
Linux Server Farm
DB Image
Load
and Updates
E/R
Logging
and Billing
l a t i g i d l a t i g i d l a t i g i d l a t i g i d l a t i g i d l a t i g i d
Availability
Requests
Shopping
Transactions
Linux Linux Linux Linux Linux Linux Linux Linux
Linux Linux Linux Linux Linux Linux Linux Linux
Linux Linux Linux Linux Linux Linux Linux Linux
Air Shopping
Transactions
18 18 18
ATSE Linux servers
• In production since July 2003
• Started with HP rp5405 servers (Unix PA-RISC)
– Migrated to Itanium in December 2003
• Using 45 HP rx5670 servers
– 4-way, 1.5 GHz, 6MB L2 cache, 32GB RAM, 4x72GB SCSI
• Software
• MySQL 4.0.15
• GNU compilers – g++ 3.2.3 and glibc 2.3.2
• TAO object request broker
• Redhat RHAS 2.1
• GoldenGate Extractor/Replicator
• Monitoring – Prognosis, CA Unicenter, scripts
19 19 19
ATSE Software
• Extensive use of open source software
• MySQL 4.0.15
• GNU compilers – g++ 3.2.3 and glibc 2.3.2
• TAO object request broker
• Redhat Linux AS 3.0
• Third party software
• GoldenGate Extractor/Replicator
• Monitoring – Prognosis, CA Unicenter, scripts
• Internally developed applications and scripts
20 20 20
Data replication
• HP NonStop (Tandem) is master database
• Golden Gate Software used to replicate to MySQL
– Extracts data form undo/redo logs on the NonStop server
– Performs INSERT / UPDATE / DELETE on MySQL
– Software performs catch-up / resync in case of crashes or
other failures
• Each Linux server has an identical copy of the database
– 50GB database on each server, all InnoDB
• Replication volume
• 150 tables replicated (over 300 on NonStop server)
• Can replicate 1M fare changes / hour
• Data updates on 7x24 basis
21 21 21
HP NonStop
Data replication
SQL/MP
DB TMF
Log Extract
Queue Data
Pump
Linux IA-64
MySQL
Queue
DB
Receive
Updater
= Golden Gate Software
22 22 22
Data Replication
Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux
DataPump
DataPump
DataPump
DataPump
DataPump
DataPump
DataPump
DataPump
DataPump
DataPump
DataPump
DataPump
Server-Net
MySQL
Queue
Extract
Collector
Repli-cator
MySQL
Queue
Extract
Collector
Repli-cator
MySQL
Queue
Extract
Collector
Repli-cator
MySQL
Queue
Extract
Collector
Repli-cator
MySQL
Queue
Extract
Collector
Repli-cator
MySQL
Queue
Extract
Collector
Repli-cator
MySQL
Queue
Extract
Collector
Repli-cator
MySQL
Queue
Extract
Collector
Repli-cator
MySQL
Queue
Extract
Collector
Repli-cator
MySQL
Queue
Extract
Collector
Repli-cator
MySQL
Queue
Extract
Collector
Repli-cator
MySQL
Queue
Extract
Collector
Repli-cator
Extract
Queue
Extract
Queue
Extract
Queue
Extract
Queue
Extract
Queue
Extract
Queue
23 23 23
Results
Reduced runtime costs
(over 80% compared to legacy)
Reduced development
costs
Increased
functionality Decreased fare
loading cycle times Competitive
Advantage
24 24 24
Hybrid
• Horizontal scalability
• Ability to throw inexpensive CPUs at the problem
• Tolerate failure of a single server
• How do we get there from here?
• Database and network functions remain on Himalaya
• C++ code readily ports to Linux
• Publish/subscribe metaphor for data in memory
• 64-bit addressing to avoid memory constraints
25 25 25
Connectivity
• CORBA
• Major functions use CORBA internally
• CORBA requests to TPF for availability
• CORBA to CTS for DCA this Summer (bypass TPF)
• Asynchronous messaging via MQ Series
• XML
• Currently uses XML requests from TPF (over RPPC) for
pricing functions
• Working on direct access from Travelocity to ATSE
– Will be used for BIP
– Already working over HTTP (development systems)
– Working on security & billing for production
26 26 26
Timeline
• 2000
• Proof Of Concept, April – August
• 5 core developers, partnership with Compaq
• 2001
• Development & training began in February
• Initial hardware delivered
• 2002
• Phase 1 in production since July
• Zero downtime since implementation
• Rapidly developing additional functionality
• Wow – this is from an ancient slide, huh?
27 27 27
Precompiler
• Challenge
• 500K lines of C/C++, 150+
files with embedded SQL
• We did not want to rewrite
ESQL / C code by hand
• Solution
• Wrote a precompiler that
converts ESQL to inline
MySQL calls
• About 1000 lines of awk
• We are willing to share this
code with others
EXEC SQL BEGIN DECLARE SECTION;
int host_a;
double host_b;
char host_c;
EXEC SQL END DECLARE SECTION;
EXEC SQL DECLARE csr1 CURSOR FOR
SELECT a, b, c
FROM table1
WHERE x = :hostvar1;
EXEC SQL OPEN csr1;
while (rc >= 0 && rc != 100){
EXEC SQL FETCH csr1 INTO
:host_a, :host_b, :host_c;
printf("Fetch %d, %lf, %s\n",
host_a, host_b, host_c);
}
EXEC SQL CLOSE csr1;
28 28 28
Precompiler
• How it works
• Convert C / ESQL to C++ code
• Polymorphism matches data types in the declare section
• Can ignore the declare section
EXEC SQL BEGIN DECLARE SECTION;
int host_a;
double host_b;
char host_c;
EXEC SQL END DECLARE SECTION;
// EXEC SQL BEGIN DECLARE SECTION;
int host_a;
double host_b;
char host_c;
// EXEC SQL END DECLARE SECTION;
29 29 29
Precompiler
EXEC SQL DECLARE csr1 CURSOR FOR
SELECT a, b, c
FROM table1
WHERE x = :hostvar1;
// EXEC SQL DECLARE csr1
static e2mysql csr1 = {
" SELECT a,b,c FROM table1 WHERE x = :hostvar1"
, NULL , 0};
Cursor declarations (SELECT statements) are converted to a static
struct. The struct has the text of the SQL, as well as statement
handles for doing prepare / execute (where applicable)
30 30 30
Precompiler
// EXEC SQL FETCH csr1
static int16 fetch_csr1()
{
if ( ! csr1.rslt )
return SQL_ERROR;
if ( csr1.row >= mysql_num_rows(csr1.rslt) )
return SQL_NO_DATA;
MYSQL_ROW row = mysql_fetch_row(csr1.rslt);
SQLBindColPoly(row[0], host_a, sizeof(host_a));
SQLBindColPoly(row[1], host_b, sizeof(host_b));
SQLBindColPoly(row[2], host_c, sizeof(host_c));
++csr1.row;
return SQL_SUCCESS;
}
EXEC SQL FETCH csr1 INTO :host_a, :host_b, :host_c;
The OPEN, FETCH and CLOSE statements are converted into
function calls. The precompiler generates the code for these calls
and puts it at the end of the source module.
31 31 31
Precompiler
inline int32
SQLBindColPoly(const char* value, int32& parm, uint16 size)
{
parm = atoi(value);
return SQL_SUCCESS;
}
A lightweight wrapper around the database API lets us
use polymorphism to convert to the types specified in the
declare section. There is a wrapper function for each
simple C++ type that we handle.
32 32 32
Precompiler
• Notes
• Light-weight C++ wrapper to MySQL API
• The precompiler understands some SQL syntax and does
some modifications of NonStop SQL/MP statements
• We have also used our precompiler to target other DBMS
– ODBC API
– Oracle
– PostgreSQL
• Since we convert C to C++, this may be problematic for
ESQL programs that used deprecated K&R syntax
– C++ compilers are stricter than C compilers
– However, we did not have this problem with our application
33 33 33
Other MySQL applications at Sabre
• ATSE is our largest and most mission critical
• We have other production systems that rely on MySQL
• Site59.com is the most visible
• MySQL also used for some internal databases
• More under development
• MySQL / Linux / SATA drives make cheap data marts
• Sometimes cheaper to replicate to a data mart than to
upgrade a central data warehouse
• Currently testing with a 1.5B row database
34 34 34
Site59
• Last minute travel packages
• Acquired by Travelocity in
March 2002
• Sales volume?
• Transaction rates?
• All dynamic content generated
using PHP & MySQL
35 35 35
Site59
Presentation
(Apache/PHP)
Replication Frontend DB
(MySQL, Linux)
Backend DB
(Oracle, Sun)
Application
Server Internet
HTTP
Reservations
System Gateway
XML/HTTP
Site59 implements a fairly “classic” dynamic website using MySQL.
Dynamic content is generated at about 30Mbits / second. Extensive
use is made of single and dual processor Linux machines (IA-32)
36 36 36
Fulfill Session Shop Sell Price
Travel Commerce Processing Chain