Sabre presentation for MySQL user conference 2004

36
Confidential MySQL at Sabre Alan Walker Sabre Labs February 2004

description

Low fare search was a cluster of 8 mainframes, running a heuristic that didn't always get a good solution. We built new algorithms and moved it all to a Linux cluster. This presentation describes the parts we put on MySQL, back when mainstream mission critical hadn't even heard of MySQL. The open source precompiler let us take HP NonStop code and compile it, unchanged, to run against MySQL.

Transcript of Sabre presentation for MySQL user conference 2004

Page 1: Sabre presentation for MySQL user conference 2004

Confidential

MySQL at Sabre

Alan Walker Sabre Labs

February 2004

Page 2: Sabre presentation for MySQL user conference 2004

2 2 2

Agenda

• Sabre Holdings Overview

• Business drivers for MySQL & Open Source

• Shopping for fares

• Air Travel Shopping Engine (ATSE)

• Data replication strategy

• ESQL precompiler for MySQL

• Other MySQL users at Sabre

Page 3: Sabre presentation for MySQL user conference 2004

3 3 3

A world leader in travel commerce,

retailing travel products, and

providing distribution and

technology solutions for the

travel industry

Who is Sabre Holdings?

Page 4: Sabre presentation for MySQL user conference 2004

4 4 4

Sabre Holdings Businesses

Page 5: Sabre presentation for MySQL user conference 2004

5 5 5

Sabre Holdings Fast Facts

• Industry leader in multiple travel channels

• Revenues of $2.06 billion in 2002

• S&P 500 company

• NYSE:TSG

• Headquarters in Dallas/Fort Worth, Texas

• 6,500 employees in 45 countries

Page 6: Sabre presentation for MySQL user conference 2004

6 6 6

Business drivers

for a single customer request

fare combinations

Over 3 billion

Multiple airlines, flights, fare types, dates

prices, taxes, surcharges

Page 7: Sabre presentation for MySQL user conference 2004

7 7 7

Business drivers

• No direct revenue for shopping queries

• Revenue for booking, but not looking (searching)

• Look-to-book ratio increasing

• Competition requires staying on the “leading edge”

• Highly reliable and scalable database

• Fast processors

• Large real memory

• Smart algorithms

• Shopping is a good fit for horizontal scale

• Pricing requires higher precision

Page 8: Sabre presentation for MySQL user conference 2004

8 8 8

Business drivers

DB / Middleware

Operating System

Application

Computing

Stack

Commodity

Point

Hardware

Hardware, operating system, database and middleware are

becoming commodities. This drives the cost down rapidly.

Open source software is a major driver of this effect.

Page 9: Sabre presentation for MySQL user conference 2004

9 9 9

Business Solution

• Linux servers alongside HP NonStop servers to create

“hybrid” Air Travel Shopping Engine (ATSE) platform

• HP NonStop delivers high availability and reliability

– Better than or equal to legacy, but at significantly lower cost

– Best fit for critical workloads and master database

management

• Linux / MySQL delivers 64-bit memory and faster CPUs

– Lower availability and reliability than HP NonStop but at

significantly lower cost

– Best fit for CPU-intensive shopping workloads

Most cost-effective platform for the shopping workload

Page 10: Sabre presentation for MySQL user conference 2004

10 10 10

Business drivers

• Sabre’s legacy

• World’s first commercial OLTP system in 1960 • Mainframe clusters running TPF • Operating system customized to our needs • True 7*24 application, with zero scheduled downtime • Most application code in assembler

• Sabre’s future

• Higher-level languages • Relational databases • Internet

• Open systems

• Reduce specialized training • Use off the shelf software • HP NonStop with OSS is a key component (LINUX?)

Page 11: Sabre presentation for MySQL user conference 2004

11 11 11

Shopping

• Finding cheap air fares is hard!

• With 50+ connect points to consider, and >100 fares per

leg, we need to evaluate >3 billion combinations

• Up to a million fares can change every day

• Availability changes continuously

• Solve it >100 times per second

• Other functions

• Price 250 tickets per second

• Process 1000 flight routing requests per second

Page 12: Sabre presentation for MySQL user conference 2004

12 12 12

Pricing

• Shopping vs. Pricing

• Shopping is the problem of finding low fares

• Pricing is used to print the ticket

• Pricing has to be accurate, or we pay the difference to the

airline

• Many internet search engines still rely on mainframes to

actually print the ticket

• Pricing also requires additional functions, such as refunds,

exchanges and auditing

Page 13: Sabre presentation for MySQL user conference 2004

13 13 13

Algorithms

• Fare-led search

• Graph-based algorithm that searches all fare

combinations across 50+ connect points

• Can generate up to a 4-segment connection

• Search space of >3 billion fare combinations

• Match or exceed any competitor in finding lowest fare

• Only loses to competitors to have access to exclusive

private fares and/or other discounts

• Search actually checks Direct Connect Availability, so that

low fare options are actually bookable

Page 14: Sabre presentation for MySQL user conference 2004

14 14 14

Algorithms

• Dynamic schedules

• Connections are not generated overnight and stored

• Not limited to routes explicitly setup by airlines or other

marketing staff

• Availability Manager

• Flexible rules to access airline availability

• Current methods

– Direct Connect

– Host Availability

– Teletype (AVS)

• Can also use

– Cached DCA

– Inventory proxy

Page 15: Sabre presentation for MySQL user conference 2004

15 15 15

ATSE Hybrid

• Air shopping for desirable itineraries

• Must search through multiple airlines, flights, fare types,

dates, adjacent airports, etc.

• Must calculate prices, taxes, surcharges

• Complexity

• Single round-trip request can have over 3 billion fare

combinations

• Search is CPU and memory intensive

• Business driver

• No direct revenue for shopping transactions

• Increasing look to book ratio

Page 16: Sabre presentation for MySQL user conference 2004

16 16 16

ATSE Hybrid

• Combine Linux servers and HP NonStop servers

• HP NonStop delivers high availability and reliability

• Better than or equal to TPF at significantly lower cost

• Master database management

• Data replicated in real-time to Linux servers

• PNR pricing, schedules and availability

• Linux delivers 64-bit memory model and faster CPUs

• Lower availability and reliability than HP NonStop but at

significantly lower cost

• Horizontally scaled server farm with spare capacity

• Best fit for CPU-intensive shopping workloads

Page 17: Sabre presentation for MySQL user conference 2004

17 17 17

ATSE Hybrid

I B M

PSS

Naming Service

And

Load Balancing

Load Information

Schedule and Availability

Updates

I B M

MVS

Fare and Rule

Updates

HP Non-Stop

Linux Server Farm

DB Image

Load

and Updates

E/R

Logging

and Billing

l a t i g i d l a t i g i d l a t i g i d l a t i g i d l a t i g i d l a t i g i d

Availability

Requests

Shopping

Transactions

Linux Linux Linux Linux Linux Linux Linux Linux

Linux Linux Linux Linux Linux Linux Linux Linux

Linux Linux Linux Linux Linux Linux Linux Linux

Air Shopping

Transactions

Page 18: Sabre presentation for MySQL user conference 2004

18 18 18

ATSE Linux servers

• In production since July 2003

• Started with HP rp5405 servers (Unix PA-RISC)

– Migrated to Itanium in December 2003

• Using 45 HP rx5670 servers

– 4-way, 1.5 GHz, 6MB L2 cache, 32GB RAM, 4x72GB SCSI

• Software

• MySQL 4.0.15

• GNU compilers – g++ 3.2.3 and glibc 2.3.2

• TAO object request broker

• Redhat RHAS 2.1

• GoldenGate Extractor/Replicator

• Monitoring – Prognosis, CA Unicenter, scripts

Page 19: Sabre presentation for MySQL user conference 2004

19 19 19

ATSE Software

• Extensive use of open source software

• MySQL 4.0.15

• GNU compilers – g++ 3.2.3 and glibc 2.3.2

• TAO object request broker

• Redhat Linux AS 3.0

• Third party software

• GoldenGate Extractor/Replicator

• Monitoring – Prognosis, CA Unicenter, scripts

• Internally developed applications and scripts

Page 20: Sabre presentation for MySQL user conference 2004

20 20 20

Data replication

• HP NonStop (Tandem) is master database

• Golden Gate Software used to replicate to MySQL

– Extracts data form undo/redo logs on the NonStop server

– Performs INSERT / UPDATE / DELETE on MySQL

– Software performs catch-up / resync in case of crashes or

other failures

• Each Linux server has an identical copy of the database

– 50GB database on each server, all InnoDB

• Replication volume

• 150 tables replicated (over 300 on NonStop server)

• Can replicate 1M fare changes / hour

• Data updates on 7x24 basis

Page 21: Sabre presentation for MySQL user conference 2004

21 21 21

HP NonStop

Data replication

SQL/MP

DB TMF

Log Extract

Queue Data

Pump

Linux IA-64

MySQL

Queue

DB

Receive

Updater

= Golden Gate Software

Page 22: Sabre presentation for MySQL user conference 2004

22 22 22

Data Replication

Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux

DataPump

DataPump

DataPump

DataPump

DataPump

DataPump

DataPump

DataPump

DataPump

DataPump

DataPump

DataPump

Server-Net

MySQL

Queue

Extract

Collector

Repli-cator

MySQL

Queue

Extract

Collector

Repli-cator

MySQL

Queue

Extract

Collector

Repli-cator

MySQL

Queue

Extract

Collector

Repli-cator

MySQL

Queue

Extract

Collector

Repli-cator

MySQL

Queue

Extract

Collector

Repli-cator

MySQL

Queue

Extract

Collector

Repli-cator

MySQL

Queue

Extract

Collector

Repli-cator

MySQL

Queue

Extract

Collector

Repli-cator

MySQL

Queue

Extract

Collector

Repli-cator

MySQL

Queue

Extract

Collector

Repli-cator

MySQL

Queue

Extract

Collector

Repli-cator

Extract

Queue

Extract

Queue

Extract

Queue

Extract

Queue

Extract

Queue

Extract

Queue

Page 23: Sabre presentation for MySQL user conference 2004

23 23 23

Results

Reduced runtime costs

(over 80% compared to legacy)

Reduced development

costs

Increased

functionality Decreased fare

loading cycle times Competitive

Advantage

Page 24: Sabre presentation for MySQL user conference 2004

24 24 24

Hybrid

• Horizontal scalability

• Ability to throw inexpensive CPUs at the problem

• Tolerate failure of a single server

• How do we get there from here?

• Database and network functions remain on Himalaya

• C++ code readily ports to Linux

• Publish/subscribe metaphor for data in memory

• 64-bit addressing to avoid memory constraints

Page 25: Sabre presentation for MySQL user conference 2004

25 25 25

Connectivity

• CORBA

• Major functions use CORBA internally

• CORBA requests to TPF for availability

• CORBA to CTS for DCA this Summer (bypass TPF)

• Asynchronous messaging via MQ Series

• XML

• Currently uses XML requests from TPF (over RPPC) for

pricing functions

• Working on direct access from Travelocity to ATSE

– Will be used for BIP

– Already working over HTTP (development systems)

– Working on security & billing for production

Page 26: Sabre presentation for MySQL user conference 2004

26 26 26

Timeline

• 2000

• Proof Of Concept, April – August

• 5 core developers, partnership with Compaq

• 2001

• Development & training began in February

• Initial hardware delivered

• 2002

• Phase 1 in production since July

• Zero downtime since implementation

• Rapidly developing additional functionality

• Wow – this is from an ancient slide, huh?

Page 27: Sabre presentation for MySQL user conference 2004

27 27 27

Precompiler

• Challenge

• 500K lines of C/C++, 150+

files with embedded SQL

• We did not want to rewrite

ESQL / C code by hand

• Solution

• Wrote a precompiler that

converts ESQL to inline

MySQL calls

• About 1000 lines of awk

• We are willing to share this

code with others

EXEC SQL BEGIN DECLARE SECTION;

int host_a;

double host_b;

char host_c;

EXEC SQL END DECLARE SECTION;

EXEC SQL DECLARE csr1 CURSOR FOR

SELECT a, b, c

FROM table1

WHERE x = :hostvar1;

EXEC SQL OPEN csr1;

while (rc >= 0 && rc != 100){

EXEC SQL FETCH csr1 INTO

:host_a, :host_b, :host_c;

printf("Fetch %d, %lf, %s\n",

host_a, host_b, host_c);

}

EXEC SQL CLOSE csr1;

Page 28: Sabre presentation for MySQL user conference 2004

28 28 28

Precompiler

• How it works

• Convert C / ESQL to C++ code

• Polymorphism matches data types in the declare section

• Can ignore the declare section

EXEC SQL BEGIN DECLARE SECTION;

int host_a;

double host_b;

char host_c;

EXEC SQL END DECLARE SECTION;

// EXEC SQL BEGIN DECLARE SECTION;

int host_a;

double host_b;

char host_c;

// EXEC SQL END DECLARE SECTION;

Page 29: Sabre presentation for MySQL user conference 2004

29 29 29

Precompiler

EXEC SQL DECLARE csr1 CURSOR FOR

SELECT a, b, c

FROM table1

WHERE x = :hostvar1;

// EXEC SQL DECLARE csr1

static e2mysql csr1 = {

" SELECT a,b,c FROM table1 WHERE x = :hostvar1"

, NULL , 0};

Cursor declarations (SELECT statements) are converted to a static

struct. The struct has the text of the SQL, as well as statement

handles for doing prepare / execute (where applicable)

Page 30: Sabre presentation for MySQL user conference 2004

30 30 30

Precompiler

// EXEC SQL FETCH csr1

static int16 fetch_csr1()

{

if ( ! csr1.rslt )

return SQL_ERROR;

if ( csr1.row >= mysql_num_rows(csr1.rslt) )

return SQL_NO_DATA;

MYSQL_ROW row = mysql_fetch_row(csr1.rslt);

SQLBindColPoly(row[0], host_a, sizeof(host_a));

SQLBindColPoly(row[1], host_b, sizeof(host_b));

SQLBindColPoly(row[2], host_c, sizeof(host_c));

++csr1.row;

return SQL_SUCCESS;

}

EXEC SQL FETCH csr1 INTO :host_a, :host_b, :host_c;

The OPEN, FETCH and CLOSE statements are converted into

function calls. The precompiler generates the code for these calls

and puts it at the end of the source module.

Page 31: Sabre presentation for MySQL user conference 2004

31 31 31

Precompiler

inline int32

SQLBindColPoly(const char* value, int32& parm, uint16 size)

{

parm = atoi(value);

return SQL_SUCCESS;

}

A lightweight wrapper around the database API lets us

use polymorphism to convert to the types specified in the

declare section. There is a wrapper function for each

simple C++ type that we handle.

Page 32: Sabre presentation for MySQL user conference 2004

32 32 32

Precompiler

• Notes

• Light-weight C++ wrapper to MySQL API

• The precompiler understands some SQL syntax and does

some modifications of NonStop SQL/MP statements

• We have also used our precompiler to target other DBMS

– ODBC API

– Oracle

– PostgreSQL

• Since we convert C to C++, this may be problematic for

ESQL programs that used deprecated K&R syntax

– C++ compilers are stricter than C compilers

– However, we did not have this problem with our application

Page 33: Sabre presentation for MySQL user conference 2004

33 33 33

Other MySQL applications at Sabre

• ATSE is our largest and most mission critical

• We have other production systems that rely on MySQL

• Site59.com is the most visible

• MySQL also used for some internal databases

• More under development

• MySQL / Linux / SATA drives make cheap data marts

• Sometimes cheaper to replicate to a data mart than to

upgrade a central data warehouse

• Currently testing with a 1.5B row database

Page 34: Sabre presentation for MySQL user conference 2004

34 34 34

Site59

• Last minute travel packages

• Acquired by Travelocity in

March 2002

• Sales volume?

• Transaction rates?

• All dynamic content generated

using PHP & MySQL

Page 35: Sabre presentation for MySQL user conference 2004

35 35 35

Site59

Presentation

(Apache/PHP)

Replication Frontend DB

(MySQL, Linux)

Backend DB

(Oracle, Sun)

Application

Server Internet

HTTP

Reservations

System Gateway

XML/HTTP

Site59 implements a fairly “classic” dynamic website using MySQL.

Dynamic content is generated at about 30Mbits / second. Extensive

use is made of single and dual processor Linux machines (IA-32)

Page 36: Sabre presentation for MySQL user conference 2004

36 36 36

Fulfill Session Shop Sell Price

Travel Commerce Processing Chain