A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing...

24
A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams

Transcript of A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing...

Page 1: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments

Xuebing Qing

Carlisle Adams

Page 2: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

Agenda

Why Compress? Criteria for Compression Algorithms Gzip and Bzip wbXML with/without Transcode ASN.1 Combinations

– wbXML + Zip

– ASN.1 + Zip Recent XML Compression Proposals Conclusions and Future Directions

Page 3: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

Foreign Domain

Internet

Home Domain

Firewall

Firewall

PDA

Pol

icy

Dep

loym

ent

/Upd

ate

Mutual AuthorizationRequest/Response

Why Compress?

For high interoperability between domains, XML (XACML) is a good choice for policy representation

On-device Authorization Decision rendering, and simple policy deployment/updating, is also required.

XML is too verbose and heavy for many mobile devices:

– Limited bandwidth

– Limited CPU power, RAM

– Limited battery, flash memory, etc.

Page 4: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

Evaluating Compression Algorithms

Criterion 1: High Compression Ratio Criterion 2: Low Processing Overhead Criterion 3: No Semantic Ambiguity “Nice to have”: 3rd Party API Support

We consider the most popular compression algorithms, as well as their combinations:

– Gzip and Bzip– wbXML– ASN.1– wbXML with transcode + Gzip or Bzip– ASN.1 + Gzip + Bzip

None of them introduce semantic ambiguity and all have good 3rd party API support.

The ideal algorithm: should achieve the highest compression rate while keeping decompression overhead at a minimum.

Page 5: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

Experimental Setting

Written in Java, tested under JSDK 1.4.2 / Windows 2000 / 866 MHz CPU and 512 MB RAM

Runtime Memory Profiling: Eclipse Hyades Plug-in

Java APIs Used

– wbXML: kXML 1.1 (Open Source)

– ASN.1: Pure Java API by OSS Nokalva (Alpha Version – Trial version)

– Gzip: The gzip implementation in JDK 1.4.2

– Bzip: Apache BZip2 Implementation

Test Cases: 9 XACML files (2KB ~ 1 MB) created from the XACML (version 1.1) Conformance Test Suite

Page 6: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

Gzip and Bzip2: Compression Rate

GZip/BZip2 Compression Rate

05

10152025303540

Com

pres

sion

Rat

e (%

)

Gzip

BZip2

Very good compression rate (especially when size > 70K) Compression_rate gzip better than Compression_rate bzip2 when

size <= 70K, while Compression_rate bzip2 better than Compression_rate gzip when size > 70K

Bzip2 performs extremely well when size >= 250K. Zip algorithm works better with large files, yet it still compresses

small files (2K) to 1/3 of original size.

Page 7: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

Gzip and Bzip2: Processing Overhead - Time

GZip/BZip2 Decompression Overhead - Time

0

100200

300400

500600

700

File 1 File 2 File 3 File 4 File 5 File 6 File 7 File 8 File 9

Tim

e (m

illi

seco

nd

)

wbXML

Gzip

BZip2

Only decompression time is considered, because the compression of XACML only happens on the server side when deploying policies.

Absolute decompression time is not enough to evaluate. The wbXML-to-XML conversion mainly involves XML tag replacement and is

not CPU intensive so it can be performed on a device (thus the time of the conversion can be used as a reference to make a fairly realistic evaluation).

Gzip performs the best; BZip2 is similar to wbXML conversion Considering that kXML 1.1 API has significant room for optimization, it

appears that wbXML conversion may ultimately have a similar time overhead to Gzip and hence may be acceptable on a mobile device.

Page 8: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

Gzip and Bzip2: Memory Overhead – Raw Data

Numbers in brackets are mem increment; numbers in red means memory in use decreases when file size increases – it is caused by garbage collection.

Memory overhead of wbXML-to-XML is used as a reference for the estimate. Size memory = Size memory_in_use + Size memory_gced. So the memory used by File 8 is

not 1,857,623 (memory in use), but 3,087,933 bytes that include garbaged collected memory in the process.

To analyze, we categorize memory as two parts: base runtime memory for the decompression API and program itself, and decompression memory for representation and computation of data at runtime.

Base memory is estimated by comparing the absolute memory size with that of wbXML-to-XML conversion.

Memory size increment factor is used to estimate decompression memory.

File Size (bytes)

GZIP Memory [increment]

(bytes)

Bzip2 Memory [increment](bytes)

wbXML with Transcode (bytes)

File 1 2,167 913,770 18,699,694 1,221,647

File 2 4,798 922,000 [8,230] 18,707,972 [8,278] 1,272,590[50,943]

File 3 9,479 938,566 [16,566] 18,851,890 [143,918] 1,372,148[99,448]

File 4 23,976 974,080 [35,514] 18,759,269 [-92621]2 1,803,052[430,904]

File 5 70,186 1,175,590 [201,510] 18,957,045 [197,776] 1,241,162[-561,890]4

File 6 140,071 1,450,050 [274,460] 19,374,474 [417,429] 1,106,431[-134,730]4

File 7 278,623 1,996,007 [545,957] 19,752,229 [377,755] 1,131,434[25,003]4

File 8 556,395 1,857,623 [-138,384]1 20,802,929 [1,050,700] 1,385,234[253,800]4

File 9 1,111,939 3,482,445 [1,624,822] 8,916,388 [-11,886,541]3 742,690[-642,544]4

Page 9: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

Gzip and Bzip2: Memory Overhead – Result

Mem Size Increment Factor

2.5 4.5 3.15

30.7

19.4

29.7

0

20

40

Gzip Bzip wbXML

Memory size increment factor measures the memory increment caused by the data size increment, or memory increment / file size increment.

The bigger a memory size increment factor is, the more memory is used for data compression and the more frequent the garbage collection will be.

It is range of possible values instead of one fixed value Result: Gzip has a very small footprint when decompressing XACML data – its

processing memory overhead is reasonable and acceptable. However, a zipped XACML has to be unpacked into XML and then processed. The processing overhead of Gzip is OHgzip = OHGzip-decompression + OHxml-processing

Base Mem

100

2046

134

0

500

1000

1500

2000

2500

Per

cen

tage

Gzip

BZip2

wbXML

Page 10: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

wbXML: Overview

Part of the presentation logic in WAP Uses a token dictionary, where each token (transcode) maps to a

predefined string (mainly element tags and attribute tags). wbXML without transcode: no explicit token dictionary specified

(otherwise, wbXML with transcode). Code segments used to generate transcode in kXML 1.1

Page 11: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

wbXML: Compression Rate

GZip/BZip2 Compression Rate

020406080

100C

ompr

essio

n R

ate

(%)

without Transcode

with Transcode

Gzip

wbXML with transcode reduces size to under 50% of the original, which is much better than wbXML without transcode.

Not comparable with Gzip, particularly when the file size is over 5 KB.

However, an XACML policy in wbXML can be processed directly by a wbXML parser without any decompression overhead.

We only discuss the processing overhead for wbXML with transcode.

Page 12: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

wbXML: Analysis of Processing Overhead

There is no time and memory overhead for decompression. However, it is impractical to measure and compare CPU time and

memory used by evaluating an XACML policy in wbXML form and in XML form.

We do following analysis rather than experiments– Footprintwbxml_obj < Footprintxml_obj : since a wbXML file is 50% of its original

XACML size, it is reasonable to assume that a wbXML object is approximately half of its XML counterpart.

– Smaller runtime representation certainly enables faster processing, but need to consider the overhead of transcode-table lookup at runtime.

– We can assume Processing_Timewbxml <= Processing_Timexml

– Evaluating an XACML policy in wbXML is less battery intensive because its in-memory representation is much smaller than its XML counterpart.

– Result: OHwbxml = x OHxml-processing where < 1; it is smaller than OHgzip = OHGzip-

decompression + OHxml-processing

Page 13: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

ASN.1: Schema Based XML Encoding

A schema-based binary encoding spec, X-694 “Mapping W3C XML Schema Definitions into ASN.1”, is under development.

The spec introduces ASN.1, a binary-and-schema-based language, into the XML world, which is XML-schema based.

With the specification, an XML document can be converted into ASN.1, which is then encoded with ASN.1’s binary encoding rules, such as PER, DER, CER, BER

Theoretically, ASN.1 with PER, the most compact encoding rule, can achieve the same level compression rate that Gzip does [4].

However, Pure Java API by OSS Nokalva only offers a compression rate that is just a little bit better than wbXML, partially because the API is still in its Alpha stage – several hot fixes have been sent during the experiments in this research.

Page 14: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

ASN.1 Encoding: Compression Rate

GZip/BZip2 Compression Rate

0102030405060

Com

pres

sion

Rat

e (%

)

ASN.1

with Transcode

Gzip

Slightly better than wbXML with transcode, but not comparable to Gzip.

The result is different from the one from Fast Web Services (FWS) [7]; this might be caused by the difference in APIs used and/or by the different characteristic between XACML files and the Web services XML files used in FWS.

Page 15: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

ASN.1 Encoding: Analysis of Processing Overhead

No need to convert an ASN.1 encoded policy to XACML when processing, because ASN.1 is a schema language and supports similar operations as XML.

As with wbXML, we do analysis rather than experiments. The analysis is similar with the one for wbXML. Result: OHASN.1 = x OHxml-processing where < 1; it is smaller than OHgzip =

OHGzip-decompression + OHxml-processing

According to Sun’s experimental results on FWS, could be as small as 0.1 in a Web services environment (although no such result has been achieved in our experiments).

Page 16: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

Agenda

Why Compress? Criteria for Compression Algorithms Gzip and Bzip wbXML with/without Transcode ASN.1 Combinations

– wbXML + Zip

– ASN.1 + Zip Recent XML Compression Proposals Conclusions and Future Directions

Page 17: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

Combine wbXML or ASN.1 with Gzip

Gzip, wbXML and ASN.1 do not perform well enough to satisfy the criteria on their own.

Pure Gzip has more processing overhead than wbXML and ASN.1, while wbXML and ASN.1 do not compress as well as Gzip.

It makes sense to combine them:– wbXML with transcode + Gzip– ASN.1 with transcode + Gzip– Other combinations are not as good as the above (wbXML with transcode is

better than wbXML without transcode, and Bzip2 consumes much more memory and CPU time than Gzip for decompression).

Page 18: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

The Combinations: Compression Rate

GZip/BZip2 Compression Rate

05

101520253035

Com

pres

sion

Rat

e (%

)

ASN.1 + Gzip

wbXML + Gzip

Gzip

Much better than pure ASN.1 and wbXML Even better than pure Gzip It is interesting that the overall compression rate of wbXML +

Gzip for XACML over 100KB is better than ASN.1 + Gzip.

Page 19: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

The Combinations: Analysis of Processing Overhead

For wbXML with transcode + Gzip: OHwbxml_GZip = OHGzip_decompression + x OHxml-processing

For ASN.1 + Gzip: OHASN.1_Gzip = OHGzip_decompression + x OHxml-processing Just for reference:

– Gzip: OHgzip = OHGzip-decompression + OHxml-processing

– wbXML: OHwbxml = x OHxml-processing

OHwbxml_Gzip is definitely better than OHGzip because an XACML file is only decompressed once but processed many times.

Although OHwbxml_Gzip is greater than OHwbxml, the difference can be ignored, because OHGzip_decompression is small and the decompression only happens the first time the policy is downloaded, and when the policy is updated.

Conclusion: wbXML + Gzip is better than ASN.1 + Gzip :– Tag names in XACML are long; simple replacement (wbXML) achieves a good

compression rate.– Replacement (wbXML) creates less overhead than complex encoding (ASN.1)– ASN.1 does not achieve the excellent compression rate expected (when publicly

available APIs are used).– Good open source wbXML APIs are available.

Page 20: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

Recent XML Compression Proposals (1): XOP/MTOM

XOP: XML-binary Optimized Packaging– an XML serialization protocol, which converts certain XML data content

(usually base-64 encoded) into binary streams and puts them into a structure that looks like MIME multipart, with an XML document as the root part.

MTOM: Message Transmission Optimization Mechanism– a description of how XOP is layered into SOAP HTTP transport (SOAP 1.2)

for Web services More HTTP friendly (it’s using MIME multipart); not originally conceived

for the wireless world. More like a communication protocol than a compression algorithm. There appears to be no public implementation available; therefore, not

known how well it performs with respect to our criteria (compression rate, processing overhead, semantic ambiguity)

Page 21: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

Recent XML Compression Proposals (2): XMill

A compression algorithm from AT&T, particularly designed for XML Step 1 - Regrouping: separate structure, layout, and data, then

distribute data elements into data streams (int, char, string, base64, etc.)

Step 2 – Use gzip, bzip2, etc., to compress these streams XMill typically achieves much better compression rate than

conventional compressors such as gzip, bzip2 on XML data. More processing overhead than gzip, bzip2 for the extra “step 1”. Compared with wbXML + Gzip, XMill needs to convert XACML back to

XML for processing.

Page 22: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

Conclusions and Future Directions

Suggested criteria for the use of XML-based policies in mobile devices Reviewed and compared a variety of compression algorithms for XML Concluded that {wbXML + transcode + Gzip} offers the best

combination of compression rate and processing overhead of all algorithms tested

– This combination is recommended for use with XML-based security policies in mobile computing environments

Directions for further work– Keep an eye on ASN.1 (will public implementations match theoretical

results?)– The compression rate of wbXML with transcode can be improved by adding

more transcodes into the table (e.g., built-in function names, data type names, etc.). How much improvement can be gained?

– Experiments on XMill (perform more detailed comparison with wbXML to determine the best algorithm for this environment)

Page 23: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

References [1] Uche Ogbuji. “Tip: Compress XML files for efficient transmission”, IBM DeveloperWorks, 9

April, 2004 [2] M. Cokus, D, Winkowski. “XML Sizing and Compression Study For Military Wireless Data”,

XML 2002 Proceedings by deepX [3] http://www.wapforum.org/what/technical/PROP-WBXML-19990815.pdf. “WAP Binary XML

Content Format Specifications – Version 1.2” [4] ASN.1 Site - XML. “What ASN.1 Can Offer for XML?”, http://asn1.elibel.tm.fr/xml/ June, 2004 [5] ITU-T X.694. “Information Technology – ASN.1 encoding rules – Mapping W3C XML Schema

Definitions Into ASN.1”, Jan, 2004 [6] Nokia. “Nokia Position Paper: W3C Workshop on Binary Interchange of XML Information Item

Sets”, Aug, 2003, http://www.w3.org/2003/08/binary-interchange-workshop/02-Nokia-Position-Paper_02.htm

[7] P. Sandoz, et al. Sun Microsystem. “Fast Web Services”, July, 2003, W3C Workshop on Binary Interchange of XML Information Item Sets

[8] http://www.devx.com/xml/article/16754/0/page/1 “Compressing XML” [9] M. Girardot, N. Sundaresan. “Millau, an encoding format for efficient representation and

exchange of XML over the Web”, http://www9.org/w9cdrom/154/154.html [10] http://www.gnu.org/software/gzip/gzip.html. “gzip - GNU Project - Free Software

Foundation(FSF)” [11] http://gnuwin32.sourceforge.net/packages/bzip2.htm “Bzip2 for Windows” [12] http://www.kxml.org “kXML with wbXML support” [13] http://www.oss.com “OSS Nokalva ASN.1/Pure Java Tools - Beta” [14] http://www.eclipse.org/hyades/ “Hyades – Automated Software Quality Evaluation

Framework” [15] http://sourceforge.net/projects/xmill “XMill - A User Configurable XML Processor”

Page 24: A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams.

Questions