Java I18n and Unicode JaxJug lightning talk 4/15/09.

5
Java I18n and Unicode JaxJug lightning talk 4/15/09

Transcript of Java I18n and Unicode JaxJug lightning talk 4/15/09.

Page 1: Java I18n and Unicode JaxJug lightning talk 4/15/09.

Java I18n and Unicode

JaxJug lightning talk4/15/09

Page 2: Java I18n and Unicode JaxJug lightning talk 4/15/09.

Java I18n   

• Native support for Unicode• Localization

o Create properties fileso Locale objecto ResourceBundles

• Other locale-dependent data

Page 3: Java I18n and Unicode JaxJug lightning talk 4/15/09.

Issues

• But what about when you are talking to someone else?o Different encodingso BE and LE

• String s = new String(buffer, "UTF8");• s.getBytes("UTF8")• public String parseMessage(byte[] buffer) {      StringBuffer buffer = new StringBuffer();      try {          InputStreamReader is = new InputStreamReader(new ByteArrayInputStream(buffer), "UTF-16LE"));          BufferedReader br = new BufferedReader(is);          int ch;          while ((ch = br.read() != -1) {              buffer.append((char)ch);          }          br.close();          return buffer.toString();

      }  catch (IOException e) {          e.printStackTrace();          return null;      }

}

Page 4: Java I18n and Unicode JaxJug lightning talk 4/15/09.

Converting Unicode

import java.io.*;

public class UnicodeFormatter { static public String byteToHex(byte b) {

// Returns hex String representation of byte b char hexDigit[] =

{ '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' }; char[] array = { hexDigit[(b >> 4) & 0x0f],

hexDigit[b & 0x0f] }; return new String(array);

}

static public String charToHex(char c) { // Returns hex String representation of char c byte hi = (byte) (c >>> 8); byte lo = (byte) (c & 0xff); return byteToHex(hi) + byteToHex(lo);

}

}

(from http://java.sun.com/docs/books/tutorial/i18n/)

Page 5: Java I18n and Unicode JaxJug lightning talk 4/15/09.

Regular Expressions and Unicode

• What is a character?o Ñ == 

\u00F1  OR  \u006E \u0303

• Matching graphemes - .• Canonical equivalence

o Pattern.compile (pattern, CANON_EQ);• http://www.regular-expressions.info/unicode.html