Kudu and Rust
-
Upload
dan-burkert -
Category
Data & Analytics
-
view
152 -
download
0
Transcript of Kudu and Rust
© Cloudera, Inc. All rights reserved. 1
Dan Burkert github.com/danburkert dcb on Mozilla IRC channels
getkudu.io
Rust and Kudu
© Cloudera, Inc. All rights reserved. 4
• High throughput for big scans Goal: Within 2x of HDFS + Parquet
• Low-latency for short accesses Goal: 1ms read/write on SSD
• Database-like semantics (initially single-row ACID)
• Relational data model • SQL queries are easy • “NoSQL” style scan/insert/update • (Java, C++ and now Rust clients)
Kudu Design Goals
© Cloudera, Inc. All rights reserved. 5
Using Kudu
• Table has a SQL-like schema • Finite number of columns (unlike HBase/Cassandra) • Types: BOOL, INT8, INT16, INT32, INT64, FLOAT, DOUBLE, STRING, BINARY, TIMESTAMP • Some subset of columns makes up a possibly-composite primary key • Flexible data distribution policies • Fast ALTER TABLE • “NoSQL” style API • Insert(), Update(), Delete(), Scan() • Integrations with higher level compute frameworks (Spark, Map/Reduce)
© Cloudera, Inc. All rights reserved. 7
Columnar Storage
{25059873, 22309487, 23059861, 23010982}
Tweet_id
{newsycbot, RideImpala, fastly, llvmorg}
User_name
{1442865158, 1442828307, 1442865156, 1442865155}
Created_at
{Visual exp…, Introducing .., Missing July…, LLVM 3.7….}
text
© Cloudera, Inc. All rights reserved. 8
Using Kudu from Rust
• Experimental client library: github.com/danburkert/kudu-rs
• Depends on a new C client library: github.com/danburkert/kudu/tree/c-api
• Goal is to merge both the Rust and C clients into the Kudu project
© Cloudera, Inc. All rights reserved. 9
Sample API
struct PartialRow { .. } impl <'a> PartialRow<'a> { pub fn set<T>(&mut self, column_name: &str, value: T) -> Result<()> where T: ColumnType<'a> { .. } pub fn set_copy<'b, T>(&mut self, column_name: &str, value: T) -> Result<()> where T: VarLengthColumnType<'b> { .. } pub fn get<T>(&'a self, column_name: &str) -> Result<T> where T: ColumnType<'a> { .. } }
// with impls for bool, i{8, 16, 32, 64}, f{32, 64}, SystemTime, &str, &[u8] trait <'a> ColumnType<'a> { .. }
// with impls for &str, &[u8] trait <'a> VarLengthColumnType<'a> { .. }
© Cloudera, Inc. All rights reserved. 10
Sample Application: KuduSQL
• A SQL-like shell for Kudu
• Supports limited CRUD and DDL functionality on Kudu tables
• Designed for interactive usage with tab completion, error reporting
• Depends on many community libraries (chrono, term, docopt, others)
• github.com/danburkert/kudusql
© Cloudera, Inc. All rights reserved. 12
INSERT INTO tweets (tweet_id, user_name, created_at, text) VALUES (2344242,
"rustlang", 2016-03-21T17:29:42Z,
"Please welcome erickt to the core team!");
Read User Input Parse Command Execute Command
© Cloudera, Inc. All rights reserved. 13
Read User Input Parse Command Execute Command
Command::Insert { table: “tweets”, columns: vec![“tweet_id”, “user_name”, “created_at”, “text”], row: vec![ Literal::Int(2344242), Literal::String(Cow::Borrowed(“rustlang”)), Literal::Timestamp(2016-03-21T11:29:42Z), Literal::String(Cow::Borrowed(“Please…”))], }
© Cloudera, Inc. All rights reserved. 14
Read User Input Parse Command Execute Command
fn insert(client: &mut kudu::Client, table_name: &str, columns: Vec<&str>, row: Vec<Literal>) -> kudu::Result<()> {
let table = try!(client.open_table(table_name)); let mut session = client.new_session(); let mut insert = table.new_insert();
for (column, literal) in columns.iter().zip(row) { match literal { Literal::Bool(b) => insert.row().set(column, b), .. Literal::String(cow) => insert.row().set(column, &cow[..]), } }
try!(session.insert(insert)); session.flush() }
© Cloudera, Inc. All rights reserved. 15
When to Use Kudu
• Big, constantly growing and updating datasets (1TiB+)
• Sequential Access to many rows per query
• Fast hardware • Takes full advantage of SSDs, NVRAM