Intro to Lab 1 and SimpleDB Overview. Labs Use: - Java for code - github for version control...

15
Intro to Lab 1 and SimpleDB Overview

Transcript of Intro to Lab 1 and SimpleDB Overview. Labs Use: - Java for code - github for version control...

Page 1: Intro to Lab 1 and SimpleDB Overview. Labs Use: - Java for code - github for version control Advantages: -Java lets us focus on db internals -Github lets.

Intro to Lab 1 andSimpleDB Overview

Page 2: Intro to Lab 1 and SimpleDB Overview. Labs Use: - Java for code - github for version control Advantages: -Java lets us focus on db internals -Github lets.

Labs

Use:

- Java for code

- github for version control

Advantages:- Java lets us focus on db internals- Github lets you submit as often as you like

Page 3: Intro to Lab 1 and SimpleDB Overview. Labs Use: - Java for code - github for version control Advantages: -Java lets us focus on db internals -Github lets.

Java Overview

- All runs in Java Virtual Machine

- Everything an object or scalar

- JUnit tests

- Use Javadoc to navigate codebase

Page 4: Intro to Lab 1 and SimpleDB Overview. Labs Use: - Java for code - github for version control Advantages: -Java lets us focus on db internals -Github lets.

Github

- Distributed version control

- Commit sets of changes - local

- Push changes to the server (github.com) – important!

Page 5: Intro to Lab 1 and SimpleDB Overview. Labs Use: - Java for code - github for version control Advantages: -Java lets us focus on db internals -Github lets.

What is SimpleDB?• A basic database system• SQL Front-end (Provided for later labs)

– Heap files (Lab 1)– Buffer Pool (Labs 1-4)– Basic Operators (Labs 1 & 2)

– Scan, Filter, JOIN, Aggregate

– B-Tree Indexes (Lab 3)– Transactions (Lab 4)

• Javadoc is your friend!

Page 6: Intro to Lab 1 and SimpleDB Overview. Labs Use: - Java for code - github for version control Advantages: -Java lets us focus on db internals -Github lets.

Module Diagram

Page 7: Intro to Lab 1 and SimpleDB Overview. Labs Use: - Java for code - github for version control Advantages: -Java lets us focus on db internals -Github lets.

Database

Catalog=> List of DB tables

Singleton Database:

Database.getCatalog()

BufferPool=> Caches DB pages in memory

Database.getBufferPool()

LogFile(Ignore for Lab 1)

Database.getLogFile()

Page 8: Intro to Lab 1 and SimpleDB Overview. Labs Use: - Java for code - github for version control Advantages: -Java lets us focus on db internals -Github lets.

Catalog

Table:

DbFile fileString nameString primary_key

DbFile ID Table

001 Table1

002 Table2

003 Table3

Catalog:

=> Stores a list of all tables in the database

Page 9: Intro to Lab 1 and SimpleDB Overview. Labs Use: - Java for code - github for version control Advantages: -Java lets us focus on db internals -Github lets.

BufferPool

Buffer Pool:

Page:

PageId idTuple tuples[]Byte header[]

Page ID Page

001 Page1

003 Page3

007 Page7

=> Caches recently accessed database pages in memory

Page 10: Intro to Lab 1 and SimpleDB Overview. Labs Use: - Java for code - github for version control Advantages: -Java lets us focus on db internals -Github lets.

HeapFile (Implements DbFile)

Heap File:

File fileTupleDesc tdDbFileIterator it

_______________

Field1 TypeField1 Name

Field2 TypeField2 Name

Field3 TypeField3 Name

File (on disk):

Tuple Descriptor:

Page1 Page2 Page3 …

Iterate through Tuples in Heap Pages:

Page 11: Intro to Lab 1 and SimpleDB Overview. Labs Use: - Java for code - github for version control Advantages: -Java lets us focus on db internals -Github lets.

HeapPage (Implements Page)

Heap Page:

HeapPageId pidTupleDesc tdByte header[]Tuple tuples[]

Field1 TypeField1 Name

Field2 TypeField2 Name

Field3 TypeField3 Name

Tuple Descriptor:

01100110 11111111 11101101 …

Empty Tuple1 Tuple2 …

Field1 Field2 Field3 …

Header:

Tuples:

Tuple:

Fields and Tuples are Fixed Width!

Page 12: Intro to Lab 1 and SimpleDB Overview. Labs Use: - Java for code - github for version control Advantages: -Java lets us focus on db internals -Github lets.

SeqScan(Implements DbIterator)

• DbIterator class implemented by all operators– open()– close()– getTupleDesc()– hasNext()– next()– rewind()

• Iterator model: chain iterators together– Use DbFileIterator from HeapFile

Page 13: Intro to Lab 1 and SimpleDB Overview. Labs Use: - Java for code - github for version control Advantages: -Java lets us focus on db internals -Github lets.

// construct a 3-column table schema

Type types[] = new Type[]{ Type.INT_TYPE, Type.INT_TYPE, Type.INT_TYPE };

String names[] = new String[]{ "field0", "field1", "field2" };

TupleDesc descriptor = new TupleDesc(types, names);

// create the table, associate it with some_data_file.dat

// and tell the catalog about the schema of this table.

HeapFile table1 = new HeapFile(new File("some_data_file.dat"), descriptor);

Database.getCatalog().addTable(table1);

// construct the query: we use a simple SeqScan, which spoonfeeds

// tuples via its iterator.

TransactionId tid = new TransactionId();

SeqScan f = new SeqScan(tid, table1.id());

// and run it

f.open();

while (f.hasNext()) {

Tuple tup = f.next();

System.out.println(tup);

}

f.close();

Database.getBufferPool().transactionComplete();

Page 14: Intro to Lab 1 and SimpleDB Overview. Labs Use: - Java for code - github for version control Advantages: -Java lets us focus on db internals -Github lets.

HeapFileEncoder.java

• Because you haven’t implemented insertTuple, you have no way to create data files

• HeapFileEncoder converts CSV files to HeapFiles

• Usage:– java -jar dist/simpledb.jar convert csv-file.txt numFields

• Produces a file csv-file.dat, that can be passed to HeapFile constructor.

Page 15: Intro to Lab 1 and SimpleDB Overview. Labs Use: - Java for code - github for version control Advantages: -Java lets us focus on db internals -Github lets.

Compiling, Testing, and Running

• Demo on running tests and debugging with Eclipse