CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0....

44
CSE 331 Software Design & Implementation Winter 2021 Section 6 – HW6, BFS Path-Finding, and Parsing UW CSE 331 Winter 2021 1

Transcript of CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0....

Page 1: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

CSE 331Software Design & Implementation

Winter 2021Section 6 – HW6, BFS Path-Finding, and Parsing

UW CSE 331 Winter 2021 1

Page 2: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

Administrivia

• Done with HW5-1!

• HW5-2 (ADT implementation) due today!– Reminder (1): Use a DEBUG flag to dial down an expensive checkRep– Reminder (2): Address feedback on your ADT design from HW5-1– Reminder (3): It’s ok to make some changes in the original design if

good reasons – and explain those

• HW6 due next Thursday

• Any questions?

UW CSE 331 Winter 2021 2

Page 3: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

Agenda

• Overview of HW6

• Breadth-first search (BFS)

• Parsing a file in comma-separated-values (CSV) format– Very similar to tab-separated-values (TSV) format in HW6

• Test scripts and the new test driver

UW CSE 331 Winter 2021 3

Page 4: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

HW6: The MarvelPaths program

• You were the implementor but now are the client of your graph ADT!

• MarvelPaths is a command-line program you write to find how two Marvel characters are connected through comic book co-appearances

• Using a large dataset in tab-separated-values (TSV) format– Each entry is a particular appearance of a character in a comic book

• Dataset processed to initialize the social-network graph

• Main functionality is finding shortest path in this social network using BFS

UW CSE 331 Winter 2021 4

Page 5: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

Outline of the assignment

0. Understand the dataset (marvel.tsv) and TSV format

1. Complete MarvelParser class to read TSV-formatted files

2. Implement graph initialization in MarvelPaths class

3. Implement path-finding via BFS in MarvelPaths class

4. Write suites of specification tests and of implementation tests• Implement MarvelTestDriver for new test-script commands

5. Write main method in MarvelPaths for command-line usage

UW CSE 331 Winter 2021 5

Page 6: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

Outline of the assignment

0. Understand the dataset (marvel.tsv) and TSV format

1. Complete MarvelParser class to read TSV-formatted files

2. Implement graph initialization in MarvelPaths class

3. Implement path-finding via BFS in MarvelPaths class

4. Write suites of specification tests and of implementation tests• Implement MarvelTestDriver for new test-script commands

5. Write main method in MarvelPaths for command-line usage

UW CSE 331 Winter 2021 6

Page 7: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

Breadth-first search

• Breadth-first search (BFS) is an algorithm for path-finding– Works just as well on directed and undirected graphs– Often used to discover connectivity in a graph

• Finds a path with the least number of edges– Recall that a path is a chain of edges, like ⟨a, b⟩, ⟨b, c⟩, ⟨c, d⟩– Ignores edge labels, so not used for weighted graphs

• Often mentioned alongside depth-first search (DFS)– BFS looks “wide” whereas DFS looks “deep”– DFS doesn’t promise to find the shortest path as measured by

number of edges

UW CSE 331 Winter 2021 7

Page 8: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

The BFS algorithm – first take

push start node onto a queue

while queue is not empty:pop node N off queueif N is goal node:

return trueelse:

for each node O in children of N:push O onto queue

return false

UW CSE 331 Winter 2021 8

Page 9: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

BFS: example on a simple graph

UW CSE 331 Winter 2021 9

push start Q = [A]pop A Q = [ ]push C Q = [C]push D Q = [D, C]pop C Q = [D]push D Q = [D, D]pop D Q = [D]pop D Q = [ ]return false

A

C D

B

start = Agoal = B

Page 10: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

BFS: example on a cyclic graph

UW CSE 331 Winter 2021 10

push start Q = [A]pop A Q = [ ]push C Q = [C]pop C Q = [ ]push D Q = [D]pop D Q = [ ]push A Q = [A]INFINITE LOOP!

A

C D

B

start = Agoal = B

Page 11: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

The BFS algorithm

push start node onto a queuemark start node as visited

while queue is not empty:pop node N off queueif N is goal:

return trueelse:

for each node O that is child of N:if O is not marked visited:

mark node O as visitedpush O onto queue

return false

UW CSE 331 Winter 2021 11

Page 12: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

BFS: example on a cyclic graph

UW CSE 331 Winter 2021 12

AB

C D

E

start = Agoal = B

Page 13: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

BFS: example on a cyclic graph

UW CSE 331 Winter 2021 13

start = Agoal = B

push start Q = [A]

AB

C D

E

Page 14: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

BFS: example on a cyclic graph

UW CSE 331 Winter 2021 14

start = Agoal = B

push start Q = [A]pop A Q = [ ]

AB

C D

E

Page 15: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

BFS: example on a cyclic graph

UW CSE 331 Winter 2021 15

start = Agoal = B

push start Q = [A]pop A Q = [ ]push C Q = [C]

AB

C D

E

Page 16: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

BFS: example on a cyclic graph

UW CSE 331 Winter 2021 16

start = Agoal = B

push start Q = [A]pop A Q = [ ]push C Q = [C]push D Q = [D, C] A

B

C D

E

Page 17: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

BFS: example on a cyclic graph

UW CSE 331 Winter 2021 17

start = Agoal = B

push start Q = [A]pop A Q = [ ]push C Q = [C]push D Q = [D, C]pop C Q = [D]

AB

C D

E

Page 18: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

BFS: example on a cyclic graph

UW CSE 331 Winter 2021 18

start = Agoal = B

push start Q = [A]pop A Q = [ ]push C Q = [C]push D Q = [D, C]pop C Q = [D]

AB

C D

E

Page 19: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

BFS: example on a cyclic graph

UW CSE 331 Winter 2021 19

start = Agoal = B

push start Q = [A]pop A Q = [ ]push C Q = [C]push D Q = [D, C]pop C Q = [D]

AB

C D

E

Page 20: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

BFS: example on a cyclic graph

UW CSE 331 Winter 2021 20

start = Agoal = B

push start Q = [A]pop A Q = [ ]push C Q = [C]push D Q = [D, C]pop C Q = [D]push E Q = [E, D]

AB

C D

E

Page 21: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

BFS: example on a cyclic graph

UW CSE 331 Winter 2021 21

start = Agoal = B

push start Q = [A]pop A Q = [ ]push C Q = [C]push D Q = [D, C]pop C Q = [D]push E Q = [E, D]pop D Q = [E]

AB

C D

E

Page 22: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

BFS: example on a cyclic graph

UW CSE 331 Winter 2021 22

start = Agoal = B

push start Q = [A]pop A Q = [ ]push C Q = [C]push D Q = [D, C]pop C Q = [D]push E Q = [E, D]pop D Q = [E]

AB

C D

E

Page 23: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

BFS: example on a cyclic graph

UW CSE 331 Winter 2021 23

start = Agoal = B

push start Q = [A]pop A Q = [ ]push C Q = [C]push D Q = [D, C]pop C Q = [D]push E Q = [E, D]pop D Q = [E]

AB

C D

E

Page 24: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

BFS: example on a cyclic graph

UW CSE 331 Winter 2021 24

start = Agoal = B

push start Q = [A]pop A Q = [ ]push C Q = [C]push D Q = [D, C]pop C Q = [D]push E Q = [E, D]pop D Q = [E]pop E Q = [ ]

AB

C D

E

Page 25: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

BFS: example on a cyclic graph

UW CSE 331 Winter 2021 25

start = Agoal = B

push start Q = [A]pop A Q = [ ]push C Q = [C]push D Q = [D, C]pop C Q = [D]push E Q = [E, D]pop D Q = [E]pop E Q = [ ]

AB

C D

E

Page 26: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

BFS: example on a cyclic graph

UW CSE 331 Winter 2021 26

start = Agoal = B

push start Q = [A]pop A Q = [ ]push C Q = [C]push D Q = [D, C]pop C Q = [D]push E Q = [E, D]pop D Q = [E]pop E Q = [ ]return false

AB

C D

E

Page 27: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

Your turn!

UW CSE 331 Winter 2021 27

Try running through the BFS algorithm on the worksheet.

Page 28: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

BFS Reminders

• BFS is done on a graph, not by (inside) the graph– This is why we have you create a MarvelPaths class!

• We will eventually want to allow other kinds of searches to be done on the graph, so BFS should not be hard-wired into the core Graph ADT

• Extensive/expensive checkRep can be very helpful during debugging– Use the debug flag to turn off expensive parts of checkRep

for testing/grading

UW CSE 331 Winter 2021 28

Page 29: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

Outline of the assignment

0. Understand the dataset (marvel.tsv) and TSV format

1. Complete MarvelParser class to read TSV-formatted files

2. Implement graph initialization in MarvelPaths class

3. Implement path-finding via BFS in MarvelPaths class

4. Write suites of specification tests and of implementation tests• Implement MarvelTestDriver for new test-script commands

5. Write main method in MarvelPaths for command-line usage

UW CSE 331 Winter 2021 29

Page 30: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

Reading in data

• Datasets are easily organized like a table or spreadsheet.– Each line is a row (i.e., entry) in the dataset– Special characters usually separate the columns (i.e., fields) of an entry – Note: fields can contain spaces

• One common data format: CSV (Comma-Separated Values)– Columns are separated by commas (‘,’)

• For HW6, we will be using data formatted as TSV (Tab-Separated Values)– Columns are separated by tabs (‘\t’)

UW CSE 331 Winter 2021 30

Page 31: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

Structure of a CSV dataset

• First line of the CSV just names the fields of dataset entries.

• An example dataset in CSV format:

UW CSE 331 Winter 2021 31

name,emailKevin Zatloukal,[email protected]

Hal Perkins,[email protected]

Mike Ernst,[email protected]

Zachary Tatlock,[email protected] Grossman,[email protected]

Page 32: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

Parsing datasets

• Since datasets are structured, we can interpret and parse the dataset programmatically.

• Existing Java libraries already do this! No need to reinvent the wheel.

• For this class, we will be using the library OpenCSV as a parser.

UW CSE 331 Winter 2021 32

Page 33: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

Dataset Parsers

• OpenCSV needs to understand how your columns are structured to translate to Java code.

• Because rows have fixed columns, Java classes can be used to represent each row.– Each column is a field in the Java class

• This class is known as a JavaBean!

UW CSE 331 Winter 2021 33

Page 34: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

What is a JavaBean?

• A JavaBean is any class that…– has a public, zero-argument constructor– has several properties, i.e., private fields each with getter and setter

UW CSE 331 Winter 2021 34

Page 35: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

Example bean

UW CSE 331 Winter 2021 35

public class UserModel {

private String name;

private String email;

public String getName() { return this.name; }

public void setName(String v) { this.name = v; }

public String getEmail() { return this.email; }

public void setEmail(String v) { this.email = v; }

}

name,emailKevin Zatloukal,[email protected] Perkins,[email protected] Ernst,[email protected] Tatlock,[email protected]

Page 36: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

public class UserModel {

@CsvBindByNameprivate String name;

@CsvBindByNameprivate String email;

public String getName() { return this.name; }

public void setName(String v) { this.name = v; }

public String getEmail() { return this.email; }

public void setEmail(String v) { this.email = v; }

}

Example bean (OpenCSV)

UW CSE 331 Winter 2021 36

name,emailKevin Zatloukal,[email protected] Perkins,[email protected] Ernst,[email protected] Tatlock,[email protected]

Page 37: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

public class UserModel {

@CsvBindByNameprivate String name;

@CsvBindByNameprivate String email;

public String getName() { return this.name; }

public void setName(String v) { this.name = v; }

public String getEmail() { return this.email; }

public void setEmail(String v) { this.email = v; }

}

Example bean (OpenCSV)

UW CSE 331 Winter 2021 37

name,emailKevin Zatloukal,[email protected] Perkins,[email protected] Ernst,[email protected] Tatlock,[email protected]

Helps OpenCSV identify field names that match

data column names

Page 38: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

From dataset to beans via OpenCSV

• OpenCSV converts each entry into an object of a chosen JavaBean class

• Returns an iterator to loop through each row of CSV!

UW CSE 331 Winter 2021 38

// see hw spec for details on getting the BufferedReaderReader reader = new BufferedReader(...);

Iterator<UserModel> csvUserIterator = new CsvToBeanBuilder<UserModel>(reader) // set input

.withType(UserModel.class) // set entry type

.withSeparator(',’) // , for CSV

.withIgnoreLeadingWhiteSpace(true)

.build() // returns a CsvToBean<UserModel>

.iterator();

Page 39: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

Demo

UW CSE 331 Winter 2021 39

A quick walkthrough of the parser code for HW6.

Page 40: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

Outline of the assignment

0. Understand the dataset (marvel.tsv) and TSV format

1. Complete MarvelParser class to read TSV-formatted files

2. Implement graph initialization in MarvelPaths class

3. Implement path-finding via BFS in MarvelPaths class

4. Write suites of specification tests and of implementation tests• Implement MarvelTestDriver for new test-script commands

5. Write main method in MarvelPaths for command-line usage

UW CSE 331 Winter 2021 40

Page 41: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

Script testing in HW6

• Same test-script mechanism from HW5, but 2 new commands!– New command LoadGraph to read and initialize graph from TSV– New command FindPath to find shortest path in graph using BFS

• Must write the test driver (MarvelTestDriver) yourself– But you can copy most of it from GraphTestDriver in HW5

UW CSE 331 Winter 2021 41

Command (in foo.test) Output (in foo.expected)LoadGraph name file.tsv loaded graph name

FindPath graph node1 noden

path from node1 to noden:node1 to node2 via edge1,2node2 to node3 via edge2,3...noden-1 to noden via edgen-1,n

... ...

Page 42: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

LoadGraph and FindPath

• LoadGraph creates a new graph variable, much like CreateGraph– LoadGraph populates a graph with nodes and edges from dataset– Note: Other script commands (e.g., AddNode, AddEdge) can still

mutate the graph once it has been loaded!

• FindPath breaks ties by lexicographic (alphabetic) order– Necessary when there are multiple shortest paths so the test output will

be deterministic– Sorting should not be implemented in your Graph ADT.

Lexicographic order should be done in BFS algorithm.

• All this specified in detail on the homework’s webpage– You will need to read it to get things right :-)

UW CSE 331 Winter 2021 42

Page 43: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

Demo

UW CSE 331 Winter 2021 43

A quick walkthrough of the TestDriver code for HW6.

Page 44: CSE 331 Software Design & Implementation · UW CSE 331 Winter 2021 4 Outline of the assignment 0. Understand the dataset (marvel.tsv) and TSV format 1.Complete MarvelParserclass to

HW6 notes

• Read the assignment spec carefully!– Ensure that you are using the right file path in the right place to

read the data file• Most common reason for failures during grading is incorrect

file paths

• Helpful to test and debug using smaller datasets– Faster and easier to understand what’s going on

• To run MarvelPaths or any program that does console I/O, use gradlew to run the desired gradle target using the IntelliJ terminal window (console I/O doesn’t work right otherwise 🐞)

• When you are done, you will be able to find the shortest path from your command line!

UW CSE 331 Winter 2021 44