Teaching the Tree of Life - University of California,...

112
Teaching with a Visual Tree of Life Denise Green and Rebecca Shapley School of Information Management and Systems, U.C. Berkeley May 13, 2005 Final Report

Transcript of Teaching the Tree of Life - University of California,...

Teaching with

a Visual Tree of Life

Denise Green and Rebecca Shapley School of Information Management and Systems, U.C. Berkeley

May 13, 2005 Final Report

Denise Green and Rebecca Shapley 2

Denise Green and Rebecca Shapley 3

Table of Contents Executive Summary ............................................................................................................ 5 Introduction......................................................................................................................... 8

What is the Tree of Life? ................................................................................................ 8 What is Phylogenetic Systematics? ................................................................................ 9 The Nature of Tree of Life Information........................................................................ 11

An Index: Bringing it Together in One Place ........................................................... 11 A Taxonomy: Describing What is Out There ........................................................... 12 A Map in Tree-space: Tree-dimensional Relationships............................................ 13 Characteristics of the Tree of Life Data Set.............................................................. 14

Goals of our Study ............................................................................................................ 15 Methods............................................................................................................................. 17

User Interviews ............................................................................................................. 17 Informational Interviews........................................................................................... 17 Teaching Observation and Document Analysis........................................................ 21 Exploratory-Comparative Usability Evaluations...................................................... 21

Teacher Survey ............................................................................................................. 27 Participants................................................................................................................ 28 Methods..................................................................................................................... 28

Results............................................................................................................................... 29 User Interview Findings................................................................................................ 29

The Educational Context........................................................................................... 29 Valuable Educational Messages ............................................................................... 31 What Should a Tree of Life Visualization Provide?................................................. 38 Typical Tasks ............................................................................................................ 40 Responses to Existing Visualizations ....................................................................... 43

Survey Findings ............................................................................................................ 46 Discussion......................................................................................................................... 48

Serving the Educational Context .................................................................................. 49 Is There Room to Teach the Tree of Life before College? ....................................... 49 Barriers to Teaching the New Biology ..................................................................... 50

Teaching Tree-Thinking ............................................................................................... 52 Tackling the Misconception of Progress................................................................... 53 Using the Tree Metaphor Carefully .......................................................................... 55 Nature of Science: Comparing Alternative Hypotheses ........................................... 68 Nature of Science: A Tree with its own History....................................................... 70

Making the Application Usable in the Classroom ........................................................ 73 An Index to an Information Space ............................................................................ 73 Bridging: Start Where People ARE, Mentally and in Tree-space ............................ 83 User-Centered Design............................................................................................... 93

Conclusion ........................................................................................................................ 97 Acknowledgements........................................................................................................... 98 References......................................................................................................................... 99 Appendix......................................................................................................................... 102

Interview Consent Form ............................................................................................. 103

Denise Green and Rebecca Shapley 4

Introduction to the Survey .......................................................................................... 105 Survey Results ............................................................................................................ 106

Denise Green and Rebecca Shapley 5

Executive Summary The extensive data set of evolutionary relationships between organisms is often referred to as the Tree of Life. Providing an effective visual interface for teachers and students to interact with this data set requires bringing together work from many disciplines: computer science, information visualization, human-computer interaction, information classification and retrieval, systematic biology, and evolution education. Through interviews and a survey with teachers and professionals in the field of biology we have developed key recommendations for designing and evaluating the efficacy of existing and future tree-structured data browsers.

The Tree of Life can be seen in many different contexts: as an index to a biological information space, as a taxonomy, and as a map showing relationships between organisms. Our project falls within the realm of larger efforts to uncover and visualize the branching relationships between organisms. The goals of our project include

! Understanding how teachers currently teach about the evolutionary relationships between organisms

! Identifying common teaching tasks involving the Tree of Life ! Determining how the future software might facilitate those tasks

Our work was conducted as part of the outreach efforts of the NSF CIPRES project, and culminated in a set of recommendations for a future Tree of Life web application.

We used several assessment techniques to determine teachers’ needs for an interactive Tree of Life application. First, we conducted interviews with educators, graduate students, and professionals in the field of biology. Then we observed teachers at two public workshops about the Tree of Life. Finally, we used what we learned in our interviews and teaching observations to conduct exploratory-comparative usability evaluations and a user survey.

Our findings about a Tree of Life application can be categorized into several broad themes:

! The educational context in which the application exists ! Valuable educational messages that the application must help teachers convey ! Desired features of a Tree of Life application

A future Tree of Life application must support the educational context in which it is used, and must allow teachers to convey the educational messages that they are required to teach. Respondents to our survey rated the following features of a future Tree of Life application to be most important.

! Zooming in to any part of the tree ! Seeing areas of controversy

Denise Green and Rebecca Shapley 6

! Viewing the relationship of divergence events to geological time ! Seeing the distribution of important character states ! Bookmarking particular branches on the tree ! Accessing geographic distributions of groups of organisms ! Viewing the distribution of biological patterns across the tree

We created a series of recommendations to address the themes our interviews and survey brought to light. The following lists group our recommendations under these themes.

Serving the Education Context

! Develop supporting curriculum: Select and highlight examples that teach concepts

Teaching Tree-Thinking

! Support tree-logical manipulations with a branch highlighting feature ! Highlight the clade descended from an internal node ! Support tree-logical manipulations with a node-flipping feature ! Use interactivity to demonstrate the variety of alternative interpretations for

branch lengths ! ! ! Reinforce biodiversity with a Wow! Button ! Provide interactive tree comparison tools ! Show evidence for the tree—descriptions, character states, and synapomorphies ! Provide a date slider ! Highlight changes when they happen ! Provide version history tools ! Connect particular topologies to source literature ! Provide canned examples describing major shifts ! Decide whether the visualization should indicate the support available for a

particular branching

Making the Application Usable in the Classroom

! Provide a radial, rank-free map of biological information space ! Create good “map” software to allow simple views at any scale ! Support focus-plus-context, using index nodes selected by systematists ! Provide for simplification of views and details-on-demand through excellent

support of branch manipulations ! Include pictures of organisms ! Include pictures of characters ! Describe groups in multiple ways on the diagram ! Develop search for the common user ! ! Develop, bookmark, and distribute useful views ! Provide interpretation guidance in context

Denise Green and Rebecca Shapley 7

! Display labels clearly, without overlapping other elements ! Develop and consistently use presentation-quality labeling conventions ! Integrate User-Centered design into the development process ! Support “undo,” “back,” or a history list for views and manipulations ! Minimize the number of conventions users must learn ! Animate changes between states ! Use pre-attentive encoding ! Support exploration by achieving benchmark system response times ! Make display resizable ! Ensure color contrast ! Provide for magnification of text sizes ! Provide manipulation alternatives ! Support screen-readers

Over the course of our project we learned a great deal about the challenges of developing a scientific database for use by teachers and students in the middle school through graduate school biology classroom. Important future work remains in defining the details of implementing our recommendations within an actual application and testing the student learning outcomes that result from using it in the classroom.

Denise Green and Rebecca Shapley 8

Introduction The extensive data set of evolutionary relationships between organisms is often referred to as the Tree of Life. Providing an effective visual interface for teachers and students to interact with this data set requires bringing together work from many disciplines: computer science, information visualization, human-computer interaction, information classification and retrieval, systematic biology, and evolution education. Through interviews and a survey with teachers and professionals in the field of biology we have developed key recommendations for designing and evaluating the efficacy of existing and future tree-structured data browsers.

We conducted needs assessment of teachers as part of the outreach efforts of the NSF-funded CIPRES project1. The CIPRES project is a collaboration between computer scientists and biologists to construct a dataset that represents the evolutionary relationships between all organisms on earth. An eventual goal of the CIPRES project is to create a web-based application that provides access to this dataset. Like many NSF-funded projects, CIPRES has a goal of making sure that this application is useful and useable, not just to scientists, but also to teachers and the public.

Our goal was to apply needs assessment and usability techniques to determine the needs of teachers for this future web application, and to make recommendations to the CIPRES project and other related projects. We chose to focus on teachers who were teaching about the evolutionary relationships between organisms, because they already had an understanding of the subject area, and we hoped they could provide useful suggestions and ideas for our investigation.

What is the Tree of Life? For many students of the natural world, excitement comes in seeing the underlying patterns. Yet even today, centuries after natural historians began classifying organisms into species, genera, classes and kingdoms, biology still feels like an impossibly big bag of facts to memorize. Darwin’s description of the diverging tree of life, spreading across Earth over time through variation and natural selection, laid the groundwork for organizing biological information according to the evolutionary relationships between organisms. This organization brings biological patterns to light.

Classifying the estimated 17 million existing types of organisms today is challenging, but reconstructing the inherent index to biological information from 3.5 billion years of sometimes inaccessible history is considerably more challenging. However, decades of developing phylogenetic theory, laboratory techniques, and computational power are now paying off. Biological systematists across the country are bringing their pieces of the puzzle together to assemble an information infrastructure equivalent in scope to the Human Genome Project, but for organisms: the Tree of Life.

1 www.phylo.org

Denise Green and Rebecca Shapley 9

The National Science Foundation’s Assembling the Tree of Life (AToL) initiative2 will continue for several years, but already the renovations to our understanding of the evolutionary relationships between living things will surprise you. Last century the old Plants and Animals biology was replaced with the Five Kingdoms, but biologists today favor the Three Domains of Life: Archaea, Bacteria, and Eukaryotes. Many things that photosynthesize are no longer considered plants, and the type of event that gave us mitochondria is now thought to have happened many times in the history of life. Birds are now included among the reptiles, and flowering plants are complex green algae. Our own little spot among the vast number of tree tips representing living taxa3 shows that, like all land animals, we are essentially very strange fish. Tree-thinking, or applying the evolutionary relationships between organisms to solve biological problems, is increasingly important in medicine and public health, agriculture, conservation, and many other professions. Once patterns are perceived, predictions and good science can follow rapidly.

What does assembling this huge dataset of the evolutionary relationships between organisms achieve for us? Where does it exist, and how can you check what it says? We benefit indirectly from the insights the AToL initiative generates. This information enables biologists to do their work more effectively. However, natural history information has a wider audience among the public and students, and we may benefit directly from having natural history information on the World Wide Web organized more usefully. If the AToL initiative results in a tool that can be used to coherently present one of biology’s most significant patterns, then to the extent that it permeates the teaching of biology, the AToL initiative can improve biological education in this country, which is no small feat, and sorely needed.

As a part of the AToL initiative, the CIPRES project brings together computer scientists and biological systematists to improve the tools available for the work by developing better algorithms and computational platforms capable of handling relationships between millions of organisms. These extremely large datasets must also be visualized and shared. Our project seeks to integrate information from teachers, biologists, and computer scientists to make feature recommendations for an educationally-valuable online resource where scientists share this tree of life information.

What is Phylogenetic Systematics? Since before Linneaus, biological systematists have been interested in classifying the diversity of biological forms. In the last few decades, phylogenetics has brought an explicitly evolutionary perspective to biological classification. Phylogenetic classification

2 http://www.nsf.gov/od/lpa/news/02/pr0294.htm 3 taxon -- n. Any named group of organisms, not necessarily a clade; a taxon may be designated by a Latin name or by a letter, number, or any other symbol; taxa- pl. For this and more definitions of terms used in phylogenetics, see the UCMP Glossary at http://www.ucmp.berkeley.edu/glossary/gloss1phylo.html

Denise Green and Rebecca Shapley 10

is defined in the following way by the University of California Museum of Paleontology (UCMP):

A system of classification that names groups of organisms according to their evolutionary history. Like Linnaean classification, phylogenetic classification produces a nested hierarchy where an organism is assigned a series of names that more and more specifically locate it within the hierarchy. However, unlike Linnaean classification, phylogenetic classification only names clades4 and does not assign ranks to hierarchical levels.

Phylogenetic classification involves the careful use of molecular, morphological and other types of characters present in today’s organisms to infer the evolutionary relationships between them. An organism that shares a character with another organism may have a more recent shared common ancestor than with other organisms. Biological systematists have made a lot of recent progress in developing the theories for making these inferences, known as phylogenetic trees, and in applying these inferences to important problems such as solving diseases, protecting crops and optimizing conservation decisions. UCMP’s Understanding Evolution website provides an excellent introduction to phylogenetics, and how scientists develop phylogenetic trees.5

4 clade –n. A monophyletic taxon; a group of organisms which includes the most recent common ancestor of all of its members and all of the descendants of that most recent common ancestor. From the Greek word "klados", meaning branch or twig. Definition from the UCMP Glossary at http://www.ucmp.berkeley.edu/glossary/gloss1phylo.html 5 http://evolution.berkeley.edu/evosite/evohome.html

Denise Green and Rebecca Shapley 11

Figure 1. Character states, monophyly and paraphyly. A diagram used to teach a brief overview of phylogenetics. The colored dot represents a character which can be found in species A through E in one of three states, white, grey, or black. All five species are thought to have shared a common ancestor with the white character state, which is still present in species A. An ancestral form developed the grey version of the character, and species B and C still have it. A more recent ancestral form developed the black character state, which species D and E still have. A name that includes D and E is monophyletic, as is a name that includes B, C, D, and E, or all five of the species. Any of these names would be describing all and only the groups on an entire branch that is broken cleanly off of this tree. A name for just species B and C, however, would not be monophyletic, it would be paraphyletic. Breaking B and C off the tree would either give two branches, or include other species in the group.

The Nature of Tree of Life Information How do you index the information space that is the entire field of biology? Conceptually, the tree of life information is an index to biological space, a taxonomy of terms for describing organisms, and a special type of information space that describes the relationships between members.

An Index: Bringing it Together in One Place With each paper published, biological systematists contribute a little piece of the tree puzzle. Traditionally papers are organized by journal and publisher within libraries, not

Denise Green and Rebecca Shapley 12

by their biological meaning. As people read these papers, and as they interact with field guides and other sources of natural history information, they become familiar with various classification terms and other associated biological information. A distributed, shared mental indexing system has emerged over time, with perhaps the most centralized and biologically oriented embodiment of it occurring in museum collection drawers at institutions around the world. No one knows all of it, no one person or committee is in charge of all of it. The index changes as our knowledge accumulates, as biological systematists continue to contribute new study results to it and as life continues to evolve.

An online resource is needed to pull all of the work together in one place so the biologically meaningful relationships can be seen. The assembled Tree of Life consists of biological systematists' compiled best phylogenetic hypotheses. It offers a biologically meaningful index to organize, access, and shape the interpretation of all biological information. To lay out the entire combined evolutionary relationships between organisms is no trivial task. The Tree of Life web project6 attempts to do just that, providing tools and coordination within which biological systematists take responsibility for editing and updating a section of the tree. Each AToL grant focuses on a particular question, such as birds7, fungi8, flies9 or green plants10, and is also charged with sharing the combined results of the work of participating scientists. The Green Tree of Life11 does a nice job of this for green plants. The next version of the Jepson Manual from the University and Jepson Herbarium in California, whose taxa descriptions are the mainstay of botany in California, will include an index explicitly based on phylogenetic trees.

A Taxonomy: Describing What is Out There Unlike most efforts to create a taxonomy for classifying information, there is reason to believe that there is a “true” tree describing the mostly diverging, but sometimes reconnecting, relationships of organisms as they have evolved through the entire history of life on earth. However, it can’t be definitively known how close biological systematists are to describing it.

The effort biologists put into agreeing upon the terms and relationships of the Tree of Life provides a controlled vocabulary and classification system. Because the enterprise is continually attempting to improve the Tree of Life's representation of the divergence patterns that actually occurred during evolution, the controlled vocabulary cannot be immutable. But it does serve important purposes for many fields. From the perspective of information science, information retrieval is supported by navigating the taxonomy. A

6 www.tolweb.org 7 Early Bird http://www.fieldmuseum.org/research_collections/zoology/zoo_sites/early_bird/intro.html 8 Assembling the Fungal Tree of Life http://ocid.nacse.org/research/aftol/ 9 FLYTREE http://www.inhs.uiuc.edu/cee/FLYTREE/ 10 The Green Tree of Life http://ucjeps.berkeley.edu/TreeofLife/ 11 http://ucjeps.berkeley.edu/TreeofLife/

Denise Green and Rebecca Shapley 13

search query for biological information can be created and refined by moving around in the taxonomy tree, selecting more explicit or more inclusive categories.

When developing a classification system to organize books in a library, or links on a website, the librarians and web developers can test the category names with the people who will use the information. Users can expect to understand what is included within a well-selected category name. However, biological taxonomy is trying to organize information created by evolution, not by humans for humans to understand. Bio-latin names may not communicate category contents readily to many people navigating the taxonomy. This challenge is one of many that make information retrieval with the Tree of Life intriguing.

Another challenge is the unique approach to query refinement suggested by crafting queries based on tree-space, that is, simply going up or down the branching tree to make a search more inclusive or more specific. But the Tree of Life taxonomy's size isn't human friendly, like the Art and Architecture taxonomy12 is. Its size, scale, distribution, dimensions are not optimized for human consumption. Like WordNet13 or Cyc14, this is a major information infrastructure project. Unlike these two projects, the structure used to organize the information is not created after the fact, and does not reflect the many ways in which human culture creates cross-influences and meanings. Rather this information space is grounded in the major theory that organizes the entire field of Biology: evolution. This structure and the information it is organizing are not originally made by humans for human consumption, like books in a library, but comes from "out there" in the world.

A Map in Tree-space: Tree-dimensional Relationships Like a map, locations in this "space" have specific relationships to each other. These relationships can be deduced or described from the existing information, without requiring new information.

The space itself provides a geometry for relating biological information. Instead of Euclidean geometry of planes and coordinates, the tree-space is a geometry of branches and nodes. Some parts may even connect back to each other to make a directed acyclic graph (DAG). Like a map, the space can be viewed at different scales for different purposes (focus), these views must be related to each other in the viewers' heads (context), and each cartographer chooses to emphasize different features (locations have variable significance). Because these are two different spatial geometries, the display of tree-geometry information in 2-D Euclidean space creates misconceptions. Branching structures must be displayed in some order upon the plane, an order that is not inherent to

12 http://www.getty.edu/vow/AATHierarchy?find=&logic=AND&note=&english=N&subjectid=300000000 13 http://wordnet.princeton.edu/ 14 http://www.opencyc.org/

Denise Green and Rebecca Shapley 14

the tree geometry. Nesting displays also create proximity in 2-D Euclidean space that isn't present in the tree-geometry. If museums, curricula, TV shows, and Google could map their content to parts of the visualized tree-space, people would have a sense of which parts of the tree they "know", and just like wanting to visit a continent one hasn't been to before, might be curious about other parts of the tree-space.

Characteristics of the Tree of Life Data Set The evolutionary relationships between organisms can be represented as a directed acyclic graph, or DAG, more commonly known as a diverging tree structure. This may change as future work advances the understanding, modeling, discovering and representation of hybridization and symbiosis events in evolution.

The tree of life data set is a very large tree graph with many, many nodes, and a very deep structure. In combination, these factors make the tree of life data set distinct from many tree-structured data sets studied in the literature (Kobsa 2004, Parr et al. 2003, Plaisant et al. 2002). Currently, a data set downloaded from the Tree of Life web project15 contains around 20,000 graph nodes. Downloading the data structure used at NCBI/Genbank, which is not moderated in any fashion, gives a tree of approximately 180,000 nodes. The press release announcing the Assembling the Tree of Life initiative16 indicates, however, that there are 1.7 million species known to biologists, and that these represent only about 10% of the species on earth. Not only does this suggest that the ultimate number of leaf nodes17 on the ideal tree of life data structure might be on the order of 17 million, but also that the total number of nodes in the data structure will be much higher.

The branching factor of the Tree of Life data is generally much lower than other trees studied. In fact, the ideally resolved phylogenetic tree would always have a branching factor of two, as more than two children of any given node indicates that the evolutionary relationship is not yet determined. This low branching factor influences the shape of the tree; compared to many classification and file structure trees, the Tree of Life data structure is very deep.

15 http://www.tolweb.org/tree/ 16 http://www.nsf.gov/od/lpa/news/02/pr0294.htm 17 leaf nodes are the terminal tips of a branching tree structure.

Denise Green and Rebecca Shapley 15

Goals of our Study The goal of the CIPRES project’s outreach efforts is to create a web application that will provide access to the compiled Tree of Life data. This future web application should visualize the evolutionary relationships, allow users to interact with and explore the information, and be useful not just to biologists, but also to people teaching about the Tree of Life, their students, and the general public.

The goal for our project was to provide recommendations about how this future application can be useful to teachers and students. We set out to understand how biology teachers teach the Tree of Life and how tree visualization software can be designed to aid them. We chose to focus on biology teachers because the challenge of bringing real science research into the classroom attracted us. By drawing on their existing knowledge of teaching about evolution and biodiversity and using several tree visualizations to simulate the ways in which the Tree of Life data might be displayed in the web application, we were able to conduct an effective needs analysis. This strategy helped us overcome the difficulty of trying to determine the usefulness of a web application that doesn’t yet exist.

Specific goals for our project included:

! Understanding how teachers currently teach about the evolutionary relationships between organisms.

! Identifying common teaching tasks involving the Tree of Life. ! Determining how the future software might facilitate those tasks.

Because our project involved developing recommendations for an entirely novel type of application, to be used for teaching a topic that many biology teachers aren’t sure belongs in their curriculum yet, we needed to identify participants with an interest in technology and an active eye on the future of biology teaching. We assumed that instructors of graduate and undergraduate biology courses who are themselves biological systematists are likely to be early adopters of new ideas in and resources for teaching biology. They are more likely to use direct biological information to supplement or problematize summaries or examples presented in text books and curricula. We asked these early-adopting biology teachers about their practices and priorities for using the tree of life in their classrooms.

To ensure that our scope was not too narrow, we also talked with middle school and high school teachers, and professionals in the field of biology. The key criterion in identifying subjects for our study was that they work with the evolutionary relationships between organisms, either by directly developing this information, teaching it with a critical approach to existing materials, or developing curriculum around it.

Although we looked at specific instances of tree visualization applications, it was not a our goal to analyze them in great detail. We expect these applications will change and be

Denise Green and Rebecca Shapley 16

enhanced in the coming years. Rather than tying our findings to existing applications, we sought to identifying themes and trends that will prove useful to future development of a Tree of Life web application.

The result of our work is a set of recommendations to the CIPRES project for the eventual web application. We hope these recommendations will influence the future directions of the CIPRES outreach tools, the Tree of Life web project, 18and efforts in human-computer interaction and biodiversity informatics at the University of Maryland.19

18 www.tolweb.org 19 http://www.cs.umd.edu/hcil/biodiversity/

Denise Green and Rebecca Shapley 17

Methods We used several assessment techniques to determine teachers’ needs for an interactive Tree of Life application. First, we conducted interviews with educators, graduate students, and professionals in the field of biology. Then we observed teachers at two public workshops about the Tree of Life. Finally, we used what we learned in our interviews and teaching observations to conduct exploratory-comparative usability evaluations and a user survey.

The following sections describe the methods we used in our informational interviews, teaching observations, and exploratory-comparative usability evaluations.

User Interviews Informational Interviews At the start of our project, we interviewed members of our target audience to determine their current practices and possible future uses of interactive visualizations. We conducted interviews in a wide variety of settings, from more informal interviews of biology and computer science professors and graduate students at the CIPRES meeting in San Diego, to more formal interviews of people in their offices, classrooms, and the SIMS lab.

Initially, the scope of our target audience for this needs assessment was quite large, including systematic biologists, professional biologists, teachers, students, and the general public. As we started to talk with people, we realized the target audience was too broad. For our study we could learn the most from people who already interact with information about the evolutionary relationships between organisms.

The National Science Foundation wants the basic science research to have an impact on science education. The challenge is bridging the gap between developing the science information and using it effectively in the classroom. This challenge attracted us, and we decided to focus our efforts on developing recommendations for features of an interactive Tree of Life application that would increase its chances of being used in the science classroom.

Goals In the informational interviews, we set out to characterize the current practices of our target audiences for the Tree of Life application. This understanding of our audiences and their typical and possible future tasks with tree of life information would inform the structure of the later stages of our work.

Denise Green and Rebecca Shapley 18

Participants The following list shows the number of people we talked to from each of our target audience groups:

! 6 middle school and high school biology teachers ! 7 professors and graduate students in biology, phylogenetics, and computer

science ! 2 experts in evolution education ! 1 instructor and about 30 natural history enthusiasts (in an informal focus group

as part of a Tree of Life workshop at the Jepson Herbarium) ! 4 people involved with educating the public about natural history and evolution

We found participants through three approaches. First, we identified members of our target audiences from among our personal and professional connections. Second, we attended three events attended by our target audiences, where we interviewed opportunistically and personally recruited participants for follow-up scheduled interviews. Third, we asked interview participants to recommend other appropriate interviewees, an approach known as snowball sampling.

Participants were screened to make sure they were members of our target audience. We also strove to understand the extent of their working relationship with biological classification, information about the evolutionary relationships between organisms, and explicit Tree of Life websites or projects. Participants were developing this information, using it in their own work, developing curriculum and exhibits for teaching it to others, and teaching it to others. As we narrowed our project to focus on teachers, we emphasized talking with participants developing curriculum and teaching about the evolutionary relationships between organisms.

Methods Most interviews were conducted in person and two were conducted by phone. We usually interviewed one person at a time, but occasionally talked with two or more people at once. For opportunistic interviews, we wrote down notes afterwards. In scheduled interviews, one of us typically asked the questions, and one of us took notes. We also audio-taped the interviews when possible.

At the start of our project we conducted informal interviews of biology and computer science professors and graduate students at a meeting for the CIPRES project. We asked the following types of questions:

! What are the key messages you are trying to teach about the Tree of Life? ! What audiences do you teach for? ! What are students’ misconceptions of the Tree of Life? ! How might a visualization of the tree help address those misconceptions? ! What features of such a visualization would you use in your work? ! What parts of the tree are people most interested in?

Denise Green and Rebecca Shapley 19

After these discussions, we conducted interviews with teachers and curriculum developers who are experts in the teaching of evolution.

We also attended two public workshops about current thinking in the structure of the tree of life. These workshops were attended by teachers, former teachers, docents at natural history organizations, professional biologists, and natural history enthusiasts. At the workshops, we recruited participants, conducted opportunistic interviews, and at one, led a short focus group. After the morning’s lecture on plants and the tree of life, participants saw the slides in Figure 2, and responded to the following questions:

! What was your most recent question about how organisms are related? ! What did you do to find an answer? ! How did you use the answer? ! Do you share information with other people about how organisms are related?

(Teaching, mentoring, publishing)? ! Describe the last time you shared? ! Did you use any books, websites, diagrams, curriculum, or other artifacts to help

you share this information?

Denise Green and Rebecca Shapley 20

Figure 2. Introduction slides used for the focus group

Denise Green and Rebecca Shapley 21

Teaching Observation and Document Analysis To help us understand current teaching practices, existing visual representations, and the context within which the tree of life application might be used, we observed teaching and to reviewed the curriculum, slides, handouts and other materials used.

Methods We observed the teaching at two public workshops on the Tree of Life information. We also reviewed an online curriculum activity developed by the UC Museum of Paleontology called “What did T. Rex taste like?,” handouts and materials developed by our exploratory-comparative usability evaluation interviewees (see below) and two textbooks, the BSCS Biology, an Ecological Approach, 7th edition, and Campbell and Reece (2005), 8th edition.

Exploratory-Comparative Usability Evaluations After completing our initial interviews, we conducted exploratory-comparative usability evaluations, in which we showed teachers several visualizations of Tree of Life data, and we solicited their feedback on the usefulness of these applications. Our usability evaluations combined elements from both exploratory and comparison usability tests as described by Jeffrey Rubin.20

Rubin describes an exploratory test as being “conducted quite early in the development cycle, when a product is still in the preliminary stages of being defined and designed.”21 He suggests some typical questions that the exploratory test can answer:

! What do users conceive and think about using the product? ! Does the product’s basic functionality have value to the user? ! Are the operations and navigation of the user interface intuitive?

A key outcome of exploratory tests is a deeper understanding of the users of the product.

Rubin states that a comparative test “can be used to compare several radically different interface styles…to see which has the greatest potential with the proposed target population…The comparison test is typically used to establish which design is easier to use or learn, or to better understand the advantages and disadvantages of different designs.”22 Several competing designs are usually evaluated by users in these tests.

Our usability evaluations blended both approaches. We had participants compare several existing visualizations in an informal manner, but our goals were more exploratory in nature, hoping to uncover user’s needs and make recommendations for an entirely novel type of web application. We did not intend to conduct detailed usability evaluations of the particular interfaces; however, using existing visualizations allowed us to make our

20 Rubin, Handbook of Usability Testing 21 Rubin, p. 31 22 Rubin, p. 40-41

Denise Green and Rebecca Shapley 22

discussions of software features more concrete. For several informative detailed usability evaluations of tree visualization applications, see studies by Alfred Kobsa (2004) at the University of California, Irvine, and by Cynthia Parr, et. al. (2003), and Catherine Plaisant, et. al. (2002), at the University of Maryland.

Participants We conducted our usability evaluations with six individuals. Two participants were high school teachers, two were middle school teachers who develop curriculum about the relationships between organisms, and two worked in a variety of roles including substitute teacher, natural history docent, and director of education for a Bay Area zoo.

To conduct these sessions in person, we limited our usability evaluations to teachers who were close geographically.

Methods We conducted our exploratory-comparative usability evaluations in three phases: an initial interview, an exploration of several tree visualization applications, and answering a questionnaire.

Phase 1: Initial Interview

We started each session with an interview to determine how teachers currently teach about the evolutionary relationships between organisms. We tried to find out what concepts the teachers are trying to convey, and how they use books, diagrams, and any other artifacts in teaching these concepts. From these discussions we worked together to identify a task from their experience that would provide a structured way to explore the software. Finally, before looking at any software, we asked them to imagine how interactive visualization software might be helpful in teaching the Tree of Life.

Phase 2: Exploring Visualizations

In the second stage of the evaluation, we showed teachers three different tree visualization applications containing the Tree of Life data. We initially operated the keyboard and mouse and “drove” the applications, so that the teachers did not have to overcome the initial learning curve of using them. If the teachers became comfortable with the way the applications worked, we let them take over and explore the applications themselves. We encouraged them to talk aloud as they explored the applications. We asked them to reflect on using the visualizations to teach the concepts we had identified in the interview phase, and to describe any other ways they might use the applications. We evaluated how well each application allowed us to accomplish the task that we had identified during the interview phase. We also encouraged them to suggest any ideas for how the visualizations could be made more useful to them and to their students. As we explored the software, we noted any usability issues we encountered, as well as the participant’s reactions and comments.

Denise Green and Rebecca Shapley 23

Phase 3: Questionnaire

After we explored the software, we finished each session by having the teachers complete our survey questionnaire that asked about their current teaching practices and what features might be important to them in an interactive visualization of the Tree of Life.

Interactive Tree-Visualization Applications We evaluated four interactive tree-visualization applications:

! Hyperbolic Tree ! SpaceTree ! TaxonTree ! Treemap

Hyperbolic Tree

For our tests we used the Hyperbolic Tree and green plants data displayed on the Jepson Herbarium Green Tree of Life website.23

Figure 3. Hyperbolic Tree from the Green Tree of Life website

Hyperbolic trees provide a dynamic representation of a hierarchical tree structure, allowing users to quickly traverse through large sections of the tree.

23 http://ucjeps.berkeley.edu/TreeofLife/hyperbolic.php

Denise Green and Rebecca Shapley 24

SpaceTree

SpaceTree is a tree browsing application created by the Human Computer Interaction Laboratory (HCIL) at the University of Maryland.24 It uses a conventional node-link diagram format, but it adds dynamic rescaling of branches to best fit the available screen space. Icons of triangles and linked squares provide a preview for the size of unexpanded branches.

Figure 4. SpaceTree from HCIL at the University of Maryland

24 http://www.cs.umd.edu/hcil/spacetree/

Denise Green and Rebecca Shapley 25

TaxonTree

TaxonTree is another tree browsing application created by HCIL at the University of Maryland.25 It extends upon SpaceTree, adding a number of features to support biodiversity data.26 For example, it provides links to external web pages that describe the organisms in detail. Users can search for both common and scientific names, and common names are displayed prominently in the interface. The interaction model has been changed, and a more simple way of navigating the tree has been introduced, in which a user can open and close nodes manually, instead of the automatic opening and closing of nodes in SpaceTree.

Figure 5. TaxonTree from HCIL at the University of Maryland

25 http://www.cs.umd.edu/hcil/biodiversity/ 26 TaxonTree: Visualizing Biodiversity Information

Denise Green and Rebecca Shapley 26

Treemap

Treemap is another visualization of hierarchical tree structures from HCIL at the University of Maryland.27 Rather than a branching node-link display, it shows a set-based, nesting approach to hierarchical data. We thought this perspective might be helpful for showing attributes of leaf nodes, and for showing patterns across the tree.

Figure 6. Treemap from HCIL at the University of Maryland

27 http://www.cs.umd.edu/hcil/treemap/

Denise Green and Rebecca Shapley 27

Figure 7. Treemap�s origins. This figure from the book that founded today's theories of phylogenetics (Hennig 1966) shows the equivalency between a branching representation of the evolutionary relationships between groups and a nesting, set-based representation, such as that used by Treemap (Figure 6).

Dataset

For our dataset we used XML data from the Tree of Life web project,28 which contains extensive data about all three domains of life. We created XSLT transformations to convert the XML data into the different formats required by SpaceTree and Treemap. For the Hyperbolic Tree, we used the data provided with the application on the Green Tree of Life website. For TaxonTree, we used the data on animals that comes with the application, because we were not able to convert our XML data into the required Microsoft Access database format in our short timeframe.

Teacher Survey We identified many representative tasks and features through our interviews and usability evaluations with biologists and teachers. We then conducted an online survey. The goals of the survey were to

! Characterize early-adopting biology teachers’ current use of extra curricular resources, including web resources, as an indicator of likely levels of adoption among teachers of a new online teaching resource

28 http://www.tolweb.org/tree/home.pages/downloadtree.html

Denise Green and Rebecca Shapley 28

! Gather early-adopting biology teachers’ priorities for potential content and behaviors of an online resource.

Well aware of the siren-call of a long list of features, we hoped to distinguish a short list of core features that the online web application, particularly suited to use in teaching.

Participants Our target group for the survey was early-adopting teachers of audiences approximately 12 years of age and up who were currently using or were interested in using the evolutionary relationships between organisms to organize the biological information they teach. The teaching may occur in formal or informal educational settings.

We identified the Bay Area BioSystematists (BABS) group as being a very appropriate audience for our survey. Although not everyone in the group was a teacher, the BABS group was very rich with professors and graduate students highly interested in this subject and likely to be early adopters. This group represented college instructors well, and to reach middle school and high school teachers we had our exploratory-comparative usability evaluation interview participants answer the questionnaire.

Methods Before launching the survey, we pre-tested it on three people. We asked each person to “think aloud” as they took the survey and to comment on any questions or terminology that they didn’t understand. This gave us thorough feedback on the questions, but did not give us a sense of how long it would take people to complete the survey in a normal setting.

The majority of comments about the survey fell into two broad categories: sentence structure and terminology. Our sentence structure and wording was overly complex on many questions, causing confusion in some cases. Additionally, some of the terminology we used was too specialized. We reworded the survey extensively to reflect these comments. We also tested our questions for their potential to provide meaningful results, and revised the survey accordingly.

We distributed an invitation to the survey website to members of the BABS e-mail list. Additionally, we used a paper version of the survey as a post-interview questionnaire for the final phase of our exploratory-comparative usability evaluations, and entered the results online. For our complete survey questions, see the Appendix.

Denise Green and Rebecca Shapley 29

Results

User Interview Findings The following sections discuss the categories of comments and observations made by participants in informational interviews and in the interview phase of the exploratory-comparative usability evaluations.

The Educational Context Many findings centered on the educational context in which teachers teach about the evolutionary relationships between organisms.

A rich teaching context Speaking with teachers and professors added complexity to our notion of what it means to “teach about the tree of life.” We were struck by the very rich context of concepts in biology and evolution within which the structure of the tree of life might become a relevant topic.

In a narrow sense, teaching about the evolutionary relationships between particular groups of organisms appears to occur in Biology departments at the university level, where upper-level undergraduate Biology majors and new Biology graduate students take “diversity” courses, such as Plant Diversity or Vertebrate Zoology. When an entire college semester can be spent learning about the characteristics of different groups, the topology of evolutionary relationships can be taught and even serve to organize the progress of the course.

At the middle school, high school, AP biology, and undergraduate introductory biology levels, however, the number of different concepts that need to be covered essentially relegates “diversity” to a class or two, or a week or two. As one AP biology teacher put it, “that leaves about 10 minutes per Phylum.” In this case, teachers reported striving to leave their students with an appreciation of diversity and classification. They want students to understand the general concept that groups are characterized by various types of morphological or molecular characters. They don’t have time to hold their students accountable to learn these characters in a meaningful way, to help students appreciate the nested nature of these groupings, or to cover topics like how biological systematists actually develop phylogenetic hypotheses. In the time available to teach about evolution, they focus on students’ understanding of the mechanisms of evolution, not the topology of the actual tree of life. We developed a sense that in general, teachers teach a lot of biological facts and concepts; what’s easily missing is thinking about the relationships between the organisms and the concepts.

Integrating changes in biology into grades 6-12 teaching practices A public school substitute biology teacher in our interviews described her limited success at introducing tree-thinking into her own teaching. As a substitute, she teaches what the regular teacher has set out for that day. She likes to introduce a little something about

Denise Green and Rebecca Shapley 30

relationships when she can. Although she’s brought events like the Tree of Life workshops to the attention of her fellow teachers, time pressures and other factors contribute to a lack of response. As she described how little time teachers have, and their lack of availability to respond except to the things they find most compelling, we came to appreciate even more the participants who had found the time for our interviews.

To prepare lessons and keep up with the changes in biology, teachers in our study mentioned using textbooks for the grades above theirs, , websites, discussions with colleagues, and a few took advantage of workshops offered at natural history institutions, similar to the ones we observed.

Examples for Teachers To teach the subjects established for their courses by state standards and university curricula, teachers need example-based curriculum materials that help them teach specific concepts. To support the improvement of evolution education, the University of California Museum of Paleontology (UCMP)29 has developed a conceptual framework for teaching evolution and understanding what concepts students need to know at each grade level30. A good example of a curricular material correlated with this conceptual framework is UCMP’s “What did T. Rex taste like?” and “The Arthropod Story.”31

Teachers don’t teach data, and students don’t learn data. Students learn concepts from interacting in structured ways with selected facts. To be useful in the classroom, we gathered from our early-adopting teachers that a tree of life interactive visualization application would need to offer features they can use to teach concepts to their students. They also need materials that provide students with guidance for exploring the tree, such as a series of questions that they could answer using the Tree of Life. Even better, teachers at the middle and high school level would like someone to select from among all of the information offered by the scientists’ database specific examples that are great for demonstrating the concepts the standards require. Otherwise, they may demonstrate the Tree of Life interactive visualization application to their students as an example of a tool scientists use, but only if there is time.

! “This is scientific information by scientists for scientists.” (A general comment about the tree visualizations using the Tree of Life data)

29 http://www.ucmp.berkeley.edu/education/explorations/tours/Trex/index.html 30 http://evolution.berkeley.edu/evosite/Lessons/IIConcepts.php 31 http://evolution.berkeley.edu/evolibrary/article/0_0_0/arthropods_toc_01

Denise Green and Rebecca Shapley 31

Valuable Educational Messages

Key Teaching Messages Some of the key messages that teachers, professors and natural history educators said they would like to communicate with a Tree of Life resource include both topics in biology and in how scientists do science. Here are some specific examples of the messages teachers mentioned they are trying to convey.

Messages about the nature of science, including understanding

! What scientific theories are ! How biologists do phylogenetics, and why computers can help ! That phylogenies are hypotheses ! How to classify things ! What paleontologists do; what’s the nature of their work

Messages about evolution and diversity, including understanding

! The scientific basis for evolution ! Evolution as the unifying theme of biology ! Why the tree isn’t showing “progress” ! The humbling message that we are just one in so many different little tips of the

tree; we are just one among 1.7 million known species, and we are not “advanced” or “special;” help people understand the diversity of life

! Meet the controversy between evolutionists and creationists head-on

Messages about the structure of the tree, including understanding

! The distinction between a classification tree and a phylogenetic tree ! That different tree structures give different results when they are used to answer

other biological questions, and published trees should be taken with a grain of salt ! The name for key nodes, and what characters support that grouping, what the

characters look like and what ecology or other natural history is also associated with that grouping

Alternatives, Evidence, and the Nature of Science Almost everyone we talked to wanted to be able to see alternative views of the tree’s structure when they exist, and they want to understand what lines of evidence – morphological, biochemical, molecular, biogeographical – support which alternatives. Our initial assumption was that although biological systematists and their graduate students need to know where disagreements are, most audiences would want systematists to agree on something for everyone else to use. However, teachers and natural history enthusiasts alike emphasized that they want to see the different alternatives that systematists take seriously. One curriculum development expert expressed that this is good pedagogy: by examining the two trees to see how they differ, students are required

Denise Green and Rebecca Shapley 32

to engage with the structure of a tree and to test their understandings of the tree diagrams in ways not required by looking at a single tree.

Figure 8. Two alternative tree structures for the invertebrate groups, from Campbell�s 8th edition Biology textbook used by an interviewee in an AP Biology course. The brown and light green colors indicate groups that are monophyletic in the tree structure shown on the left, but are paraphyletic under the tree structure on the right. The names of the groups across the tops of the diagrams indicate clusters of characters are being used as evidence for the structures.

What do these teachers mean by “evidence”? An AP biology teacher cited a description he liked from a Tree of Life web project page32 about cnidarians and ctenophores. Previous hypotheses held that these two groups were closely related because they both have stinging cells. The new hypothesis of two less closely related groups is based on discovering that one group doesn’t make stinging cells itself, but actually eats members of the other group and re-uses the stinging cells. An expert in curriculum for teaching would like to see an interactive application facilitate students’ doing historical inference of the character states a common ancestor may have had, based on the character states of the known taxa and the tree topology. One biology professor organizes his graduate-level diversity course on insects using two alternative trees. He shows his students how some characters that define a group on one tree (a synapomorphy) become paraphyletic 33on the other structure, which is supported by different lines of evidence.

32 http://tolweb.org/tree?group=Cnidaria&contgroup=Animals 33 paraphyletic -- Term applied to a group of organisms which includes the most recent common ancestor of all of its members, but not all of the descendants of that most recent common ancestor. Definition from the UCMP Glossary at http://www.ucmp.berkeley.edu/glossary/gloss1phylo.html

Denise Green and Rebecca Shapley 33

Many people mentioned that it is important to recognize trees as hypotheses about evolutionary relationships, and that presenting the evidence for these hypotheses would help students understand the nature of science, and how scientists do science. The nature of science is an important theme in the California state science education standards. People we asked said they would like to be able to compare trees side by side, rather than as overlapping trees or two different structures to the left and right side of the same list of taxa,.

! “The real point isn’t how these things are related, but why do we think they are related in this way.”

! “That’s the most important thing…I want my students to ask, why? How do we know that?”

Connecting with the History of the Tree Interviewees at the workshops mentioned a desire to update their understanding of the relationships between major groups of organisms, having encountered indications that their current understanding was out of date. They learned about the higher level classification anywhere from five years ago to decades ago, when for example anything that photosynthesized was in “plants”, or perhaps they learned the five kingdoms: Bacteria, Protists, Fungi, Plants, and Animals. Interviewees had often tried to use various resources to understand the new biology, but found themselves still confused. They would like to see about a resource that shows how biologists’ former understandings translate to the current understanding, much like the workshop itself provided.

Absent Rank Names are confusing Similarly, people needed more information about why they don’t see ranks indicated on the new tree of life. Without a clear understanding of the relationship between the new phylogenetic trees and the existing classification systems, or why some biological systematists would like to get rid of ranks altogether, some teachers perceive the absence of ranks as unscientific. Anecdotal evidence from the Tree of Life web project indicates that teachers are looking for the Kingdom-Phylum-Class-Order-Family-Genus-Species ranks among the node names on the tree. Two teachers in our interviews were happy poking around in the classification tree names in TaxonTree, but expressed relief when the ranks appeared on rolling over node labels.

Misconceptions abound Teachers, students and the public bring lay concepts or outdated biological concepts to the study of evolution and tree diagrams that often cause misinterpretation of tree diagrams, which are developed using tree-thinking or tree logic.

Denise Green and Rebecca Shapley 34

Some misconceptions identified by our interviewees:

! Every organism is striving to become something else ! Humans are the apex of evolution’s progress ! Advanced species evolved from primitive species living today ! Vertebrates are diverse ! Breathing oxygen is typical

Humans Within Biodiversity One ubiquitous misconception is the notion of evolution as progress towards humans. Our humanness provides the perspective from which we examine the tree of life. Middle school teachers we interviewed spoke of the pedagogical appropriateness of starting with what middle school students are familiar with, and therefore starting with the location of humans, primates, and mammals on the tree of life. A high school teacher teaches the concepts of evolution using human evolution as a primary example, in order to tackle the controversy about human evolution directly and encourage his students to come to their own positions. Other interviewees emphasized the diversity of life. A biological systematist and natural history museum director felt visualizations of the tree of life should seek to leave people with the humbling impression that we are only one out of the millions of species. We sensed both the importance of learners experiencing a connection with our own location on the tree and experiencing all the other types of organisms located on there with us.

! “I’d like them to have an underlying feeling of the unity of life with this marvelous diversity…to internalize that the carrots and we are cousins!”

Phylogenetic Diagrams May Perpetuate Notions of Advanced and Primitive Taxa Phylogenetics uses diagrams called cladograms that simply illustrate the hypothesized connections between groups, without regards to geologic time. They also distinguish between character states, which can be primitive or advanced depending on how much they are thought to have changed, and extant taxa, which cannot be primitive or advanced, because they are all currently modern and have been evolving for the same period of time.

However, the fineness of these distinctions is easily lost on the public, who start from the misconception that “advanced” living taxa evolved from “primitive” living taxa. During our teaching observation, the professor shared an anecdote about discovering the sister group to the flowering plants. A journalist popularizing the discovery wanted to call the plant “the first flowering plant” or “a primitive flowering plant.” Actually, the new plant is a descendant of the first flowering plant, sharing that first flowering plant as a common ancestor with the rest of the flowering plants. Extant species can share a common ancestor, as in the case of this new plant species and all flowering plants, but they can’t have evolved from each other. An evolution education expert shared a more common example of this misconception: that humans evolved from chimps. Evolutionary

Denise Green and Rebecca Shapley 35

biologists are always careful to state that humans and chimps share a common ancestor, a statement with subtleties that get lost without a tree-thinking perspective.

All extant species are equally modern, so none can be advanced or primitive. Some organisms may have changed less over time, providing us with better information about what a common ancestor may have been like, but these species are not primitive, and are not themselves that ancestral type. These are examples of tree thinking or tree logic that are difficult to convey to the public, and that the public and students don’t have when they interpret tree diagrams.

Conventions used in cladograms cause confusion between extant and fossil taxa. The tendency to read the y-axis in a bottom-to-top tree diagram as a time scale suggests that all taxa across the top are modern and extant. In cladograms such as Figure 13, however, biological systematists tell us that “you only put the taxon at the node if you actually, really know that a particular fossil taxa IS the common ancestor.” Although extant taxa will never be internal nodes on a cladogram, fossil taxa can be internal nodes but are commonly placed at the top or side with the extant taxa, because knowing for sure that a fossil taxa represents a common ancestor is rare.

Misinterpreting Trees�The Need for Tree-thinking The public and students—even college students—stumble into similar issues when interpreting tree diagrams, such as Figure 9. For example, biological distance between taxa on a tree is determined by the relative positions of common ancestors in the branching path, yet many interpret distance by position on the page. A list of taxa across one side of a diagram is often interpreted as an intentionally-ordered series, much like our member of the public who wanted to see the leaf nodes listed down the middle of Figure 10 as an ordered series from simple to most complex. A middle school teacher asks her students to measure various attributes of monkey and ape skulls and line them up in some order; after which they draw a branching structure to connect them and build this into a mobile. The logistics of this educational activity that ask the students to make sense of data by lining objects up in some series are very reasonable, yet possibly perpetuate a notion of progress from monkeys to humans. The mobile, however, is intended to emphasize the branching structure and de-emphasize any perceived series. This activity highlights the fine line teachers find themselves walking when teaching these complex concepts with students. Like reading diagrams from left to right because text is read left to right, students bring into the science classroom skills that result in misinterpretations which feed into seeing progress in tree diagrams where biological systematists don’t intend it to appear.

Denise Green and Rebecca Shapley 36

Common Problems Students Have with Cladograms

1. They don't realize that the x-axis has no meaning. They tend to want to interpret the horizontal distance between taxa as a measure of how different the species are even though the x-axis is just a way of spreading the taxa out spatially so they are all not on top of one another.

2. They don't understand that just because two species appear next to each other at the tips of the tree that they are not necessarily more closely related to one another than species that are physically more distant. It takes them a while to understand that each internal node on the tree can be independently rotated 180 degrees and that this does not affect the relationships of the taxa.

3. On phylograms, it takes them a while to grasp that the "distance" between two taxa is correctly represented by the shortest path on the tree rather that their distance at the tips. This relates to point #2 above.

Figure 9. Common errors undergraduate evolution students make when they first encounter phylogenetic trees, particularly cladograms. Contributed by Randy Linder, School of Biological Sciences, Section of Integrative Biology, University of Texas at Austin.

To combat this tendency to read progression from trees, a biological systematist who teaches undergraduates about evolution felt it would be helpful if students learned more about the mechanisms of evolution before coming to college. An expert in evolution education concurred, suggesting that teaching tree-thinking, or tree-logic, is the best way to address this misconception. When a student is looking at a diagram using tree-logic, the diagram will make more sense than it would if interpreted as progression. In an educational setting, it’s okay to expect that teachers will have to train their students to use certain assumptions when working with tree diagrams. Resources designed for public use will have to find other effective ways to develop tree-logic within their audience.

Tree-Thinking Biological systematists we spoke to see the AToL project as an investment in providing the information required to support an increasingly important approach to answering biological questions by applying the evolutionary relationships between organisms. From this perspective, teaching tree-thinking is not only important for dismantling misconceptions about evolutionary progress, but it is also an essential part of good preparation for biology students. Representations of the tree should support tree-thinking (applying the evolutionary relationships between organisms to solve biological problems).

Denise Green and Rebecca Shapley 37

Denise Green and Rebecca Shapley 38

Figure 10 (previous page). A compiled tree of life. Figure 34.1 from Assembling the Tree of Life (Cracraft and Donoghue 2004).

What Should a Tree of Life Visualization Provide? Many themes arose during our interviews about what types of features a Tree of Life visualization should provide.

Index to Information Space There is a great deal of interest in using a visualization of the Tree of Life as an index to other biological information about organisms. People mentioned a wide variety of types of information they’d like to see associated with the tree, including:

! Descriptions for various groups; some of the groups specifically mentioned include the Crown Eukaryotes, phyla, plant families, and genera of lizards

! Trends such as photosynthesis, types of metabolism, biomass, biogeographical patterns

! Information about native, threatened, and endangered species ! The characters associated with branching points ! Extinct groups ! Links to case studies, websites, and peer-review pages ! Pictures and images of organisms ! Associated fossils

Several participants suggested that the tree could become a portal to all kinds of information about evolution. One person mentioned, for example, a Florida natural history museum that did a study on how people understand certain concepts about evolution, and thought it would be useful to have this linked to the tree.

Different Views of the Tree Our interviewees repeatedly asked to have different views of the tree for different users. They often don’t need to see the whole tree and would rather focus on particular parts of it. One professor said he was interested in using figures from the Tree of Life to teach ideas rather than facts; these figures should be stripped down and simple, and focus on conveying the idea. Some examples of different views of the tree that people mentioned include

! The three kingdoms ! The relationships between interesting groups, such as dinosaurs, birds, and

mammals, or angiosperms and gymnosperms ! Small trees that illustrate interesting branching points

One participant mentioned a USGS website with a two-layered structure, where more visually interesting and simple front pages provide introductions for the public, while deeper levels present more technical information with links to explain concepts and vocabulary that supports the interested learner. She suggested this as a model for a Tree of Life website.

Denise Green and Rebecca Shapley 39

! “Maybe there could be two versions. A simpler version for students and a detailed one for scholarly work.”

Produce useful diagrams as output Visual tree applications should allow users to print useful diagrams, images, and descriptive text, and to create slideshows.

Most of the visualization applications couldn’t recreate textbook-like diagrams, usually because of an inability to simplify the view with enough precision to hide specific branches.

Labels and Pictures Many people wanted to see nodes labeled with common names, and they wanted to be able to search on common names as well. Teachers requested this for themselves and for their students. One person requested that the tree should allow annotation and re-labeling of nodes.

A number of people felt that pictures or images would be very helpful to provide interest and to help with navigation.

Navigation and Context People had a number of suggestions for how to navigate the tree, including the following:

! It would be nice to be able to bookmark your place in the tree, so that you could return to it, refer to it, or pass it along to someone else

! Navigation of the tree should be more powerful than just node-to-node navigation and should provide shortcuts for navigating through many nodes at once

! The tree should allow you to collapse clades and return to saved states ! It would be nice to be able to zoom in and out of the tree ! The visualizations need undo capability

It is important to have a sense of the larger context in which a node of the tree occurs—people frequently felt like they were lost once they started to navigate into the tree.

! “You need a string behind you, to follow back home. Like leaving behind a trail of crumbs.”

Graphical Conventions Interviewees offered suggestions for using graphical conventions to communicate meaning. The current visualizations could be enhanced in many ways, for example, color coding could show the level of scientific support for a particular branch of the tree. In one of the workshops we observed, an unintentional difference in branch thickness in a diagram suggested unintended meaning to students.

Denise Green and Rebecca Shapley 40

Several people mentioned that a circular format, like that used by the UCMP’s Family Tree page34, reduces the sense of hierarchy or order in the layout. However, a cautionary tale came from an interview about interpreting tree diagrams conducted with one member of the public. He interpreted the perfectly circular layout of the UCMP diagram as a pie chart. From the perspective of pie charts, the stuff in the middle doesn’t have meaning, and so he found the branches meaningless and the odd-shaped white spaces between the domains confusing.

Typical Tasks Our results from teaching observations and the interview phase of the exploratory-comparative usability evaluations suggest four task types that should be well-supported by a Tree of Life data browser: finding a target taxon, seeing how selected taxa are related and how biologists know that, creating a tree browser view to match a target diagram, and converting between one target tree view with more detail and another target tree view with less detail. Our usability interviewees used similar tasks when interacting with the three existing browsers with real Tree of Life data, and reflected on the effectiveness of the result.

Browsers for tree-structured data are used for a variety of purposes, and will support different types of tasks well. We suggest these tasks could be useful for heuristic evaluations and usability studies of existing and future tree browsers or for guiding the development of new browsers.

Task 1: Finding a Target Taxon Starting with a target taxon such as “ferns” or “Homo sapiens,” what is the experience of finding that target within the tree? This task highlights the effectiveness of the software’s support for searching and browsing, as well as the domain-specific challenge of how well the target taxon name matches with the names available in the data set. Additionally, users need to be able to identify which branch to follow when they are labeled with potentially unfamiliar terms. How much domain knowledge is required to complete this task successfully?

This task also points to some interesting longer-term questions about how the visualization supports repeat users. If the target taxon’s node is found on the tree, either purposefully or in passing for some other task, how easy is it to find again, both immediately during the current session, and at some future visit? How much does the seeker learn about the tree data set, which might support future finding efforts? For example, the ability to establish a mental map of the tree may help. Finally, since the tree of life has changed since many people were last in a biology classroom, and it will continue to change over time as biological systematists contribute better understandings to it, how does the visualization help users convert their previous experiences finding taxa to success at finding them in a changed tree? We develop these questions more in the Discussion section.

34 http://evolution.berkeley.edu/evosite/evo101/IIAFamilytree.shtml

Denise Green and Rebecca Shapley 41

Task 2: Seeing How Select Taxa Are Related, and How Biologists Know That Having selected a few taxa of interest, what is the experience of finding out how they are related? Is information available about the evidence that biological systematists believe supports that relationship? Teachers might do this to prepare for teaching, to begin preparing a diagram as described in Task 3, in response to a student question, or students might do this to answer their own questions for a report.

In one of our exploratory-comparative usability evaluations, the teachers developed the example of finding out how horsetails and geraniums were related. Because of the number of nodes involved when trying to see these two groups at once on the screen, we soon took the whole group of Angiosperms as a substitute for geraniums. This task challenges the tree browser to support showing nodes from various levels of the tree simultaneously, handling the view of the tree at whatever scale or specificity the user expects. It also asks that the interactive tree visualization application provide evidence for the various groupings being examined, and provide the same visual representation for any alternative tree structures biological systematists take seriously. None of our existing visualizations were able to support these aspects of the task.

Denise Green and Rebecca Shapley 42

Figure 11. Diagram used in the Jepson Herbarium workshop �Building the Tree of Life: What ever happened to Plants?� to show the relationships between six major groups of Eukaryotes. Underlined groups indicate where biologists believe organisms formerly included under "Plants" should be located. The diagram includes a branching structure, pictures of representative organisms, and a combination of common and scientific names designed to identify the groups and what is contained within them to the audience of natural history enthusiasts.

Task 3: Creating a Tree View to Match a Target Diagram Static diagrams developed for educational materials are designed to convey a particular message effectively. Examples of diagrams from educational materials that have been developed to present the relationships between select taxa include Figure 8, Figure 11, Figure 13, Figure 33 and Figure 34. Ideally, an online visualization of the tree of life data should enable the creation of up-to-date visual images that are educationally effective about the relationships between groups from any part of the tree. After vetting a given phylogenetic diagram for passing Donovan and Wilcox’s (2004) guidelines with flying colors, it can be used as a target diagram for this task.

We found that attempting to show the same relationships between groups within a dynamic visualization software as are shown in a diagram turns up some remarkably subtle limitations to the manipulations currently possible within the tree browsers. This

Denise Green and Rebecca Shapley 43

task challenges the interactive tree visualization application to support effective labeling and simplification of views, as developed more in the Discussion section.

An additional important dimension of this task is repeatability. If an effective visual image is made with the visualization software, can it be repeated? Being able to return to or repeat a previously created image (or the appropriately updated image, if the tree topology has changed) is an important part of creating memorable images that people can use repeatedly and share with others.

Task 4: Converting Between One Target Tree View with More Detail and Another Target Tree View with Less Detail. Researchers may be interested in more detail about the relationships between nodes, while students need a simpler picture. This task involves being able to reversibly emphasize aspects of a tree topology and hide or dramatically de-emphasize others. It is related to task two and three, because often the educational message has simplified and focused on a subset from all of the detail available about organisms’ relationships. Deep Green35 has two example trees of the relationships between plant taxa, one with a level of detail appropriate for researchers and one simplified for teaching.

The challenge is to make the resulting tree view sufficiently visually simple, while still providing access to more information through interactivity. After using the visualization software to create the “research tree” (Figure 27), task four involves converting this to the level of detail of the “teaching tree” (Figure 28). Manipulations required include changing node names, collapsing clades, hiding branches, and eliding uninteresting node-branch structures into a single branch. To be reversible, some evidence of the existence of more information should be apparent on the less-detailed view, which the user can interact with in order to restore the detail. Practical considerations such as avoiding any need to maintain different versions of the compiled tree of life data set, and allowing audiences to move between these different levels of detail if desired, suggest that any teaching trees should be derivable from the whole data set by a series of manipulations to the view. These manipulations could be stored in a script file, accessible by a URL, and shared to accompany curriculum materials or for other purposes. We make recommendations about simplification of tree views in the Discussion section below.

Responses to Existing Visualizations The following sections provide brief descriptions of participants’ impressions of the user interfaces of the visualizations we evaluated.

Overwhelmingness For many people, their first response to seeing the visualization applications with the Tree of Life data was to feel completely overwhelmed by the number of scientific names. Most people tried to orient themselves by finding names they knew, which they often had a hard time doing, despite their background in biology. We think this is because people’s

35 http://ucjeps.berkeley.edu/htree_intro.html

Denise Green and Rebecca Shapley 44

experiences tend to focus on small sections of the tree or larger groupings. Most sources of biological information, such as field guides and curricula, will use a subset of names that an editorial biologist has decided are important for a particular purpose. Few comprehensive sources for biological names exist. and so the experience of looking at a large part of the tree at once was often bewildering.

! “The sheer comprehensiveness of it and level of detail could be something that turns people off. Something that left out a lot of the intermediate stuff would probably often be more useful…”

! “There is so much information that you can drown in it.”

Recognizing Names and Groups As they interacted with the scientific terms, users were asking themselves, is that term I see the same as the term I’m looking for? Or that I’m familiar with? This was a significant proportion of the think-aloud activity from our users. Because scientific and common names often vary a little at the end of the word, such as the group primata which is commonly called primates, people were aware that they might not see the exact same term they were familiar with in a new resource, although they may not actually have the expertise to know how similar two terms need to be to be the same. Our member of the public offered a nice example with a static diagram: when he saw the word Amoebazoa, he asked himself if it was the same as amoeba, which he remembered from biology, and decided that it must be.

For those with more biology experience, the names represent groups. A high school teacher and natural history docent looking for ferns on the visualizations with us was trying to locate group names she had from other resources, such as Cladoxylopsida and Moniliformposes. Automatically generated space-saving abbreviations such as “Monilifor” weren’t recognizable when scanning for the presence of these names among the visible labels. Also, it was difficult to interpret the absence of Cladoxylopsida, a known scientific name from another source, in the visualization.

As they watched or interacted with the visualizations, participants would often speak aloud the terms that they recognized. When interacting with TaxonTree, where some nodes are named with both common and scientific terms, the common name was usually the term spoken aloud.

SpaceTree The first impression of SpaceTree for several participants was of being overwhelmed by the scientific names. Everyone, however, was able to use the interface easily to navigate through the tree. Several people commented that they didn’t necessarily want to start at root of the tree every time. For example, if they were teaching kids about mammals, they would like an easy way to start from a mammals node.

One middle school teacher noted that the text size used in the SpaceTree interface is too small to be viewed with a projector. On several occasions, participants accidentally

Denise Green and Rebecca Shapley 45

closed a branch of the tree that they had spent a long time expanding, and at least one person commented that an Undo feature is needed.

! “I want pictures!” ! “This is much more confusing than TaxonTree.” ! “Oh, am I feeling ignorant.”

TaxonTree As with SpaceTree, several people’s first impression of TaxonTree was of being overwhelmed by scientific names. This provoked one subject to say “I hate it already,” in response to the feeling of being overwhelmed. Once people started using TaxonTree, they found the interface easy to use and appreciated some of the improvements over SpaceTree, such as the ability to have more control over expanding and closing branches.

One user repeatedly double-clicked on nodes to open them, which opened and then closed the nodes, which was not what he expected. The same user expanded many nodes and noticed that nodes that overlapped the lines between the nodes was confusing.

! “It’s fun for zoologists, but kids would be lost.” ! “I have no idea what we are looking at right now.” ! “What’s great about this is that it appears that there’s a lot on here.” ! “It’s interesting to visualize where things are in relation to each other.”

Green Tree of Life Hyperbolic Tree Most of our participants liked the visual appeal of the animation used by the hyperbolic tree. As with the other visualizations, one teacher commented that she was overwhelmed by number of groups names that she couldn’t recognize. She said that she would need to use other resources in conjunction with the hyperbolic tree to identify group names.

The automatic abbreviations of names were cryptic and were not useful. Everyone we talked with wanted to see the full names. One teacher commented that the display of nodes branches so quickly that it obscures the more general relationships you want kids to see. Several users noted the need for an Undo feature and found that they couldn’t get the tree back to a previous state they had seen.

! “Without knowing the taxonomy, I’m getting pretty lost here.” ! “With Hyperbolic Tree I lose sight of where I’ve been. I can’t see where I’ve

been—I have to remember where I’ve been.”

Treemap Treemap was good at showing high level groupings. One teacher speculated that it could be good for showing comparisons between groups, such as biomass, diversity, or ecological differences. She said, for example, that she’d like to be able to compare the biomass of termites and humans.

Denise Green and Rebecca Shapley 46

Because our data set was very large, sometimes names would be abbreviated to a length of only one or two characters. These abbreviations didn’t have any meaning to our participants.

! “This looks somewhat forbidding.” ! “Pictures would certainly be better than names that nobody recognizes.”

Survey Findings The first questions in our survey confirmed that our respondents were indeed members of our target audience. 91.3% of the respondents said that they teach biology in the context of a quarter, semester, or year-long course. 100% said that they teach the evolutionary relationships between organisms, and 71.4% said that they use the evolutionary relationships between organisms as a key organizing theme in their teaching.

The results of our survey pointed out some interesting differences between groups of teachers. Middle and high school teachers reported greater usage of websites in conjunction with their teaching than did college teachers. 100% of the middle school and high school teachers said that they use websites to find information before teaching, and 100% use curricular activities that require students to visit websites. When asked, “How do you use websites now?,” middle school and high school teachers responded in higher percentages than college teachers to almost every type of website use, suggesting that they have integrated websites into their teaching more than the college teachers.

Middle school and high school teachers reported that using common names in their teaching is much more important to them than it is for college teachers. 100% of college teachers rated common names to be not important or mildly important, while 60% of middle and high school teachers rated them to be important or very important. In a similar vein, college teachers found scientific names to be more important in their teaching than did middle and high school teachers. However, in our observation of college teachers, we noticed that they did use common names frequently in speaking and on diagrams, in addition to scientific names.

When asked, “When you teach about the evolutionary relationships between organisms, how important do you feel the following areas are?” respondents rated the following areas as most important (80% or more rated them important or very important):

! Evidence that supports the Tree of Life ! Structure of the Tree of Life ! Descriptions for groups of organisms ! Homology and Analogy ! Monophyly ! Pictures of representative organisms ! Adaptations, adaptive radiations ! Competing ideas about a part of the tree

Denise Green and Rebecca Shapley 47

! Specific examples other than those provided by the curriculum or textbook ! Specific examples provided by the curriculum or textbook

Evidence that supports the Tree of Life was rated important or very important by 100% of our respondents.

When asked, “How important would the following features of this website be for your teaching?” respondents rated the following features as most important (80% or more rated them important or very important):

! Zooming in to any part of the tree ! Seeing areas of controversy ! Viewing the relationship of divergence events to geological time ! Seeing the distribution of important character states ! Bookmarking particular branches on the tree ! Accessing geographic distributions of groups of organisms ! Viewing the distribution of biological patterns across the tree

Finally, teachers said that if the entire Tree of Life was available at one website, they would use it for many purposes. 81% said they would use it as a reference link offered to students, and 76.2% said they would use it to check the tree of life structure in preparation for teaching. One respondent said:

A fully integrated system (that your questions imply) would be fantastic. As it is, I gather a lot of this information from disparate and independent sources. Having a website that seamlessly relates topology, evolutionary patterns, graphics, and geographic and ecological data would save an unimaginable amount of teaching preparation time.

For the complete survey questions and results, see the Appendix.

Denise Green and Rebecca Shapley 48

Discussion Our interviews and survey brought several themes to light. In this section, we discuss the themes and provide a survey of best practices on the issues, our recommendations, and connections to useful resources for addressing the issues. Table 1 gives a summary of our recommendations and lists the page number on which more detail about each recommendation can be found.

Recommendation Page

Serving the Educational Context

Develop supporting curriculum: Select and highlight examples that teach concepts 51

Teaching Tree Thinking

Support tree-logical manipulations with a branch highlighting feature 56

Highlight the clade descended from an internal node 56

Support tree-logical manipulations with a node-flipping feature 57

Use interactivity to demonstrate the variety of alternative interpretations for branch lengths 61

Reinforce biodiversity with a Wow! Button

67

Provide interactive tree comparison tools 69

Show evidence for the tree—descriptions, character states, and synapomorphies 69

Provide a date slider 71

Highlight changes when they happen 71

Provide version history tools 72

Connect particular topologies to source literature 72

Provide canned examples describing major shifts 72

Decide whether the visualization should indicate the support available for a particular branching 72

Making the Application Usable in the Classroom

Provide a radial, rank-free map of biological information space 75

Create good “map” software to allow simple views at any scale 76

Support focus-plus-context, using index nodes selected by systematists 79

Provide for simplification of views and details-on-demand through excellent support of branch manipulations

80

Include pictures of organisms 84

Include pictures of characters 87

Denise Green and Rebecca Shapley 49

Recommendation Page

Describe groups in multiple ways on the diagram 87

Develop search for the common user 87

Develop, bookmark, and distribute useful views

88

Provide interpretation guidance in context 88

Display labels clearly, without overlapping other elements 88

Develop and consistently use presentation-quality labeling conventions 89

Integrate User-Centered design into the development process 94

Support “undo,” “back,” or a history list for views and manipulations 94

Minimize the number of conventions users must learn 94

Animate changes between states 95

Use pre-attentive encoding 95

Support exploration by achieving benchmark system response times 95

Make display resizable 95

Ensure color contrast 95

Provide for magnification of text sizes 95

Provide manipulation alternatives 96

Support screen-readers 96

Table 1. Our recommendations

Serving the Educational Context Is There Room to Teach the Tree of Life before College? Unlike many topics which compete with variable success for the brief time available in the grades 6-13 classroom, the tree of life has a chance of being covered because of its potential to be indispensable as an interwoven, organizing, unifying principle for biology and biological information. And indeed, if the evolutionary relationships between organisms is biology’s unifying theory, becoming increasingly central to how practicing biologists answer all sorts of questions, then the tree of life must play this role in our curricula!

What implications does this have for a new interactive tree of life visualization tool? The structure of the tree itself, as seen in the visualization software examples that we used in our exploratory-comparative studies, is inadequate for filling this broader role. This structure must be used to organize and provide access to pictures, information about groups, great examples that teachers can use to teach key concepts about evolution, ecology, physiology, cells, and genetics—all of these other areas that must be covered

Denise Green and Rebecca Shapley 50

conceptually (rather than with a battery of facts) in the grades 6-13 classrooms. In addition to the already tricky task of getting biological systematists to agree and contribute their tree structures to a central location, this implies collecting, standardizing and making easily accessible a staggering amount of ancillary information.

Bridging from scientific results to something usable in the classroom is no trivial task – biologists and curriculum developers will need to expend significant effort identifying the important groups students should see in tree views, identifying the interesting scientific data that can be used to teach a concept well, and presenting this information at a level of simplicity appropriate for the presentation of these concepts to students.

Barriers to Teaching the New Biology There are many barriers preventing teachers from adopting and teaching new ideas in biology, no matter how much biologists accept and agree on them. Teachers are tied to teaching the standards on biology and taxonomy, and are often required to “teach to the test.” For the latest ideas in biology to appear in the classroom, it’s important to think about how the tree can be used to support the required curriculum and to support teaching the standards. Sometimes this will necessarily involve changing the standards.

Integrating the tree of life into the biology curriculum requires educational materials. Generally, changes in scientific ideas become part of graduate education, then are written into undergraduate textbooks, and finally become part of curricula and texts used at the middle and high school levels. A promising example of this is The latest (8th) edition of the Campbell and Reece Biology text book for the college introductory biology and high school AP biology level is a promising development, using the tree to organize basic biological concepts. Because educational materials approved for use within the public K-12 schools in California must demonstrate that they meet the state educational standards, and because California and Texas are huge players in the educational materials publishing market, these standards may need to be changed before we will see the tree of life used in biology curricula for grades 6-12.

Another barrier is the minimal time teachers have available to learn more about the subject. The primary reference is probably the curriculum or text book that teachers use, and the changes that come with new editions. For any deeper understanding or for keeping up with the types of rapid changes currently occurring in biology, other resources are required, and are much less frequently present. School administrators typically must offer support for teacher professional development through funding for course fees and substitute teachers. Teachers in our study mentioned using textbooks taught at the next higher level, websites, and discussions with colleagues. A few took advantage of workshops offered at natural history institutions, like the ones we observed. In our study, we sought out the early-adopting teachers who create their own curricula, and struggle against these barriers to improve the education they bring to their students. They are the exception, and their dedication isn’t a realistic model on which to base efforts to impact biology education nationwide.

Denise Green and Rebecca Shapley 51

Essentially, as biologists learn more, system-wide changes to teaching practices in biology must be implemented through new curricula, adjustments to state standards and tests, and support for teacher training. The existence of an online tool showing biologists’ data isn’t going to make that change except for a very few lucky students in private schools.

Recommendation To support the new interactive tree of life application playing any role in 6-13 biology classrooms, we suggest that it must position itself to play a key role a new biology curriculum organized according to the unifying concept of the evolutionary relationships between organisms. Beyond organizing biological systematists’ phylogenetic hypotheses, the application must provide teachers with good examples and evidence they can share with students.

Develop supporting curriculum: Select and highlight examples that teach concepts

Good educational materials provide examples to support teaching. The new interactive tree of life visualization application can help teachers locate examples, based on actual scientific data and studies about the relationships between specific taxa, that support students’ learning about general evolutionary process and pattern concepts such as homology, reproductive isolation, and adaptation.

However, in order to do this effectively the application must be more than a repository for organizing, retrieving, and analyzing real science data. While many teachers might be interested in having their students work with real science data, they aren’t interested in, nor are they qualified to, sort through a collection of science data to identify good examples to use with their students in the classroom. Teachers need to know about appropriate selections from the scientists’ information repository, and how to use them. Curriculum developers often work with scientists to identify good examples and develop them into good educational materials. However, whether or not curriculum developers are involved, the step of digesting the information for presentation and use by teachers is indispensable. We recommend developing the materials, and then indicating directly on the tree where the organisms involved in the materials are located. For example, the Tree of Life web project is beginning to collect educational materials in their “treehouses” 36– we encourage them to provide connections between the materials and the locations of taxa in the tree.

Future work developing the features of this interactive tree visualization application should involve teachers. In addition, involving evolution education experts who can test the impact of our recommended features on students’ learning outcomes will maximize the ability of the application development team to positively impact biology education in this country.

36 http://tolweb.org/tree/home.pages/treehouses.html

Denise Green and Rebecca Shapley 52

Teaching Tree-Thinking If evolution is biology’s unifying theory, then biology teachers need to teach it. Yet this is difficult (Alters and Nelson 2002). The concept of evolution has accrued a slew of misconceptions through the history of a century or more, making it particularly challenging to teach well. Diagrams carry layers of intentional and unintentional meanings for scientific and public audiences. Common social understandings of the images and metaphors of evolution fight with the sophisticated intentions of biologists making these diagrams. Misconceptions flourish unintentionally but vigorously on the placement of taxa and the polysemy of graphical elements. Branching diagrams show systematists’ theories about the evolutionary history of life, while at the same time these theories and diagrams have their own history. In fact, descriptions of the evolutionary relationships between organisms are changing rapidly, leaving the grade school teachers’ own biology training quickly in the dust.

Striding into this challenging fray, an interactive application for visualizing the tree of life for teachers must straddle both the biologists’ latest ideas and the naïve notions students bring to understanding diagrams. We recommend that because of the depth of the crisis in evolution education (Alters and Nelson 2002) and the pervasiveness of common misconceptions, the new interactive tree visualization should not just aspire to avoiding perpetuating these misconceptions, but must actively strive to dismantle them, by providing teachers with the right tools to teach their students tree-thinking.

Educational theory holds that learners must recognize their existing perception within a lesson in order to “unlearn” it and replace it with a different one. Without this step, learners will hold onto old concepts, and learn the new concepts for the test, but may not take ownership of the new perspective, resulting in reverting to reasoning using the old concept when solving future problems. Even students graduating from Harvard held strange notions about how the solar system works due to never having effectively replaced naïve childhood assumptions through effective learning37. We expect the same holds true for evolution, and we strongly recommend that the tree visualization acknowledge inappropriate concepts, and then provide the visual tools for learners to experience the correct concepts and interpretations, and compare the two. (Alters and Nelson 2002)

37 This experience is described in a video entitled “A Private Universe,” (1988) distributed by Pyramid Media. The video looks at issues of learning and demonstrates how alternative conceptions in science occur in the classroom and how hard they are to erase later on. It is used in professional development workshops to help instructors understand how students’ naïve ideas and their classroom knowledge can co-exist in their own minds. A study guide providing more description of the video can be found at: http://www.pyramidmedia.com/item.php3?title_id=1225. A nice description of the application of these ideas to public science education is at http://www.astc.org/resource/visitors/universe.htm.

Denise Green and Rebecca Shapley 53

Tackling the Misconception of Progress The most serious and pervasive of all misconceptions about evolution equates the concept with some notion of progress, usually inherent and predictable, and leading to a human pinnacle. Yet neither evolutionary theory nor life’s actual fossil record supports such an idea.

- Stephen J. Gould (1995, p. 42)

In our interviews, biologists teaching about the evolutionary relationships between organisms shared with us some of the misconceptions that undergraduate students, journalists, and the public bring to the topic. These include the notions that evolution represents improvement or progress, that evolution has a purpose, that organisms get increasingly complex, that human beings are the most complex and therefore better organism, or that the process of evolution has been leading towards our existence. We tend to see the world around us from our perspective, and science education likes to start with the familiar, so we learn a lot more about our own evolutionary history – the path through time from first life to us – than the diversity of other organisms that have spent the same 3.5 billion years evolving as we have. And because we are here, we look at our own favorite features as indicators of what is successful, loathe to believe we might be shaped as much by chance as by our own efforts.

What the theory of evolution actually holds is that natural selection and chance work together to screen for the organisms whose adaptations work in the present conditions, with no plan for the future. Sometimes getting simpler is a successful adaptation, and many simple organisms such as bacteria and insects are extremely numerous. All the millions of other types of organisms on this planet have been evolving for the same period of time, are just as modern, and each have their own evolutionary history that, from their perspective, looks just as purposeful as our own.

These are difficult notions to contradict in the popular mind. Visual representations perpetuating this misconception about progress came into existence shortly after Darwin published the Origin of Species (Figure 12). They express iconic notions of evolution as progress and are pervasive and persuasive players in popular culture, the Scopes monkey trial and other battles over evolution in schools, and even in science museums (Stephen J. Gould 1995, 1997, Clark 2001)

Consider the first historically important tree of life (bark and all) ever published – Ernst Haeckel’s version of 1866. Haeckel conflates time with progress on the vertical axis, and his tree founders on the logical and pictorial impossibility of adequate representation…

- Stephen J. Gould (1995, p. 65)

[Even in iconography for an educated audience, t]he bias of progress has led all these artists to paint the history of life as a progressive sequence leading from marine invertebrate to Homo sapiens. Diversification and

Denise Green and Rebecca Shapley 54

stability, the two principal themes of natural history, are entirely suppressed, and the tiny, parochial pathway leading to humans stands as a surrogate for the history of life.

- Stephen J. Gould (1995, p. 52)

Even if linear progression isn’t intended to be conveyed by a visualization, it should be checked for the potential to be misinterpreted. In fact, our cultural tendency to read linear progress into diagrams (Gould 1995) suggests that evolutionary education material must actively disrupt implications of progress. To do this, biologists developing visualizations cannot ignore the fact that they are easily misinterpreted, nor rely on training the audiences through text captions or context. This is easier to do, and usually means using something the biologists are familiar with. But it doesn’t work—the visual itself must evoke and confront the observer’s misconceptions in order to change them.

Denise Green and Rebecca Shapley 55

Figure 12. Ernst Haeckel's Pedigree of Man. From Ernst Haeckel, The Evolution of Man: A Popular Exposition of the Principal Points of Human Ontogeny and Phylogeny (1866; New York, 1896), p. 189.

Using the Tree Metaphor Carefully Realistic images in scientific diagrams can aid interpretation of scientific and technical images—or bring in confusing metaphors (Pinto and Ametller 2002).

The metaphor of a tree is overloaded. While biologists focus on the relative position of abstracted diverging branches, the rest of us bring everyday understandings of tree-growth through time. Many religious and spiritual traditions have a concept for the Tree of Life, too, often related to biology but loaded with non-scientific meanings. Gould (1997) discusses how many representations of evolution founder on the hopeless attempt

Denise Green and Rebecca Shapley 56

to represent three dimensions of diversity, progress, and time within a two-dimensional diagram. Ernst Haeckel’s tree-like diagram of evolutionary relationships introduced a blending of time and progress among extant taxa (Gould 1995) that we still struggle against in the popular consciousness. The conical shape of the classic tree with more branches in the areas we know best also conveys the misconception of continuously increasing diversity, rather than irregular radiations trimmed by natural selection (Gould 1995).

By focusing on connection and divergence, cladograms seek to leave out any attempt to represent progress, and thus avoid conflating time with progress. But less-trained readers will still see time or progress in the second dimension of information. A Treemap layout manages to show only the nesting or branching structure, without implying time or progress. However, the amount of space taken up on the view by the categories does suggest some meaning. The most basic is relative numbers. When using the Tree of Life web project data in the Treemap software, the bias that biological systematists show for working on Eukaryotic organisms that are visible to the human eye is readily apparent in the size the domain takes up on the screen. To use this remarkably time- and progress-free visualization approach to show other attributes, such as relative number of types, diversity, or biomass, we must have the data.

Graphical elements with different meanings for trained and untrained audiences contribute to misinterpretation. Pinto and Ametller (2002) found that the ubiquitous polysemy of arrows in scientific diagrams about the physical sciences made effective interpretation of the diagrams by students much more difficult. For systematists’ tree diagrams, for example, polysemy is a risk for the branching lines, the placement of nodes in space, and the meaning of labels.

Recommendations To avoid misrepresentation, misinterpretation, and help to dismantle misconceptions through developing students’ tree-thinking abilities, the new interactive visualization should support tree-logic and consider the following resources.

Support tree-logical manipulations with a branch highlighting feature

Provide features that visually reinforce tree-logic for interpreting the branching structures. For example, a relationship indicator tool could draw out a path between two organisms in a highly visible color, highlighting the branching structure between them (Figure 14).

Highlight the clade descended from an internal node

When an internal node is selected or clicked on a branching diagram, use a cloud of color to highlight the area of the tree for which that node is the shared common ancestor (Figure x). The Treemap software also does this by highlighting the node, which surrounds the child nodes because of the nesting layout. The view can also be re-focused to include only that node and its children. Interactive branching diagrams should support the equivalent of this, designating an internal node and its descendants to be the focus of

Denise Green and Rebecca Shapley 57

the view. The user should be able to set an internal node as the new “root” to which context viewing features orient them. See also “Starting where people ARE” and “Support Simplification.”

An intriguing model for many potentially useful tree connectivity visualization features are the social network information visualization features developed by Jeff Heer and danah boyd in Vizster38. Figure x shows a cloud of color around nodes that share membership in a social community.

Support tree-logical manipulations with a node-flipping feature

The interactive visualization should support the ability to rotate branches around nodes, showing students that the order of taxa changes, but the relationships – highlighted or unhighlighted – remain unchanged. A “Shake it up!” button might randomly rotate nodes on the screen, shaking up any perceived series or order students might be reading into aligned taxa. A handle on every node would allow teachers to demonstrate the concept by flipping the node around, or allow students to play with the idea themselves. Curriculum activities similar to UCMP’s “What did T. rex taste like?” could take advantage of these features, asking students to identify taxa that are more closely and more distantly related.

Figure 13. A tree from UCMP�s �What did T. rex taste like?� curriculum, showing the relationships between major vertebrate groups.

38 http://www.cs.berkeley.edu/~jheer/vizster/early_design/ and http://jheer.org/vizster/

Denise Green and Rebecca Shapley 58

Figure 14. The relationship between humans and birds is highlighted in green, using tree-thinking. Tree thinking traces distance along the connecting branches, not through space.

Denise Green and Rebecca Shapley 59

Figure 15. Color clouds identify the clades associated with two internal nodes.

Denise Green and Rebecca Shapley 60

Figure 16. Vizster's social network visualization identifies members of the same social community with clouds of color.

Denise Green and Rebecca Shapley 61

Figure 17. A second version of the vertebrate tree. The two nodes highlighted in green have each been rotated 180 degrees, flipping the order in which the taxa descended from the common ancestor represented by the node are shown across the top of the diagram. The two trees and show equivalent relationships. Although the human is now much closer to T. rex in the parade of icons across the top, their phylogenetic relationship has not changed.

Use interactivity to demonstrate the variety of alternative interpretations for branch lengths

We recommend using interactivity to demonstrate the variety of meanings branch lengths can have. The length of the lines connecting the nodes in branching tree diagrams can have at least three different meanings: connection, time, and change. The ideal interactive visualization would support all three approaches to branch lengths, providing students with direct experience with how the same relationships appear under different approaches, and thus how to interpret branch lengths.

Often the branch lines simply indicate two nodes are connected and in which order the branching occurred, with no intent to convey any other information. Cladograms, a particular type of tree diagram like that shown in Figure 13 and Figure 17, may line up all the taxa in a study, including fossil taxa, across the side or top. Computer-generated branching diagrams or interactive tree browsers may use standard-length lines between nodes as they automatically lay out the branching structure, resulting in something similar to Figure 18.

Denise Green and Rebecca Shapley 62

Whether it is intended or not, extensive experience of a century of evolution education suggests that people tend to read the y-axis of a bottom-to-top tree diagram as representing time (or progress, complexity, or advancement), in at least some relative sense. Often the branch lengths do indicate relative time. Lining up extant taxa at one side of a diagram or view may convey that this side of the diagram is the present, placing the branching events in the past. For example, the highlighted nodes in Figure 17 indicate that T. rex is more closely related to other reptiles like birds and caiman than to amphibians like the frog, because they share a more recent common ancestor. Less frequently branch lengths are explicitly used to indicate the passage of time. In this case, extant taxa will line up across one side, but extinct taxa will end before reaching that line, like T. rex in Figure 20. The infrequency of this type of display is in contrast to the desire that at least three of our interviewees expressed (and two demonstrated through their own curriculum development work) in correlating the existence of common ancestors to particular ranges of geological time. These diagrams are probably infrequent because this type of information is difficult for biologists to determine.

In many cases, diagrams for scientific phylogenetic publications use branch length to indicate the amount of change that has occurred between two nodes, either in the DNA sequence, or changes in the states of morphological characters, or other changes the biologist is tracking. One group may have changed only little since it shared a common ancestor with another group, which may have changed quite a lot, resulting in a tree of extant, equally modern taxa being shown with ragged, unaligned tips, as suggested by Figure 21. Correlating a divergence event with a moment in geological time can give an indication of the average rate at which change is occurring. When this is done, a diagram that places all extant taxa in a line causes branch lengths to represent both time and change and indicates the assumption of a molecular clock39.

Ideally it is clear what branch lengths mean on every diagram, but this is seldom the case. Biologists familiar with the subtleties of generating these diagrams often understand the implications of different layouts for the meaning of the branch lengths. For example, when all the taxa are lined up, biologists can distinguish which meaning the branch lengths have based on experience and the methodology used to create the tree. This cryptic information is unavailable for most teachers and students.

39 The simplest assumption of a molecular clock holds that genetic change has occurred at a steady pace since life began, so that the amount of change is essentially a stand in for time that has passed. Biologists agree this is clearly too simplistic. Sometimes, however, in the absence of better information, biological systematists will assume that the rate of molecular change among a group of organisms has been roughly the same for a period of time, and use this to hypothesize about other time-related questions.

Denise Green and Rebecca Shapley 63

Figure 18. Connection only. This diagram shows the same relationships as in Figure 17. In both cases, the branch length represents only what it takes to connect the nodes. In Figure 17, the leaf nodes are lined up at the top of the picture, and the branches must come to meet them. Here, the branches are always the same length, causing the leaf nodes to be shown at a distance from the root proportional to the number of divergence events shown in the tree. Note that the path to whichever taxon is the focus of a given example often has more branching, and is therefore shown higher up. In this case it is T. rex, but in many cases, our focus on human relationships gives the visual emphasis to humans on similar tree diagrams.

Denise Green and Rebecca Shapley 64

Figure 19. Connection only. This diagram shows the same relationships among vertebrates as Figure 17, through a nesting, set-based display like Treemap. The issue of interpreting � or misinterpreting � branch length is removed entirely.

Denise Green and Rebecca Shapley 65

Figure 20. Time. The same relationships are shown where the length of the branch on the y-axis represents time. Although the data implied here is fictional, the diagram demonstrates internal nodes on the tree roughly correlated with the geologic time when the common ancestor of the extant group is thought to have existed. Note that T. rex, as an extinct taxon, is not lined up at the top with the extant taxa.

Denise Green and Rebecca Shapley 66

Figure 21. Change. The same relationships are shown, but here, the length of the branches on the y-axis counts the number of changes biologists believe occurred before the next branching point. Shorter branches mean fewer changes. Measuring distance between organisms has the most meaning on a diagram where branch lengths represent the number of changes.

A wonderful benefit of interactivity is the ability to display the same data in different ways, instantly. We recommend using interactivity to dynamically display any part of the Tree of Life structure such that branch lengths have any one of these three meanings: connection only, time, or change. Transitions between the different branch length displays should be animated, so that the location of taxa the viewer is focused on can be followed through the change. Seeing the same relationship data take on different layouts under these different meanings for the branch lengths is likely to directly educate students that branch lengths can mean different things and helps them to question their reflexive interpretation of diagrams as conveying information about time or progress.

Denise Green and Rebecca Shapley 67

Reinforce biodiversity with a Wow! Button

To counteract our tendency to focus on the human story to the exclusion of other stories, the new interactive visualization should emphasize the diversity of organisms out there. It’s an amazing – and humbling – message, that humans are just one among 1.7 million known species. We’re not advanced or special, we’re just one species. Two of our interviewees specifically requested the ability to convey this educational message to their students. A visualization should avoid perpetuating the advanced versus primitive dynamic that arises from neglecting other groups while we tell our own story (Gould 1995), by presenting focal groups (such as humans) as equivalently modern with non-focal groups (such as invertebrates, fish, plants, and members of the Archaea and Bacteria domains).

Somehow a balance needs to be struck between this experience and the negative experience of being overwhelmed by quantities of unfamiliar scientific names [see Interview Findings: Overwhelmingness] The desired visual impact might be achieved from an overview of the tree where all branches are expanded. If creating a mental space for the tree [see Pursuing a Public Mental Map of “Tree space”?] consider using this overview as the “global” or full-extent view. The Tree of Life application should create a button labeled “Wow!” that is always visible and returns the user to this view.

Resources

Examine the Placement and Representation of Nodes

Sam Donovan and Laura Wilcox (2004) have developed a series of heuristics for evaluating evolution diagrams in educational materials. To avoid implying progress, their heuristics emphasize the importance of the placement of nodes, including representing existing and extinct taxa. Figure 22 presents an excerpt from the heuristics.

Evaluation Category Educational Significance

Placement of Extant Taxa Are extant taxa represented as internal nodes on the tree?

The inclusion of extant taxa internally on the tree could lead to a progressive notion of evolutionary change (ladder of life) and may cause confusion about the differences between shared common ancestry and ancestor-descendent relationships.

Common Ancestor Do the trees or figure legends indicate the presence of any common ancestors?

The abstractness of tree representations can make it difficult for students to interpret internal nodes as hypothetical common ancestors. Simply labeling the root or some other internal node as a common ancestor can help overcome this issue.

Extinction Do the trees or figure legends address extinct taxa?

Extinction plays an essential role in producing the patterns we see in biological taxa. The exclusion of extinction could lead students to beliefs about the persistence of species and progressiveness of change.

Figure 22. Excerpt from Donovan and Wilcox (2004)

Denise Green and Rebecca Shapley 68

Figure 23. An image map index to the Tree of Life web project.

The challenge of getting this right is illustrated by Figure 23. The figure shows an image map developed for the front page of the Tree of Life web project. Its intention is to provide easy navigation to the root of major groups of interest for their web pages, admirably consistent with our recommendation to start where people’s interests are (See Bridging). However, they find themselves describing, on a separate page,40 what conclusions shouldn’t be drawn from this diagram. For example, they suggest that users should not interpret the placement of the annelid (the orange and blue creature in the middle) at an internal position on the tree to indicate that it is a common ancestor to butterflies and frogs. We suggest that these visual impressions are hard to contradict effectively with text, and recommend avoiding ambiguous visuals.

Nature of Science: Comparing Alternative Hypotheses Where biological systematists have agreed that they can’t yet decide between alternative possible tree topologies, teachers want to share this with their students. AP biology teachers are interested in having their students see how different lines of evidence – 40 http://tolweb.org/tree/home.pages/aboutoverview.html

Denise Green and Rebecca Shapley 69

morphological, molecular, biochemical – support similar or different tree hypotheses for a given group of taxa. This is a great way for students to see science in action.

Recommendations

Provide interactive tree comparison tools

The new interactive tree visualization application should include a way to look at two alternative tree topologies side by side. By interacting with the two trees in order to understand their similarities and differences, students engage more with the tree’s structure than they would need to when looking at a single tree, and have the opportunity to develop and apply tree-logical thinking. Tamara Munzer’s TreeJuxtaposer41 provides this type of tool for biologists to analyze comparisons between large trees, and HCIL’s DoubleTree42 is like a TaxonTree application run in two windows. The coordination between the two windows is the key to making this work. Another good model is the information visualization features developed for the XML schema mapping tool described in Robertson et al. (2005). Note that these systems are primarily designed for analysis, and so an interactive comparison tool that is appropriate for the classroom will need to also focus on simplicity and clear labeling, as discussed in other recommendation sections.

What do these interactive tree comparison tools offer? When a user clicks on or otherwise interacts with a part of one tree, their actions also affect the analogous part of the comparison tree. Connections are highlighted to help the user pick out the relevant details from among the structures of the two trees. The displays will adjust automatically to ensure that the relevant parts of the two trees are easily seen, while accomplishing the transition in such a way as to not disorient the user. The visualization tool infers which parts of the tree structures are interesting, and automatically collapses or closes the less interesting parts in order to optimize the user’s view of the trees’ structure in the display space. And finally, search features are advanced, including being tuned to help the user interact to inform the system about which search results she or he is interested in so the display can be optimized.

The example from the Campbell and Reece textbook, shown in Figure 8, is another good model. It demonstrates the use of color to help the observer keep track of how a taxa that form a monophyletic clade in one scenario are redistributed under the alternative scenario. However see the labeling section for more cautions, as we suggest that labeling of monophyletic groups should be handled consistently across the application.

Show evidence for the tree�descriptions, character states, and synapomorphies

The Tree of Life web project pages often include text descriptions of the evidence that supports existing groupings, or alternative groupings. This is a wonderful start.

41 http://olduvai.sourceforge.net/tj/ 42 http://www.cs.umd.edu/hcil/biodiversity/#DoubleTree

Denise Green and Rebecca Shapley 70

Wherever possible, tree views should be able to show character states of particularly relevant characters for defining groups on the tree topology. Biological systematists can apply editorial discretion to select the characters that are most interesting for supporting particular groupings. TaxonTree has an example of this feature. In the context of comparing trees, show how a character state that is a synapomorphy – a good supporting character – on one tree topology may become paraphyletic – not supporting any particular grouping, but distributed among taxa in various groups—in an alternative topology.

In addition, the application could provide tools to facilitate historical inference of the character state of a common ancestor. MacClade43 has pioneered this for biologists; appropriate, simpler tools can be developed for biology students.

Finally, the Tree of Life application should include or connect to descriptions about the characters, character states, and more general types of evidence that support particular structures, such as molecular evidence or morphological evidence.

Resources

Multi-panel Displays

Displaying multiple coordinated information sources may involve multi-panel displays. For example, PEGasIS44 shows phylogenetic relationships, character states, geographical locations, and even literature relevant to specimens held at the Jepson and University Herbarium through multiple coordinated windows. Two alternative trees might be displayed in separate panels, or a small inset panel might be used to accomplish focus-plus-context when navigating the tree. The field of information visualization has developed techniques such as brushing-and-linking to help ensure that users can interpret and interact effectively with information displayed in this manner. Baldonado et al. (2000) have developed heuristic guidelines for evaluating when to use multiple coordinated viewing panels on a rich information space, and how to do it well.

Nature of Science: A Tree with its own History Not only does the compiled tree of life attempt to give us some perspective on the history of life on earth, it also has its own history. Darwin and Haeckel started things off in the 19th century, and our ideas about the Tree of Life have been developing and changing ever since. Static diagrams usually present only a snapshot of the Tree’s history, either a given hypothesis for a given moment, or perhaps a comparison between a previous version and the new version. A database-driven interactive tree visualization application has the power, however, to show the history of biologists’ understandings about the Tree of Life. It can be evident from the very tools provided on the application that what a

43 http://macclade.org/macclade.html 44 Phylogeny, Ecology, Geography Information System http://ucjeps.berkeley.edu/PEGASIS/pegasis.php

Denise Green and Rebecca Shapley 71

visitor to this website sees is the sum of current thinking, but it will change in the future as biological systematists find out more, just as it has changed in the past.

In contrast to the enthusiastic call for viewing alternative trees that came through in both our survey and our interviews, our survey respondents were less enthusiastic about seeing the history of the tree. However, workshop participants’ calls for help understanding how their previous experience connected with the current understanding of the Tree of Life was very compelling.

There are numerous good reasons to show the history of the tree. One is extremely pragmatic: the more tightly integrated this tree structure resource is into scientific practice, the more incentive there will be for scientists to contribute to keeping it up-to-date. The others are good pedagogy: in support of understanding the nature of science, students can have direct experience that trees are the product of scientific reasoning and that they change as evidence and reasoning changes. Also, people who learned earlier versions of biological classification – often teachers, docents, and members of the public who took biology when the major groups were “Plants and Animals” or the five Kingdoms, but also any enthusiast or professional who is familiar with an older taxonomy for any given focal group – will appreciate being able to tell how the structure that is familiar to them has morphed into the one they see now. Helping people relate new information to their current concepts, especially when asking them to change those concepts, is an important learning strategy (Alters and Nelson 2002).

Although this recommendation is somewhat in contradiction with our survey findings, we believe it is sound. We attribute the contradiction to writing the question poorly, before we had developed enough of a sense of what these tree history features would provide. Our recommendation is based on the ideas we heard from our interviews with enthusiasts and systematists. Since the community in which we invited responses to our survey includes many practicing biological systematists, we’d be surprised if they wouldn’t find the versioning features suggested below very useful. However, future work could certainly benefit from checking again with the user community about their feature priorities.

Recommendations

Provide a date slider

What did biologists think about these relationships last year? Has it changed? Provide the navigation tools such that for any particular view, the user can step back through previous topologies. Once this tree information is maintained in a database, showing earlier versions should be trivial.

Highlight changes when they happen

It’s easy to be blind to changes in information resources, so help frequent users by highlighting what has changed since their last visit. Store a cookie in the browser for visitors that don’t log in. All users can appreciate a “What’s New” feature on each page or view, and it’s exciting to know that things have changed.

Denise Green and Rebecca Shapley 72

Provide version history tools

Scientists will be accessing this centralized information source, and need to be able to cite the topology they find there. Additionally, they need to be able to look up previously cited structures, no matter what the current topology is thought to be. Display on each view how it should be cited, and provide navigation and search to view earlier versions.

Connect particular topologies to source literature

Provide scientists, professionals, and dedicated educators or naturalists with the ability to look up the original paper. Papers are organized by their biological meaning, rather than their journal or other things that make more sense to a library or librarian. Provide a visual, easily used way to distinguish between literature available for free and that requiring a subscription for access. Students and others will see that the evidence is there to be looked at, not just developed out of thin air.

Provide canned examples describing major shifts

Ideally, earlier versions of the classification system and phylogenetic relationships can also be input into the database, so they can be visualized. Until this happens, provide some visual examples showing how the older models and the new models compare. For example, Figure 11 shows a diagram from a herbarium workshop that attempts to show how members of the old category of Plants are now distributed on the current Three Domains tree. Backyard Nature has an example of a webpage trying to help nature enthusiasts transition to the Three Domains model.45 This type of translation information is important for supporting a broader audience’s interactions with the updated tree of life. This broader audience often includes the people doing the teaching!

Decide whether the visualization should indicate the support available for a particular branching

Biologists developing phylogenetic trees have developed methods to evaluate and describe the likelihood that any given branching in a topology represents the real, true branching evolutionary history. Adjusted to an easier-to-interpret scale,46 the results of these methods could be shown visually as the thickness of particular branches in the tree. The advantage of this is that it directly shows the biological systematists’ own measure of the certainty behind the topology to viewers, underscoring that the tree is based on evidence and isn’t complete.

However, polysemy is a serious risk here, as less-highly-trained users are likely to interpret branch thickness as encoding another attribute, such as the number of species or organisms found along that branch, the amount of biomass along that branch, or some other quantitative measure. Theories of pre-attentive visual coding (Stolte et al. 2002) suggest that people first interpret size and thickness as quantity. Interpreting branch thickness as a support value is the result of training, and likely not intuitive. Similar to the 45 http://www.backyardnature.net/lifetree.htm 46 The “Bootstrap” method provides values from 1 to 100, where anything from 70 to 100 is good support, but 60 or less indicates a poorly supported structure. A thickness display would need to correct for the non-linear nature of the interpretation of these numbers.

Denise Green and Rebecca Shapley 73

tool for showing different meanings that can be encoded by branch length, a tool could allow for showing different attributes by branch thickness. This should be tested with users.

Making the Application Usable in the Classroom An Index to an Information Space As the people we spoke with took the time to understand the potential of a tree of life visualization to pull together currently disparate references on biologists’ thinking about the relationships between organisms, they seldom stopped there. Imaginations ran on to imagine other things this system might do. One wanted to navigate around the tree following parasite – host interactions. Another wanted to access ecological studies that have been done on organisms or their sister groups. Teachers wanted to be able to move around in the tree and find examples of interesting stories. We gathered the distinct impression that there is quite an appetite for organizing biological information by biological classification logic, as opposed to the library classification logic currently in use.

Pursuing a Public Mental Map of �Tree space�? Should the new interactive tree visualization explicitly support the user’s development of a mental map of tree space? The size of the tree of life information set and results from our heuristic and user studies clearly indicate the challenge of staying oriented and the need for context when navigating within the tree. Features that can help the user stay oriented within the information are important.

Information visualization wisdom holds that given a consistent visual layout of graph data in space, users with good spatial memory should be able return quickly to favorite parts of the graph, reducing search time for repeat, increasingly experienced users. Visualization software should support users learning this mental space by offering spatial consistency and stability (Misue et al. 1995).

This is certainly the theory behind 2-D and 3-D information visualization spaces developed in projects such as ConeTree and the hyperbolic tree. However, it’s not clear that visualizing the tree structure into a physical space has actually been successful at providing extra navigation benefits for users. Although the developer of the hyperbolic tree has had some stunning successes in “browse-offs” against text hierarchy browsers, results of usability studies have been much more ambiguous (Parr et al. 2003, Plaisant et al. 2002, Xiang et al. 2004), and even our own usability evaluations highlighted some important shortcomings specific to the hyperbolic tree, such as losing branch length information, difficulty reading labels and difficulty reproducing the same view.

Additionally, tree-thinking doesn’t necessarily benefit from additional dimensions. The branching divergence relationships that biologists believe exist between taxa do not lend themselves easily to being represented in two dimensions, nor in three. Translating the

Denise Green and Rebecca Shapley 74

graph data into 2-D and 3-D47 spaces forces nodes to have spatial relationships on the page or in space that don’t have biological meaning, such as nodes appearing to be in a list and being close or far apart in space. These relationships can foster misconceptions discussed above. Tree-thinking brings with it its own geometry, where the distance between nodes is measured by traversing the branches between them, and ordering is absent. Visualization is great when the order of the child nodes matters – which is not the case for Tree of Life data. Indeed, the tree-logic feature of rotating the nodes around in the tree visualization (see Teaching Tree-Thinking) seems directly contrary to providing a consistent spatial location for nodes of the tree.

Interactive tree of life visualizations should strive to introduce users to tree-geometry, and embody tree-thinking wherever possible. Features might include the approach used for navigation (although interviewees and experience suggest that navigating only node-by-node is too tedious) and easily rotating branches around nodes to show alternative layouts of the same relationships. Navigating by browsing across a plane of nodes doesn’t support a visceral sense of tree-branching, and so doesn’t support tree-thinking.

However, usability tests during this project and the MaNIS User Interface project48 show that a hierarchical text browser doesn’t work well with classification data, and all of the reasons are exacerbated by the structure of phylogenetic data. The tree of life data is a very deep tree, with a low branching factor. This means that the path to a desired location can be full of lots of new names people don’t know how to choose between. SpaceTree and TaxonTree are pioneering views that helpfully open not just the next node’s children but child nodes a few layers down, to address this issue.

Personal information spaces such as the 3-D web bookmark organizing environments tested in Data Mountain (Robertson et al. 1998) don’t seem to show much benefit from leveraging the user’s ability to create a visual mental map of the space. This may be because both ways of organizing the information are designed by the same person, and perhaps also because the classification of the objects may be more arbitrary, with fewer inherent relationships. The information space can be reinvented at any time, by any user, and is not intended to be shared. There is no challenge of trying to interpret someone else’s classification system. In considering the creation of a mental visual space for the Tree of Life, however, there is a potential for creating a shared information space that is analogous to how people share a sense of the geographic map of the globe. This would be consistent with the tree of life serving as a information space within which people can relate disparate references and develop a sense of “where” they and other organisms are

Replacing the Role of Ranks� Traditionally, the Linnean ranks, commonly listed as Kingdom, Phylum, Class, Order, Family, Genus and Species, have been important tools to help people organize the many

47 See Paloverde as an example of a 3-D visualization of phylogenetic data. http://ginger.ucdavis.edu/paloverde/paloverde.html 48 http://www.sims.berkeley.edu/academics/courses/is213/s04/projects/Manis/

Denise Green and Rebecca Shapley 75

Latin terms used by taxonomists to classify organisms. Often the ending on a term indicates the rank to which it belongs, for example, plant names ending in –aceae indicate a Family name. When encountering a new term, knowing what rank it belongs to provides some clues about where it fits in the distributed, shared indexing system. However, the amount of novelty an organism must have in order to represent a new category has always been left to the discretion of the individual biological systematist. Over time, and at different rates in the different natural history fields of mammalogy, ornithology, botany, paleontology, and so on, the pendulum swings between those who want to lump a great deal of variety together under one name, and those who want to recognize all distinguishable differences with a new name. While many who use the biological classification information want to interpret ranks as indicating named categories encompassing roughly equivalent diversity, in fact this isn’t a good assumption. For example, the diversity encompassed within a family of plants is significantly more than within a family of mammals. But ranks operating across all fields make it easy to assume that rank names designate roughly equally diverse groups of organisms.

For this and various other reasons, as information from various natural history fields is examined side by side more often, many biologists are de-emphasizing ranks or even calling for dropping them all together. A phylogenetic tree-based index system doesn’t use ranks. By centralizing access to the vast amounts of information so that an unknown name can readily be looked up to understand its context, a new interactive tool for accessing this tree information can nicely support the transition away from ranks, towards a rank-free biology. However, even as ranks imply unhelpful equivalency they have also helped people understand roughly where things are on the mental tree they have built through their experiences with biological information, and the new tool should seek to support this same important function in some other, rank-free and biologically appropriate manner. Creating a public mental map space on which new information can be related to previous information might provide an answer.

Recommendations

Provide a radial, rank-free map of biological information space

What should this public mental map of biological information space look like? We favor a radial layout, such as UCMP’s family tree49 or that seen in Figure 24, feeling that it would address some of the issues that otherwise arise with the composition of most two dimensional representations of evolutionary relationships.

Users bring conventions about composition to their efforts at interpreting visuals, such as reading a page (completing a task) from right-to-left and top-to-bottom, or seeing the bottom of a page as a horizon or ground and the top as sky.

Pinto and Ametller (2002) found that these conventions affected students’ ability to interpret scientific diagrams. Visualizations should trigger these conventions intentionally 49 http://evolution.berkeley.edu/evosite/evo101/IIAFamilytree.shtml

Denise Green and Rebecca Shapley 76

and in ways consistent with the meaning they seek to convey, and take care not to invoke them arbitrarily. Visualizations that require different conventions to be interpreted effectively must find a way to bridge from users’ experiences to the new convention.

Reading a page this way supports the conflation of concepts that should remain separate such as height, position, time, and importance. For example, this might be a risk with the hyperbolic tree.

Create good �map� software to allow simple views at any scale

Cartographers draw maps to show the relationships between points of interest, no matter how far apart. To do this effectively, focal features may not be drawn to scale. Teachers want to emphasize different stories within the Tree of Life, perhaps emphasizing the relationship between a group such as “primates” or “green plants,” and then the specific location of an organism being used in the lab. A system designed to present this information will need to have more highly developed visuals than what might be required for scientists to analyze it.

Semantic zooming involves showing a level of detail appropriate to the scale being viewed that is meaningful for the human, not a level of detail selected by a mathematical calculation (e.g. “5x Zoom” is mathematically calculated; “street detail” or “city detail” is a human-meaningful scale). The interfaces for Google maps50 and Yahoo! maps51 both attempt to provide semantic zooming. An interesting related concept is semantic depth of field, where the focal objects are kept in focus, and other objects are blurred52.

Although it is beyond our artistic abilities to mock up, we envision a radial tree of life map to the biological information space that zooms from the overview where the three domains of life are evident, down to the finest detail of the phylogenetic trees that biological systematists have prepared about the relationships between multiple individuals in one population of one species. Depending on your current level of zoom, the easy-to-read labels would represent the major groups within view on the screen. Smaller labels represent the next level of structure down, and a larger label represents a header, defining the group you are in. As you zoom in and out, these labels cycle through these different purposes, according to the scale you are viewing the information at.

50 http://maps.google.com 51 http://maps.yahoo.com 52 http://www.kosara.net/research/#SDOF

Denise Green and Rebecca Shapley 77

Figure 24. A diagram showing �our current best guess on the major groups of life and their relationships to each other� (Baldauf et al. 2004). This is a nice model of a radial layout, and the use of label design to convey tree structure information at the level that the viewer chooses to focus on it.

Denise Green and Rebecca Shapley 78

Figure 25. Screenshot of TaxonTree. In search of the Wow! button, we opened up a number of branches on the animal diversity tree. When it all fits in one window, the node labels get hard to see. Labels do appear on rolling over any node, which helps tremendously. We suggest that labels indicating larger patterns could also be visible at this scale to help orient the viewer.

Denise Green and Rebecca Shapley 79

Figure 26. Demonstration of a potential approach to identifying areas of a visual layout. Keywords are shown superimposed over a display of photos, indicating their clustered structure. Part of Figure 3 in Rodden et al. 2001. Imagine keywords such as �Birds� and �Sharks� superimposed over appropriate parts of the TaxonTree image shown in Figure 25.

Support focus-plus-context, using index nodes selected by systematists

Part one and three of Ben Shneiderman’s (1996) famous information visualization mantra, “overview first, zoom and filter, then details-on-demand” are addressed by focus-plus-context approaches. Hauser and Kosara (2004) suggest using “focus-plus-context visualization (F+C visualization) as a means to jointly support zooming into the visual depiction of the data while at the same time maintaining the visual orientation of the visualization user to support navigation in the visualization.” An important component of handling focus-plus-context effectively is to provide the user continuity across the different levels of detail.

Our usability evaluation participants often mentioned experiencing a lack of context within the visualization software we tested. We suggest that systematists should identify the importance of the various nodes within the compiled phylogenetic trees. When the

Denise Green and Rebecca Shapley 80

user is deep within the tree, a path back to the root showing only these most important branching points and eliding the numerous other nodes in between will help provide context. For example, when most of the screen is filled with the evolutionary relationships between mammals, a small part that connects back to the root’s location in tree map space might show three nodes: vertebrates, animals, and eukaryotes. These “important nodes” play a role similar to ranks, yet unlike the rank structure there is no limit on the number of important nodes, the nodes shown will vary based on where you are in the tree, and there is no unintended implication of equivalency with other parts of the tree.

Provide for simplification of views and details-on-demand through excellent support of branch manipulations

Part two of Ben Shneiderman’s information visualization mantra, “overview first, zoom and filter, then details-on-demand” suggests the need for search, filtering, hiding panels, branches, uninteresting ranges of the data, and otherwise reducing the complexity of what is shown on the screen.

In order to create the types of tree views that teachers and students need, where all the extraneous detail has been cleared away, the tree of life web application will need to provide excellent support for simplifying tree views. These are the features that will support the easy completion of all but the first Typical Task. The interactions should allow the user to focus on certain details of the tree view, while minimizing the screen space taken up by remaining parts of the structure. However, the other parts remain available to provide additional detail on demand. And critically, none of the manipulations to simplify or add more detail to the view will affect the underlying tree structure of the data.

To illustrate the manipulations required, examine the “Research Tree” (Figure 27) and “Teaching Tree” (Figure 28) prepared from the green plant branch of the tree of life53. We recommend going online to see this, but the figures should provide some sense of the task. Notice that near the bottom of the screenshots, small branching twig lines provide a preview of the tree structure that would be visible in other parts of the hyperbolic tree. The two trees show very similar overall structure to the preview twigs, while the Teaching Tree’s twigs clearly have a simpler structure.

53 http://ucjeps.berkeley.edu/htree_intro.html

Denise Green and Rebecca Shapley 81

These types of operations would be required to transform the given Research Tree into the given Teaching Tree. The examples are drawn from comparing the two trees either from near the root or from the Angiosperms node:

! Remove leaf node entirely—no other effect on structure (e.g.: Micromonadophic as a child of the root node; Choleochaet as a child of another node, drop Piperales off of Magnolids)

! Remove leaf node entirely and collapse now-orphaned branching point—i.e. a node with only one child into parent node (e.g.: Pleura, sharing node with Chlorop.)

! Add a label to a previously unlabeled node (e.g.: add "Ulvobionta") ! Collapse an unnamed node into a polytomy with a parent node (e.g.:

Nymphaeacea into Angiosperms) ! Remove a clade or branch entirely, and remove the orphan branching node (e.g.:

Astrobailyaceae/schisandraceae branch) ! Collapse a clade to show no more structure, and if needed, name the parent node

(e.g.: Eudicotyledone) ! Elide multiple orphan nodes in a row and represent with a number.

Although this example uses a branching tree diagram, our usability evaluations with the Treemap software suggested Treemap-style views would benefit from the same types of manipulations.

Denise Green and Rebecca Shapley 82

Figure 27. "Research tree" shown in a hyperbolic browser.

Denise Green and Rebecca Shapley 83

Figure 28. "Teaching tree", showing approximately the equivalent tree view as Figure 27. The data set behind this hyperbolic tree display has been simplified by hand from the �Research tree� data set, to move closer to a level of detail appropriate for educational use.

Bridging: Start Where People ARE, Mentally and in Tree-space Providing biologically-organized access to such substantial quantities of information is exciting, but promises to be overwhelming unless handled deftly.

Denise Green and Rebecca Shapley 84

It’s very important to rely as little as possible on training the audience through teaching or text. Not everyone interacting with this new application will have an appropriate teaching environment interpreting it for them, and the current state of evolution education in the country points to the failures inherent in letting people interpret diagrams from a non-biologically informed common understanding. Instead of starting at the root of all life 3.5 billion years ago, views of the tree should start where people’s questions are. Instead of starting where the theory starts, the user experience should start where the user’s question, context, and task starts. The interactive visualization application’s features must do whatever is within its power to bridge between the perspective and experience its users have and the information biologists collect and generate from the field’s perspective.

Numerous moments in interviews have made abundantly clear that even people with a high degree of familiarity with taxonomy, classification, and even phylogeny are not successful at starting from the root of the tree and navigating through the branching structure to the taxa of interest. It only takes one unfamiliar branching to make them feel dumb, and feeling dumb encourages people to leave, not to learn.

To start where the users are, we recommend the rigorous inclusion of pictures of organisms, robust and friendly search features, overviews with visible labels, previews of the nodes located beyond this particular branching, and the use of common language descriptions. To be useful in a K-13 educational context, the new interactive tree visualization application should seek to support effective navigation by non-experts, that is, by people innocent of or just recently introduced to the domain of biology.

Recommendations

Include pictures of organisms

Start with pictures. Middle school students and naturalists alike come to an information resource with the organism in mind, and there’s no better way to start where they are than to start with pictures. Pictures of representative taxa are more common in tree diagrams prepared for publication. Tools created just for scientific analysis may not require pictures, and many research and computer-generated trees don’t have them. However, if the information is intended for presentation to the educational community and the public, pictures are essential. Everyday experience of naturalists and educators indicates that pictures are critical to connect the presented information with students’ experiences with the real world.

Middle school teachers in our interviews suggested that the pictures should introduce the evolutionary relationships, rather than having one following the relationships to access the pictures at the end of the tree-tips. PhotoMesa54, another product from HCIL’s information visualization efforts, shows a landscape of the photos in your collection, grouped by the folders they are in. We envision an application that is a hybrid between PhotoMesa and Treemap where each taxa has its representative photo or icon, and the 54 http://www.cs.umd.edu/hcil/photomesa/

Denise Green and Rebecca Shapley 85

clustering is nested according to evolutionary relationships. The first impression is a vast variety of organism images, as in the example mocked up in Figure 30. Algorithms to optimize layout and thumbnail generation and the application of color contrast and perception theories not currently possessed by these authors could both improve the visual effectiveness of this approach.

Figure 29. Screenshot from a trial with PhotoMesa and a folder hierarchy of photos from the Tree of Life web project's green plants branch.

Denise Green and Rebecca Shapley 86

Figure 30. By �hybridizing� PhotoMesa�s photo management and zoom features (Figure 29) with Treemap�s nesting display (Figure 6), this mock up suggests that users could start their exploration of the tree of life from the organisms themselves, rather than the more abstract concepts of the tree�s root and branches.

In our usability explorations with Treemap, a lot of screen space was spent showing the nesting relationships of the deep tree. Treemap would be better for viewing a data set with a higher branching factor, such as a classification tree, or a much less well-resolved phylogenetic tree. It might be worth losing some resolution in the phylogenetic tree

Denise Green and Rebecca Shapley 87

structure to use a Treemap view to show pictures or the distribution of biological attributes across the tree.

Include pictures of characters

Characters and character states should be described and illustrated or photo-illustrated, so that users don’t have to guess. Close-ups of small characters, such as many morphological characters on insects, should be located on a large view of the entire organism.

Describe groups in multiple ways on the diagram

Node labels are essentially descriptions of the group of taxa included in the child nodes. These labels can include bio-latin, but should also include common language descriptions (e.g. “bony-tongued fish,” “ferns,” or “flowering plants”). Ideally, internal nodes can be illustrated with an artist’s recreation of the common ancestor, complete with the visible morphological character states that biological systematists use to infer the common ancestor of the group. Until the artist can be hired, groups can also be described by the two pictures of the representative organisms that define that clade in the same way that the Phylocode55 does, and the common names for those organisms. However, caution should be taken here not to imply by the placement of the pictures on branching diagrams that these pictures of representative organisms are pictures of the common ancestor. See “Examine the placement and representation of nodes” for more resources on this issue.

Develop search for the common user

Develop powerful search features that support common names, suggest matches to spellings of scientific names, use preattentive visualcolor coding (Stolte et al. 2002) to indicate the distribution of search match results on the tree letter-by letter (see the search features of SpaceTree). Display the search terms and provide easy ways to modify them to narrow the results, and use other good practices for easy-to-use search. Note that good search experiences are not easy to find! Given the size of the information space and the experience of the users, this is a very important aspect on which to exert programming and usability resources. Evaluate the search features using heuristics and usability tests. See if it works for real users!

Heuristics for evaluating search features have been developed by Louis Rosenfeld,56 Studying Google,57 and Flamenco.58 They provide additional, up-to-date suggestions such as suggested completions, preview numbers for size of search results, feedback in bold showing where text search matches were found, one click refinement of search terms, and tight integration between searching and browsing.

55 http://www.ohiou.edu/phylocode/ 56 http://louisrosenfeld.com/home/bloug_archive/000290.html 57 http://www.google.com/ 58 http://flamenco.berkeley.edu See especially the talks and publications, which describe the approach to search developed in Flamenco.

Denise Green and Rebecca Shapley 88

Develop, bookmark, and distribute useful views

Survey educational materials to see what types of tree diagrams are frequently used. With a tree visualization application designed to allow simplification and bookmarking, have a project member actually develop the same tree views and save them (very similar to Task two). Distribute these bookmark URLs widely. Try to get it to websites that offer web resources for teachers using the curriculum. Make them readily available on the application’s front page. Highlight their availability for people browsing the tree nearby. Set up an index by curriculum with links to the pre-defined views.

For example, the UCMP curriculum “What did T. rex taste like?” features a simple vertebrate tree (Figure 13) showing the relationships between a shark, tuna, frog, human, hare, parrot, and T. rex. The web resources page of the curriculum guide could link to the same view of the tree at the new interactive tree visualization website, where students can play with the branch lengths, rotate nodes, and even un-hide other branches to see where they fit. Examples based on the Campbell and Reece textbook 8th edition, which uses phylogeny to organize parts of the content, might include the two alternative invertebrate trees. The Green Plant Tree of Life website might link to a simple overview tree of the green plants branch, allowing researchers and amateur botanists to drill down into the parts they want to know more about.

To help future curriculum development efforts and to serve other audiences, the biological systematists responsible for various parts of the tree may want to develop views of their parts of the tree that are great for sharing the important divergence events and major groupings at different levels of detail, appropriate for different audiences. These maybe appropriate entry points for browsing the website.

Provide interpretation guidance in context

As Donovan and Wilcox (2004) suggest, the view or text accompanying the view on the same page should provide any guidance needed to interpret the tree diagram. Mechanisms for doing this include captions and rollover text. Don’t expect people to go to a separate page to read up on what they see. Since accumulated experience in the usability community indicates that Help features are infrequently accessed and many users don’t read on-screen text, Help and documentation should redundantly describe features that are as much as humanly possible already intuitive and discoverable by users. Find out if features are intuitive by testing them with real users.

Display labels clearly, without overlapping other elements

Good labeling is important to help users locate familiar terms on the tree visualization. Labels should not abbreviate names. They should include both scientific names and common names or common language descriptions (e.g. “bony-tongued fish”) wherever possible.

Denise Green and Rebecca Shapley 89

Figure 31. Screenshot from SpaceTree with the Tree of Life web project data. The visual overlap of the leaf node Medullosaceae creates the impression that it is a parent node. Caytoniaceae�s location is also problematic. Future tree visualization applications should check for and avoid these situations.

Users of the hyperbolic tree have reported being frustrated with being unable to adjust the view to see all of the labels clearly, as can be seen somewhat in Figure 3 and Figure 27 (Xiang et al. 2004). Figure 31 shows a screenshot from SpaceTree, where the overlap of the node label and the lines connecting nodes causes confusion for interpreting the branching structure. Robertson et al. (2005) describes implementing algorithms to check for overlap and adjust labels or bend the connecting lines accordingly.

Develop and consistently use presentation-quality labeling conventions

The diagrams from the Campbell and Reece (2005) textbook shown in Figure 32, Figure 33, and Figure 34 are quite impressive, doing a nice job of using color, pictures, and careful layout to communicate their core idea. The labeling scheme is optimized for the concept being conveyed by the diagram, and the space it has to convey it in. The labels are very readable. The conventions follow and improve upon those used in phylogenetic scientific publications. While less variation in the labeling would be better, these diagrams set the standard for the clarity, simplicity, and readability of tree views produced by a new interactive visualization tool intended for use by teachers and students.

Denise Green and Rebecca Shapley 90

Labels on internal nodes are at high risk for polysemy. The node simultaneously indicates the inferred existence of a common ancestor and defines a grouping of extant taxa. The label and pictures on the internal nodes could be about this common ancestor. Pictures and descriptions at the node could select representative taxa within the modern-day group defined as the descendents of this node. Placing these items at the internal node does help to define it for modern day observers, and at the same time risks the misinterpretation that these representative taxa existed at that prior period of time. While the specifics of how an application handles this difficulty will be decided in the future, we suggest that this point should be considered carefully in the development of labeling conventions and other node representations.

Consistent conventions should be developed for labeling monophyletic groups, labeling paraphyletic groups, and labeling monophyletic and paraphyletic character states. Note that our member of the public looking at Figure 10 was confused by the variety of different ways in which monophyletic groups were labeled. We suggest that placing monophyletic group labels within the structure of the tree encourages students to engage with the structure, and is better pedagogically. It is may be necessary, however, to label paraphyletic groups with lines across from the tree’s tips, as has been done in Figure 8. The use of color to define groups should also be explored.

Denise Green and Rebecca Shapley 91

>

Figure 32. The inconsistency of labels. This part of a figure from Campbell and Reece (2005) uses both color and bracket labels above the branching diagram to identify clades. Leaf node groups are labeled with bio-latin and don�t have pictures.

Denise Green and Rebecca Shapley 92

Figure 33. The inconsistency of labels. This figure from Campbell and Reece (2005) uses nested color clouds to define clades, selected to coincide with classification ranks. Leaf nodes are labeled with common names and illustrated with realistic icons. The tree is labeled with characters that define the group, such as �Carnivorous (meat-eating) teeth�, which defines the Order Carnivora.

Denise Green and Rebecca Shapley 93

Figure 34. The inconsistency of labels. This figure of a classification tree from Campbell and Reece (2005) uses illustrations and the same color coding to indicate classification ranks as Figure 33, which is a wonderful display of consistency. However, everything else about the labeling is different. Here, labels directly on tree branches name monophyletic clades while on the previous tree they list one key character that defines the clade. Leaf node labels include both common and bio-latin names. Labels spanning the tips of multiple branches can name taxonomic groups, monophyletic clades, and/or paraphyletic groups (not shown). Labels can be in bio-latin, common language, or both.

User-Centered Design The broader an audience a tool is intended for, and the less training that users can be expected to receive in using it, the more important good usability becomes when designing a tool. Our comparative-exploratory usability evaluations suggested various insights into usability issues with the existing visualization software we tested. Heuristic evaluation can also uncover important issues before placing software in front of users. These are both important tools in the process of user-centered design.

Because this application is envisioned as a major public information infrastructure for use by teachers and students, accessible design practices, also known as universal usability,

Denise Green and Rebecca Shapley 94

should be followed. Indeed, all websites created with United States Government federal funding are required to be Section 508 compliant, and California has a similar law. Section 508 establishes accessibility requirements, although the community is still developing best practices for implementing these requirements. Before, during and after developing an application, developers should seek up-to-date resources on accessibility practices in general, within information visualization, and as implemented within the software toolkit used.

Universal usability is not just for a minor, fringe segment of the target audience. Implementing these features should help biologists be able to use the application even as their eyesight gets worse with age, and assist the application with making the transition to mobile and hand-held devices teachers may use in the science lab.

Software is always changing, and we expect that new software may be developed for sharing the Tree of Life data. Here are some recommendations for developing or evaluating the usability of future resources.

Recommendations

Integrate User-Centered design into the development process

Developing an application that the target audience doesn’t use is a waste of money. It’s very easy to do and happens all the time. To avoid this waste, have a usability-trained person involved all the way along. Test, test, and test some more – test ideas, test prototypes, and test assumptions. Listen to what these efforts discover, and make a development schedule that allows indicated changes to be integrated into the application.

Support �undo,� �back,� or a history list for views and manipulations

Through the course of interacting with the application, the user may create a variety of views. Going back to a view seen earlier allows the user both to undo mistakes, and to follow up on other interesting details that might have been encountered along the way, similar to the Back and History features of a web browser. The various states that the application can produce should be discrete, allowing the exact same state to be recreated again during the same interaction or some future interaction. This recommendation developed readily out of our own experiences with the software during the exploratory-comparative usability evaluations and one scientist’s comment that when working with the hyperbolic tree, screenshots taken before are essentially impossible to replicate again. In an information tool used by scientists, others need to be able to see the same view again!

Minimize the number of conventions users must learn

It’s important to minimize the number of convention users must learn in order to interpret the visual language used by the visualization. Pinto and Ametller (2002) found that each field of science develops its own visual language, consisting of a set of conventions used in diagrams. Interpreting visuals effectively requires learning this language and discerning the idiosyncrasies of the given diagram. For an interactive visualization, this suggests care with introducing icons, colors, and other visual cues about interactive

Denise Green and Rebecca Shapley 95

affordances. Users will have to distinguish between visual features that have biological meaning, and visual features that have interactive meaning.

Examples of these visual features include, the triangles and tree-previews in SpaceTree (see Figure 4) and the color coded dots in TaxonTree (see Figure 5). We have experimented with our own suggestions, and realize that the number of features for manipulating branches suggested within this paper would be unworkable if they resided directly on the visualization’s structure.

Animate changes between states

To preserve users’ orientation with the material, animate changes between states. Yee et al. (2001) suggest animation as an effective way of viewing dynamic graphs. There is a good literature on animation to give computer objects perceptual validity to humans, including tricks such as starting slow, moving quickly in the middle of a movement trajectory, and slowing as approaching the end, even bouncing slightly as if “settling” into the end state. SpaceTree and TaxonTree do this well, and it helps the user maintain their sense of context.

Use pre-attentive encoding

Pre-attentive color encoding can effectively indicate paths to search results or other nodes of interest. Preattentive encoding can be done with color, size, and shape. However, used inappropriately, it can result in users misinterpreting structural features as representing quantities, or lead to 3-D perception artifacts in a 2-D environment (Sebrechts 1997).

Support exploration by achieving benchmark system response times

Humans are slower than computers, but will also notice a delay. Slow system response will increase the cost of exploration and limit users to known tasks. The HCIL at the University of Maryland does a nice job of integrating real computer science optimization with usable visualization applications – try to leverage their experience if possible.

Make display resizable

The display size should be adjustable and the visualization’s information view should be effective at any size without requiring excessive scrolling.

Ensure color contrast

Colors should contrast. Check that the contrast can be seen under most visual impairments, and when printing in black and white. This is especially important if data is being conveyed with color. Consider providing redundant alternative ways to display color-encoded data.. The wellstyled.com Colorscheme Generator59 can help.

Provide for magnification of text sizes

Provide for magnification of text sizes either by the browser or within the application. Hyperbolic tree currently has an implementation of this.

59 http://wellstyled.com/tools/colorscheme2/index-en.html

Denise Green and Rebecca Shapley 96

Provide manipulation alternatives

Especially if mouse-based interaction with the visualization requires pointing to very precise parts of the screen or holding down the mouse button (click and drag), provide menu-based and key-stroke based alternatives to accomplish the same interaction. The SpaceTree application has pioneered this interaction.

Support screen-readers

These types of technology translate the application for users with a variety of challenging conditions. Support them and make your job of meeting your equality obligation easier. Use alt tags in HTML. Provide any appropriate equivalent to skip navigation links. Identify the accessibility features supported by the interactive plug-in and use them. Identify the core key relationship information being conveyed by the visualization and develop ways to communicate it with text and/or sound. Think about the order in which text elements are encountered on the screen, and how that best supports the order in which users need the information.

Resources

Heuristics for Evaluating Application Deigns and Prototypes

Heuristic guidelines are intended to be used by several usability-trained reviewers, with the idea that compiling all of their comments together will generate substantial insight into the usability improvements that can be made in a piece of software. While heuristic evaluations can reduce the number of actual users that are needed in order to find all of the usability issues with software, there is of course no actual substitute for real users, so usability testing should be included in the work plan, too.

For usability of the application in general, we recommend using the heuristic guidelines developed by design consulting company Adaptive Path,60 based on Jacob Nielson’s pioneering work.61 Heuristics for evaluating an application design involving multiple panels have been developed by Baldonado et al (2000). Search features also deserve special attention, as described in the section above, “Develop search for the common user.”

Further resources on designing effectively for mobile devices can be found at Little Springs Design.62

60 http://www.adaptivepath.com/events/workshops/tour2002/files/14_heuristics.doc 61 http://www.useit.com/papers/heuristic/heuristic_list.html 62 http://www.littlespringsdesign.com/design/

Denise Green and Rebecca Shapley 97

Conclusion Through interviews with teachers, education experts, and biological systematists, we learned a great deal about the challenges of developing a scientific database for use by teachers and students in the middle school through graduate school biology classroom.

These interviews and our survey results helped us develop several recommendations for the features of a classroom-ready interactive Tree of Life visualization web application. Our recommendations focus on features that support teaching, although some features may support biologists, too. Primarily, the features of this application must serve to help students learn the concepts that teachers are obligated to cover. The application must avoid pitfalls like representing a new topic for teachers to cover, conflicting with state educational standards, and being difficult to learn and use. Avoiding these pitfalls starts with leveraging the potential within the phylogenetic perspective to organize and provide additional coherence to the newest generations of biology curriculum materials.

Keeping teachers in the development loop for the new Tree of Life visualization application is a critical contribution to developing a win-win situation for biologists and educators. Good curriculum development practices and user-centered design principles provide models for teacher involvement. Casting the efforts of the CIPRES project’s outreach work within this larger context creates the potential for the new application to be more than a novelty used by a few private-school teachers, but instead to make a lasting impact on the quality of biology education nationwide.

Biological systematists and classroom biology teachers emphasize different requirements from a centralized visual access to all Tree of Life information. A very real danger is that the collaboration will founder upon the difficulty of serving both audiences effectively. Where biologists need access to all published phylogenetic trees, teachers need select examples developed into educational materials. Where biologists need the information but don’t need it to be pretty, teachers need spare, simple, presentation-quality visuals that won’t confuse their students with extra details. Other digital library projects have stumbled over similar issues when collaborating groups had different requirements for the level of editorial quality needed in the data (Nancy Van House, Pers. Comm).

From our perspective at the end of this project, we feel like our work has only begun. Our participants have helped us generate exciting ideas about application features that leverage interactivity to support students develop tree-thinking and understand the nature of science, and a visualization approach that helps teachers, students and the public develop a mental map of the tree of life while being able to work with simple views of the relationships at any relevant location. Important future work remains in defining the details of implementing our recommendations within an actual application, and testing the student learning outcomes that result from using it in the classroom.

Denise Green and Rebecca Shapley 98

Acknowledgements This research was supported by the National Science Foundation via the CIPRES project (http://www.phylo.org/), a multi-site collaborative grant funded by the NSF Information Technology Research program, entitled "BUILDING THE TREE OF LIFE: A National Resource for Phyloinformatics and Computational Phylogenetics."

The work received clearance from the University of California at Berkeley Committee for the Protection of Human Subjects under application 2005-2-34.

Our academic advisors Dr. Nancy Van House (School of Information Management and Systems, U. C. Berkeley) and Dr. Brent Mishler (Department of Integrative Biology and the Jepson and University Herbarium, U. C. Berkeley) provided invaluable feedback and guidance to the shape and quality of this project. Dr. Sam Donovan of the Department of Education at the University of Pittsburgh, Judy Scotchmoor of the University of California Museum of Paleontology, and Sue Boudreau from Orinda Intermediate School offered critical perspectives on the future of evolution education.

We are very appreciative of the time our interview participants spent with us. Members of the CIPRES project, the Bay Area BioSystematists group and the Jepson and University Herbarium contributed immeasurably to the quality of our results, and we thank those groups for finding our inquiry worth their time. We would also like to thank Cynthia Parr from the Human Computer Interaction Laboratory at the University of Maryland for being amazingly responsive to our questions about the HCIL software packages. The U. C. Museum of Paleontology kindly gave permission for the use of their vertebrate tree image in our visual communications.

Thanks to Kim Rathbun and Laurie Shapley for their amazing support. Finally, we would be remiss if we didn’t thank Zachary’s Pizza on Solano Avenue.

Denise Green and Rebecca Shapley 99

References Alters, Brian J. and Craig E. Nelson. (2002). Perspective: Teaching Evolution in Higher Education. Evolution, 56:10 pp. 1891-1901.

Baldauf, S. L., D. Bhattacharya, J. Cockrill, P. Hugenholtz, J. Pawlowski, A. G. B. Simpson. (2004). The Tree of Life: an overview. Chapter 4 in Cracraft and Donoghue, eds. Assembling the Tree of Life. Oxford University Press.

Baldonado, Michelle Q. Wang; Allison Woodruff and Allan Kuchinsky. (2000) Guidelines for Using Multiple Views in Information Visualization. Advanced Visual Interfaces 2000 http://citeseer.ist.psu.edu/wangbaldonado00guidelines.html

BSCS. (1998) Biology: An Ecological Approach. 8th Edition, BSCS Green Version. Kendall/Hunt Publishing Company, Dubuque, Iowa.

Campbell, Neil A. and Jane B. Reece. (2005) Biology. 8th Edition. Pearson Education, Inc., publishing as Benjamin Cummings.

Clark, Constance Areson. (2001). Evolution for John Doe: Pictures the Public, and the Scopes Trial Debate. Journal of American History, March 2001 87:4. http://www.indiana.edu/~jah/teaching/2001_03/article.shtml

Cracraft, Joel and Michael J. Donoghue. (2004) Assembling the Tree of Life: Where we stand at the beginning of the 21st century. Chapter 34 in Cracraft and Donoghue, eds. Assembling the Tree of Life. Oxford University Press.

Donovan, Sam and Laura Wilcox. (2004) Tree figures in texts: A framework for unpacking their potential. Poster presented at the Society for the Study of Evolution meeting, Ft. Collins, CO. June 26-30, 2004. http://www.lrdc.pitt.edu/donovan/products.html

Gould, Stephen J. (1995) Ladders and Cones: Constraining Evolution by Canonical Icons. In Silvers, Robert B., ed. Hidden Histories of Science. New York Review of Books: New York.

Gould, Stephen J. (1997) Redrafting the Tree of Life. Proceedings of the American Philosophical Society, 141:1 pp. 30-54.

Hauser, Helwig and Robert Kosara. (2004) Interactive Analysis of High-Dimensional Data using Visualization. VRVis Research Center in Vienna, Austria. http://www.vrvis.at/TR/2004/TR_VRVis_2004_004_Full.pdf

Hennig, Willi. (1966). Phylogenetic Systematics. University of Illinois Press.

Denise Green and Rebecca Shapley 100

Kobsa, Alfred. (2004) User Experiments with Tree Visualization Systems. Proc. InfoVis 2004, pp. 9-16.

Misue, Kazuo, Peter Eades, Wei Lai, and Kozo Sugiyama. (1995) Layout adjustment and the mental map. Journal of Visual Languages and Computing, 1995:6 pp.183-210.

Parr, Cynthia Sims, Bongshin Lee, Dana Campbell, and Benjamin B. Bederson. (2003) TaxonTree: Visualizing Biodiversity Information. Proceedings of AVI, (2004), pp. 320-327.

Pinto, Roser and Jaume Ametller. (2002) Students’ difficulties in reading images. Comparing results from four national research groups. International Journal of Science Education 24:3 pp. 333-341.

Plaisant, C., J. Grosjean, and B. B. Bederson. (2002) SpaceTree: Supporting Exploration in Large Node Link Tree, Design Evolution and Empirical Evaluation. Proc. InfoVis 2002, New York: IEEE, pp. 57-64.

Robertson, George G., Mary Czerwinski, Kevin Larson, Daniel C. Robbins, David Thiel, and Maarten van Dantzich. (1998) Data mountain: using spatial memory for document management.

Robertson, George G., Mary P. Czerwinski, and John E. Churchill. (2005) Visualization of mappings between schemas. CHI 2005 Papers: Interactive Information Visualization pp. 431-439.

Rodden, Kerry, Wojciech Basalaj, David Sinclair and Kenneth Wood. (2001) Does organisation by similarity assist image browsing? Proceedings of Human Factors in Computing Systems (CHI 2001) ACM Press, pp. 190-197. http://citeseer.ist.psu.edu/rodden01does.html

Rubin, Jeffrey. (1994) Handbook of Usability Testing: How To Plan, Design, And Conduct Effective Tests. John Wiley & Sons, Inc., New York, pp. 31-46.

Shneiderman, B. (1996) The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations. HCI Lab, Inst. Systems Research, Inst. Advanced Computer Studies, Dept. of Computer Science Tech. Report CS-TR-3665, Univ. of Maryland.

Stolte, Chris, Diane Tang, and Pat Hanrahan. (2002) Polaris: a system for query, analysis and visualization of multidimensional relational databases. IEEE Transactions on visualization and computer graphics, 8:1.

Sebrechts, Mark M. (1997) Using theories of perceptual dimensional interaction to improve information visualization design. CODATA Euro-American workshop, June 1997. http://www.codata.org/archives/1997Vis/pp7.htm

Denise Green and Rebecca Shapley 101

Xiang, Yang, Michael Chau, Homa Atabakhsh, and Hsinchun Chen. (2004) Visualizing criminal relationships: comparison of a hyperbolic tree and a hierarchical list. Decision Support Systems, 15 pages.

Yee, Ka-Ping, Danyel Fisher, Rachna Dhamija, Marti Hearst. (2001) Animated Exploration of Graphs with Radial Layout, in IEEE Infovis Symposium, San Diego, CA. http://bailando.sims.berkeley.edu/papers/infovis01.htm

Denise Green and Rebecca Shapley 102

Appendix

Denise Green and Rebecca Shapley 103

Interview Consent Form Interview Consent Form

We are Denise Green and Rebecca Shapley, graduate students in the School of Information Management and Systems at the University of California at Berkeley. We would like to invite you to take part in our research study, which looks at how people use Tree of Life information in their teaching and learning, and how that information would best be represented by software tools.

If you agree to take part in our research, we would like to conduct an interview of about 1 hour at a time and place of your choosing. We will ask you about how you use Tree of Life information in teaching and learning activities. With your permission, the interview will be audio taped. We may ask to contact you by telephone or email if there are any follow-up questions we have after our interviews.

There are no known risks to you from taking part in this research, and no foreseeable direct benefit to you either. However, it is hoped that the research will benefit others (or science) by contributing towards the development of future software tools useful in teaching this topic.

All of the information that we obtain from you during the research will be kept confidential. We will store the tape recording and notes about it in a locked file. Each person we interview will have their own code number so that no one other than us will know who you are in our notes. The key to the code of names will be kept in a separate locked file. Your name and other identifying information about you will not be used in any reports of the research. The audio recording is for private use by the student researchers to recheck conclusions made about the interview, and will not be used for any public or publishing purpose.

Your participation in this research is voluntary. You are free to refuse to take part. You may refuse to answer any questions and may stop taking part in the study at any time.

If you have any questions about the research, you may contact Denise Green, at (510) 524-2447 or [email protected] or Rebecca Shapley at (925) 280-7865 or [email protected]. If you agree to take part in the research, please sign the form below. Please keep the other copy of this agreement for your future reference.

If you have any question regarding your treatment or rights as a participant in this research project, please contact the University of California at Berkeley’s, Committee for Protection of Human Subjects at 510/642-7461, [email protected].

I have read this consent form and I agree to take part in this research.

_________________________________________ ____________

Denise Green and Rebecca Shapley 104

Signature Date

________________________________________

Print Name

Check one:

____ Yes, I give permission to be audio-taped.

____ No, I do not give permission to be audio-taped.

Denise Green and Rebecca Shapley 105

Introduction to the Survey

Denise Green and Rebecca Shapley 106

Survey Results

Denise Green and Rebecca Shapley 107

Denise Green and Rebecca Shapley 108

Denise Green and Rebecca Shapley 109

Denise Green and Rebecca Shapley 110

Denise Green and Rebecca Shapley 111

Denise Green and Rebecca Shapley 112