The impact of organization of personal content in computing

The impact of organization of personal content incomputing environments

A study of personal information managing in computer systems and users’awareness thereof

BSc thesis – Bachelor of Science thesis

Author: Johan Lund

Institution: SSKKII – Göteborg University

Supervisor: Pierre Gander

Year: 2003

2

Abstract

Humans are prone to save energy, when it comes to doing cognitive work. Thistogether with the constraints of a hierarchical file system and a special kind ofdata found in all Microsoft Windows operating systems, called hidden content,that cannot be moved or easily found, complicates content migration. To buy oracquire a new computer or a different operating system implies a migration ofyour personal content to that system. This empirical study shows thatexperienced users believe this migration procedure to be easier while noviceusers expect to get help from others. Whether technical and cognitivedifficulties, such as poor organization of user data, leads to a resistance fromusers to upgrade existing software and hardware remains under debate.

Keywords: HCI, human-computer interaction, interaction design,cognitive science, content, information infrastructure

3

Index

Abstract.....................................................................................................................................................21. Introduction.....................................................................................................................................51.1 Questions and hypotheses ..........................................................................................................71.1.1 Main question.............................................................................................................................71.1.2 Main hypothesis .........................................................................................................................71.1.3 Sub questions..............................................................................................................................81.1.4 Sub hypotheses...........................................................................................................................81.1.5 Theoretical hypotheses ...............................................................................................................91.2 Purpose.....................................................................................................................................101.3 Delimitation..............................................................................................................................102. Background ...................................................................................................................................112.1 Fields of research......................................................................................................................112.2 Terminology.............................................................................................................................112.2.1 Content .....................................................................................................................................112.2.2 Content object ..........................................................................................................................122.2.3 Placing content (direct) ............................................................................................................122.2.4 Hidden content .........................................................................................................................122.2.5 Content management................................................................................................................122.2.6 Content infrastructure...............................................................................................................132.2.7 Retrieving content ....................................................................................................................142.2.8 Lost content ..............................................................................................................................142.2.9 Content spreading.....................................................................................................................142.2.10 Content migration ........................................................................................................................142.3 The nature of file systems.........................................................................................................142.4 System data vs. user data..........................................................................................................152.5 Theories....................................................................................................................................152.5.1 Memory....................................................................................................................................152.5.2 Working memory .....................................................................................................................152.5.3 Chunking ..................................................................................................................................162.5.4 Rehearsal loop .........................................................................................................................162.5.5 Long-Term Memory and organization .....................................................................................172.5.6 Clustering in free recall ............................................................................................................172.6 Basic-level categories...............................................................................................................172.7 External cognition ....................................................................................................................182.7.1 Externalizing to reduce memory load.......................................................................................182.7.2 Computational offloading.........................................................................................................182.7.3 Annotating and cognitive tracing .............................................................................................192.8 Attention...................................................................................................................................192.8.1 Inattentional blindness..............................................................................................................193. Method ..........................................................................................................................................203.1 Participants ...............................................................................................................................203.2 Material ....................................................................................................................................203.3 Procedure..................................................................................................................................214. Results and discussion...................................................................................................................234.1 General results..........................................................................................................................234.2 Hypothesis-related results ........................................................................................................234.2.1 Hypothesis 1.............................................................................................................................234.2.2 Hypothesis 2.............................................................................................................................244.2.3 Hypothesis 3.............................................................................................................................264.2.4 Hypothesis 4.............................................................................................................................274.2.5 Hypothesis 5.............................................................................................................................274.3 Relating hypotheses to theory ..................................................................................................284.3.1 Overview..................................................................................................................................284.3.2 Hypothesis 1.............................................................................................................................284.3.3 Hypothesis 2 and 5 ...................................................................................................................294.3.4 Hypothesis 3.............................................................................................................................304.3.5 Hypothesis 4.............................................................................................................................30

4

4.3.6 Hypothesis 6 and 7 ...................................................................................................................314.3.7 Hypothesis 8.............................................................................................................................324.4 Main hypothesis .......................................................................................................................335. Conclusion ....................................................................................................................................355.1 Future research .........................................................................................................................366. References.....................................................................................................................................387. Appendix.......................................................................................................................................397.1 Data ..........................................................................................................................................397.1.1 Likert scales questions..............................................................................................................397.1.2 Yes or no questions ..................................................................................................................427.1.3 Descriptive questions ...............................................................................................................447.1.4 Choice questions ......................................................................................................................46

5

1. Introduction

It is my belief that people in general don’t want a new personal computer everyday, even if the new computer is slightly better than the current one. Even if thenew one is free! Slightly better just isn’t enough, not worth the hassle. It is kindof like moving to a new house or flat. You wouldn’t do it everyday even if itwas free and slightly better. There is a certain amount of work involved inmoving to a new home, the physical work being to move all the furniture. Butthere is also a large amount of cognitive work that has to be done for the newhome and old thoughts will be lost with the old home.Where and how to place all the furniture is one example of this, perhaps youwill need to acquire new furniture and old furniture won’t fit in anymore.People cherish their work and their thoughts and don’t willingly leave withoutthem. Of course there are people who would go through with the painfulmentioned process, everyday, and just to get a better home for free. I don’tthink it’s the process of moving that they like, just having the best thingpossible.

The situation, with homes, described above is in many ways similar to theproblems you face when getting yourself a new computer system. Few peoplestick with the same system forever, and the ones that do, have reasons to dothis. At some point in your life you will want to upgrade, or your system willseize to function. It is my personal experience that people avoid coming to thispoint of change for as long as they can. If I am correct it is a huge problem forsoftware and hardware vendors. For instance when Microsoft releases a newversion of their operating system “Windows”, they want people to buy it. Ifpeople don’t buy it, Microsoft will lose money. Hardware manufacturersrelease new and improved computers several times per year. If people don’tbuy those, they too will have cut profits. For the average users this could meanthat new and improved software is released less frequently. An excerpt fromMicrosoft’s fiscal report 2003, presented in Table 1, indicated that client salesare flat.

6

(In millions) RevenueOperating

Income/(Loss)

Three Months Ended Sept. 30 2002 2003 2002 2003

Client $2,807 $2,809 $2,270 $2,264

Server and Tools 1,625 1,866 297 370

Information Worker 2,268 2,287 1,664 1,591

Microsoft Business Solutions 106 128 (94) (79)

MSN 427 491 (147) 58

Mobile and Embedded Devices 28 53 (65) (32)

Home and Entertainment 485 581 (245) (273)

Corporate and Other — — (652) (751)

Consolidated $7,746 $8,215 $3,028 $3,148Table 1

“Client revenue was $2.81 billion in the first quarter of fiscal 2003 and 2004. Client includesrevenue from Windows XP Professional and Home, Windows 2000 Professional, and otherstandard Windows operating systems. Client revenue in the first quarter was flat compared tothe first quarter of fiscal year 2003 at $2.81 billion driven by a flat reported license growthand no year over year change in product mix.” (MICROSOFT CORPORATION, FORM 10Q, For the Quarter Ended September 30, 2003:13)”

You have your home set up the way you want it. It’s cosy; you know whereeverything is and how everything works. All the little peculiarities of yourhome are familiar to you. Same thing goes with the computers of today, butcomputers have one big advantage. On a computer you can transfer thingselectronically and perfectly, effortlessly and quickly. Or can you?

I believe that the solution to these problems can be found in organization.People refrain from organizing if they can get away with it. Unfortunately thesame problem seem to exist with the makers of the software we use, they arehumans too after all. Perhaps the software should force users into organizing orperhaps the software should organize for us, or maybe we should have astandard way of organizing our data. There are problems with all the abovesolutions because people are different, have different needs and different levelsof knowledge in different areas.

I think that the way the folders are organized on user’s hard drives, providingthey organize them at all, are “loose” versions’ of how they themselvesorganize and think of things, or as Norman (1997) puts it, an externalization ofcognition. Eleanor Rosch invented the concept of Basic-level categories. Shehas argued that there is a “natural” level of categorization, neither too specific,nor to general. (Eleanor Rosch, referenced in Reisberg, 2001:279) Perhaps thenames that people assign to their folders, on their computers, are also neithertoo specific, nor to general. Perhaps there is a connection between the two;

7

folder names and basic-level categories. That would be a good sign thatcognitive material is externalized in the organization of files and folders. Isuspect, that it is when not enough effort is put into maintaining thisorganization and as the amount of data increases over time that problems offinding and accessing data start to appear.

“Interpretation is discovery; we find coherence – order and sense. /---/Discerning the sense is simply a way of discerning order; that’s whyorder and sense are, at root, two sides of one principle; coherence”.

(Haugeland, 2000:96)

As the organization of content on your hard drive become less and lesscoherent, it gets harder and harder to interpret, make sense of, and remember.So when you try to put new content into an already incoherent structure youmake more mistakes, effectively making it even less coherent. It’s a nastycircle, to which there is no cure, except a lot of hard work.

I speculate that if organization is bad, you might not know where all theimportant data resides. You may also be aware that there is important databuried somewhere in the structure, waiting to be needed, but you haveforgotten what data it is. If you were to transfer all important data to a newcomputer, you would be leaving some of it behind, that you might need,months later. So you are stuck.

1.1 Questions and hypotheses

My own experience is that I often find myself puzzled in front of the computer.I repeatedly ask myself “Now where did I put that file?” or “What was thename of that file again?” and “where should I put this kind of file. I find it hardto remain consistent. I know that migrating my, messy, system to a differentcomputer, certainly would be hard and prone to errors. I therefore avoid it. Iwould like to know if others have problems like these as frequently as I have.Further more I would like to gather knowledge of how this troublesomesituation was allowed to occur.

1.1.1 Main question

“Does the nature of users’ content, such as poor organization, lead tounwillingness from users’ to migrate from one system to another”.

The word “nature” means considering all factors. Poor organization is anexample of such an influencing factor. Other factors may be various technicaland cognitive limitations.

1.1.2 Main hypothesis

8

The main question easily translates into a hypothesis.

“Content organization influences users’ willingness to migrate from one systemto another”.

1.1.3 Sub questions

The main hypothesis gives birth to an array of questions that have to beanswered, that in turn raise even more questions. I will try to drill down theunfolding tree of questions, in order to support the main hypothesis.

Question A“Do users think that personal data is hard to find and move?”

Question B“How much effort do users put into organizing their personal data?”

Question C“What factors contribute to making a transfer of content to another systemproblematic?”

Question D“What strategies do people employ to survive in the information jungle?Finding their files, documents etc?”

Question E“How aware are users of the fact that some data cannot be moved or is hard tofind?”

Question F“What constitutes a good or poor organization of data and how does it affectusers?”

1.1.4 Sub hypotheses

Many of the questions asked in the last section will have to be translated intohypotheses, to be either proven true or falsified.

Hypothesis 1“Moving all content from one system to another is difficult, users thereforeavoid doing so.”

If hypothesis 1 is correct, perhaps the reason is that their personal data is badlyorganized. We need to find out how well they organize it.

Hypothesis 2

9

“Moving all content from one system to another is difficult because personalcontent is badly organized.”

We also need to find out more about cognitive limitations as well as technicalproblems that my increase complexity of content migration. It is a fact thattechnical difficulties are involved to some extent but how much this influenceusers’ need to be investigated.

Hypothesis 3“Moving all content from one system to another is difficult because somecontent cannot be moved or is hidden by the system.”

We need to know more about how people cope with a steady stream ofincoming information and content.

Hypothesis 4“Less important content receives less content management.”

Hypothesis 5“The content infrastructure on most users’ computers is an externalization ofcognition.”

Hidden content can be helpful in that it hides information and data from theeyes of the user, but users’ may not remember about it when they are to migrateto a different system.

1.1.5 Theoretical hypotheses

These hypotheses will not be investigated or supported by means of datacollection. An attempt to solve remaining hypotheses by theoretical analysisappears in the chapter of results and discussion.

Hypothesis 6“Users often estimate badly in regard to the importance of content.”

Hypothesis 7“Users avoid content management in order to minimize workload.”

Hypothesis 8“Poor and/or insufficient content managing, leads to poor contentinfrastructure.”

Hypothesis 9“Poor content infrastructure leads to less efficient retrieval of content.”

10

1.2 Purpose

This paper is really about wether people’s inability to manage personal dataleads to “migration unwillingness”. By migration I mean; moving all your datato a different computer and throwing out the old one.

Do people really have an inability to manage personal data? If so, where doesthis inability come from? This is also a big topic that I have to deal with beforeI can prove that bad data management leads to unwillingness to migrate. Now Iwill have to find out what constitutes good or bad management. Humancognition may also play a vital role in this mentioned inability. I will point totheories of cognition that supports my hypotheses.

I came up with the idea for this subject by noticing my own inability to findand properly store files. The more I use my computer the less I feel like movingto a new one. I thought that this could be dangerous to the computer industryand decided to pursue the subject further.

1.3 Delimitation

I will only investigate the impact of file managing systems that come preequipped with the operating system. Further more I will concentrate on theMicrosoft Windows™ family of operating systems.

I will not cover any third party software, such as dedicated database software.

I will also concentrate on personal systems and home use, not office orprofessional use. My findings may however still relate to these domains aswell.

I do not expect to find answers to all my hypotheses. It is truly beyond thescope of this paper. It is an ambitious task I have taken upon myself and I canonly hope to scratch its surface. Hopefully I will bring some light onto thesituation.

11

2. Background

2.1 Fields of research

In this chapter I will explain the various fields of research that are relevant tothis paper, namely cognitive science, human-computer interaction (HCI), andcognitive psychology.

Cognitive science is the science of the mind, how we think and what constitutesour consciousness and thoughts. The complexity of these topics requirescognitive science to be multidisciplinary. It involves various disciplines such aspsychology, philosophy, artificial intelligence, and computer science andlinguistics to mention a few.

Recently, the various models and theories developed, in the name of cognition,are getting applied to a wide range of practical applications. An example of thisis the field of HCI. There is currently no widely accepted definition of HCI butGary Perlman, director of the HCI Bibliography (www.hcibib.org), puts it likethis:

“Human-computer interaction is a discipline concerned with the design,evaluation and implementation of interactive computing systems forhuman use and with the study of major phenomena surrounding them.”(http://sigchi.org/cdg/cdg2.html#2_1)

Cognitive psychology, a field within cognitive science and psychology,originally dealt with human knowledge. The question of how we acquireknowledge lead to that we must find out about human perception, and howperceived objects are recognized and categorized. How this information isstored and retrieved follows from this, which leads to research on memory.Other topics that cognitive psychologists deal with are decision making,inference etc.Cognitive psychologists’ contribution to HCI includes research on learning ofsystems, passing on that knowledge to others, the mental representation ofsystems, and human performance on such systems.

2.2 Terminology

I would like to generate and explain some useful terminology. This will make iteasier to refer to complex topics and further the discussion.

2.2.1 Content

Content is defined as data that has personal value to a user. Content can bemanipulated in a variety of ways such as placing, retrieving, and managing.

12

2.2.2 Content object

A piece of content, usually a file but could also be for instance; an electronicbookmark or any other type of content unit.

2.2.3 Placing content (direct)

Content can be acquired in two different ways. Either the user creates it orreceives it from someone/somewhere else. Either way the user has to decidewhat to do with it. Throw it away or save it somewhere. If the option of savingit is selected, the user now has to decide where to put it, which includescategorizing the content semantically and finding or making the proper virtualplace for it. You will also have to find a descriptive name for this newparticular piece of content. This process I call “placing content”.

2.2.4 Hidden content

You can also place content indirectly, as opposed to direct placing of content.This happens when you are creating content without being aware of it. Forexample, when setting options in your OS, and changing preferences in theprograms that you use, you are placing content indirectly. You have no directcontrol of where these changes are stored or under what name. I call thisspecial kind of indirect content “hidden content”. The way hidden content isplaced is different from program to program. There are no standards. Hiddencontent can therefore not easily be backed up. You may find that it can easilybe lost. Sometimes it is possible for so called “power users”, i.e. experts, tofind the location and name of hidden content objects and make backups.

At a first glance the importance of hidden content might be underestimated butit is more important then expected. Hidden content contribute topersonalization and personal taste. It is what makes users feel “at home”.Having and using your own computer can be a highly personal experience andsome users even assign names to their computers.

Some personal settings and customizations are placed directly once; after theyhave been set they are not supposed to be changed. This kind of content,although it has been placed directly by the user, also qualifies as hiddencontent.

2.2.5 Content management

When placing content (directly) you don’t always get it perfect the first time.You might have to refine the structure and organization of your content. To

13

rename, delete, move, organize and categorize files and folders I call “contentmanagement”. This is in itself a huge topic.

2.2.6 Content infrastructure

The way your files or content is organized I call “content infrastructure”. Theeffectiveness of the content infrastructure plays a vital role in retrieving storeddata. Naming conventions and semantic categories are also part of the contentinfrastructure. Figure 1 shows a sample of such a structure.

Figure 1This picture shows a folder hierarchy taken from “Windows Explorer”, the“My documents” folder. This is a proposed content infrastructure fromMicrosoft. The top folder is not a Microsoft generated folder but a folder fromthe computer game “Max Payne 2”. The creators of this game have adoptedthe infrastructure suggested by Microsoft.

A good content infrastructure is, by me, defined by two things; its coherenceand level of truth. Problems, with keeping the structure coherent, arise whennew and different data arrive. Let us say that you just bought an audio bookfrom a website on the internet. Now, which folder, in Figure 1, should you putit in? “Received files”? You just received the file didn’t you? Perhaps the “MyMusic” folder, it is not music but it is audio, so it might be valid too. The file isa book and it is electronic so the folder “My eBooks” is probably the bestchoice but it is far from clear. If you finally decide to have all your audio booksin the “My Games” folder and abide to that you still have coherence but you donot have truth.

“Meanings /…/ relate symbols to objects, which they are “of” or“about.” Truth /…/ discloses – and thereby validates – these supposedconnections.”

(Haugeland, 2000:98)

When certain types of files are put in folders where they don’t really belongyou lose level of truth. So putting audio books in the game folder, whichdoesn’t make sense since, it is not a game, compromises truth.

14

The point of having a good content infrastructure is that it enables you to spendless time trying to find files and to make fewer mistakes when placing content.

2.2.7 Retrieving content

This is the process of bringing stored information into view. It involvesremembering about the name or location of the content object.

2.2.8 Lost content

Users typically develop their own content managing strategies. Similaritiesexist between how users organize and categorize in the real world and in thevirtual world. Objects can be misplaced or forgotten about equally as well inthe real world as in the virtual world. Content that has been forgotten about, orits location forgotten, I call “lost content”. The word lost does not indicate thatit has been erased or is otherwise irretrievable. It means that you don’t knowhow or where to look for it. It is especially frustrating when you do know, thata lost content object exists.

2.2.9 Content spreading

When you have many objects in many different locations they become hard tooverview and keep track of. Having content spread out across multiplelocations I call “content spreading”.

2.2.10 Content migration

This is when you need to move all content to a different system. It involvesfinding out what, out of all data, is content and the technical procedure ofperforming the transfer of the same.

2.3 The nature of file systems

The desktop computer systems of today are a world of documents. Documentsare such things as image files, or textual material, email attachments,spreadsheets etc. Most users organize their documents by location inhierarchies. This is because file systems uses a hierarchical structure of folders;onto which users impose their own semantic structures. Email systems alsogroup mail messages hierarchically and web browsers use hierarchies to storebookmarks.

Unfortunately, strict hierarchical structures can map poorly to users needs. Theuse of document locations mean that documents can only appear in onelocation at a time and users are thereby forced to a strict categorization of

15

document categories. This may interfere with users’ goals. Categorizationschemes tend to be far less stable or absolute than they might seem. (Trigg etal., 1999, referenced in Dourish et al., 1999)Barreau and Nardi (1995, referenced in Dourish et al., 1999) observe little useof deeply nested structure or cross-linking (via shortcuts or links), and insteadnote a preference for visual grouping and location-based search.

2.4 System data vs. user data

Locations, or folders, in a hierarchy have the role of grouping related files, butcan also play a special role to the system itself. A certain folder could be a unitof administrative control for purposes of backups and remote access. Theseadministrative functions impose constraints on the organization of documents.These demands make it harder to organize documents according to user needs.(Dourish et al., 1999)

Furthermore, by not distinguishing between user content and system content,the two may become intertwined. This forces users having to repeatedlydistinguish between the two types of content, placing an extra burden onto theuser.

2.5 Theories

In this section I will cover theories, or relevant parts of theories, from wellknown research, that has influenced me throughout this study. I am not onlyinterested in the shortcomings of the mind, but also in certain cognitivephenomena that may support my hypotheses. At the end of each theory’sexplanation, I will offer my comments on it’s relevance to this paper.

2.5.1 Memory

I here present an overview of memory and some of its features in order toemphasize that organization and structuring support memory performance andcapacity but also understanding.

2.5.2 Working memory

Most mental tasks rely on working memory (WM). WM actually consists ofseveral different parts. We can talk of a WM system. I will not go into greatdepth here but some features of WM are worth mentioning.

WM has a limited capacity, originally measured with a so called “digit spantask”. Arbitrary numbers are read to a person who immediately has to readthem back. It was found that participants could hold 7 plus or minus 2 “items”.(Reisberg, 2001:140)

16

If we can only hold this little information in memory, we cannot rememberabout all our content. We have to be reminded about it some how.

2.5.3 Chunking

George Miller (Miller, 1956 referenced in Reisberg 2001:141) proposed thatone item should be referred to as a chunk because the size of that one item isvariable. Performance depends enormously on how the person thinks about andorganizes the items. A six digit phone number, 168549, could instead bethought of or chunked into 168 549. Attention is required however, to performthe “repackaging”. The number of chunks that can be held in working memoryalso varies. The more chunking that has to be done the less attention is left tokeep the chunks in memory. In most cases chunking depends on understanding,and understanding rests on things you already know. The number series “1 4 91 6” can only be chunked as the squares of the digits if you already know whatthe squares of the digits are. (Reisberg, 2001:141,161)

This shows that effective memory relies on understanding. It is thereforeimportant that we understand the structure of our own content.

2.5.4 Rehearsal loop

When you read something quietly, the areas of the brain normally associatedwith speech are activated, as if you were speaking. The rehearsal loop is asubsystem of the working memory and is specialized in dealing with verbalmaterial and is sometimes called “the inner voice”. The more an item isrehearsed, the more likely it is to be transferred and stored into Long TermMemory.

There are however two types of rehearsal. The first kind of rehearsal is called“maintenance rehearsal”. It is simply a way of silently repeating somethingover and over again in order to remember it over a very short period of time. Itrequires little effort but is fairly useless if you later on want to recall thematerial.

The second kind of rehearsal is called “elaborative rehearsal”. It involvesthinking about the meaning of the material and how it is related with otherinformation you already possess. Elaborate rehearsal takes both more time andeffort than maintenance rehearsal. This kind of rehearsal is much moreefficient, if you later on want to recall the material.

Since maintenance rehearsal is easier, participants will try to use it wheneverthey can get away with it. (Reisberg 2001:144,145,148-149)

17

This further emphasizes the importance of meaning and understanding. It alsoshows that people try to do tasks with less cognitive workload, if they can. Thisin turn, can influence how well they organize their content.

2.5.5 Long-Term Memory and organization

When searching through your memory, you rely on memory connections.Connections allow one memory to trigger another and then another till one islead to the sought-for information. Memories then seem to be connectedtogether in a network. (Reisberg 2001:153)

In a study by Craik and Tulving (1975, referenced in Reisberg 2001:154), datashowed that words were more likely to be remembered if they appeared withinrich, elaborate sentences. The richness offers the potential for manyconnections as it calls other thoughts to mind, each of which provides apotential retrieval path. (Reisberg 2001:154)

How can we discover and create these connections? More than 60 years ago, aGerman psychologist, George Katona, argued that the key lies in organization.(Katona, 1940, Referenced in Reisberg 2001:154) We memorize well when wediscover or impose an organization on the material.

So we need to discover or impose an organization to something to understandit. If an organization is hard to discover properly, we will instead try to imposeone on it, in order to make sense of the material. This means that we couldreinterpret our own organization of content, the wrong way.

2.5.6 Clustering in free recall

People spontaneously use organizing schemes to help them remember. It hasbeen proven in a number of experiments. For example: participants are given anumber of words, about 25 to 35, to listen to and repeat back immediately.Recall performance is considerably better if words are not chosen at random,but fall into categories. However, if we scramble the sequence, they will beunscrambled when the participant report them back. This is referred to as“clustering in free recall”. It is considered free because participants may reportback the words in any sequence they want. (Reisberg 2001:157)

This shows that we automatically categorize things all the time. Perhaps we canfind some similarities between how people organize their content this way.

2.6 Basic-level categories

Reisberg (2001:279) explains that Eleanor Rosch has argued that there is a“natural” level of categorization, neither too specific, nor to general. Basic-

18

level categories are usually represented in our language via a single word,while more specific categories are identified via a single phrase. Thus “chair”is a basic-level category and so is “apple”. The subcategories of “armchair” or“wooden chair” are not basic-level. Children that are leaning how to talkacquire basic-level categories earlier than subcategories.

Critics of the theory of basic-level categories argue that an expert, such as agardener might regard “annual” or “perennial” as basic-level. Another questionis how stable basic-level categories are in different kind of contexts andsituations?

Yet another clue to how people possibly categorize and manage their content.

2.7 External cognition

In everyday situations, behaviour is determined by the combination of internalknowledge, external information, and constraints. People routinely capitalizeon this fact. They can minimize the amount of material they must learn or thecompleteness, precision, or depth of the learning. People can deliberatelyorganize their environment to support their behaviour. (Norman, 1993,referenced in Preece et al, 2002:98)

The environment inside the computer must qualify for this as well. How peopleorganize themselves to support their behaviour is an important topic.

2.7.1 Externalizing to reduce memory load

There are all kinds of tools around to help aid cognition. If we wish toremember something, like an upcoming event, we can write it down; makingthe representation, of whatever it is we want to remember, external. Now wecan simply forget about it and look at the note to get reminded instead. This iscalled “externalizing to reduce memory load”. All we have to remember now isto look at the note. This is far easier than recalling the event at the appropriatetime. If we succeed we have effectively reduced the load on our memory.(Norman, 1993, referenced in Preece et al, 2002:98)

If people use this strategy using their computers and later on can’t find thecontent, it is unsupportive.

2.7.2 Computational offloading

When we are using, for example, a computer, calculator, or just plain pen andpaper, to carry out a computation, we are offloading our own memory. Adevice together with an external representation of our cognition, together to

19

offload memory is called computational offloading. (Norman, 1993, referencedin Preece et al, 2002:99)

This is probably why the computer was invented in the first place. We use thisstrategy all the time.

2.7.3 Annotating and cognitive tracing

By modifying external representations we can indicate that something haschanged, for example crossing off on a to-do list or underline something. Thisis called annotating.

We can also change the order or the structure of items. This is called cognitivetracing. For example, “in a card game, the continued rearrangement of a handof cards into suits, ascending order, or same numbers to help what cards tokeep and which to play, as the game progresses and tactics change”. (Norman,1993, referenced in Preece et al, 2002:99)

This is an example of a useful strategy people use to better understand and bereminded of things.

2.8 Attention

2.8.1 Inattentional blindness

Experiments made by Mack & Rock (1998, referenced in Reisberg, 2001:96)show that people see what they expect to see. Only! In an experiment they toldparticipants that they were about to be shown some object X on a screen. Thenthey showed participants X. Later they started to put in both X and some otherobject Y, and when they asked participants if anything had changed, they hadnot noticed the Y. The conclusion is that participants expected to see X onlyand therefore became blind to the extra Y. This phenomenon has been dubbed“inattentional blindness”. Observe that this is a simplified version of theexperiment, for a more detailed description see (Reisberg, 2001:96).

Yet another conspiracy theory against memory, what we can’t see we can’tunderstand or remember.

20

3. Method

An empirical study was carried out using a questionnaire.

3.1 Participants

Participants consist of ten respondents chosen to include a variety of life roles.The participants have not been chosen specifically, but through so called“convenience sampling” which means that only those who were available arerepresented. Only Swedish speaking people are represented. Users thatindicated that they used the Macintosh Operating system were not representedin the data. Three questionnaires were discarded due to this. Only MicrosoftWindows operating system users’ are represented. Unfortunately, not all agegroups are represented. There is a heavy overweight of respondents in the agespan of 21-30. Common to all respondents is that they use the MicrosoftWindows operating system and have their own personal computer system.Demographic data gathered from the questionnaire are presented in this sectionfor convenience.

Demographic data for the whole test groupData Number of respondentsMale 8Female 2Age 21-30 8Age 60+ 2

It is due to the complexity of the questionnaire I have limited the number ofparticipants to ten. The rather low number of test subjects could however makeit more difficult to generalize results to a larger population. The questionnaireis included in the appendix and I will present my findings further on in thispaper.

3.2 Material

I will be using a questionnaire for the data gathering. The questionnaire willhave both qualitative and quantitative sections in addition to questions wherethe test person may write freely and give suggestions etc. The qualitative partconsists of “Likert Scales” and “Yes or No” questions and “two-choice”questions.

Advantages of questionnaires are that they are cheap and easy to quantify andthat they offer anonymity. A disadvantage is that the results of a questionnairevery much depends on the questions posed in it in the way that people tend toanswer them in the way they think are expected of them and that there is noway for the participants to answer questions that do not appear in thequestionnaire. (Preece et al., 2002) The main problem in using a questionnaire

21

for this study, is that the type of questions asked are not topics that userstypically ponder on. This could mean that answers will be fragmented andinsufficient. I will try to get around this by putting some more general andsome similar questions into the questionnaire, to get respondents in the right“frame of mind”.

The questionnaire will use so called Likert scales. An example of a 5 pointLikert scale:

(1) How proficient do you consider yourself to be at using your computer?(Where 1 represents beginner level and 5 represents “expert level user.)

1 2 3 4 5

Likert Scales are often used when designing questionnaires. This is toinvestigate people’s attitudes and opinions towards various things, but also toinvestigate about what they believe. This type of scale is said to be “closed”,which means that users cannot select any other alternatives from the onespresented, and generates quantitative data.

The questionnaire holds a number of “trick” questions. These are meant toinvestigate people’s truthfulness and the correctness in their view ofthemselves. As an example: One question, early on, in the questionnaire ask“How proficient do you consider yourself to be, at using a computer?” using a5 point Likert scale where 5 would represent expert user. Later on in thequestionnaire I ask a question that only a level 4 or 5 user would be able toanswer. If the answers correspond it means that that person has been truthful or“level headed” about that particular topic. Not all “real” questions have acorresponding trick question due to the size of the questionnaire. Aquestionnaire that is too big might prove boring and may not be completedtruthfully or properly.

The order of the questions was also important to avoid bias from earlierquestions and to avoid having respondents make answers correlate.

3.3 Procedure

This empirical study has been performed using a 41 question longquestionnaire. Originally a pilot questionnaire was created and tested on tworespondents. Some questions were revised or left out after this. The purpose ofquestions was to find out about users general behaviour and thoughts aboutcomputers and information handling but also about their awareness of certainproblems such as moving hidden data. I sometimes pose questions like this;“How personalized, for your specific needs, do you think that your currentcomputer system is?”. The respondent have nothing “real” to compare withwhen answering this question, it is simply his or her feeling on the subject I’m

22

looking for. It is therefore important to note that some of this questionnaire andthe data gathered from it only represent what the respondents “think” aboutsomething. It is simply their opinion and does not necessarily represent thereality of the situation. The respondents may be wrong or may not be able toremember something exactly etc. This, of course, applies only to a certain kindof questions. Demographic data is unaffected by this.

In order to get a more accurate view of how and what respondents do and donot do, one would have to perform some kind of observation of users. This kindof data gathering technique takes considerable more time to perform and istherefore beyond the scope of this paper.

Respondents were given the questionnaire and were told to complete it bythemselves. I remained in the same room during the completion of the form tomake sure they abided to the rules stated on the first page of the questionnaire.Respondents were allowed to ask me questions if they were unsure of what aquestion meant. Some of the trick questions typically held computer jargon orterms needing specific knowledge for example. Novice users might ask aboutwhat that specific term might mean and was told by me that if they didn’t knowwhat the term meant they could simply skip ahead to the next question.

Data from all forms were then entered into a spreadsheet document foranalysis.

23

4. Results and discussion

In this chapter I will do some comparison and analysis of raw data. Theindividual questions do not say much by themselves. It is when they are crosschecked that interesting data starts to appear. Later on in this chapter I will alsopoint out vital connections between the hypotheses and theory.

4.1 General results

Our test group seems to consider itself to be of average skill level, compared ofother users, when it comes to using computers. The whole spectrum of userswere also represented, which is what we want. Nine out of ten respondentsknow how to move, copy, and erase files. Most likely they use this knowledgeto organize their files and folders, and we are interested in this.

I would like to stress the point that all answers from users is as they perceive it.It might not necessarily be the truth. Users might, for example, believe thatthey have better knowledge of where they keep important data, than theyactually have. However, all respondents passed my trick questions, so I will notdwell on this.

Music, pictures and textual documents are the most common types of content.Practically all users have some or all of these types.

4.2 Hypothesis-related results

4.2.1 Hypothesis 1

Hypothesis 1 states that “moving all content from one system to another isdifficult, users therefore avoid doing so”.

Most users did not feel that content migration would be that hard, when askeddirectly. An average of 2.9 is fairly normal, but still some hassle is expected bythem. This is in line with how much they would like to run two physicalsystems in parallel, at first (average 2.3). If the move to a new system is easy,then you do not have to keep the old system running as well, just in case youmissed bringing some important data to the new system. It could of course bethat having two systems running at the same time is in itself more hassle thanjust moving. That should explain the lower number 2.3.Low numbers on how hard a move would be could also be explained by thatsome users have never moved to a new system before and are not familiar withthe problems that could arise.If we speculate that novice users are less likely to be familiar with theprocedure of moving with a new system, data is supporting this as shown inFigure 2.

24

Its eas ier to move to a new computer system for advanced users

0

1

2

3

4

5

0 1 2 3 4 5

The difficulty percieved by respondent

Res

pond

ents

leve

l of e

xper

tice

Respondents 1 - 10

Figure 2 – This cluster analysis shows that skilled users findcontent migration to be easier than novice users do.

It is interesting that only four out of ten respondents reported that they wouldbe able to move all important data to a new system by themselves. Therespondents obviously expect to get help from someone else to make thetransfer. They could unconsciously have added this into the answer of question3. Data somewhat supports this on question 6, how often they get help. Thereare a lot of high and low numbers averaging each other out. The reason whyquestion 24 and 5 don’t correlate exactly could be that transferring all yourcontent to another machine, is a special case, where help certainly is neededmore often.

The final conclusion of this must be that most users underestimate the hardshipof transferring data from one system to another hence partially supportinghypothesis 1. On the other side, if users underestimate the complexity of thistask they might actually go ahead and try and this undermines the second partof hypothesis 1. Whether they will refrain from trying the move, data cannottell.

4.2.2 Hypothesis 2

Hypothesis 2 states that “moving all content from one system to another isdifficult because personal content is badly organized”.

There is not much support to be found for hypothesis 2 in this data. Usersactually feel that they know where content is stored quite well. Questions 6through 9 are all above the score of 3. Questions 10 and 11, how orderlyrespondents are, in the real world seems to correlate with how orderly they arein the virtual world, quite well, and also score above medium. How orderly

25

they are as persons in general also corresponds with how good they are atknowing where their files are. This leads us to conclude that, in general, themore organized you are as a person, the better the content infrastructure onyour computer.

There is a relation with how orderly respondents are in life and how well organized their content is .

0

1

2

3

4

5

0 1 2 3 4 5

Quality of content infrastructure, as percieved by respondent

How

ord

erly

res

pond

ents

are

in

real

life

, as

perc

ieve

d

Respondents 1-10

Figure 3 – How organized you are translates into the digital world.

One could suspect that users are overconfident in this matter. There are datathat points to matters that could complicate the user’s content managing. Sevenout of ten respondents are using a computer that is also used by other people, amulti-user system. Six out of those respondents report that they are aware ofthat the other users have content on their system too. Four out of those sixreports that they know what and where that content is located. This means thatat least three of the seven multi-user system respondents could be unaware ofthat content. Furthermore, six out of ten respondents report that they are usingmultiple systems, such as an office computer or a laptop. Two out of those sixrespondents synchronize content between the different systems. This meansthat the same content objects could be available on several systems at the sametime, possibly in different versions. These two points could add other layers ofcomplexity. This however proves nothing, it just points out matters that couldcomplicate the user’s situation. Looking at it the other way around, this couldinstead mean that our participants organize their content so well that not eventhese complicating matters, such as multiple systems and double data, canmake them lose footing.

One question that I now realize that I should have asked is how much datausers keep. Big systems are more likely to be harder to organize. If all therespondents in the inquiry have relatively small systems, they will be easier tomanage. Question 44 asks if users tend to store everything or nothing. Eightresponded that they tend to store everything. Our group of participants thenseems to be “collectors” of content. This point to that content is most likely to

26

pile up over time. This however proves nothing either. Question 43 provides ussome hints of the various respondents’ system sizes. We do however have tomake an assumption here about ways of locating files. In the best case scenariothe user maneuvers to the place where he knows that the file is and opens it.The second best case would be that the user browses around places where thefile is likely to reside until he finds it. Second worst case scenario is when youhave to make a file search. To make a file search you must possess someinformation on the file, such as part of its name or creation date. If you do not,you have to browse again, this time without knowing anything about where thefile could be, “a blind browse”. On question 43, we can infer that at least threerespondents seem to have small systems, namely respondents 4, 7 and 10. Itcan however not be concluded for any of the other respondents whether theirsystems are large or small. On top of this, removing respondents 4, 7 and 10from the statistics only increases the assertiveness of the remainingrespondents.

It is tempting to view the score of questions Nr. 6, 7, 8 and 9 as low instead ofhigh. If you have thousands of files, an organization with the score of above 3,but below 4, might not be enough. Again we have no data that indicate the sizeof the respondents’ data collections.

Our respondents save everything, back up little and almost never write downpasswords or settings and most of them cannot move their important datawithout consulting others. Yet they remain confident that they are in control.To sum up, no qualifying support can be produced, from available data,supporting hypothesis 2.

4.2.3 Hypothesis 3

Hypothesis 3 said that “moving all content from one system to another isdifficult because some content cannot be moved or is hidden”.

On a direct question (question 43), only one respondent stated that “personalsettings” could be difficult to move to a different system. As a matter of fact,he is correct. There are, to my knowledge, currently no failsafe technicalsolutions to move personal settings to a different system. The only way tomove passwords, color settings (for most software) etc. is to write them downon a device or paper outside the system you are migrating data from. Let ustake a look at how good our test group is at doing this. Question 17 (writingdown passwords) scored an average of 1.7 out of 5. Question 18 (writing downsettings) has got an average of 1.8. Perhaps the numbers are so low because thistype of content is hidden. It is quite hard to write something down if you cannotfind it, which explains why users do not.

It seems that our users do a fair amount of customizing. The question of howwell suited to their needs and customized their system was sported an average

27

of 3.5 for the whole group, even novice users customize their virtualenvironment. Seven out of ten reported that they make personal settings. Thisindicates that customization is very important to most users. Since mostcustomization settings cannot be moved at all, at least not digitally and isdifficult to find, they use hidden a lot of hidden content. If they use a lot ofhidden content it is going to complicate content migration procedures, and thissupports Hypothesis 3.

The conclusion must be that hypothesis 3 do gain support from the data.

Very few respondents reinstall their system now and then (question 38). Ispeculate that they don’t because they cannot save their settings. The 3 that doreinstall were among the most competent users.

4.2.4 Hypothesis 4

Hypothesis 4 states that “Less important content receives less contentmanagement”.

The test group considers itself to be downloading about average amounts fromthe Internet (question 12, average 2.7). Downloaded files are howeverconsidered below average important (question 14 average 2.5). This issupported by question 15, about how much respondents tend to backupdownloaded content (average 1.5). I.e. they hardly back up any of it at all.

The questions 6, 7, 8 and 9 are all about how careful users manage files ingeneral. They all score higher than downloaded files.

Question 11, how orderly the respondents are in general, in the world ofcomputers, also score higher than question 13.

The conclusion is that downloaded content is regarded less important andtherefore receive less content managing than other content. This supportshypothesis 3.

4.2.5 Hypothesis 5

Hypothesis 5 said that “The content infrastructure on most users computers, arean externalization of cognition”.

Although I have not investigated each respondent’s content infrastructurespecifically, there is some evidence in the data to support this. Question 33. Sixrespondents report that they keep music on their computer. Five out of those sixrespondents use the actual word “music”. One respondent used the word“musicfiles”. Eight respondents also reported that they have pictures on theircomputers. Seven of those eight used the actual word “pictures” and only one

28

used the word “graphics”. It looks like basic-level categories and couldtherefore be an externalization of cognition.

The conclusion is that there is an indication of support for hypothesis 5, in dataalthough it is not proven beyond doubt.

4.3 Relating hypotheses to theory

4.3.1 Overview

The background theory is important because some vital facts are pointed outabout human cognition. People might reduce their efforts to keep order,because they are lazy, or more accurately because they conserve their limitedmental resources. There are also motivational and contextual factors involvedand together with some technical constraints we have a potential problem ofaccessing information. I first present an overview the way I believe theoryrelates to our present topic.

We cannot remember the full layout of our content infrastructure due tolimitations of WM. We can take a look at the file structure and get some cluesthat will bring memory alive, what clues are brought to mind are determined bywhat we are currently looking for, and want, or just happen to lay our eyes on.Unfortunately we are almost completely blind to everything else. Things haveto be brought to our attention. Every time we need to access some contentobject and do not remember exactly where it is, we have to, more or less,reinterpret our own organization.

Theory states that in order to make sense of something we need to discover itsorganization. The more complex that organization is the more cognitive workhas to be done, to make sense of it. Anything that increases the complexity oftasks, related to organization, makes us less prone to spend resourcesmaintaining it. Hidden content is extra dangerous because it does not giveusers’ any clues or reminders that it exists or what it is used for.

4.3.2 Hypothesis 1

Hypothesis 1 states that “moving all content from one system to another isdifficult, users therefore avoid doing so”.

Users do not think it is too hard consciously, but they expect to get help with it.This means that they do think it is hard. The question whether they avoid it ornot is harder to answer.

It would be reasonable to conclude that anyone would like to avoid this if it isnot absolutely necessary. After all it means work and if you cannot do it

29

yourself you need someone else, who has the skills and is willing to put timeand effort into helping you out. Unless it is someone in the family etc., it couldat least delay the whole operation.

4.3.3 Hypothesis 2 and 5

Hypothesis 2 says that “moving all content from one system to another isdifficult because personal content is badly organized”

Hypothesis 5 said that “The content infrastructure on most users computers, arean externalization of cognition”.

It is possible that I could be entirely wrong in my original assumption ofhypothesis 2. Maybe “ordinary” users do not have that much personal data.Maybe most people just have a couple of text documents and a few pictures. Itwould not be that much to keep track of. However, users reported that they didnot organize downloaded content as well, and they sure do not backup any of it.It is therefore vital that they keep downloaded and personal content separate. Ifthey start to mix badly organized content with well organized content, they willeffectively reduce the quality of both.

Organization is definitely important, the various theories of memory, presentedearlier, states that it is the key to discovering long term memory connections.We learn and remember better when we discover the order of things. To makesense of something we have to discover its order. Subsequently if we are toremember “where we put that file”, we must make sense of the file system. Thebest way of doing that is to organize it to your convenience, which means it issuited specifically for you. The data presented in the previous chapter, supportthe claim that users do this; they have a high grade of customization. The morean organization is to make sense, the more close to our “inner” organization itmust be. That is why the theory of basic-level categories supports hypothesis 5

The critiques of this theory say that basic-level categories need not be stable. Iagree with this, different people have different needs and knowledge. Theymay therefore vary. Can we apply the theory of basic-level categories ontopeoples file systems somehow? Further research on what folder names peopleuse in the root and sublevels, of their file systems, would be the only way offinding out about this for certain. But the names of the different type ofcontents (question 33) and that some people partitition their hard drives, forstructural reasons (question 42), indicate that some kind of basic categories areat work here.

One thing, that is for certain, is that a computer, together with its file system, isa powerful tool for externalizing cognition. Norman (2001) says that peopleroutinely capitalize on this fact, in order to reduce workload. It is quite safe toassume that people do externalize cognition by means of organizing their

30

content organization. To what degree is harder to answer. If you enter asituation where you are trying to find a specific file, and the model of the filesystem (content infrastructure) does not match your mental model, it will beharder to find. This again supports hypothesis 2.

4.3.4 Hypothesis 3

Hypothesis 3 states that “moving all content from one system to another isdifficult because some content cannot be moved or is hidden”.

This is more a technical question rather than a psychological or cognitive one.It is however frustrating trying to move something that cannot be moved or isnot meant to be found. Unfortunately, there is lots of this kind of data on mostcomputers.

The main cognitive problem with this is that, because it is hidden from view, itis harder to remember. Sometimes we only see things that we expect to see andare completely blind to other things. This is the theory of “Inattentionalblindness”.

“There is no conscious perception without attention”.(Mack & Rock, 1998:14 referenced in Reisberg 2001:96)

We need those cues from our surroundings to trigger the right path in long termmemory. When you transfer your data to a different system, you have toremember all those things hidden and that you need to bring with you. Thismay be why some people underestimate the difficulty of such a move.

Also, how often something is repeated increases the chance of learning it, asstated in the theory of the rehearsal loop. Hidden content is often hidden for areason; it is important and is only placed (directly) once and is then notsupposed to be changed. This means that there will be no rehearsal. It couldalso be that you placed it indirect (without knowing), the manufacturers of thesoftware were kind enough to hide it from view, so that you would not becomeoverloaded with information. Now, you do not even know it exists. Accordingto our data respondents seem to be unaware the problem with hidden content.Only one respondent said that “personal settings” could be hard to move. If theconclusion, that a content migration is easy, was drawn without thisknowledge, respondents may be overly confident. Unfortunately there was noquestion about if they had actually experienced such a move.

4.3.5 Hypothesis 4

Hypothesis 4 states that “less important content receives less contentmanagement”.

31

Data supports this hypothesis and it also supports the opposite “more importantcontent receives more content management”. Much of Norman’s theoriessupport this as well. Humans have both limited cognitive resources and time.We are forced to prioritize, for instance we cannot remember everything. It isonly natural to conclude that some things receive more attention than others. Itherefore consider hypothesis 4 to be correct.

4.3.6 Hypothesis 6 and 7

Hypothesis 6 states that “Users often estimate badly in regard to the importanceof content.”

Hypothesis 7 states that “users avoid content management in order to minimizeworkload”.

What happens if we prioritize the wrong content? We spend our resourcesorganizing content that is less important to us then we believe and possiblyneglect content that is truly important to us. This could lead to a less effectivecontent infrastructure.

There is a tradeoff between efficiency and accuracy and people also makeerrors routinely.

“Hardly a minute of a normal conversation can go by without a stumble,a repetition, a phrase stopped midway through and discarded or redone”.

(Norman, 2002:105)

But what kind of errors do we make, when we do content managing andplacing?

32

Please categorize this:

Figure 4 - The duck/rabbitThe figure can be perceived either as a duck or as a rabbit. People sometimesexperience great difficulty finding a different interpretation. That is, peopleimagining the “duck” have great difficulty in discovering the “rabbit” and viceversa. (Reisberg, 2001:354)

Recognizing and categorizing can be ambiguous and we are prone to makemistakes. This example is a visual one but the idea applies across all content.

I have showed that categorizing is not always trivial. In conjunction with ourlimited mental resources, this leads to that we make mistakes, hence supportinghypothesis 6 and 7.

4.3.7 Hypothesis 8

Hypothesis 8 states that “Poor and/or insufficient content managing, leads topoor content infrastructure”.

Hypothesis 9 states that “Poor content infrastructure leads to less efficientretrieval of content”.

The definition of a good content infrastructure is, as stated earlier, one thatmakes sense. If it makes sense then we can understand it and if we canunderstand it we can use it. If the user starts putting files in illogical places andnaming folders with names that are hard to understand, the structure is going tomake less sense. As the volume of the content increases over time, complexitywill rise. As stated in the theories of memory and understanding, organizationis the key, hence supporting hypothesis 8.

33

A hard to understand structure makes the computer a less efficient instrumentin ways of “externalizing to reduce memory load”. If the content infrastructureis good however, then we must be able to use it better. We will find contentobjects faster and won’t forget about certain content as much and thereforewon’t lose content etc., hence supporting hypothesis 9.

4.4 Main hypothesis

Figure 5 is based on the hypotheses presented earlier on. It visualizes howhuman behaviour can degrade content infrastructure and that it feeds back anddeteriorates over time.

Figure 5 – This figure describes the way that content infrastructure quality isreduced. The boxes are numbered for convenience.

Figure 5 outlines several different scenarios that all lead to degradation ofcontent infrastructure. I here offer an explanation to those possible scenarios.

Imagine some new content object that has to be placed (direct) that really isunimportant to the user (7). The user however, thinks that the object really isimportant and incorporates it into the infrastructure (8). This leads to a lessefficient content infrastructure because important and unimportant content isnow mixed increasing the overall complexity of the structure. A more complexstructure takes more cognitive resources to understand, hence the efficiency offinding and retrieving content, have now been reduced as well (3-4).

In the next scenario we have some incoming object that is truly important to theuser (1). The user, on the other hand, does not realize this and consider itunimportant (2). There is no need to spend cognitive resources managingunimportant objects. Now an important object has been badly managed andeffectively reduced efficiency of content infrastructure (3).

4. Less efficient retrieval of content

3. Less efficient content infrastructure

2. Content is consideredunimportant by user

6. User is lazy or time pressed

1. Incoming important content 7. Incoming unimportant content

5. Content is hard to categorize

8. Content is consideredimportant by user

34

As mentioned before, a complex and/or low efficient content infrastructuremakes it harder to understand and use. This has the side effect of making newcontent harder to incorporate properly into the infrastructure (3-5).

There is also the scenario where the important incoming content object (1) iscomplex and possibly in conjunction with an insufficient infrastructure (5)make the procedure of, incorporating the object, so taxing that the userconsiders it unimportant just to escape work (6). It could also be that the userdoesn’t have enough time to allocate to the procedure of properly managingand incorporating the object into the infrastructure and therefore refrain fromdoing so (6).

The retrieval stage (4) is needed for daily use, but also for content migration.Since there is so much that can decrease efficiency of content retrieval, it ishighly likely that content migration procedures will suffer.

There are offcourse other factors that could be incorporated into figure 5, suchas problems arisen from the use of hierarchical filing structures and hiddencontent. With figure 5 I have simply tried to visualize human behaviour thatleads to a degradation of content infrastructure and that it in fact deterioratesover time.

35

5. Conclusion

Not all hypotheses have been proven to be correct, beyond doubt. I havehowever found some answers to my original questions and managed to raisesome new ones. The questions that I posed in the beginning of this paper were:

Question A: “Do users think that personal data is hard to find and move?”Yes and no. Users think that they have pretty good control of their ownpersonal content. They don’t estimate content migration to be that hard. Theydo however expect to get help from someone more skilled than themselves,which leads us to conclude that they subconsciously think that it is hard. Dataalso shows that more experienced users think this is easier than novice users, sothere is definitely difficulty involved in this process.

Question B: “How much effort do users put into organizing their personaldata?”They put in a fair amount. They personalize a lot, and they prioritize certaintypes of content over others types. Whether their content infrastructure isefficient or not remains unanswered

Question C: “What factors contribute to making a transfer of contentproblematic?”If you cannot perform the migration to a new system yourself; finding anotherperson that can help you complicates matters. Remembering all content objectsthat you need to bring with you is another big problem. If infrastructure ispoorly maintained you may have to differentiate between system data andpersonal data that could be of varying degrees of importance. Hidden contentcomplicate the situation even more.

Question D: “What strategies do people employ to survive in the informationjungle?”They prioritize certain content and forget about most unimportant content toprotect them from overload. Because they know that they make judgementerrors they tend to store everything and bin nothing. They memorize the mostimportant data and rely on external cognition for the rest.

Question E: “How aware are users of the fact that some data cannot be movedor is hard to find?”They are almost totally unaware of this. The fact that users don’t come intocontact with this kind of data very often makes it easy to forget about. Theimportance of this kind of data varies, and the impact on loosing it in theprocess of moving to a new computer system is hard to estimate. Data,however, shows that users rarely write down information, associated withhidden content, such as passwords, email accounts and personal settings. Thisfurther complicates the procedure of transferring data to another system.

36

Question F: “What constitutes a good or poor organization of data and howdoes it affect users?”Since users map their own semantic structures onto the file system’s folders, itis important to keep the file system integrity and coherence. A good contentinfrastructure makes stored material easy to retrieve and new material easy toimplement into the structure. A good content infrastructure also prevents filesfrom being forgotten about or impossible to find.

Main question: “Does the nature of users’ content, such as poor organization,lead to unwillingness from users’ to migrate from one system to another”.

The nature of users’ content is certainly a complex matter. It involvescognition, knowledge, technical matters such as the user interface and good orbad software design, users’ interests and priorities and personal restrictions etc.The evidence presented throughout this paper indicates that the nature ofpersonal content, and all the problems that comes with managing it, does notfavour the procedure of migration. This could offcourse differ from person toperson, but the general picture seems quite clear, although not proven beyonddoubt.

One of the most important aspects of this research is that if users think thatcontent migration is a cumbersome procedure, they might avoid doing so. Or atleast wait until it is absolutely necessary. This means that hardware andsoftware vendors might be loosing money because of all this. I cannot provethat this is in fact a problem. The only thing data support; is that users mightthink it is difficult to perform the procedure by them selves. All good thingscoming from a good content infrastructure are useful during content migration.To what extent those influences users willingness to go through with theprocedure is harder to foresee. If this is enough to hinder them fromtransferring to a new system, remains unsaid.

5.1 Future research

To gather more information on users’ content infrastructure would bebeneficial. Also their behavior and different individual strategies used forcontent management should be investigated further. I doubt that this can bedone through questionnaires. Some sort of field analysis or naturalisticobservation might be necessary. I would like to do a through investigation ofhow users’ folders are named and organized. Are the root folders named in avery similar fashion amongst users? Are naming conventions of files similarfrom user to user? One study could involve looking at a couple of respondentscontent infrastructures, in great detail, before and after content migration.

I am also interested in new innovative techniques for managing personalinformation. There must be better ways of managing information than a fixedhierarchical model. There are a few attempts out there, (such as Dourish 2000,

37

Gifford 1991) but to my knowledge none of them have proved successful in thesense that they have become widespread. A consensus of these attemptstogether with some new and innovative ideas might be the foundation of afuture prototype.

All in all, computers are here to support us and if they don’t do a good job of it;then they should be improved.

38

6. References

Dourish, P, et al (2000) ‘Extending Document Management Systems with User-Specific Active Properties’ ACM Transactions on Information Systems, No. 2,April 2000, 140–170.

Hauegeland, J. (2000) Artificial intelligence; the very idea. USA, MIT press.

Norman, D.A. (2002) The design of everyday things. USA, Basic books.

Preece, J, et al (2002) Interaction design: beyond human-computer interaction.USA, John Wiley & Sons, Inc.

Reisberg, D. (2002) Cognition; exploring the science of the mind. USA,W.W.Norton & Company, Inc

David K, et al (1991) Semantic File Systems. USA, ACM No. 0-89791 March 3

39

7. Appendix

7.1 Data

The results are presented in a raw data form with the question posed first andthen the numbers answered, by each respondent, including the average for thewhole group. All questions have been translated from Swedish into English.The questions have been numbered for.

7.1.1 Likert scales questions

The tables in this section are organized so that all ten respondents individualanswers are presented, R. 1 through R. 10 means Respondent 1 throughRespondent 10. The numbers below ranges from 1-5 with 5 being higher/betteretc. Above each table is the question that was asked in the questionnaire. Thequestion has been translated from Swedish.

1. How good do you consider yourself to be with computers?

R. 1 R. 2 R. 3 R. 4 R. 5 R. 6 R. 7 R. 8 R. 9 R. 10 Average5 4 2 3 2 3 1 5 3 2 3

2. How personalized, for your specific needs, do think your current computersystem is?

R. 1 R. 2 R. 3 R. 4 R. 5 R. 6 R. 7 R. 8 R. 9 R. 10 Average5 4 3 3 4 2 5 3 3 3 3.5

3. If you were to change to a new computer system today, how hard/easy wouldyou consider the change to be? (Taking into account personal files and personalsettings).


4. How much do you estimate the need to run the new system and old system,in parallel, at the start?


40

5. How often do you have to get help, in using computers, from other physicalpersons?


6. Do you know where all your important files and data are located on yourcomputer? How good are you at keeping track of them all?


7. How much effort do you put into naming files you want to save?


8. How consistent are you when choosing names for your files?


9. When naming files, how consistent are you at following your own namingconventions?


10. How orderly do you consider yourself to be in real life?


11. How orderly do you consider yourself to be in the world of computers?


41

12. How much files do you consider yourself to download from the internet?


13. How well do you organize files that you have downloaded?

R. 1 R. 2 R. 3 R. 4 R. 5 R. 6 R. 7 R. 8 R. 9 R. 10 Average5 5 2 2 1 4 1 3 4 3 3

14. How much does files that you have downloaded mean to you? Howimportant are they?


15. How much of the files that you’ve downloaded do you make backup of? (Abackup means that you have the same data in two different places)

R. 1 R. 2 R. 3 R. 4 R. 5 R. 6 R. 7 R. 8 R. 9 R. 10 Average1 2 No

ans2 1 4 1 1 1 1 1.5

16. How consistent are you at writing down, outside the computer, passwords?


17. How consistent are you at writing down, outside the computer, settings?


18. If your personal computer crashed right now, how serious would that be toyou?


42

7.1.2 Yes or no questions

I did not provide checkboxes for yes or no on the question form. This led to asituation where respondents sometimes gave other answers, than I had askedfor, such as “Partially” instead of yes or no. Sometimes you find interestingnotes in the margins of the questionnaire, when using this approach but thiswill not be included in the data below.

19. Do you have your own, or access to your own, personal computer?R. 1 R. 2 R. 3 R. 4 R. 5 R. 6 R. 7 R. 8 R. 9 R. 10

Y Y Y Y Y Y Y Y Y Y

Yes 10

No 0

20. Do you keep personal data in this computer? (I.e. data that you have createdyourself.)R. 1 R. 2 R. 3 R. 4 R. 5 R. 6 R. 7 R. 8 R. 9 R. 10

Y Y Y Y N Y Y Y Y Y

Yes 9

No 1

21. Do you make personal settings in the operating system and/or in programs?R. 1 R. 2 R. 3 R. 4 R. 5 R. 6 R. 7 R. 8 R. 9 R. 10

Y Y Y N N Y N Y Y y

Yes 7

No 3

22. Do you use keyboard shortcuts in programs and in the operating system?R. 1 R. 2 R. 3 R. 4 R. 5 R. 6 R. 7 R. 8 R. 9 R. 10

Y Y N N Y Y N Y Y Y

Yes 7

No 3

23. Do you know how to move, copy and erase files?R. 1 R. 2 R. 3 R. 4 R. 5 R. 6 R. 7 R. 8 R. 9 R. 10

Y Y Y Y Y Y N Y Y Y

Yes 9

No 1

24. Would you be able to move all your important data to a new system, byyourself?R. 1 R. 2 R. 3 R. 4 R. 5 R. 6 R. 7 R. 8 R. 9 R. 10

Y Y N N N Y N Y N N

Yes 4

43

No 6

25. Do you make backups of your personal data?R. 1 R. 2 R. 3 R. 4 R. 5 R. 6 R. 7 R. 8 R. 9 R. 10

Y Y Y Y N N N Part N Y

Yes 5

No 4

Partially 1

26. Are you a “single”, i.e. alone, user on your particular system?R. 1 R. 2 R. 3 R. 4 R. 5 R. 6 R. 7 R. 8 R. 9 R. 10

N Y Y N N N N N N Y

Yes 3

No 7 (these respondents are on multi-user systems)

27. If you are on a multi-user system, do you know if the other users havepersonal data on this system as well?R. 1 R. 2 R. 3 R. 4 R. 5 R. 6 R. 7 R. 8 R. 9 R. 10

Y Doesnot

apply

Doesnot

apply

Y Don’tknow

Y Y Y Y Doesnot

applyYes 6

Does notapply

3 (On single system)

Don’t know 1 (don’t have personal system there for doesn’t’ know)

28. If you are on a multi-user system and the other users have personal data onit as well, do you know what data and where it is?R. 1 R. 2 R. 3 R. 4 R. 5 R. 6 R. 7 R. 8 R. 9 R. 10

Y Doesnot

apply

Doesnot

apply

Y Doesnot

apply

Y N Y N Doesnot

applyYes 4

No 2

Does notapply

4

29. Do you utilize several different computer systems?R. 1 R. 2 R. 3 R. 4 R. 5 R. 6 R. 7 R. 8 R. 9 R. 10

Y Y N Y N N Y Y Y N

Yes 6

No 4

30. Do you synchronize content between several different computer systems?R. 1 R. 2 R. 3 R. 4 R. 5 R. 6 R. 7 R. 8 R. 9 R. 10

Y N Doesnot

N Doesnot

Doesnot

N N Y Doesnot

44

apply apply apply apply

Yes 2

No 4

Does notapply

4

31. Do you store any kind of data online?R. 1 R. 2 R. 3 R. 4 R. 5 R. 6 R. 7 R. 8 R. 9 R. 10

Y Y N Y Noanswer

Y Y Y Y N

Yes 7

No 2

Noanswer

1

7.1.3 Descriptive questions

Our group of respondents reported that they use their computer for thefollowing:(No frequency data is available because answers typically vary too muchbetween respondents.)

• Work related tasks• Playing games• Communicating E.g. Emailing• Internet surfing and finding information• Banking• Producing text documents, graphics and presentations (did not specify

what presentation)• Listening to music• Bookkeeping• Cataloging (did not specify what)• Photo album

All numbers in the following tables represent nr. of respondents, i.e. how manyreported something.

32.What kind of personal data do you keep on your computer?Music 6Pictures 8Textdocuments

8

Films 1Invoices 1Bookkeeping 1

45

33.Is there any kind of data you could think of that would be difficult to transfer to a new system, whatkind of data?Personalsettings

1

34.If the phone rings and you’re sitting by the computer and have to write down a phone number quickly(no other means but the computer are available to write something down on), how would you proceed?(Observe. You are also late for the bus.)Notepad (text editor) 3Word 3Don’t know 1Organizer 1Email 1No answer 1

35. Do you like computers or do you think they are just a necessary evil, or what do you think?

9 respondents like computers and one think they are a necessary evil. Several of them report that theylike them when they’re working properly and not when they are faulty.

36.Do you reinstall your system every now and then? If you do, explain why.Nr of respondents that reinstall 3Reason 1: System gets instable after a while.Reason 2: Want to install the new OS.

37.If you reported that you utilized several different computer system before (6), what systems?Home, stationaryLaptopWork

38.If you reported that you utilized several different computer system before (6), how does data differbetween them?Home, stationary Fun and personal contentLaptop No answerWork Work related content

39.What kind of data do you store online, if any?Various (not specifiedEmailInformationText files

40.Do you partition your hard drive? If so explain why?Information structural reasons 3Keep OS and data separate 1No 3Don’t know what it means 3

46

41.How do you proceed when you want to find a document you created some time ago? Explain.Respondent 1 File search.Respondent 2 Browse to folder, relevant to topic, for example School.Respondent 3 First browse then search.Respondent 4 Look in the My documents folder where I usually save everything.Respondent 5 My desktopRespondent 6 Look where I think I’ve put it, then file searchRespondent 7 I save all documents in the same folder. I start Word and open file.Respondent 8 Think about where I’ve put it, otherwise search.Respondent 9 I know where they are but if I don’t, I search for it.Respondent 10 I use the “My recent documents” or make a search

Browse usually means that you look for it by stepping through folders using the“Explorer” application that comes with Windows. A search usually means thatyou use the “File find” application of Windows where you can have thecomputer search for files using specific criteria, such as part of the documentsname.

7.1.4 Choice questions42.Do you tend to store everything or nothing?Everything 8Nothing 1No answer 1

43.Do you memorize passwords or do you write them down, outside the computer?Memorize 8Write down 2

The impact of organization of personal content in computing

Documents

Transcript of The impact of organization of personal content in computing