28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail...

42
28 February 2012 Kaiser: COMS E6125 1 COMS E6125 Web-enHanced COMS E6125 Web-enHanced Information Management Information Management (WHIM) (WHIM) Prof. Gail Kaiser Prof. Gail Kaiser Spring 2012 Spring 2012

Transcript of 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail...

Page 1: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 1

COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)

COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)

Prof. Gail KaiserProf. Gail Kaiser

Spring 2012Spring 2012

Page 2: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 2

Today’s Topics:

• What is Web 2.0?• Information Sharing and

Privacy• Applications Beyond the Web

Page 4: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 4

Netscape vs. Google: The Web As Platform

• Netscape: free web browser as flagship to establish market for high-priced server products that push content to the “webtop” – but servers also turned out to be commodities

• Google: Native web application, never sold or packaged or ported, delivered as a service with no scheduled software releases, massively scalable - core competency is data management

Page 5: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 5

Akamai vs. BitTorrent:Internet Decentralization

• Akamai: Treats network as platform at deeper level of stack, transparent caching and content delivery that eases bandwidth congestion – also limited by business model catering to large providers

• BitTorrent: P2P file fragment downloads, every client is also a server, the service automatically gets better the more people use it - architecture of participation

Page 6: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 6

Harness Collective Intelligence

• Google PageRank using link structure• eBay enabler of user activity requiring

critical mass• Amazon uses community activity to

produce better search results (e.g., real-time “most popular” computation)

• Wikipedia – radical experiment in trust, profound change in content creation

Page 7: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 7

Harness Collective Intelligence

• Web of connections grows organically• Viral marketing – if a site or product relies

on advertising to get the word out, it isn’t Web 2.0

• Peer-production open source development of much web infrastructure – linux, apache, mysql, perl, php, python

• Network effects from user contributions are the key to market dominance

Page 8: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 8

Blogosphere• Blogging vs. personal home pages,

replaced personal dairy, daily opinion column, Usenet News, now being supplanted by facebook and twitter

• RSS (Really Simple Syndication) allows subscribing to a page – the incremental (or live) web

• Permalink builds bridges between weblogs, effects PageRank search results

Page 9: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 9

Perpetual Beta• Software delivered as a service, not a

product• Upgrades every day vs. every 2-3 years• Operations and monitoring must

become core competencies• Scripting languages as duct tape• Innovation in assembly

Page 10: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 10

AJAXRich User Experiences

• Standards-based presentation using XHTML and CSS

• Dynamic display and interaction using the Document Object Model

• Data Interchange and manipulation using XML and XSLT

• Asynchronous data retrieval using XMLHttpRequest

• Javascript binding everything together• Without plugins!

Page 11: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 11

Infoware• Data management as core competency• Web crawlers vs. specialized databases (“

inv isible w eb”)• Map databases: starting with Mapquest, many

services now license the same data from NavTeq (digital street maps) and Digital Globe (satellite images)

• Amazon licensed ISBN registry from Bowker but added publisher-supplied data and user annotations

• Mashups based on lightweight programming model create value-added data

Key issue: Who owns the data?

Page 12: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 12

Information Sharing: Web 1.0

• The original purpose of the Web!• Generally viewed as an information resource,

download without upload• Websites owned by “someone else” may store

your information in a database – usually limited to basic identification (name, address, phone number, credit card) and “preferences”

• Personal websites (e.g., hosted by geocities) might be universally browse-able but visited by few

• Key issue: Who owns the data?

Page 13: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 13

Information Sharing: Web 2.0

• Message boards with user-supplied content• Portals with user-selected content “portlets”• Blogs, wikis, news feeds, texting• Social networking, collaborative filtering• The Web as Platform, user-supplied

applicationsKey issue: Who owns the data?

Page 14: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 14

The Right To Privacy• Secrecy (confidentiality): The extent

to which we are known to others• Anonymity: The extent to which we

are the subject of others’ attention• Solitude: The extent to which others

have access to us

Page 15: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 15

Rights to Sue (wrt Privacy)

• Intrusion upon seclusion or solitude, or into private affairs

• Public disclosure of embarrassing private facts

• Inaccurate reporting: Publicity that places a person in a false light in the public eye

• Appropriation of identity: “identity theft”

Page 16: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 16

A New Yorker cartoon from 1993

Page 17: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 17

But in 2012, your browser (and its addons, plugins, etc.) know

• You’ve searched for local veterinarians and groomers

• You’ve read reviews comparing flea powders• You’ve ordered “chew sticks” and “squeaky toys”• You’ve printed coupons for Alpo• You’ve downloaded 101 Dalmations and Lassie

“on demand” movies• Your email contains sales notices from

petco.com Your “My Pictures” folder contains 100s of

images of fire hydrants and frisbees

Page 18: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 18

Page 19: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 19

Web Tracking• Bits: How Do They Track You? • Data collection events:

– Pages displayed– Search queries entered– Videos played– Advertising displayed (both same party and

third party)• In December 2007 alone, yahoo

collected 400 billion events, aol 100 billion, google 91 billion, microsoft 51 billion

Page 20: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 20

From study bycomScore publishedin NY Times online3/9/08

Page 21: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 21

Caveats• Not all of this data is useful• Not all of it is retained by the companies

with access to it• Much of it cannot be traced back to

individuals• Several data collection events may be

triggered by a single Web page • Augmented by user-volunteered data

(website registration, public profiles, “like” buttons)

Page 22: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 22

Fighting Back?• Targeted advertising supports “free” services

and content (ad serving was the first widely deployed mashup)

• Partially combated by blocking (e.g., TACO) and transparency (e.g., Open Data Partnership)

• But collected information can be used for other purposes…

• Need a general-purpose “No track” button

Page 23: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 23

Privacy Before and After

• Before the Web, you participated in a variety of activities

• These might have involved groups of people, in public or private, possibly even “the press”

• Photos or recordings might have been taken, with or without your knowledge

• You might have borrowed or purchased books or magazines related to your activities

• You might have sent/received letters by snailmail

• What is different now? Does it matter?

Page 24: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 24

Privacy Before and After

• Before the Web, you might have typed your name, address, phone number, birth date, social security number, bank account numbers, credit card numbers, etc. into your PC for personal storage

• It was unlikely anyone outside your household could access your PC

• Now you type at least part of that information into your PC all the time (if you make online purchases and/or sign up for online services)

• And you have no idea who might be reading them, from either your PC (if connected to Internet) or from the Websites you sent them to

Page 25: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 25

Privacy Before and After

• Your name, phone number, address were always easily available (phone book, reverse listings)

• So was your birth date, although harder to obtain (birth records, drivers license)

• And your SSN - lots of forms ask for it• Your checking account and/or credit card

numbers were available through the issuing banks and the merchants where you made purchases

• So what is different now? Does it matter?

Page 26: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 26

Web 2.0 Applications for Scientific Communities

• Scientists collaborating together in the same lab on the same project share:– Data: specimens, samples, materials, observations,

etc.– Tools: instruments, software, hardware– Knowledge: open discussion, whiteboard Real-world social networking

• However, there are time and space constraints• More significantly, this model does not scale

well to communities of scientists working on different projects but who could possibly learn from each other’s expertise, experience, etc.

Page 27: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 27

CSCW Approaches• CSCW (Computer-Supported Collaborative

Work) aims to augment same-time/same-place collaboration but more significantly different-time/different-place collaborations and communities

• Current generation CSCW systems support data sharing (e.g., PNNL Collaboratories) and/or tool sharing (e.g., UIUC BioCoRE)

• However, these systems do not address knowledge sharing how/when/where/why to use tools and data

Page 28: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 28

Knowledge Sharing• Knowledge sharing is partially enabled

through labor intensive static approaches: publications, email lists, wikis, chat, shared display, etc.

• We seek to enable automatic knowledge sharing - without requiring “extra work” on the part of scientists

Page 29: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 29

Social Networking Metaphor

• Some online social networking is a form of CSCW that is potentially enjoyable and profitable but still requires “extra work”, with dynamism limited by explicit user participation– Facebook, LinkedIn, Twitter, etc.

• Other social networking automatically records what people do online to aggregate, data mine and disseminate in an enjoyable and profitable fashion, with no “extra work” required - but can be enhanced by very simple user actions (e.g., ratings)– Collaborative filtering – “people like you …”

Page 30: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 30

genSpace Overview• We combine implicit and explicit social

networking concepts in our approach to knowledge sharing

• Prototype implemented as a set of plugins for geWorkbench, a platform for analysis and visualization tools for integrated genomics

• Records, aggregates, data mines and disseminates geWorkbench users’ activities with tools and tool sequences (workflows)

Page 31: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 31

Questions genSpace Can Answer

• What do I do next? • Which tools work well together?• Where does this tool fit in a typical

workflow?• Who do I know who also uses this tool?• How can I get help (from an expert who

is online right now)?

Page 32: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 32

genSpace Features• Collaborative Workflow Composition: past

history of analysis tool usage is used to identify commonly-occurring sequences/workflows

• Tool Suggestions: suggests analysis tools that may be useful, based on what tools were previously used

• Social Networking: allows users to associate with each other and share knowledge within groups

• Data Suggestions: suggest data sets based upon previous analyses and CF

Page 33: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 33

genSpace Architecture

Page 34: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 34

Privacy/Confidentiality Concerns

• Users can choose anonymous logging or disable it entirely

• Security/privacy of the activity logs is being investigated (data sets are NOT recorded*)

• Issues when users change their collaborative networks and/or opt out preferences

• Must we provide privacy by default?

Page 35: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

Research in the Cloud• geWorkbench, most other analysis

tools are “fat” desktop applications

• Why not create a browser-based client?

28 February 2012 Kaiser: COMS E6125 35

Page 36: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

More open questions for genSpace

• What other Web 2.0 concepts and techniques can help support scientific researchers?

• How can we efficiently address privacy concerns while providing helpful recommendations?

28 February 2012 Kaiser: COMS E6125 36

Page 37: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 37

genSpace Summary• genSpace embodies an approach to

knowledge sharing that is based on social networking metaphors

• genSpace is built on the geWorkbench platform for integrated genomics

• Potentially applicable to other kinds of scientists and engineers, including software engineers

Page 38: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

Web 2.0 Summary• It’s here and everywhere,

privacy/anonymity are losing ground• Web-Oriented Architecture (Web

Services, RSS, Mashups)• Rich Internet Applications (AJAX, HTML5,

Flash)• Social Web (Facebook, Google+,

LinkedIn, user participation in shopping/renting as well as review sites)

28 February 2012 Kaiser: COMS E6125 38

Page 39: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

Web 3.0

28 February 2012 Kaiser: COMS E6125 39

Page 40: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 40

Next Assignment #1: Presentation Proposal

• Due Tuesday March 6th, 10am• Title and a brief 1-2 paragraph description of the planned content • Presentation slots on the course schedule will be assigned asap

after proposals are received (specify any scheduling constraints)• Each presentation should be about 10 minutes and should consist

of approximately 10 slides• The target audience is the students in this class: do not assume

any specialized knowledge beyond the scope of the initial course lectures but also do not duplicate any material covered in lectures (except a one-slide “review” is ok)

Page 41: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 41

Next Assignment #2: Project Proposal

• Due Tuesday March 6th, 10am• Three pages, not including figures and references (if any)• Identify your full team (if any), with “management

structure”• Sketch the project you have in mind, including both the

functionality or evaluation you aim to achieve and the technology you plan to use

• You should plan to do some programming and to produce some demoable software

Page 42: 28 February 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

28 February 2012 Kaiser: COMS E6125 42

COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)

COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)

Prof. Gail KaiserProf. Gail Kaiser

Spring 2012Spring 2012