28 February 2012 Kaiser: COMS E6125 1
COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)
COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)
Prof. Gail KaiserProf. Gail Kaiser
Spring 2012Spring 2012
28 February 2012 Kaiser: COMS E6125 2
Today’s Topics:
• What is Web 2.0?• Information Sharing and
Privacy• Applications Beyond the Web
Tim O’Reilly, September 2005
3
28 February 2012 Kaiser: COMS E6125 4
Netscape vs. Google: The Web As Platform
• Netscape: free web browser as flagship to establish market for high-priced server products that push content to the “webtop” – but servers also turned out to be commodities
• Google: Native web application, never sold or packaged or ported, delivered as a service with no scheduled software releases, massively scalable - core competency is data management
28 February 2012 Kaiser: COMS E6125 5
Akamai vs. BitTorrent:Internet Decentralization
• Akamai: Treats network as platform at deeper level of stack, transparent caching and content delivery that eases bandwidth congestion – also limited by business model catering to large providers
• BitTorrent: P2P file fragment downloads, every client is also a server, the service automatically gets better the more people use it - architecture of participation
28 February 2012 Kaiser: COMS E6125 6
Harness Collective Intelligence
• Google PageRank using link structure• eBay enabler of user activity requiring
critical mass• Amazon uses community activity to
produce better search results (e.g., real-time “most popular” computation)
• Wikipedia – radical experiment in trust, profound change in content creation
28 February 2012 Kaiser: COMS E6125 7
Harness Collective Intelligence
• Web of connections grows organically• Viral marketing – if a site or product relies
on advertising to get the word out, it isn’t Web 2.0
• Peer-production open source development of much web infrastructure – linux, apache, mysql, perl, php, python
• Network effects from user contributions are the key to market dominance
28 February 2012 Kaiser: COMS E6125 8
Blogosphere• Blogging vs. personal home pages,
replaced personal dairy, daily opinion column, Usenet News, now being supplanted by facebook and twitter
• RSS (Really Simple Syndication) allows subscribing to a page – the incremental (or live) web
• Permalink builds bridges between weblogs, effects PageRank search results
28 February 2012 Kaiser: COMS E6125 9
Perpetual Beta• Software delivered as a service, not a
product• Upgrades every day vs. every 2-3 years• Operations and monitoring must
become core competencies• Scripting languages as duct tape• Innovation in assembly
28 February 2012 Kaiser: COMS E6125 10
AJAXRich User Experiences
• Standards-based presentation using XHTML and CSS
• Dynamic display and interaction using the Document Object Model
• Data Interchange and manipulation using XML and XSLT
• Asynchronous data retrieval using XMLHttpRequest
• Javascript binding everything together• Without plugins!
28 February 2012 Kaiser: COMS E6125 11
Infoware• Data management as core competency• Web crawlers vs. specialized databases (“
inv isible w eb”)• Map databases: starting with Mapquest, many
services now license the same data from NavTeq (digital street maps) and Digital Globe (satellite images)
• Amazon licensed ISBN registry from Bowker but added publisher-supplied data and user annotations
• Mashups based on lightweight programming model create value-added data
Key issue: Who owns the data?
28 February 2012 Kaiser: COMS E6125 12
Information Sharing: Web 1.0
• The original purpose of the Web!• Generally viewed as an information resource,
download without upload• Websites owned by “someone else” may store
your information in a database – usually limited to basic identification (name, address, phone number, credit card) and “preferences”
• Personal websites (e.g., hosted by geocities) might be universally browse-able but visited by few
• Key issue: Who owns the data?
28 February 2012 Kaiser: COMS E6125 13
Information Sharing: Web 2.0
• Message boards with user-supplied content• Portals with user-selected content “portlets”• Blogs, wikis, news feeds, texting• Social networking, collaborative filtering• The Web as Platform, user-supplied
applicationsKey issue: Who owns the data?
28 February 2012 Kaiser: COMS E6125 14
The Right To Privacy• Secrecy (confidentiality): The extent
to which we are known to others• Anonymity: The extent to which we
are the subject of others’ attention• Solitude: The extent to which others
have access to us
28 February 2012 Kaiser: COMS E6125 15
Rights to Sue (wrt Privacy)
• Intrusion upon seclusion or solitude, or into private affairs
• Public disclosure of embarrassing private facts
• Inaccurate reporting: Publicity that places a person in a false light in the public eye
• Appropriation of identity: “identity theft”
28 February 2012 Kaiser: COMS E6125 16
A New Yorker cartoon from 1993
28 February 2012 Kaiser: COMS E6125 17
But in 2012, your browser (and its addons, plugins, etc.) know
• You’ve searched for local veterinarians and groomers
• You’ve read reviews comparing flea powders• You’ve ordered “chew sticks” and “squeaky toys”• You’ve printed coupons for Alpo• You’ve downloaded 101 Dalmations and Lassie
“on demand” movies• Your email contains sales notices from
petco.com Your “My Pictures” folder contains 100s of
images of fire hydrants and frisbees
28 February 2012 Kaiser: COMS E6125 18
28 February 2012 Kaiser: COMS E6125 19
Web Tracking• Bits: How Do They Track You? • Data collection events:
– Pages displayed– Search queries entered– Videos played– Advertising displayed (both same party and
third party)• In December 2007 alone, yahoo
collected 400 billion events, aol 100 billion, google 91 billion, microsoft 51 billion
28 February 2012 Kaiser: COMS E6125 20
From study bycomScore publishedin NY Times online3/9/08
28 February 2012 Kaiser: COMS E6125 21
Caveats• Not all of this data is useful• Not all of it is retained by the companies
with access to it• Much of it cannot be traced back to
individuals• Several data collection events may be
triggered by a single Web page • Augmented by user-volunteered data
(website registration, public profiles, “like” buttons)
28 February 2012 Kaiser: COMS E6125 22
Fighting Back?• Targeted advertising supports “free” services
and content (ad serving was the first widely deployed mashup)
• Partially combated by blocking (e.g., TACO) and transparency (e.g., Open Data Partnership)
• But collected information can be used for other purposes…
• Need a general-purpose “No track” button
28 February 2012 Kaiser: COMS E6125 23
Privacy Before and After
• Before the Web, you participated in a variety of activities
• These might have involved groups of people, in public or private, possibly even “the press”
• Photos or recordings might have been taken, with or without your knowledge
• You might have borrowed or purchased books or magazines related to your activities
• You might have sent/received letters by snailmail
• What is different now? Does it matter?
28 February 2012 Kaiser: COMS E6125 24
Privacy Before and After
• Before the Web, you might have typed your name, address, phone number, birth date, social security number, bank account numbers, credit card numbers, etc. into your PC for personal storage
• It was unlikely anyone outside your household could access your PC
• Now you type at least part of that information into your PC all the time (if you make online purchases and/or sign up for online services)
• And you have no idea who might be reading them, from either your PC (if connected to Internet) or from the Websites you sent them to
28 February 2012 Kaiser: COMS E6125 25
Privacy Before and After
• Your name, phone number, address were always easily available (phone book, reverse listings)
• So was your birth date, although harder to obtain (birth records, drivers license)
• And your SSN - lots of forms ask for it• Your checking account and/or credit card
numbers were available through the issuing banks and the merchants where you made purchases
• So what is different now? Does it matter?
28 February 2012 Kaiser: COMS E6125 26
Web 2.0 Applications for Scientific Communities
• Scientists collaborating together in the same lab on the same project share:– Data: specimens, samples, materials, observations,
etc.– Tools: instruments, software, hardware– Knowledge: open discussion, whiteboard Real-world social networking
• However, there are time and space constraints• More significantly, this model does not scale
well to communities of scientists working on different projects but who could possibly learn from each other’s expertise, experience, etc.
28 February 2012 Kaiser: COMS E6125 27
CSCW Approaches• CSCW (Computer-Supported Collaborative
Work) aims to augment same-time/same-place collaboration but more significantly different-time/different-place collaborations and communities
• Current generation CSCW systems support data sharing (e.g., PNNL Collaboratories) and/or tool sharing (e.g., UIUC BioCoRE)
• However, these systems do not address knowledge sharing how/when/where/why to use tools and data
28 February 2012 Kaiser: COMS E6125 28
Knowledge Sharing• Knowledge sharing is partially enabled
through labor intensive static approaches: publications, email lists, wikis, chat, shared display, etc.
• We seek to enable automatic knowledge sharing - without requiring “extra work” on the part of scientists
28 February 2012 Kaiser: COMS E6125 29
Social Networking Metaphor
• Some online social networking is a form of CSCW that is potentially enjoyable and profitable but still requires “extra work”, with dynamism limited by explicit user participation– Facebook, LinkedIn, Twitter, etc.
• Other social networking automatically records what people do online to aggregate, data mine and disseminate in an enjoyable and profitable fashion, with no “extra work” required - but can be enhanced by very simple user actions (e.g., ratings)– Collaborative filtering – “people like you …”
28 February 2012 Kaiser: COMS E6125 30
genSpace Overview• We combine implicit and explicit social
networking concepts in our approach to knowledge sharing
• Prototype implemented as a set of plugins for geWorkbench, a platform for analysis and visualization tools for integrated genomics
• Records, aggregates, data mines and disseminates geWorkbench users’ activities with tools and tool sequences (workflows)
28 February 2012 Kaiser: COMS E6125 31
Questions genSpace Can Answer
• What do I do next? • Which tools work well together?• Where does this tool fit in a typical
workflow?• Who do I know who also uses this tool?• How can I get help (from an expert who
is online right now)?
28 February 2012 Kaiser: COMS E6125 32
genSpace Features• Collaborative Workflow Composition: past
history of analysis tool usage is used to identify commonly-occurring sequences/workflows
• Tool Suggestions: suggests analysis tools that may be useful, based on what tools were previously used
• Social Networking: allows users to associate with each other and share knowledge within groups
• Data Suggestions: suggest data sets based upon previous analyses and CF
28 February 2012 Kaiser: COMS E6125 33
genSpace Architecture
28 February 2012 Kaiser: COMS E6125 34
Privacy/Confidentiality Concerns
• Users can choose anonymous logging or disable it entirely
• Security/privacy of the activity logs is being investigated (data sets are NOT recorded*)
• Issues when users change their collaborative networks and/or opt out preferences
• Must we provide privacy by default?
Research in the Cloud• geWorkbench, most other analysis
tools are “fat” desktop applications
• Why not create a browser-based client?
28 February 2012 Kaiser: COMS E6125 35
More open questions for genSpace
• What other Web 2.0 concepts and techniques can help support scientific researchers?
• How can we efficiently address privacy concerns while providing helpful recommendations?
28 February 2012 Kaiser: COMS E6125 36
28 February 2012 Kaiser: COMS E6125 37
genSpace Summary• genSpace embodies an approach to
knowledge sharing that is based on social networking metaphors
• genSpace is built on the geWorkbench platform for integrated genomics
• Potentially applicable to other kinds of scientists and engineers, including software engineers
Web 2.0 Summary• It’s here and everywhere,
privacy/anonymity are losing ground• Web-Oriented Architecture (Web
Services, RSS, Mashups)• Rich Internet Applications (AJAX, HTML5,
Flash)• Social Web (Facebook, Google+,
LinkedIn, user participation in shopping/renting as well as review sites)
28 February 2012 Kaiser: COMS E6125 38
Web 3.0
28 February 2012 Kaiser: COMS E6125 39
28 February 2012 Kaiser: COMS E6125 40
Next Assignment #1: Presentation Proposal
• Due Tuesday March 6th, 10am• Title and a brief 1-2 paragraph description of the planned content • Presentation slots on the course schedule will be assigned asap
after proposals are received (specify any scheduling constraints)• Each presentation should be about 10 minutes and should consist
of approximately 10 slides• The target audience is the students in this class: do not assume
any specialized knowledge beyond the scope of the initial course lectures but also do not duplicate any material covered in lectures (except a one-slide “review” is ok)
28 February 2012 Kaiser: COMS E6125 41
Next Assignment #2: Project Proposal
• Due Tuesday March 6th, 10am• Three pages, not including figures and references (if any)• Identify your full team (if any), with “management
structure”• Sketch the project you have in mind, including both the
functionality or evaluation you aim to achieve and the technology you plan to use
• You should plan to do some programming and to produce some demoable software
28 February 2012 Kaiser: COMS E6125 42
COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)
COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)
Prof. Gail KaiserProf. Gail Kaiser
Spring 2012Spring 2012
Top Related