1 Peter Fox Xinformatics ITEC 4962/6961, ERTH 4963/6963, CSCI 4960/6960 Week 1, January 25, 2011...
-
Upload
austen-robinson -
Category
Documents
-
view
216 -
download
0
Transcript of 1 Peter Fox Xinformatics ITEC 4962/6961, ERTH 4963/6963, CSCI 4960/6960 Week 1, January 25, 2011...
1
Peter Fox
Xinformatics
ITEC 4962/6961, ERTH 4963/6963, CSCI 4960/6960
Week 1, January 25, 2011
Introduction to XinformaticsCourse Scope, Assessments
Contents• Introductions
• Course Outline
• Application areas
• Logistics and resources
• Assessment and assignments
• Learning objectives, outcomes
• Introduction to Xinformatics
• Next class(es)
2
Introductions• Name, major, year
• Interests, goals, outcomes
• Have you completed any *suggested* prerequisites:– Knowledge such as that gained in a Data Base
class (e.g., CSCI-4380)– Knowledge such as that gained in a Data
Structures class (e.g., CSCI-1200)– Knowledge such as that gained in a Data
Science class (e.g. ITEC/CSCI/ERTH 6961-01)
• Questions 3
Course Outline (tentative)• Introduction to Informatics• Capturing the problem: Use case development and
requirement analysis• State-of-the-Art, informatics applications• Information theory, models, tools• Foundations; semiotics, library, cognitive and social science• Information life-cycle• Information architectures (Internet, Web, Grid, Cloud) • Information Visualization, Information and Workflow
Management• Information Discovery, Information Integration• Class exercises, presentations** along the way
4
Application Areas• Geoinformatics
• Astroinformatics
• Cheminformatics
• Bioinformatics
• Helioinformatics
• Healthinformatics
• Ecoinformatics
• Nursing informatics
• and the list goes on, and on5
Logistics• Class: ITEC 4962/6961, ERTH 4963/6963, CSCI
4960/6960Hours: 9am-11:50am Tuesdays• Location: SAGE 2707• Instructor: Peter Fox - [email protected] or
[email protected] , x4862• Contact hours: Mondays 3pm-4pm (or by appt)• Contact location: Winslow 2120 or JRSC 2C04• TA: Stephanie Ardizzone ([email protected] )• *******Web:
http://tw.rpi.edu/web/Courses/Xinformatics/2011 Schedule, syllabus, reading, assignments, etc.
6
Assessment and Assignments• Via written assignments with specific percentage of
grade allocation provided with each assignment• Via individual oral presentations with specific
percentage of grade allocation provided• Via group presentations – depending on class size• Via participation in class (not to exceed 10% of
total) – this works by ‘losing’ points by not participating
• Late submission policy: first time with valid reason – no penalty, otherwise 20% of score deducted each late day
7
Assessment and Assignments• Reading assignments
– Are given almost every week– Most are background and informational– Some are key to completing assignments– Some are relevant to the current week’s class (i.e. follow
up reading)– Others are relevant to following week’s class (i.e. pre-
reading)– Undergraduates - will not be tested on but we will often
discuss these in class and participation in these is taken into account
– Graduates – are likely to be tested as part of assignments, i.e. an extra question
• You will progress from individual work to group work
8
Objectives• To instruct future information architects how to
sustainably generate information models, designs and architectures
• To instruct future technologists how to understand and support essential data and information needs of a wide variety of producers and consumers
• For both to know tools, and requirements to properly handle data and information
• Will learn and be evaluated on the underpinnings of informatics, including theoretical methods, technologies and best practices.
9
Learning Objectives• Through class lectures, practical sessions,
written and oral presentation assignments and projects, students should:– Understand and develop skill in Development
and Management of multi-skilled teams in the application of Informatics
– Understand and know how to develop Conceptual and Information Models and Explain them to non-experts
– Knowledge and application of Informatics Standards
– Skill in Informatics Tool Use and Evaluation10
Academic Integrity• Student-teacher relationships are built on trust. For example, students
must trust that teachers have made appropriate decisions about the structure and content of the courses they teach, and teachers must trust that the assignments that students turn in are their own. Acts, which violate this trust, undermine the educational process. The Rensselaer Handbook of Student Rights and Responsibilities defines various forms of Academic Dishonesty and you should make yourself familiar with these. In this class, all assignments that are turned in for a grade must represent the student’s own work. In cases where help was received, or teamwork was allowed, a notation on the assignment should indicate your collaboration. Submission of any assignment that is in violation of this policy will result in a penalty. If found in violation of the academic dishonesty policy, students may be subject to two types of penalties. The instructor administers an academic (grade) penalty, and the student may also enter the Institute judicial process and be subject to such additional sanctions as: warning, probation, suspension, expulsion, and alternative actions as defined in the current Handbook of Student Rights and Responsibilities. If you have any question concerning this policy before submitting an assignment, please ask for clarification.
11
Questions so far?
12
Introduction to Informatics• E.g. Bioinformatics
– Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological information generated by the scientific community. This deluge of genomic information has, in turn, led to an absolute requirement for computerized databases to store, organize, and index the data and for specialized tools to view and analyze the data.
– http://www.ncbi.nlm.nih.gov/About/primer/bioinformatics.html
13
Tell us more…• Bioinformatics is the field of science in which biology,
computer science, and information technology merge to form a single discipline.
• The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.
• At the beginning of the "genomic revolution", a bioinformatics concern was the creation and maintenance of a database to store biological information, such as nucleotide and amino acid sequences.
• Development of this type of database involved not only design issues but the development of complex interfaces whereby researchers could both access existing data as well as submit new or revised data.
14
And…• Ultimately, however, all of this information
must be combined to form a comprehensive picture of normal cellular activities so that researchers may study how these activities are altered in different disease states.
• Therefore, the field of bioinformatics has evolved such that the most pressing task now involves the analysis and interpretation of various types of data, including nucleotide and amino acid sequences, protein domains, and protein structures. 15
And…• The actual process of analyzing and interpreting
data is referred to as computational biology. Important sub-disciplines within bioinformatics and computational biology include:– the development and implementation of tools that enable
efficient access to, and use and management of, various types of information
– the development of new algorithms (mathematical formulas) and statistics with which to assess relationships among members of large data sets, such as methods to locate a gene within a sequence, predict protein structure and/or function, and cluster protein sequences into families of related sequences
16
One result – myexperiment.org
17
Definitions
• Data - are pieces of information that represent the qualitative or quantitative attributes of a variable or set of variables.
• Data (plural of "datum", which is seldom used) - are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables.
• Data - are often viewed as the lowest level of abstraction from which information and knowledge are derived 18
Definitions ctd.• Information
– Representations (of facts? data?) in a form that lends itself to human use
– The word information derives from the Latin informare (in+formare) meaning to give form, shape, or character to. It is therefore to be the formative principle of, or to imbue with some specific character or quality.
• Knowledge– Check out Wikipedia…. meaning
19
Definitions ctd.
• Metadata – data about data
• Metainformation – information about information
• Documentation – integrated collection of information and metadata intended to support all aspects of data (find, access, use…)
20
Full life cycle of dataMicro
22
The Information Era: Interoperability
• managing and accessing large data sets• higher space/time resolution capabilities • rapid response requirements• data assimilation into models• crossing disciplinary boundaries.
Modern information and communications technologies are creating an “interoperable” information era in which ready access to data and information can be truly universal.
Open access to data and services enables us to meet the new challenges of understand complex systems:
Fox CI and X-informatics - CSIG 2008, Aug 11
23
Shifting the Burden from the Userto the Provider
20 April 2023 © GEO Secretariat
slide 24
Earth is a complex system of systems
Data is required from multiple observation
networks . . . and systems . . .
20 April 2023 © GEO Secretariat
Local in-situ Networks and Systems Air pollution
measurement station
Emden, Germany
Local and national air pollution networks Venice, Italy, and Indonesia
Other forms of information
26
Information explosion• Devices are everywhere, but … by 2020
27
And, gulp, unstructured
28
The key is:• As volume, complexity and heterogeneity
increase…– Suddenly information may look more like a
continuum– All known methods, algorithms will not scale
(except for very simple operations)– And because it is information, humans are part of
the loop
• Thus – we need to understand and apply the theoretical foundations
• Problem: all to date are developed in an analog world, not a digital one!!
29
30
Mind the gap• As capabilities and needs grow on both
sides: science/ medicine/ engineering – and
technology:
• There is/ was still a gap between science
and the underlying infrastructure and
technology that is available
• Cyberinfrastructure is the new research environment(s) that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services over the Internet.
Informatics - information science includes the
science of (data and) information, the practice
of information processing, and the engineering
of information systems. Informatics studies the
structure, behavior, and interactions of natural
and artificial systems that store, process and
communicate (data and) information. It also
develops its own conceptual and theoretical
foundations. Since computers, individuals and
organizations all process information,
informatics has computational, cognitive and
social aspects, including study of the social
impact of information technologies. Wikipedia.
31
But really it’s not just one field
IT Cyber
Infrastructure (CI)
Cyber Informatics
Core Informatics
Science Informatics
Science, Benefit to others
Informatics
•CI = Discipline neutral, e.g. web server, database, wiki
•Cyberinformatics = mapping to discipline neutral aspects
•Core informatics = Reasoning engine, semantics, computer science
•Science (X) informatics = Use cases, science domain terms, concepts in an ontology or controlled vocabulary
A moment of history
• In the late 1950’s (actually around 1957-1958 or 1962 depending on what you read) the modern informatics term was coined
• Existed for a while but then split into library science and computer science and developed their own fields, became disconnected
• Now coming back to be relevant to science• Informatics IS NOT just having a scientist work
with an “IT/ICT” person (NOT, NOT, NOT) 32
Cyberinformatics
• The first match between the domain and the underlying domain-neutral e-infrastructure/ cyberinfrastructure
• When the underlying infrastructure (when it becomes real infrastructure and not just software) changes this is one part that needs to change
• Less brittle since upper layers remain intact
33
Core informatics
• The realm of computer science (for the most part, also librarians)
• Strongly influenced by science (and engineering and medical applications) above and below this layer
• If we can leverage this, we do not need to do the specialist work, however …
• We must work with these scientists, sustainably 34
Science Informatics
• Where science meets the underlying technical capabilities and methods
• Must be expressible in science terms; increasingly use cases
• The people in this area are multi-lingual and both interdisciplinary and multi-disciplinary, few are trained or literate here ******
• Team, or really a community of practice (CoP) 35
BORROMEAN RINGSBORROMEAN RINGSThree interlinked circles that Three interlinked circles that represent inseparable parts of represent inseparable parts of the whole. Remove any one the whole. Remove any one ring and the other two fall ring and the other two fall apart. Because of this apart. Because of this property, Borromean Rings property, Borromean Rings have been used as a symbol have been used as a symbol of unity in many fields.of unity in many fields.
THE PHYSICS OF INFORMATIONTHE PHYSICS OF INFORMATION
•Information has three indivisible ingredients – Information has three indivisible ingredients – content,content, context context and and structure. structure. •The ability to automatically utilize the inherent The ability to automatically utilize the inherent structure of information is the threshold in information structure of information is the threshold in information management from hardcopy to digital media. management from hardcopy to digital media.
© 2005 EvREsearch LTD© 2005 EvREsearch LTD
EvREsearch©EvREsearch©
Not a perfect story• Many authors criticize the use of the term entropy,
and physics of information
• Information conservation, diffusion, viscosity, advection, dissipation… sort of all make some sense
• Units are a big part of it (question: what are the possible units?) and what are the non-dimensional numbers?
• However the idea is very relevant to modeling, design and architecture
• We’ll revisit the components of the physics of information 37
Information theory
• Semiotics, also called semiotic studies or semiology, is the study of sign processes (semiosis), or signification and communication, signs and symbols, into three branches:– Syntactics: Relation of signs to each other in
formal structures– Semantics: Relation between signs and the
things to which they refer; their denotata– Pragmatics: Relation of signs to their impacts on
those who use them
38
Library science• Curates the artifacts of knowledge but
increasingly: (yes) information
• Organizes and manages them for consumers– Cataloging and classification
• Preservation– ‘maintaining or restoring access to artifacts,
documents and records through the study, diagnosis, treatment and prevention of decay and damage’ (wikipedia)
• Digital age– Curation and preservation
39
CLAY
PAPYRUS
PAPER
DIGITAL
TIME (years before present)
INF
OR
MA
TIO
N T
RA
NS
PO
RT
INF
OR
MA
TIO
N I
NT
EG
RA
TIO
N
INFORMATION VOLUME
STONE
HISTORY OF INFORMATION THRESHOLDSHISTORY OF INFORMATION THRESHOLDS
INFORMATION ERASINFORMATION ERAS
© 2005 EvREsearch LTD© 2005 EvREsearch LTD
FUTURE0100020003000400050006000
Social Science• Branch of humanities
• Especially as it relates to networks of scientists
• Exploits sociology of groups, teams
• Cultural norms as well as discipline norms– Modes of what and how rewards are given– Between those who produce and those who
consume data and information– How you collect, understand, model and design
models and architectures is as much social as technical skill
41
Cognitive Science• Cognitive science is an interdisciplinary study of
the mind and intelligence• It operates at the intersection of psychology,
philosophy, computer science, linguistics, anthropology, and neuroscience.
• Of relevance for data and information science are three significant theoretical underpinnings– mental representation,– the nature of expertise, – and intuition
• Very relevant to models, modeling, metamodel choice 42
Use Case
• … is a collection of possible sequences of interactions between the system under discussion and its actors, relating to a particular goal.
• The collection of Use Cases should define all system behavior relevant to the actors to assure them that their goals will be carried out properly.
• Any system behavior that is irrelevant to the actors should not be included in the use cases.– is a prose description of a system's behavior when
interacting with the outside world.– is a technique for capturing functional requirements of
business systems and, potentially, of an IT system to support the business system.
– can also capture non-functional requirements
Use Case
• Must be documented (or it is useless)• Should be implemented (or it is not well
scoped)• Is used to identify: objects ~ resources,
processes, roles (aka actors), requirements, etc.
• Scopes and guides what is implemented
Preview of Information Models• Conceptual models, sometimes called domain
models, are typically used to explore domain concepts
• High-level conceptual models are often created as part of initial requirements envisioning efforts as they are used to explore the high-level static business or science or medicine structures and concepts.
• Conceptual models are often created as the precursor to logical models or as alternatives to them
• Followed by logical and physical models 45
Object models• A data model is a logic organization of the
real world objects (entities), constraints on them, and the relationships among objects. – A database (DB) language is a concrete syntax
for an object (data) model. – A DB system implements that model.
46
Architectures• Building on content, context,
and users, some illustrate information architecture as an iceberg.
• Just like an iceberg, the majority of information architecture work is out of sight, "below the water."
• The work includes the creation of plans, controlled-vocabularies, and blueprints all before any user interfaces are created.
47
Above the water and below• Design, design, design
• Of the interfaces, architecture, of the social, cognitive, etc. elements of information ‘systems’
• Almost all are design to support two basic modes of investigation: induction and deduction… but enough of that for now
48
49
Information life-cycle
50
Visualization
51
Workflow Management
52
Discovery, Integration• Discovery (mostly about libraries!)
– Digital Fluencies– Federated Search– Folksonomies– Information Literacy– Intelligent Agents– Search Engines– Taxonomies
• Integration (mostly about application tools)
53
Discussion• About informatics?
• Definitions?
• Applications?
• Components?
• Theory (we’ll start on this soon)
54
Skills needed• Modeling, theory, architecture experience?
– Nah, we’ll cover that
• Literacy with computers and applications that can handle information– Yep
• Ability to access internet and retrieve/ acquire data– Oh yea
• Presentation of assignments– Ditto 55
What is expected• Attend class, complete assignments (esp.
reading)
• Participate
• Ask questions
• Work both individually and in a group
• Work constructively in group and class sessions
• Next classes Feb 1 and 8 …
56
Also on the web• Reading assignments – are intended to
prepare you for following lectures and may be considered materials for written assignments or project
• Assignments will be posted there– Individual– Group
• Stephanie is your first contact for assignment questions
57
What is next• Next week – use cases, etc.
• Week after – some guest presentations:– Bioinformatics– Astroinformatics– Geoinformatics (tbc)
• Reading
58