Transcript of Mining and Supporting Community Structures in Sensor Network Research
- Mining and supporting community structures in sensor network
research Alberto Pepe (University of California at Los Angeles)
Marko A. Rodriguez (Los Alamos National Laboratory) CENS Friday
Seminar | May 2, 2008
- Outline.
- Studying Collaboration at CENS
-
- Introduction to Data Practices
-
- Detection of Structural Communities
- Supporting Collaboration at CENS
-
- Introduction to the Semantic Web
-
- Semantic Networks and Graph Databases
-
- Analyzing Semantic Networks
Alberto Marko
- Data practices group.
- Background research questions:
-
- What context data is necessary to support interpretation during
re-use?
-
- How can we automate the capture of context data?
-
- How can we link scholarly and scientific data into meaningful
aggregations/chains?
-
- What are the social and academic settings that yield the
production of scientific and engineering data/knowledge?
- Current study.
- Question: how do collaboration communities differ from
socioacademic communities?
- Method : comparative analysis of coauthorship network community
structure and selected socioacademic community structures (e.g.
academic department, affiliation, country of origin, academic
position)
Rodriguez, M.A., Pepe, A., On the relationship between the
structural and socioacademic communities of a coauthorship network,
Journal of Informetrics, in press, 2008.
- Steps of the study.
- Gather bibliographic and socioacademic data.
- Generate coauthorship network.
- Determine structural communities in the coauthorship
network.
- Test for statistical independence between the structural and
socioacademic communities.
- Steps of the study.
- Gather bibliographic and socioacademic data.
- Generate coauthorship network.
- Determine structural communities in the coauthorship
network.
- Test for statistical independence between the structural and
socioacademic communities.
- Gather data.
-
- Collected from eScholarship repository
-
- 291 CENS and non-CENS authors
-
- Multi-institutional and interdisciplinary
-
- 560 manuscripts (379 conference papers, 163 journal
articles)
-
- Published over a ten year period (1998-2007)
-
- Gathered academic department, academic affiliation, country of
origin, and academic position
- Steps of the study.
- Gather bibliographic and socioacademic data.
- Generate coauthorship network.
- Determine structural communities in the coauthorship
network.
- Test for statistical independence between the structural and
socioacademic communities.
- Generate coauthorship network.
- author={Marko A. Rodriguez and Alberto Pepe },
- title={On the relationship },
- journal={Journal of Informetrics },
Alberto Marko coauthor
- CENS population statistics. Socioacademic communities
- Study model. Alberto Marko coauthor Affiliation: UCLA
Department: IS Origin: Italy Position: PhD Student Affiliation:
LANL Department: CS Origin: USA Position: PostDoc
- Steps of the study.
- Gather bibliographic and socioacademic data.
- Generate coauthorship network.
- Determine structural communities in the coauthorship
network.
- Test for statistical independence between the structural and
socioacademic communities.
- Structural communities.
- Structural communities are c liquish subgraphs composed by
groups of vertices that are highly connected between them, but
poorly connected to other vertices.
Girvan, M., & Newman, M. E. J., Community structure in social
and biological networks. Proceedings of the National Academy of
Sciences, 99, 7821, 2002.
- Community detection methods.
- walktrap (random walks) [2]
[1] Girvan, M., & Newman, M. E. J. Community structure in
social and biological networks, Proceedings of the National Academy
of Sciences, 99:7821, 2002. [2] Pons, P., & Latapy, M.,
Computing communities in large networks using random walks, Journal
of Graph Algorithms and Applications, 10:2, 2006. [3] Reichardt,
J., & Bornholdt, S, Statistical mechanics of community
detection, Physical Review E, 74 (016110), 2006. [4] Newman, M. E.
J., Finding community structure in networks using the eigenvectors
of matrices. Physical Review E, 74, 2006.
- Coauthorship network map. 27 structural detected CENS
communities (LEV).
- Coauthorship network statistics.
- Typical clustering coefficients:
- less-cliquish, sparse collaboration patterns
- CENS community fragmented in research agenda
- Newman, M. E. J.,The structure and function of complex
networks, SIAM Review, 45, 167, 2003.
- Steps of the study.
- Gather bibliographic and socioacademic data.
- Generate coauthorship network.
- Determine structural communities in the coauthorship
network.
- Test for statistical independence between the structural and
socioacademic communities.
- Chi square test.
- Chi square test determines whether two nominal/categorical
properties are statistically independent.
Alberto Marko coauthor Community: A Affiliation: UCLA Department:
IS Origin: Italy Position: PhD Student Community: B Affiliation:
LANL Department: CS Origin: USA Position: PostDoc
- Chi square analysis. N.B. p-value greater than 0.05 is
considered statistically independent leading eigenvector (LEV),
walktrap (WT), edge betweenness (EB), spinglass (SG).
- Anecdotal example.
- Anecdotal example.
- Remarks.
-
- Community structure is representative of department and
affiliation
-
- Academic position and country of origin are independent of the
structural community of the scholar.
-
- Policy recommendations to increase interdisciplinarity
-
- Extension to other coauthorship network and other socioacademic
(demographic) variables
-
- Useful to predict or infer topological/socioacademic
configuration when data is scarce
- Metadata reuse.
- Metadata can be used to support scholarly collaboration.
- Everything is metadata. Borgman Article2 JCDL Pepe Italy UCLA
CENS writtenBy writtenBy member country attended hasLab Article1
Sensor Networks cites topic researches contains member member
- Introduction to the Semantic Web.
- The World Wide Web is used to link documents, where documents
are given universal identifiers/locators called URIs (e.g.
URL).
-
- The structure is machine processable, but the
documents/elements are primarily human processable.
- The Semantic Web is used to link data, where data is given
universal identifiers/locators called URIs (e.g. URL).
-
- The structure and the data are both human and machine
processable.
T. Berners-Lee, J. Hendler. Publishing on the Semantic Web. Nature,
410(6832):10231024, April 2001.
- The Uniform Resource Identifier.
-
- Anything that can be identified.
- The Uniform Resource Identifier (URI):
-
-
- urn:uuid:550e8400-e29b-41d4-a716-446655440000
-
-
- http://www.lanl.gov#MarkoRodriguez
-
-
-
- prefix it to make it easier on the eyes --
lanl:MarkoRodriguez
-
- first identify it, then relate it!
W3C/IETF. URIs, URLs, and URNs: Clarifications and recommendations
1.0, September 2001.
- The undirected network.
- There is the undirected network of common knowledge.
-
- Sometimes called an undirected single-relational network.
-
- e.g. vertex i and vertex j are related.
- The semantic of the edge denotes the network type.
-
- e.g. friendship network, collaboration network, etc.
i j
- Example undirected network. Herbert Marko Aric Ed Zhiwu Alberto
Jen Johan Luda Stephan Whenzong
- The directed network.
- Then there is the directed network of common knowledge.
-
- Sometimes called a directed single-relational network.
-
- For example, vertex i is related to vertex j , but j is not
related to i .
i j
- Example directed network. Muskrat Bear Fish Fox Meerkat Lion
Human Wolf Deer Beetle Hyena
- The semantic network.
- Finally, there is the semantic network
-
- Sometimes called a directed multi-relational network.
-
- For example, vertex i is related to vertex j by the semantic s
, but j is not related to i by the semantic s .
i j s
- Example semantic network. SantaFe Marko NewMexico Ryan
California UnitedStates LANL livesIn worksWith cityOf
originallyFrom stateOf stateOf locatedIn hasLab Cells Atoms madeOf
madeOf researches Oregon southOf hasResident Arnold governerOf
northOf
- The technologies of the Semantic Web.
- Resource Description Framework (RDF): The foundation technology
of the Semantic Web. RDF is a distributed, semantic network data
model. In RDF, URIs and literals (e.g. ints, doubles, strings) are
related to one another in triples.
- RDF Schema (RDFS) and the Web Ontology Language (OWL): The
ontology is to the Semantic Web as the schema is to the relational
database.
-
- Anything of rdf:type lanl:Human can lanl:drive anything of
rdf:type lanl:Car .
- Triple-Store : The triple-store is to semantic networks what
the relational database is to the data table.
-
- a.k.a. semantic repository, graph database, RDF database.
- RDF and RDFS. lanl:marko lanl:cookie lanl:Human lanl:Food
lanl:isEating rdf:type rdf:type lanl:isEating rdfs:domain
rdfs:range ontology instance RDF is not a syntax. Its a data model.
Various syntaxes exist to encode RDF including RDF/XML, N-TRIPLE,
TRiX, N3, etc.
- RDF, RDFS, and OWL. lanl:fluffy lanl:marko lanl:Pet lanl:Human
lanl:hasOwner rdf:type rdf:type lanl:hasOwner rdfs:domain
rdfs:range ontology instance _:0123 rdfs:subClassOf owl:onProperty
1 owl:maxCardinality lanl:bob lanl:hasOwner owl:Restriction
rdf:type
- General-purpose modeling. next next next item item item item
key value key value entry entry el el el el el el List Map Set
- General-purpose computing. next value test PC item heap el
Program Virtual Machine false true next next stack el next item
next el Rodriguez, M.A., General-Purpose Computing on a Semantic
Network Substrate, in review, Journal of Web Semantics,
LA-UR-07-2885, April 2007.
- A web of data and process. 127.0.0.1 127.0.0.0 127.0.0.2
127.0.0.3
- The triple-store. SELECT ?a ?c WHERE { ?a type human ?a wrote
?b ?b type article ?c wrote ?b ?c type human ?a != ?c }
- There are two primary ways to distribute information on the
Semantic Web.
-
- 1.) publish a serialized RDF document on a web server.
-
- 2.) expose a public interface to an RDF triple-store.
- The triple store is to semantic networks what the relational
database is to data tables.
-
- Storing and querying triples in a triple store.
-
- SPARQLUpdate query language.
-
-
- like SQL, but for triple-stores.
INSERT ?a coauthor ?c WHERE { ?a type human ?a wrote ?b ?b type
article ?c wrote ?b ?c type human ?a != ?c } DELETE ?s ?p ?o WHERE
{ ?s ?p ?o }
- Triple-store vs. relational database. Triple-store Relational
Database SQL Interface SPARQL Interface SELECT ?x1 ?x2 WHERE { ?x1
lanl:hasFriend ?x2 . ?x2 lanl:worksFor ?x3 . ?x3
lanl:collaboratesWith ?x4 . ?x4 lanl:hasEmployee ?x1 . } SELECT
friendTable.personId1, friendTable.personId2 FROM personTable,
authorTable, articleTable, friendTable, hasEmployeeTable,
organizationTable, worksForTable, collaboratesWithTable WHERE
personTable.id = authorTable.personId AND personTable.id =
friendTable.personId1 AND friendTable.personId2 =
worksForTable.personId AND worksForTable.orgId =
collaboratesWithTable.orgId2 AND collaboratesWithTable.ordId2 =
personTable.id Give me all pairs of people that are friends, but
whom work for collaborating companies. Now!
- Triple-store and graph-analysis.
- Nearly all network analysis algorithms can be decomposed into a
graph traversal problem.
-
- Spreading activation and the energy diffusion.
-
- PageRank and the random walker.
-
- Geodesics and the breadth-depth search.
- Relational database is not optimized for graph traversal.
-
- Indexes are not appropriate for graph traversal.
-
- Every traversal is a table join.
- Triple-store is more optimized for graph analysis.
-
- While the triple-store is optimized for graph pattern matching,
it is more optimal for graph traversal than the relational
database.
-
- Hybrid statement/linked-list databases are good at both pattern
matching and traversal.
- Graph analysis can be used for ranking and recommendation.
Rodriguez, M.A., "A Multi-Relational Network to Support the
Scholarly Communication Process", International Journal of Public
Information Systems, volume 2007, issue 1, pages 13-29, ISSN:
1653-4360, LA-UR-06-2416, March 2007.
- Modeling the scholarly community.
- Agents : humans and groups.
- Artifacts : articles, books, journals, proceedings,
conferences, datasets, software, websites, [sensors,
deployments].
- Relationships : citations, authorship, publisher, contains,
attends, coauthor, members.
Rodriguez, M.A., Bollen, J., Van de Sompel, H., A Practical
Ontology for the Large-Scale Modeling of Scholarly Artifacts and
their Usage, 2007 ACM/IEEE Joint Conference on Digital Libraries,
pages 278-287, Vancouver, Canada, ACM/IEEE Computing,
doi:10.1145/1255175.1255229, LA-UR-07-0665, June 2007.
- Demonstration.
- Conclusion.
- Thank you for coming. Good life.