Semantic issues in Land Use and Land Cover Studies ...
Transcript of Semantic issues in Land Use and Land Cover Studies ...
Semantic issues in Land Use and Land Cover Studies – Foundations, Application and
Future Directions
Ola Ahlqvist The Ohio State University
After decades of accomplishments and faced with new technological and scientific insights, the
field of land use and land cover (LULC) is seemingly at a crossroads for effective and open uses
of data. The use of categorical LULC data in computer-based land analysis poses a significant
challenge because it usually leads to a binary treatment of the information in subsequent
analysis. Still, LULC data offers a rich and generic resource and it is often used for purposes
other than just finding out what the land cover is at a location; examples include climate
modeling, monitoring of biodiversity, and simulation of urban expansion. As a result, the
objectives for LULC semantics have moved from those of increasingly accurate technological
representations of spatially explicit change to the representation of integrated roles LULC play
within a broader environmental context. Many of these uses call for deeper understanding of the
categories in order for the data to be re-purposed. As more and more land cover data sets have
been developed there is also an increased recognition that variation in nomenclature and class
definitions poses significant hurdles to effective and synergistic use of LULC resources.
A book on this subject would provide a platform for scholars to reassess the field, affirm
successful approaches, and point to future possibilities in advancing LULC semantics. The
proposed book will consist of three parts. The first section will be a summary and analysis of
land-use/land-cover semantics and explanation of current practices. The objective of this section
is to review aspects of data modeling where designers and practitioners should be aware of
providing clear and consistent semantic details. The second section will consist of current
approaches to manage LULC semantics. This section will serve as a resource for LULC data
producers, managers and users to adopt current best practices in their own work with land cover
and land use information. The third and final section will consist of a forward-looking collection
of ongoing research across the entire spectrum of LULC semantics solicited from recent
conferences and workshops.
The topics to be covered in sections one and two will consist of both conceptual and
technological semantic practices, including but not limited to:
Categorization; the definition of criteria for sets and their members. Classes may be named in
advance top-down, such as those based on scientific literature, or may be formed bottom-up by
aggregating observations on the ground.
Metadata; documentation for data reuse. Data-sharing services rely on metadata specifications
for searching portals for public use.
Ontology logic restrictions; logical axioms applied to data to guide the way those data engage
and execute in information processing.
Reasoning from text sources; content analysis of texts to elicit semantics and identify reasoning
principles.
Explicit semantic specifications, ontologies, vocabularies and design patterns.
Use cases from applying semantics in searches, LULC classification, spatial analysis and
visualization
The content of the forward-looking last section is harder to predict as it will rely on contributions
of cutting-edge work drawn from upcoming research conferences. Nevertheless it is reasonable
to expect that there will be contributions that treat issues of Big Data, Open Science, knowledge
infrastructures and their organization, integration of bottom-up and top-down approaches to
LULC semantics, collaboration frameworks, and interdisciplinary challenges such as
EarthCube.
Enhanced Semantics for Gazetteers
Kate Beard
School of Computing and Information Science
University of Maine [email protected]
This proposed presentation would discuss the development and architecture for a geo-
semantically enhanced gazetteer. Place names provide easily expressible ways for people to
engage in geospatial searches but they have computational limitations. Digital gazetteers provide
a means to expand and make place name search more effective and robust. Current digital
gazetteers generally take the form of triples of place names (N), geographic footprints (F) and
feature types (T), and impose no constraints on the number of names, footprints, or feature types
that can be associated with a feature (Hill 2006). They support query expansion of official names
to name variants and the translation of place names to geographic coordinate foot prints to enable
spatial searches (Goodchild and Hill 2008). A functionality not well addressed by current
gazetteers is a capability for expanding place name searches to geographically related place
names (e.g. connecting Atlanta, GA to its named suburbs or the Potomac River to its named
tributaries). Humans are able to make geo-semantic connections between place names, for
example that Queens, Central Park, and Rockefeller Center, are parts of New York City, but
computational systems, without such knowledge, are unable to make such geo-semantic
connections.
Suggestions have been made to add relationships to gazetteers (Hill 2006), but to date gazetteers
have remained largely flat structures with named features as isolated unconnected instances.
Some gazetteers have incorporated one or a few modelled relationships such as the containment
relationship in the Getty Thesaurus which links named features to administrative units and
GeoNames which includes parent, neighbour and nearby relationships between places. Partial
solutions can also be obtained by deriving topological relationships between feature footprints
but these relationships are not semantically based and are limited by the dimension and
configuration of gazetteer footprints which are predominantly points.
While topological relationships between feature footprints can be derived, these do not capture
semantically rich feature to feature or feature-part relationships such as the relationship of
tributary between streams. This presentation will describe a semantically enhanced gazetteer
model developed from two ontologies; a gazetteer ontology and a geographic feature domain
ontology. The two ontologies align with Coucelis’s (2010) distinction between ontologies
focused on information constructs and those focused on real world entities. The gazetteer
ontology models features as information constructs and formalizes relationships (in an
information space) between a feature, its location representations, and feature types. The
geographic feature domain ontology specifies a set of canonical geographic feature classes and
models feature to feature and feature part relationships between the canonical classes.
The gazetteer instantiates classes and relationships from both ontologies with the result being
instantiated relationships between named feature instances that can be queried. The approach is
demonstrated with named hydrologic features from the National Hydrographic Dataset (NHD).
The gazetteer and supporting ontologies were developed with semantic web technologies: RDF
(Resource Description Framework), RDFS (RDF Schema), OWL (Web Ontology Language) and
SPARQL. These technologies have a number of limitations for working with and representing
geospatial semantics which the presentation will outline.
References
Couclelis, H. (2010). Ontologies of geographic information. International Journal of
Geographical Information Science(24):12, 1785-1809.
Goodchild, M.F. and L.L. Hill, 2008. Introduction to digital gazetteer research.
International Journal of Geographical Information Science. 22(10): 1039-1044.
Hill, L.L. 2006. Georeferencing: The Geographic Associations of Information (Digital Libraries
and Electronic Publishing). Cambridge MA: MIT Press.
Attendee: Dr. Kai Cao, World History Center, University of Pittsburgh,
Title: World Historical Ontology Research in CHIA Project
Philosopher George Santayana warned (in 1901) that, “Those who cannot remember the
past are condemned to repeat it.” Less known, but likely as important, is his challenge
that “a man's feet should be planted in his country, but his eyes should survey the world.”
The Collaborative for Historical Information and Analysis (CHIA) exists to accelerate
and empower research surveying the global human record. As a public system, it will be
used synergistically by policy-makers adopting decisions, scholars identifying global
processes, and educators developing student skills in global analysis. It will ingest
comprehensive, multidisciplinary data and provide tools to uncover the patterns of social
interaction and the processes driving these interactions. Its research and analysis will
integrate the approaches of social, health, and environmental sciences with those of
information sciences. The result, an improved understanding of past patterns in society at
all levels, is fundamental to assessing future challenges and predicting the success of
proposed solutions.
In roughly five years from now on, CHIA intends to develop a strong and expanding
research team which will unleash a rapid inflow of historical data to be documented and
archived. CHIA will develop an overall ontology for world-historical documentation and
analysis, including an expanding system of metadata to describe data and assist in their
integration and aggregation. CHIA will conduct interactive analysis at regional and
global levels of variables in social sciences, health, and climate; and develop systems of
visualization that will assist in analysis and provide feedback for collection and definition
of data. Here are the primary goals:
Global Collaboration: collaborative relations to sustain and expand the
creation of a world-historical data resource
Crowd-sourcing applications: to facilitate data ingest and file merging
CHIA Archive: a distributed archive with datasets held at five levels of
integration into the overall CHIA system
World-historical gazetteer: a comprehensive historical gazetteer, and a
spatial search engine to accompany it
Temporal search engine, with extended temporal metadata
Ontology: a developing CHIA ontology, providing topical classification
of data, as well as space, time, and the tasks and applications of CHIA.
Digital Stewardship: following best practices in housing and display of
datasets
Data: energetic collection of historical data worldwide
Theory: engage debate on linkage of social-science theories to each other
Undoubtedly, the overall ontology—the overarching classification system—of the global
archive would be necessary and meaningful. Various aspects of the ontology would be
established at different stages of the project. Initially it includes what we here call
metadata—the description of values and variables in each data set and the recording of
the sources and compilers of data. The incorporation of such existing detailed
classifications means that data- ingest work can start before the high level framework –
the overall project ontology – is finalized. Later stages of ontology include more
comprehensive categorization of types of data, definitions and classification for the
linkage and aggregation of datasets, and definitions for the analysis and visualization of
data.
As one of the primary researchers in the NSF funded CHIA project, I believe that through
this workshop, the world historical ontology research, even the study of world historical
gazetteers could benefit a lot from all the other successful applications and
implementations of semantics in geospatial architectures.
Hopefully I could be one of the attendees of this workshop (with no presentation).
Many thanks.
Best regards,
Postdoctoral Research Associate, University of Pittsburgh
Visiting Research Fellow, Harvard University
Janet Fredericks Woods Hole Oceanographic Institution [email protected] Enabling Semantic Mediation in OGC SWE The OGC has developed core standards to provide a framework that enables machine-to-machine harvesting of observational geospatial data and metadata. What is under the hood doesn’t matter – data can be stored in native data systems. The services upon an HTML request return the OGC-adopted encodings that encapsulate the information, enabling machine harvesting of data selected through geospatial, temporal queries, as well as other specifications, depending on the implementation. The use of the OGC standards supports brokering activities that can provide translations across standards. But, use of the adopted service standards in a collaborative environment requires an implementation designed to enable semantic mediation. OGC Sensor Observation Service (SOS) offers a standards-based framework in which to describe observational provenance (SensorML), as well as observational data (O&M). The use of OGC SOS has been adopted in real-time ocean observing systems, such as the NOAA IOOS and the associated regional associations (NFRA). Through participation in the EarthCube Brokering Team Hackathons, a demonstration SOS delivering oceanic wave data was tested on three brokering sites. Through the NOAA ERDDAP broker, the WHOI/Q2O (Quality to OGC) SOS implementation (q2o.whoi.edu/node/129) was translated into ISO metadata with NetCDF or TSV requested output. Brokering services enable users to choose to work in frameworks beyond the primary data offering without installation of or development of translation tools. Through the ESRI Geoportal and the Data Access Broker, catalogue services were automatically populated with information relating to geospatial and temporal coverage along with basic metadata, enabling data discovery and access. The Q2O SOS demonstration was developed, with funding from NOAA, to enable dynamic quality assessment. It is content-rich and delivers information about how the observations came to be, as well as information about quality tests and associated real-time results. The implementation also demonstrates the ability to enable the development of ontologies by integrating links within the SOS to URLs that link to SKOS-encoded terms (Figure 1). These ontologies can be utilized in collaborative environments where terms with the same or similar meanings may have different names but must have associations. For example (Figure 2), one provider’s QC test result is called pass and another’s is called _1 and a data aggregator can map each to the same meaning. Also, one can use the mapped terms to have code values encoded. For example, one’s ass may have a value of one (1) and another provider may use a value of zero (0) to represent a passed qc-test. Through inclusion of links to encoded terms, these values can be mapped to have the same meaning in data integration and filtering data offerings. The use of standards in geospatial data access is important. But without the inclusion of references to registered terms in a semantics framework, ontologies cannot be developed, making automated data assessment and integration nearly impossible.
Figure 1
Figure 2
Damian Gessler Semantic Web Architect The iPlant Collaborative University of Arizona Tucson, AZ 87521 [email protected] Indication: Can attend the meeting and discuss iPlant’s Semantic Web Platform at the ‘Workshop on Semantics in Geospatial Architectures: Applications and Implementation.’ The iPlant Collaborative Semantic Web Platform Geospatial semantics has huge promise. Yet implementing semantics in large infrastructures is challenging. Early implementers face significant obstacles in migrating from research-grade proof-of-concept applications to production-grade, value-added platforms. The well-informed perceived promise of semantics may cloud inconvenient “details” that significantly hinder operational maturity. Yet the promise is real, and the complexity of today’s earth science challenges imply that computational semantics has an important role. To get from promise to realization, we need a sober understanding of the challenges and solutions. Cyberinfrastructure semantics is challenging because there does not exist a generally adopted technology stack that integrates the various technology layers and social norms into a readily accessible platform for the end-user. Thus semantic technologies—from RDF (Resource Description Framework) and OWL (Web Ontology Language) to pseudo-semantic ontologies such as OBO (Open Biological and Biomedical Ontologies), Darwin Core, schema.org, OGC (Open Geospatial Consortium), and LOD (Linked Open Data)—exist in a disjointed ecosystem of ad hoc installations and social contracts. Indeed, the implied semantics inherent in making any individual system operate often outweigh the explicit semantics that are needed for computational and integrative maturity. The iPlant Collaborative—an NSF-funded large cyberinfrastructure for the plant sciences—approaches this challenge with a three-tier architecture. At the Foundational layer is a tight collaboration with NSF XSEDE resources (Extreme Science and Engineering Discovery Environment; https://www.xsede.org). This delivers world-class high performance computing clusters (“big iron”) at the peta-FLOPS and peta-byte scale [O(1015) floating point operations per second and storage capacity respectively]. The next tier is an Enterprise layer, consisting of a Web-accessible Discovery Environment and virtual machine farm. The former delivers a breath of applications (approximately 300 bioinformatic applications accessible in a virtual desktop interface), while the later delivers a depth (scientists and labs can configure customized virtual machines with specific software and workflows). The final tier is the semantic layer of iPlant’s production-grade semantic platform using SSWAP: Simple Semantic Web Architecture and Protocol. SSWAP (http://sswap.info) uses open, Just-In-Time ontologies and transaction-time OWL reasoning to bridge Foundational resources with third-party Web sites and distributed scientific
offerings. SSWAP is a light-weight OWL protocol that allows any Web resource to describe its offering—its mapping of some input to some output—in simple, first-order description logic. iPlant runs a semantic Discovery Server that allows users to “discover” these resources, send data, invoke and execute services, and daisy-chain services into semantic pipelines. Visitors to third-party Web sites can click a button and have requests sent to iPlant for real-time semantic service discovery and invocation. Actual Web service execution is performed at the separate, distributed semantic Web service sites. Thus iPlant’s Semantic Web Platform is a semantic broker performing both vertical and horizontal semantic integration: it is not simply a feeder of data into iPlant, but an integrator of third-party and/or iPlant semantic resources across the Web. For the geospatial context, iPlant collaborators at TreeGenes have implemented a Web resource called CartograTree for tree scientists. Scientists can visually select tree samples as displayed by their lat/long coordinates and then send data into just-in-time TreeGenes and iPlant semantic pipelines. A worked example is at http://sswap.info/example.
Semantic Portals for Semantic Spatial Data InfrastructuresFrancis HarveyUniversity of [email protected]
This position paper suggests that a challenge in developing semantic interpretability for next generation spatial data infrastructures lies in the creation of portal-level semantics. What this means is that architectures have to conceptually blend the work on semantic interoperability with portal designs that support domain needs and requirements. Why? Portal support specific applications or ranges of applications. Semantic interoperability has however focused on data set level documentation and operationalization. Merging these two approaches leads to architectures that support domain semantics through terminological and interface bridges that connect to robust data set level description languages. This approach seems to offer a helpful way to resolve the current arms-race in portal building and harness the strengths of semantic interoperability solutions. The idea comes as the University of Minnesota is beginning to develop an interoperable data management system for geospatial data. A large research university, with over 52,000 students and funded research projects totaling over $749 million in 2012, UMN researchers produce practically every conceivable kind of spatial data. Spatial and temporal resolutions, object footprints, semantics, etc vary enormously. Second, disciplinary and legal requirements lead to a broad range of data practices across the sciences. All attempts to the contrary, it seems likely that a large number, if not finally, unknowable number, of portals to facilitate researcher access to data resources can develop. Getting ahead of this development and providing a suite of portals that support researcher needs to ingest, edit, display, search and visualize semantic data in a user-friendly and meaningful way seems a wise strategy in an era of diminishing resources. Indeed, how can an information infrastructure support the diversity of a research university? How can it do this especially when the nature of research encourages a multiplicity of management approaches and organization of research data? Lessons from experiences with spatial data infrastructures suggest that multiple means to participate and balance researcher control and institutional management offer the best concepts for architectures that support data sharing without encumbering researchers with bureaucracy. The challenge lies in creating user-friendly interfaces to display, browse and query data while the underlying Semantic Web technology remains extremely obtuse for users unfamiliar with the RDF triple format. Instead of attempting to create a single universal geospatial data portal for the university, we are exploring the concept of supporting multiple portals that gain access to data through a linked open data design to metadata and data held by researchers or archived by the institution. This holds similarities with Semantic Web Portals proposed by a number of researchers including Ding et al 2010. We are currently exploring this idea with participants in the U-Spatial project through a user-orientated design process. The concept described in this brief paper suggests the initial idea; it will certainly be altered through the design process before implementation begins.
Dave Kolas
Raytheon BBN Technologies
I would like to attend the workshop in order to understand and contribute to the current state of
the art in geospatial Semantic Web systems. While I do not have a particular system of interest
to present at this time, I have experience both as the initial developer and current maintainer of
the spatial indexing support in BBN’s triple store Parliament and as the co-chair of the OGC
GeoSPARQL working group.
While I could certainly present introductory GeoSPARQL information, or information about
using Parliament for GeoSPARQL, it might be more interesting at this point to facilitate a
discussion about implementation issues in these types of systems. Most presentations about
semantic integration often focus on the high-level structure and prototype systems. If we had a
group discussion about all of the difficult parts of actual implementation, it is possible that we
could push progress forward significantly. The result might be the community getting closer to
more deployed systems.
Enabling Semantic Search in Geospatial Metadata Catalogue to Support Polar Sciences
Wenwen Li1 and Vidit Bhatia2
1GeoDa Center for Geospatial Analysis and Computation, School of Geographical Sciences and Urban Planning, Arizona State University, Tempe, AZ 85287
2Department of Computer Science, Arizona State University, Tempe, AZ 85287 {wenwen, vidit.bhatia}@asu.edu
Studies of polar regions have become increasingly important in recently years because (1) increasing interest in mining and natural resource exploration; (2) both poles are sensitive to human activities and global, environmental, and climate changes; and (3) polar regions are key drivers of the Earth climate. In May 2013, the White House released “President’s National Strategy for the Arctic Region” and identified “increasing understanding of the Arctic through scientific research and traditional knowledge” and “making decisions using the best available information” as the overarching stewardship objectives to achieve in the coming decade. Fortunately, we are entering the era of big data. Pervasive technology for Earth observation such as sensor network, high-resolution telescope and the Polar satellites enables the retrieval of large amount of polar data to accelerate the scientific discovery process. Several data centers, including ACADIS (https://www.aoncadis.org), NSIDC (http://nsidc.org/data/search/data-search.html) and the Antarctic and Southern Ocean Data Portal (http://www.marine-geo.org/portals/antarctic/) have established to share these available resources. A metadata catalogue is usually provided by these portals to support the discovery of data of interest through a keyword-based search interface. Currently, Lucene technique is widely used in these portals and this text-based search approach hinders the retrieval of semantically related dataset, the content of which is described using different keyword set from a user’s query. To enable an intelligent search and a smart connection between an end user and his most needed dataset, semantic-based search comes into play. This search strategy can be categorized into two classes: ontology-based semantic expansion and smart search based on knowledge mining. The ontology-based approach can be considered as a top-down approach: possible semantic linkages are populated by domain experts and encoded in a machine-understandable format, and then a user’s query is expanded by traversing these predefined semantic linkages/relationships. This approach assumes that the semantic relationships in the data can well be captured in advance. However, different people tend to have different perspective on how the ontology should be established and it is extremely difficult to build a complete knowledge base to serve various search purposes. To overcome this issue, in this work, we aim at employing a bottom-up approach which relies on mining the dataset itself to discover the latent semantic relationships between keywords/terms in the metadata corpus. A Latent semantic indexing technique combined with Paice Stemming algorithm is performed on top of the Lucene indexing to further improve
the search results. A new ranking algorithm based on a revised cosine similarity and two-tier ranking algorithm ensures the high precision of the top search results. In addition, we integrated this approach into a popular metadata catalogue- Geonetwork (http://geonetwork-opensource.org/) to broadly share this semantic search capability with peers. We expect this work to greatly enhance the capability of data search in existing polar data portals and geospatial data discovery at large.
Developing Semantics Rules using Evolutionary Computation for Information Extraction from Remotely Sensed Imagery
Henrique Momm
Assistant Professor Department of Geosciences
Middle Tennessee State University [email protected]
The difference between low-level information extraction techniques using traditional pixel-based classification methods and the high-level information extracted by a human analyst is often referred to as “semantic gap”. Human analysts use a complex combination of different image cues such as color (spectral information), image texture, object geometry (geometry of image regions), and context (relationship between image regions). Because human analysis of large areas and often multiple periods of time (multiple images) is costly and time consuming, scientists have recognized the importance of developing more sophisticated semi-automated or automated methods to convert large quantities of image into actionable information. The challenge resides in multifaceted problems where the relationship between image’s regions is too complex to be defined by explicit programming, and therefore stochastic algorithms are being investigated as a plausible alternative. Evolutionary computation algorithms were integrated with standard image processing and unsupervised clustering algorithms to derive individual image cues in a “learn-from-examples” mode. Genetic programming was selected as the evolutionary engine because these methods represent candidate solutions as mathematical equations (human readable), do not require assumptions about target data statistics, and their ability to develop robust models even when the parameters relationship is not fully understood. The principal objective is to bridge the semantic gap by sub-dividing the overall information extraction task into sequential steps. In the first steps, the evolutionary framework derives candidate solutions based on image spectral and texture information image cues through optimize search for spectral transformations and image texture operators (or sequence of texture operators) that maximizes the influence of the feature of interest and minimizes the influence of the remaining image background. Based on the findings of the initial steps, the evolutionary framework is used to evolve solutions to identify features of interest based on geometric properties of image regions. Future research opportunities are also discussed herein. Potential developments should include the addition of more steps to investigate ontology between features (relationship between image regions). The efficient optimized learn-from-examples schema of evolutionary computation algorithms could be utilized to generate the most appropriate ontology representation to replicate our spatial relationship perception ability. Enhancements should also be made to combine all individual image cues solutions into a single decision-making procedure, just like human analysts do. Finally, a database of semantics rules should be implemented and shared with the scientific community allowing for collaborative contributions and enhancements. I have interest in presenting.
Spatial Semantics enhanced Geoscience Interoperability, Analytics, and Applications Krishnaprasad Thirunarayan and Amit Sheth
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, OH-45435 [email protected], [email protected]
We present our research ideas for developing cyberinfrastructure for Geoscience
applications developed in the context of the EarthCube initiative, and our NSF-sponsored work on incorporating spatial-temporal-thematic semantics for enhanced querying and feature extraction from sensor data streams. (1) Semantics-empowered Cyberinfrastructure for Geoscience applications Rapidly maturing semantic technologies, based in part on Semantic Web standards, have the potential to increase opportunities for interdisciplinary research by providing support and incentives for sharing, publishing, accessing and discovering heterogeneous data. Our thesis is that associating machine-processable lightweight semantics with the long tail of science data can overcome challenges associated with data discovery, integration and interoperability caused by data heterogeneity. In order to demonstrate this, we propose to develop cyberinfrastructure (CI) utilizing lightweight semantic capabilities to serve individual researchers. Specifically, the focus is on ease of use, low upfront cost, and shallow semantics that appeals to, and is most likely to be used by the broad community of geoscientists. The choice of using controlled vocabularies and lightweight ontologies, as compared with using formal ontologies in OWL, reduces complexities and training efforts, enabling wider and faster adoption by scientists not skilled in computer science techniques. We propose to use existing, community-ratified and enhanced ontologies that scientists can employ with minimal training to easily annotate (tag) their data, publish it, and discover relevant data in support of scientific discoveries. Coarse-grained annotations can facilitate semantic search, while fine-grained annotations and extraction can be used to create Linked Open Datasets (LOD). Using LOD, that is increasingly being adopted by open government and open science initiatives, data can be translated to a form that makes it readily available, reusable, and amenable to automatic processing, while supporting conceptual richness of data representation. Our research is aligned with National Science Foundation’s EarthCube initiative. (2) Expressive search and integration using Geospatial information
We have developed expressive extensions to RDF and SPARQL that associate spatio-temporal information with triples via annotations and employ rich operators to support inferencing1. This framework extended using geospatial knowledge to support spatial semantics can support interoperability and complex analysis.2
In the context of Semantic Sensor Web3, to process multimodal sensor data streams, we have used spatio-temporal context in Semantic Sensor Observation Service (SemSOS) to aggregate and combine primitive weather sensor data to obtain weather features, and exploit Geonames portion of LOD to map place names to GPS coordinates, to locate relevant sensors and to provide easy-to-use and natural query interfaces4. 1 http://knoesis.org/research/semweb/projects/stt/ 2 http://knoesis.org/library/resource.php?id=903 3 http://knoesis.wright.edu/research/semsci/application_domain/sem_sensor/ 4 http://www.slideshare.net/patniharshal/real-‐time-‐semantic-‐analysis-‐of-‐streaming-‐sensor-‐data
The Need to Determine Ontology System Requirements for Online Graduate Students
Dalia Varanka
Johns Hopkins University
The Johns Hopkins (JHU) Online Master of Science in Geographic Information Systems (GIS)
offers an entirely online course in Geospatial Ontology and Semantics (. Few courses on this
subject have been taught in addition to the James Madison University included in the INTEROP
project. The initial architectural components for this program have been started, but face design
and implementation challenges in order to more fully educate new ontologists. This brief paper
discusses the state of semantic technology architecture intended to support the geospatial
ontology education process. Though the announced INTEROP workshop objectives specifically
mention spatial data infrastructure (SDI) requirements, these SDI are intended to be used by
educational institutions as one group of stakeholders. Thus, the identification of educational
objectives is useful to government designers.
The course aims to prepare JHU program graduates with skill sets serving a role such as ‘Data
Analyst.’ The term Data Analyst is intended to mean a thorough understanding of the system,
especially data and software, to allow for the expert solution of geospatial information
manipulation and knowledge acquisition tasks. Because students are temporary ‘residents’ of a
broader academic program, extensive architectural design is not left for them to design and
support, though the system must be easily navigable. Presently (2013), students download
Protégé (BMIR 2013) to their personal computers and are provided with a remote connection to
a SPARQL endpoint, supporting the GeoSPARQL standard, at the U.S. Geological Survey. A
cloud-based (Geo)SPARQL endpoint implementation is being built on JHU networks, but is not
yet operating for semantic technology architecture.
In addition to achieving course objectives, a primary aim of online education is to involve the
student at a high level of interaction, both technically and through discussion. Thus
opportunities are needed for testing design ideas and experiencing immediate feedback. Online
educational systems such as Blackboard have advanced capabilities for technical exchanges, but
qualitative methods are less well researched; such exchanges are crucial in semantic architecture,
which leverages a range of philosophical, linguistic, geographical, and social skills. These skills
are presently the domain of discrete experts, contributing through various ontology media such
as upper ontologies, computational linguistics, logic, social media, and information models.
Educational software designs are required for the cohesive integration of these approaches for
the student. Lastly, these approaches must be shown to integrate with other GIS courses.
An additional objective for the JHU course is to prepare students for potential doctoral-level
research, whether on their own or as part of a research team. Advanced applied ontology skills
would require ontology modeling and linked-data collaboration, sometimes at the international
level. Initially, JHU students have access to Internet-based semantic technology projects.
Unfortunately, solutions for ontology modeling are very hard to find.
The expansion of semantic technology for science benefits will require investment in university
information science curricula. The extensively used Esri ArcGIS software does not lend itself to
semantic technology, though conversion programs between Esri data formats and RDF are
available. Commercial software with RDF capabilities will probably only be purchased if
semantic technology shows itself to be better more popular than it is at present. Open source
digital solutions present the most promising area of semantic technology development for
university courses.
REFERENCES
Albrecht, J., Derman, B., and Ramasubramanian, L. 2008. Geo-ontology Tools: the Missing
Link. Transactions in GIS, 12 (4): 409-424.
Stanford Center for Biomedical Informatics Research (BMIR). Protégé. Stanford University
School of Medicine. Accessed September 19, 2013 at: http://protege.stanford.edu/.
James W. Wilson, PhD James Madison University
I would like to attend and can present at the Workshop on Semantics in Geospatial Architectures:
Applications and Implementation. I believe that semantics will play an ever increasing role in the
development of SDIs. Finding, visualizing, and understanding geospatial data and analytical results is
becoming ever more important and ever more difficult to accomplish given the rise in the amount of
heterogeneous data and systems. In very controlled environments (e.g. within a government
organization), standardized ways of storing and encoding data can be accomplished, but not so in the
open environment of the Internet. The development of formal ontologies in different domain areas
can aid in bridging between knowledge areas, and semantic technologies can provide a framework
for interacting with diverse SDI components.
As part of the NSF INTEROP team, I have led the effort to develop an Internet-based GUI to create
GeoSPARQL queries and visualize the results in a web-based map. The tool, GeoQuery, has been
programed by Dr. Ralph Grove of James Madison University (who is not able to attend the
workshop), and is loosely based on a tool developed by the Center for Excellence in Geographic
Information Science at the USGS. At present, GeoQuery queries a Parliament SPARQL endpoint at
JMU that contains data for the Shenandoah River watershed; however, the tool can be configured by
the user to work on any SPARQL endpoint. I can demonstrate this tool at the workshop if an Internet
connection is available at the workshop location.
In the current testing environment, GeoQuery is used to search and visualize spatial data that is
stored in a triplestore. The tool could also be used to query and visualize metadata that represents
geospatial data, and Dr. Grove and myself, along with Dr. Steve Whitmeyer (a geologist at JMU)
have had initial conversations with the GeoPortal team at ESRI to look at ways to enhance the
semantic abilities of their open source Geoportal software. Three areas that have been identified for
possible future discussions are 1) using GeoQuery to query their rudimentary SPARQL endpoint, 2)
invoking an ontology in the search, and 3) invoking an ontology to add meaning to the query results.
Another area that has not been discussed yet is the possibility of using GeoQuery to access data not
stored in a triplestore. Since GeoQuery is using a standards-based mapping interface (OpenLayers), it
could be modified to include querying and visualizing data from standard web-based mapping
applications (e.g. OGC WMS & WFS, ESRI ArcGIS rest services) as well.