Decision support on distributed computing environment

Decision support on distributed computing environmentAngela Olasz & Binh Nguyen ThaiInstitute of Geodesy, Cartography and Remote Sensing (FOMI), Department of Geoinformation

Department of Cartography and Geoinformatics, Eotvos Lorand University

[email protected], [email protected]

AbstractIts trivial that data is crucial for the geospatial sector, focusing on data acquisition, processing, analysis and

visualization. There is an exponential increase in the amount of available geospatial information; however theirformat is not unified. The geospatial community including all segments, such as industries, governmental insti-tutes, foundations, authorities, or the academic world itself have to implement solutions for both data storage andprocessing. This requires new approaches for automatic extraction of information from large raw datasets. In thearchaeological cultural resource management (CRM) there is also a shift from setting up better data acquisitiontechniques to generate proper modelling methods or cross-validation between model results, besides the availableinput data formats for each type of model parameter layers are highly heterogeneous. In archaeological site mod-elling the distributed file system and the parallel processing environments solutions are also considered as a newtechnical implementation with new challenges to be solved. This environment such as Apache Hadoop offers theexecution of large data sets across clusters of computers using simple programming model MapReduce. IQmulusis a potential distributed environment, where users can achieve rapid response to their needs not only on their owndatasets but with others as well, along with distributed processing power for pre-processing and data analysis.

IntroductionIt as an unquestionable fact that data is of crucial importance for the geospatial sector, with emphasison the aspects of the data acquisition, distribution, processing, analysis and visualization. There is ex-ponential increase in the amount of geospatial information available in highly heterogeneous formats.Firstly, in airborne surveys consider for example the expansion of UAVs (Unmanned Aerial Vehicles)and the hundreds of different sorts of instruments that can be installed on them. Likewise, inno-vations for instance in mobile-mapping systems with different kinds of data acquisition techniquescombining LiDAR with digital imagery, allowing the users to jointly acquire laser point clouds andimagery and load the into GIS systems for applications such as inventories, maintenance and assetmanagement from their office. All these techniques record lots of data, yet in different data modelsand representations; therefore, resulting datasets require harmonization and integration before deriv-ing meaningful information from them. Hereinafter, we will discuss predictive modelling methodswith current technologies. After that a brief introduction on distributed systems and Apache Hadoopdistributed environment. Finally a short introduction on IQmulus project as a solution for distributeddecision support system.

Objectives

Recent solutionsFor input data all models uses environmental factors as a thematic layer or a derived geostatisticallyprocessed data such as: Elevation , Slope, Aspect, Aspect (north/south axis), Aspect (east/westaxis), Solar radiation, Distance to water, Vertical distance to water, Cost distance to water, Dis-tance to water confluences, Topographic variation (DEM,DSM), Soils (Proximity to pottery clay),Landuse, Distance to historic trading paths, Proximity to firewood, Proximity to transport routes(Wescott and Mehrer 2007).

Decisions can then be made by comparing objectives or, at the lowest level of the hierarchy, bycomparing attributes. Attributes are measurable quantities or qualities of a geographical entity or ofa relationship between geographical entities. An attribute is a concrete descriptive variable; an objec-tive is a more abstract variable with a specification of the relative desirability of that variable. Theattributes used for archaeological predictive modeling are usually a limited number of environmen-tal variables; they can be supplemented with expert knowledge on less measurable variables. In themodel created by Soonius and Ankum (1990) (see Figure 1.), the environmental variables used wereselected by means of a 2 test using the known archaeological sites (Wescott and Mehrer 2007).

Figure 1: The hierarchical structure of objectives. Source: Soonius and Ankum 1990

The widely used modelling methods are developing concerning the technical progress of data-mining techniques. The base of the modeling methods are the following: Bayesian statistics (weightsof evidence), Regression analysis, Multi-criteria analysis, Supervised classification/ unsupervisedclassification (or cluster analysis), Fuzzy logic, Artificial neural networks and recently Random for-est modelling.

In the modeling process between the input data (location of the archaeological sites) and the differ-ent types of raster data (digital elevation model, soil map), is being evaluated if they have statisticalcorrespondence - either negative or positive .

Distributed approachData may arrive from different sources independently in different formats resulting in a big hetero-geneous set of data. Relational data stores are not prepared for this kind of cases. However most ofthe time, we are curious, if there is any relation or bond between these data, hence we are looking forinterrelationship among data sets. Data is that they are stored permanently and seldomly modified.

In order to use or set up Hadoop, we must know the boundaries of Hadoop framework especiallyfor decision makers and developers. Lets start with its disadvantages: Its not a distributed Rela-tional Database. Security issues (no data caging). Its not easy to configure, maintain and use.Data management in Hadoop is different from RDBMS. Linux and Java knowledge is neededto understand and use Hadoop.

If some of the above mentioned criterias is a must in the system requirements then Hadoop is notthe right choice, otherwise you will get the following benefits by choosing Hadoop: designed forstoring and querying large data sets, archived data can be resumed and used at any time, real-timedata analysis support, batch processing, support ETL (extract-transform-load) process.

ResultsThe IQmulus research projects objective is to enable the use of large, heterogeneous geo-spatial datasets (Big Geospatial Data) for better decision making through a high-volume fusion and analysis in-formation management platform (http://www.iqmulus.eu). The project, funded by the 7th FrameworkProgramme of the European Union, is now after the first year from its four-year duration, . To en-sure user-driven development, it implements two test scenarios: Maritime Spatial Planning, and Landapplications for Rapid Response and Territorial Management. The contribution of large number ofusers from different geospatial segments, application areas, institutions and countries have alreadybeen achieved. The incorporated project partners represent numerous different facets of the geospa-tial world to ensure a value-creating process requiring collaboration among academia, authorities, na-tional research institutes and industry. Providing improvements for researchers, developers, academicstaff and students allow them to sharing knowledge, implement new solutions, methods and ideas inthe field of geoinformation through the development and the dissemination of a new platform. Fur-thermore, the research project offers new practices for end-users, following the continuous evolutionthe big geospatial data management requirements. As end users are among those providing geospatialsupport for decision makers, the whole decision making process can benefit from improvements andinvolvement of new practices.

Services consist of algorithms and workflows focusing on the following aspects: spatio-temporaldata fusion, feature extraction, classification and correlation, multivariate surface generation, changedetection and dynamics. In the following phases, system integration and testing will be done , and theIQmulus system will be deployed as a federation of these services based on the above mentioned twoscenarios. Actual deployment will be based on a distributed architecture, with platform-level servicesand also basic processing services invoked from clusters and conventional computing environments.Concerning workflow definition and execution, the concept is to develop a set of domain-specificlanguages that makes the definition of several types of spatial data processing and analysis tasks sim-pler. They can be compiled to run on parallel processing environments, such as GPGPUs, Cloudstacks, or MapReduce implementations such as Hadoop. The compiled algorithms can be packagedas declarative services to run on the respective environment they have been compiled for. The ApacheHadoop open source framework has already been chosen as a basis for architecture development. Thisframework offers the execution of large data sets across clusters of computers using simple program-ming models. It is supplying the next- generation cloud-based thinking and is designed to scale upfrom single servers to hundreds of workstations offering local computation and storage capacity. AHadoop Distributed File System (HDFS) has already be installed in cloud environment, providinghigh-throughput access to application data. The Hadoop MapReduce system for parallel process-ing of large data sets is currently under implementation. Thanks to this solution, users can achieverapid response to their needs used by their own datasets in cloud processing by exploit the systemcapacities.

Despite of all the features provided by Hadoop eco-system, large raster-based data storage and pro-cessing is not efficient on HDFS. In order to solve this problem further research and investigationis needed by applying segmentation and clustering algorithms on raster images based on HDFS andMapReduce method.

Final thoughtsIn conclusion working on the above mentioned open issues, IQmulus aims at the development andintegration of an infrastructure and platform that will support critical decision-making processing ingeospatial applications using Open source solutions at the first opportunity. Its obvious that IQmu-lus is not applicable for every decision support problems. Achievements and further developmentdirections with benchmark data will be presented in the paper through a concrete application examplearising from user requirements.

References[1] Ducke B. Archaeological predictive modelling intelligent network structures in m, doerr, a. sarris

(eds.). Proceedings of the 29th CAA conference held at Heraklion, 2002.

[2] Kamber M. Jiawei Han. Data mining: Concepts and techniques. Morgan Kaufman AcademicPress USA, pages 105124, 2001.

[3] T.A. Kohler and S.C. Parker. Predictive models for archaeological resource location. M.B. Schif-fer (eds.), Advances in Archaeological Method and Theory 9, page 397452, 1986.

[4] Cohn M. User stories applied: For agile software development. 2004.

[5] Wescott L. K. Mehrer W. M. Gis and archaeological site location modeling. Taylor and Francis,FL., pages 5889, 2007.

[6] Saman Heydari-Guran Michael M. Application of data-mining technologies to predict paleolithicsite locations in the zagros mountains of iran. Online proceedings of the 37th international con-ference, 2009.

[7] Deeben J. Hallewas D. P. and Maarleveld TH. J. Predictive modelling in archaeological heritagemanagement of the netherlands: the indicative map of archaeological values (2nd generation).pages 125, 2002.

[8] Verhagen P. Case studies in archaeological predictive modelling. Leiden Universitiy Press, pages1329, 2007.

[9] Verhagen P. Archaeological prediction and risk management. Leiden Universitiy Press, 2009.

[10] IQmulus Project. https://www.iqmulus.eu/, 2012.

[11] Crutchley S. The light fantastic: Using airborne lidar in archaeological survey. in Wagner W.,Szekely B., (eds.) ISPRS TC VII. Symposium-100 years ISPRS, 2010.

[12] Robertson S. and Robertson J. C. Mastering the requirements process 2nd. edition. 2006.

[13] D. Wheatley and M. Gillings. Spatial technology and archaeology: the archaeological applica-tions of gis. Taylor and Francis, FL., 2002.

Decision support on distributed computing environment

Documents

Transcript of Decision support on distributed computing environment