IanScott_MSc_Dissertation

67
City University Distributing Aggregated Census Data to higher geographic resolutions using GIS datasets Ian David Scott January 2009 Submitted in partial fulfilment of the requirements for the degree of MSc in Geographic Information Systems

Transcript of IanScott_MSc_Dissertation

City University

Distributing Aggregated Census Data to higher geographic resolutions using GIS datasets

Ian David Scott

January 2009

Submitted in partial fulfilment of the requirements for the degree of MSc in Geographic Information Systems

Page 2 of 67

Abstract Many companies and Governments collect census style data and use it for various forms of analyses. When the analyses results are released to clients or the public, the companies and Governments will aggregate the census data to protect their own investments and the privacy of the population under census. Disaggregating this data maintains the anonymity of the data while allowing more detailed and spatially accurate analyses by the clients and public. This dissertation looks at methods for disaggregating census data and insurance risk figures on a worldwide scale using freely available GIS datasets. This dissertation also provides an automatic Disaggregation Tool to run inside ESRI ArcGIS (version 9.2 and above with the Spatial Analyst extension) along with instructions on how to customise it for different locations and different sorts of insurance risk types. The Disaggregation Tool uses features from a DEM dataset, a Slope dataset, a Land Use/Cover dataset and an Impervious Surface Areas dataset to disaggregate residential, commercial, agricultural and industrial insurance risks based on shapefile boundaries. The resulting dataset is provided in an ASCII file in a user defined gridded format and resolution. The Disaggregation Tool has been configured for commercial insurance risks. The Disaggregation Tool has templates for four other insurance risk categories: apartments, agricultural, industrial and single family homes. Keywords: Disaggregation, Distribution, ArcGIS, Remote Sensing, census, GIS, insurance risk

Page 3 of 67

Dedication

To my little Sister Moira I am honoured to be your brother

You touched so many lives and shone so brightly I will miss you so much

Page 4 of 67

Table of Contents Title Page ......................................................................................................................1 Abstract.........................................................................................................................2

Keywords ..................................................................................................................2 Dedication .....................................................................................................................3 Table of Contents .........................................................................................................4 Table of Diagrams, Graphs and Pictures ..................................................................5 Acknowledgements ......................................................................................................6 Introduction..................................................................................................................7 Literature Review ........................................................................................................8 Source Data...................................................................................................................9

USGS Digital Elevation Model (DEM) ..................................................................9 Derived Slope Dataset............................................................................................10 USGS Land Use/Land Cover ................................................................................11 NOAA/NGDC Impervious Surface Areas (ISA) .................................................12 Optional Miscellaneous Data Layer .....................................................................13

A Guide to Disaggregating Data ...............................................................................14 Methods of Disaggregation........................................................................................17 Detailed Analysis of a Grid Cell ...............................................................................18 Methodology for Automatic Disaggregator.............................................................22 Obtaining the Most Accurate Distribution of Risks ...............................................24 Determining Remap Values ......................................................................................26 Analysing Results .......................................................................................................31 The Disaggregator: Program Operation .................................................................37

Step 1 .......................................................................................................................38 Step 2 .......................................................................................................................39 Step 3 .......................................................................................................................40 Step 4 .......................................................................................................................41 Program Finished...................................................................................................42 Default Program Assumptions..............................................................................45

The Disaggregator: Program Order of Operation .................................................46 Recommendations for Enhancement .......................................................................51 References ...................................................................................................................52 Appendix .....................................................................................................................54

The Disaggregator: subroutine and function details ..........................................54

Page 5 of 67

Table of Diagrams, Graphs and Pictures Figure 1 - USGS Land Use/Land Cover System Legend (Modified Level 2).............11 Figure 2 - Hypothetical District……. ..........................................................................14 Figure 3 - Pixelated District……….............................................................................14 Figure 4 - District Land Cover.....................................................................................14 Figure 5 - District Suitability .......................................................................................15 Figure 6 - Normalised District….. ...............................................................................16 Figure 7 - Disaggregated District.................................................................................16 Figure 8 - Aerial Photograph of One Grid Cell (Google Maps) ..................................19 Figure 9 - East Side - Looking East (Microsoft Live Earth 1).....................................20 Figure 10 - North Edge - Looking South (Microsoft Live Earth 2).............................20 Figure 11 - Centre - Looking North (Microsoft Live Earth 3).....................................21 Figure 12 - Japan (Shown as Land Use / Land Cover) ................................................26 Figure 13 - Graphs of Total Commercial Risks in Japan Plotted against Topology ...28 Figure 14 - Commercial Risks in Japan Plotted against Topology by Unit Area........29 Figure 15 - Tokyo and Surrounding Districts ..............................................................31 Figure 17 - DEM and Slope Datasets ..........................................................................32 Figure 18 - Graph of Commercial Risks against DEM by unit area............................33 Figure 19 - Graph of Commercial Risks against Slope by unit area ...........................33 Figure 20 - Land Use/Cover and Modified Land Use/Cover datasets.........................34 Figure 21 - Graph of Commercial risks against Land Use ..........................................34 Figure 22 - ISA and Smoothed ISA datasets ...............................................................35 Figure 23 - Graph of Commercial Risks against Impervious Surface Areas...............35 Figure 24 - Tokyo: Surveyed and Disaggregated Commercial Risks..........................36 Figure 25 - ArcMap with Disaggregation Tool ...........................................................37 Figure 26 - Disaggregation Tool Step 1.......................................................................38 Figure 27 - Disaggregation Tool Step 2.......................................................................39 Figure 28 - Disaggregation Tool Step 3.......................................................................40 Figure 29 - Disaggregation Tool Step 4.......................................................................41 Figure 30 - Disaggregation Tool Finished ...................................................................42 Figure 31 - ArcMap showing Step 3 raster output.......................................................43

Page 6 of 67

Acknowledgements I would like to thank my fiancée Amanda whose support has been unwavering and her belief in me has kept me going throughout my Master’s degree but mainly I would like to thank her for saying yes. Thanks also my family who have been so enthusiastic in supporting and encouraging me throughout my degree as well as looking after all my belongings while I was enjoying the best Boston has to offer. Thanks to my cat, Esme, for only biting me when I’d been working so long, I forgot to feed her. Many thanks to AIR Worldwide Corporation and especially to the Exposures Group for making me feel welcome in Boston during my internship and helping me design and test my disaggregation program. Thanks to my supervisor Professor Jonathan Raper especially for his assistance when I had to return to the UK unexpectedly and to all the staff in the School of Informatics and Placement Unit who assisted me while on my internship. Lastly thanks to my friends both in City University and outside who knew when to distract me away from my work and when to let me get on with it.

Page 7 of 67

Introduction When census data is made available to the public, the data is aggregated to different levels such as post code or county level. This is done to provide a degree of anonymity to the people represented by the census. Aggregated census data is bought and analysed by insurance and reinsurance companies as part of their policy pricing procedures. Having aggregated data can give reasonably accurate results but due to the large amounts of money being allocated based on these results, companies and naturally keen to reduce the level of aggregation as much as possible. Disaggregating aggregated census data can lead to a more accurate representation of the spatial details of a population. The disaggregation process utilises datasets of the geographic locations covered by the census data and attempt to distribute population counts to the most likely areas. The census data being used with this dissertation (program) consists of the number of insurance risks of certain types. In some cases these will be buildings, such as the number of single family homes while in other cases there could be multiple risks within a single building for example apartments within flats or commercial units within an office block. The insurance policies and premiums for these risks are calculated by insurance companies. The spatial location of these risks is a large factor in these polices especially when considering potential loss from natural disasters. Insurance companies aggregate their own data before passing it onto reinsurance companies as a way of protecting their commercial knowledge base. Reinsurance companies attempt to disaggregate this data themselves to better measure the underlying risks and model potential loss situations (such as hurricanes, floods, terrorism, etc). This dissertation provides an automatic method for spatially disaggregating risks by analysing topological datasets (such as elevation and impervious surface area) to determine the likely spatial locations of the risks. The program is called a Disaggregation Tool (or a Disaggregator). An aggregated collection of risks might be located within a large spatial boundary area (such as a county or district). Catastrophe modelling programs can not determine the exact locations of these risks so they are assigned to the central point of the area or uniformly distributed across the area. The Disaggregation Tool distributes the risks inside the boundary areas depending on the topological features. The AIR Worldwide (AIR 2008) Exposures group has previously attempted to create a disaggregation methodology for manual user operation of ArcGIS functions. The early results indicated the disaggregation was feasible but time consuming. The procedure would have to be run numerous times weighting the different inputs to a variety of degrees before an optimum output could be reached. The early data sources were confined to a small area of Central America and were highly optimised for this area alone. For this dissertation, the scope was to create an automatic program running inside ArcGIS that would utilise freely available worldwide datasets and an AIR provided shapefile (outlining the district boundaries) with census data (i.e. the risks), to output an integer raster dataset plus an ASCII file of x,y data points, in a user defined resolution not greater than 1/120 decimal degree2, where each pixel’s value indicated the number of risks at that spatial location.

Page 8 of 67

Literature Review Disaggregating aggregated census data is performed in many industries with a wide variety of methods. These methods are tailored based on the available input data, what the resulting data is needed for and what degree of accuracy is required. An example is an advertising firm. They might require details of the distribution of age groups over a city so they can better target their clients’ products. In contrast a collection of retail companies planning on opening a new collection of stores outside of town in a new retail park could need data indicating car ownership and population density for the surrounding areas to help them find the optimum location for the retail park. In either situation, the clients (i.e. the Advertising firm or the retail companies) must decide the potential monetary benefits of buying expensive accurate minimally aggregated census data to buying cheaper more generally aggregated data. More accurate data could provide better spatial locations but the cost of the data might out weigh any benefits. AIR Exposures group is primarily interested in determining insurance risk locations to enter into their catastrophe modelling programs. More accurate locations equal more accurate loss predications. Hofstee and Islam (2004) explain the need to have accurate population location data as well as the non-uniform distribution of people inside census wards. They compare satellite imagery with aerial photographs with surveyed ground data and conclude, for the relatively small areas they were studying, ground surveys are necessary to gather specific building information. This dissertation is focused on providing a disaggregation method for areas where there has not been any ground surveying performed. Voss, Long and Hammer (1999) describe two spatial interpolation techniques that use road networks to “aggregated demographic characteristics from one type of geographic boundaries (i.e., the geographic hierarchy of the U.S. Census) to another (e.g. watersheds) under conditions of “spatial incongruity.” The road network based disaggregation has some very positive aspects to it. Automatically locating clusters of nodes, i.e. road intersections, should correspond with high densities of people. Voss et al. talk about developing this in an earlier version of ArcMap (v3.1). This by itself would make it an ideal partner for the Disaggregation Tool in this dissertation except for problems with the source data. The only freely available worldwide road network is the VMap Level 0 dataset (based on the Digital Chart of the World). The problem with this is at a scale of approximately 1km, the intersections imply a level of accuracy (‘x’ marks the spot so to speak) while the data has an error range of 1km. If the VMap Level 1 data (at 0.25km resolution) is ever made freely available, it would be a great benefit to the Disaggregator using some of Voss et al.’s techniques. This is a similar case with Xie (1995). There are details procedures for using road network intersections but the output accuracy is only as good as the inputted data. AIR Exposures group decided to only pursue these methods if the more accurate road maps became readily available. Many of the papers on disaggregation focus on relatively small areas with high resolution data sets and aerial photography. Many of these techniques give disaggregated data with high degrees of accuracy but the methodologies are not easily transferable to different worldwide locations. Miller et al. (2002) do not try and disaggregate census data but they do compile a database of worldwide datasets with the aim of combining the datasets to determine water and food balances in Africa to better prepare for emergencies. Many of the worldwide datasets listed by Miller et al. were considered for inclusion in this dissertation’s Disaggregation Tool before the three source and one derived source datasets were chosen.

Page 9 of 67

Source Data: Review and Preparation USGS Digital Elevation Model (DEM) Global 30 Arc-Second Elevation Dataset (GTOPO30 website) The DEM dataset has three uses:

• Preventing the Disaggregator from distributing risks to high elevations that are unlikely to contain commercial, residential or industrial risks.

• Forming the source for creating a slope dataset. The slope dataset will be used for a similar purpose as the first elevation use.

• Preventing the Disaggregator from distributing risks to large water bodies. This function is also performed by other datasets.

The source is provided broken in 33 tiles (6 covering the Antarctic, 27 covering the rest of the world) to allow for easier downloading (if someone doesn’t want the entire World, they can download just the local tiles needed. The data has a horizontal grid spacing of 30 arc seconds which is approximately 1km. Due to the geographic projection, the actual distance changes depending on latitude. Each tile is archived as one file e.g. w060n90.tar.gz Each archive contains 8 files: e.g. W060N90.DEM W060N90.DMW W060N90.GIF W060N90.HDR W060N90.PRJ W060N90.SCH W060N90.SRC W060N90.STX To prepare the files for use in the Disaggregator, the following procedure was taken: The files are in Band Interleaved by Line format (BIL) but needed to be renamed from .DEM to .BIL for ArcGIS to recognise them. All tiles are opened in ArcGIS -> ArcMap From the ArcToolbox, the Data Management Tools -> Raster -> Mosaic to New Raster function is selected.

All tiles are added and an output location is specified. The output format is set to ESRI GRID The projection is set as GCS_WGS_1984 Pixel depth is 32bit signed integer The cellsize is left unchanged at 0.0083333333, 0.0083333333 Number of bands is 1 Mosaic Method is First (although the should not be any overlapping) Mosaic Colormap Mode is left as First.

The resulting raster has the following properties: Water denoted by a value of: 55537 or -9999 Land below sea level denoted by: 65000 to 65536 (where water is 55537) or

-200 to -1 where water is (-9999) The following steps are run:

All areas marked as Water set to: 0 All areas marked as land below sea level set to: 1 Ensure resolution is 0.0083333333, 0.0083333333 (1/120) Saved as a 32bit unsigned integer GRID formatted dataset.

The resulting dataset, named GRID_DEM, is ready for use as a source for the Disaggregator. Limitations of USGS DEM Areas of dry land that are beneath sea level are all set to 1m elevation. This prevents any meaningful slope data or distinguishing features from being shown for these areas. The decision to flatten these areas was taken after discussion with the AIR Exposures team. The

Page 10 of 67

decision for distributing residential, commercial or industrial risks to these areas will not be based on elevation and the slopes in these areas would not be significant enough to affect the results. All that is needed is for a land value greater than sea level, 1 meter. The geographic projection means the dataset is not equal area. AIR Exposures group did not require equal area data. The gridded dataset outputted by the Disaggregator allows risk numbers to be compared without being concerned of actual area. One grid cell can be compared directly with another grid cell as they have been created using the same format and methodology and represent equal ‘units’ but not area. The DEM was compiled by the USGS over 3 years from 9 source datasets using a variety of methods. The USGS provides (USGS GTOPO30 Website) the following table to show data sources and the approximate percentage each data source comprises of the DEM.

Source % of global land area ------------- --------------------- Digital Terrain Elevation Data 50.0 Digital Chart of the World 29.9 USGS 1-degree DEM's 6.7 Army Map Service 1:1,000,000-scale maps 1.1 International Map of the World 1:1,000,000-scale maps 3.7 Peru 1:1,000,000-scale map 0.1 New Zealand DEM 0.2 Antarctic Digital Database 8.3

A mask of the World’s Oceans was also used. The USGS follows this table with a description of the various data sources and an overview of methods employed to process and combine the data sources. In cases where the source data was of higher resolution than the target they were resampled to 30-arc second horizontal grid spacing before being combined with the other datasets. The USGS due cite a variety of sources detailing their interpolation and other processing techniques to allay any uncertainty about the accuracy of the DEM. Derived Slope Dataset During step1, the Disaggregator creates a Slope dataset based on the DEM source dataset. The Spatial Analyst ArcObjects command RasterSurfaceOp.Slope() is used with a calculated Z-buffer. The Z-buffer is the ratio between the x,y resolution and the units along the z axis. The x,y axis is in decimal degrees based on arcs while the z axis (the elevation) is in meters. The following algorithm (ESRI support centre Z-Buffer) works out the ratio to act as the z-buffer: Determine what the middle latitude of the area of interest is. Convert that degree value to radians. 1 degree = 0.0174532925 radians Use the value in radians in the following equation: Z factor = 1.0 / (113200 * cos(<input latitude in radians>)) Use this calculated Z factor in the slope tool. The Slope dataset is used to prevent the Disaggregator from distributing risks to areas that are deemed too steep for the specified type of risks. For example, it is unlikely to find single family homes on slopes steeper than 20°.

Page 11 of 67

Limitations of Derived Slope The resulting z-buffer is ideal for the centre of the slope map and becomes less ideal the further north and south. An obvious deficiency in the slope map is apparent because of this. If a large area that stretches over many degrees of latitude, then the resulting slope map will have large inaccuracies towards the top and bottom edges. To address this issue, the Disaggregator creates the slope map, after the DEM has been cropped to the area of interest. Also the AIR Exposures group (the primary users of the Disaggregation Tool) have been advised of this issue and it has been recommended to run the Tool on smaller areas to minimise the problem. The resolution of the dataset (approx 1km) is a large area for a slope. Inside a square kilometre there could be short steep slope that makes the area undesirable to build in but is not significant enough to affect the overall slope value. Also a series of narrow but steep hills could give a net effect of zero slope while still being undesirable building areas. USGS Land Use/Land Cover (USGS Global Land Cover Characteristics Data Base Version 2.0) 1 file: gusgs2_0ll.img Using ArcGIS -> ArcMap this was converted to a 32 bit unsigned integer GRID formatted dataset with a resolution of 0.0083333333, 0.0083333333 (1/120) which equals the source 30-arc second horizontal grid. The dataset is part Land Use and part Land Cover. It will be referred to as Land Use throughout dissertation for brevity. The USGS state (USGS Land Cover Website) the Land Use dataset was produced from a 1-km Advanced Very High Resolution Radiometer (AVHRR) data spanning a 12-month period (April 1992-March 1993).

Figure 1 - USGS Land Use/Land Cover System Legend (Modified Level 2)

The Land Use dataset has three main uses:

• Emphasising the location of urban areas • Preventing the Disaggregator from distributing risks to water bodies

Page 12 of 67

• Identifying land covers, other than urban, which contain risks. Limitations of USGS Land Use The dataset is based on relatively old data. At least 15 years has passed. Many features of Land Use could have changed during this time. While it is unlikely for urban areas to reduce, expansion is very likely. Cities tend to grow as world population increases. The Disaggregator does expand all urban areas during processing but this does not reflect any measured expansion. The USGS provides details (USGS Land Cover Website) of the data capture and analysis methods but the resulting Land Use dataset only has one category for urban areas. As the Disaggregator needs to distribute risks such as residential and industrial outside of city centres, more urban categories would be helpful. In the current form, the Land Use can be used to determine city and large town centres while other data sources are needed to help locate smaller towns, sub-urban residential areas and industrial parks. There are only 24 (plus no data) categories and of these only one describes urban land. Surveyed data shows many residential and some commercial risks exist in areas that are not classified as urban but as cropland or grassland but not in all cropland or grassland areas. NOAA/NGDC Impervious Surface Areas (ISA) Global Distribution and Density of Constructed Impervious Surface Areas (NOAA/NGDC ISA Website) ISA in GCS (WGS84) ENVI (Compressed with tar/gzip) 2 files: ngdc_isa_gcs_product.dat.hdr ngdc_isa_gcs_product.dat Values in range -1.000000 to 100.000000 DAT file in ENVI format. Change:

All values of -1.000000 to -1 Everything else Multiply by 1,000,000 Set to integer Convert to GRID format. Ensure resolution is 0.00833333333333, 0.00833333333333 (1/120)

Converting the file to integer greatly reduces processing time when running the Disaggregation Tool. By multiplying the source by 1,000,000 the source precision is kept. The value of -1 is a uniform mask over bodies of water at sea level and large lakes. The ISA was produced using data from satellite observed night-time lights (from the years 2000-2001) and population counts. The data production was trained on 30m resolution ISA of the USA derived from Landsat images provided by the USGS. The ISA dataset is the most important of the four source datasets.

• It provides location and magnitude information of cities, towns and villages where most commercial risks will be located. This in turn can suggest suburban locations for most residential risks and in turn, rural areas for agricultural risks.

• It is used to prevent the Disaggregator from distributing risks to large water bodies (which all have a value of ‘-1’).

Limitations of ISA The ISA data is based on urban patterns in the USA. It is noted by Elvidge et al. (2007) that while China has more ISA than any other country, it has far less ISA per person than the USA. Elvidge et al. had to make adjustments around the world for different styles of living.

Page 13 of 67

They had to cope with underestimated ISA in areas with lots of trees and overestimations in barren areas. The ISA also used population counts. Some governments provide this based on where people sleep which would lead to low numbers in city centre where there is a typically ISA value. Elvidge et al. (2007) state “ISA is a function of population, level of economic development, and the availability of surfaces suitable for building”. ISA values are less likely to spatially indicate low population numbers and low economic areas. This is not a major problem for AIR or Insurance companies as most of the value of their commercial and residential risks will be in high population areas of high economic development. The night lights data the ISA is based on has problems such as glare, reflection (especially off ice and water) and ‘blooming’ (where a bright light is received by numerous pixels in the camera and appears to cover a larger area than in reality) that are difficult to automatically filter out. Optional Miscellaneous Data Layer A user specified data layer The Disaggregator has been designed to allow a fourth dataset to be included. It must be a GCS WGS84 Geographic Projection and in an ESRI GRID format. If the format is not the same as the other source data, the Disaggregator will convert it during an ESRI RasterMathOps.Times() function. The results of this conversion are unpredictable so it is recommended the user resample their data before running the Disaggregator. The dataset will need to be added as a layer to ArcMap. If it is named grid_misc then the Disaggregator will add its name automatically. The Disaggregation Tool will use the MISC version of the remapping files (e.g. APR-MISC-Remap.txt, COM-MISC-Remap.txt...) which the user must configure before running the Disaggregator. An example of use is AIR Exposures might buy a more recent higher resolution Land Cover dataset that is not worldwide such as the CORINE Land Cover dataset (discussed in Steinnocher et al. 2006). This recent 100m resolution Land Cover has multiple categories for urban areas and can significantly improve the Disaggregator’s distribution of European risks. When using the CORINE dataset, the user would set the USGS Land Use dataset’s weighting reduced to near zero while the miscellaneous dataset given a more significant weighting.

Page 14 of 67

A Guide to Disaggregating Data In the diagram below, the red line is a hypothetical shapefile denoting a political boundary such as a district (Figure 2). The squares represent a raster at a certain resolution. Every pixel whose midpoint lies within the district has been marked with an ‘X’. In this example, the district census states there are 5000 residential houses in this district. There are 48 pixels in this district at this resolution. The resulting pixelated district is shown to the right (Figure 3).

Figure 2 - Hypothetical District Figure 3 - Pixelated District One method of disaggregation would be to assume an even distribution of residential buildings across the district. In this case, each of the 48 pixels would be labelled with 104.17 residential risks. As the risk models work with integer values, this would be 104 risks per pixel with 8 risks lost. Another method is to locate the centre point of the district, allocate a large proportion of the risks to that pixel, then distribute the rest in reducing numbers based in their distance to the centre point. Neither of these methods reflects the actual spatial locations of the risks in the real world (except on a few coincidental occasions). Analysing the topology of the land should lead to a more accurate placing of the risks. Figure 4 is a simplified Land Use / Land Cover map of the district and surrounding area. The district boundary is still shown as the black line.

Figure 4 - District Land Cover

Page 15 of 67

By looking at the details of the Land Use, accurate predictions about the probable locations of the residential risks can be made. If this were the only dataset to base the disaggregation on then the following assumptions could be made. Assumptions: Land Use Chance of Residential Urban High (H) Mountain Zero (0) Cropland Low (L) Forest Low (L) Water Zero (0) Cropland 2 pixels from City Medium (M) Cropland 1 pixel from urban Medium (M) Forest 1 pixel from urban Medium (M) The last three assumptions are based on the idea that a Land Use dataset at 1km2 is based on the majority land use but does not indicate what else occurs in that area. Cities and to a lesser extent, towns, tend to affect land use surrounding them. By saying there is a medium chance of residential risks in land surrounding the city and town is to take into account dispersed sub-urban areas. Also residential areas tend to expand further in open areas of cropland as opposed to forested land. The diagram below (Figure 5) shows the results of the assumptions.

Figure 5 - District Suitability

It is worth noting near the lower central area of the district, there is some cropland that doesn’t appear to be near enough to the town to have an effect but it is affected by the parts of the town outside the boundary of the district. This also shows the importance of extending the analysis beyond borders if possible. Land Use beyond the borders of a district can have far reaching affects to the dispersion of residential and commercial risks. In the full version of the Disaggregation Tool, there are many variations to the low, medium and high suitabilities shown in figure 5. The remapping of the topological features of a single pixel area to a single suitability number requires detailed analysis of building location trends. It is likely numerous variations of these remapped suitabilities will be needed during test and based on district location in the World. Changing the high, medium and low chances into probabilities and normalising across the district (so all probabilities add up to 100%), the next diagram (Figure 6) is produced. Distributing the 5000 residential risks leads to the final diagrammatic raster showing the risks disaggregated across the district (Figure 7).

Page 16 of 67

Figure 6 - Normalised District Figure 7 - Disaggregated District Totalling all the pixel values for houses equals 5003 residential risks. The extra 3 comes from the requirement to have integer values. The 3 extra risks is only a 0.06% increase which is within the accepted margin for error of the AIR Worldwide Exposures group guidelines for a diagram (the output from step 3 of the Disaggregator). The actual data to be used in AIR’s models must have the exact number of risks. The Disaggregator will remove 3 risks from the top three pixels when ordered by risk number (the output of step 4). The resulting raster indicates high potential risk areas only in the top left and bottom right corners. Reinsurers need only plan for large losses if natural disasters occur in those areas meaning lower premiums for the other areas within the district. The full version of the Disaggregator uses a combination of four source datasets (instead of the one in this example) to give greater accuracy. For example, in the land use dataset, large areas are labelled as mountains right next to the city. Impervious surface area data, slope and elevation might indicate the city extends into the mountains but not so far through the forest or croplands. This would lead to a different distribution of the risks and therefore, different recommendations to the insurers and re-insurers.

Page 17 of 67

Methods of Disaggregation Ground Proofing – Walking round an area, counting buildings, investigating how many risks an insurance company has in any area is very time consuming, expensive and not feasible for a worldwide dataset. Analysing a small area and applying the results elsewhere is more feasible but still impractical due to the distances involved in obtaining a large enough sample set. This surveyed data can be used to train automatic disaggregation routines. Aerial Photography and Manual Counting – Google Maps, Microsoft Live Earth, Yahoo Maps and others provide high resolution aerial pictures of a large percentage of the Earth’s surface. Additionally Google provides street level views for many locations while Microsoft provides oblique angle aerial photos for many locations. Analysing these pictures, the number of buildings in a given area can be determined. An example of this manual process is described below (Detailed Analysis of a Grid Cell) along with the limitations of such an approach. Some companies (Zhu et al. 2008 and Eguchi et al. 2008) are developing software that can automatically analyse these photos to determine the number and type of structures. These software products are not freely available, the Google and Microsoft images are not freely available to commercial enterprises like AIR Worldwide and they are still experimental. Analyse commercially obtained ground proofed data – The ground proofed data is not freely available but fortunately this is where AIR Worldwide have moved beyond their original remit of freely available data. A gridded dataset of the whole of Japan with numbers of commercial, residential and agricultural risks in each grid cell was made available. By mapping this to the same resolution and projection as the source data, an analysis could be performed. Section Analysing Results details this analysis. Taking the results and optimising the remapping to the Japanese data resulted in a program that could predict risk spatial locations to an AIR acceptable accuracy level (discussed in Analysing Results). The problem with this is other countries in the world vary in topology from Japan. In Japan, few residential risks occur above an altitude of 400m and there are virtually zero over 1000m, in other countries such as Chile there are cities above these altitudes. In Japan, there is a large amount of commercial risk on land use type 8 (code 321, Shrubland) and none on land use type 7 (code 311, Grassland), in fact there is very little land in Japan classified as grassland (less than 0.1%). Compared with New Zealand, where grassland is the majority land use type (approximately 49.1%) and there isn’t any land classified as shrubland. This indicates a likelihood for a greater percentage of New Zealand’s commercial risks to exist on grassland than for Japan’s commercial risks to be similarly located. The configurable nature of the program is designed to cope with these differences. The program loads in text config files as specified by the user. The user can define configuration files specific areas of the world or use the more general ones provided. The weighting can be adjusted at run time to further adjust the outputted file.

Page 18 of 67

Detailed Analysis of a Grid Cell Centre Point: 4.885211E 45.701438N Locale: Vénissieux, South-East Lyon, France Grid Cell Size (X,Y): 1/120 , 1/120 (Decimal Degrees) Geographic Projection D WGS 1984 ArcGIS Reported Size of pixel: 0.005843 by 0.008294 (decimal degrees) 650 by 920 (meters) When dealing with world or country datasets, it is easy to forget just how much actual ground area is being represented and the diversity that can occur over relatively small distances. The source data this dissertation is using has a resolution of 1/120 decimal degrees in the x and y axis. Due to the geographic projection the actual area represented varies depending on latitude. In the aerial photograph (Figure 8) below, the black rectangle denotes the area covered by one pixel with the centre point specified above orientated so North is towards the top. The source datasets for this area give the following values:

DEM = 194 meters Slope = 0.34 degrees Impervious Surface Area = 40.42 Land Use = 2 : Dryland Cropland and Pasture

These values suggest a reasonably good suitability for residential buildings but the lower than average impervious surface area value suggests a lower suitability for commercial buildings. Zooming out in Google maps shows this area is very close to the city of Lyon. This increases the likelihood of there being commercial and residential buildings located inside the pixel. Examining the aerial photograph it is possible to see many buildings and to estimate their purposes. For example to the south-east, there are long rectangular buildings with long shadows denoting tall buildings, either offices or residential flats. Considering their surroundings, residential flats is the likely building use. For further clarification, we can look at Figure 9. This is an oblique aerial photograph from the Microsoft Live Earth website of those buildings (looking eastwards). It clearly shows them to be residential flats. Looking towards the north-west of the first photograph, there is a large square concrete grey-white rectangular area surrounded by red roofed buildings and an unusually shaped grey and white building to the west at a road junction. The large concrete square is a cemetery surrounded by residential houses with red roofs. To determine the purpose of the grey and white building, we turn to Microsoft Live Earth again. Figure 10 (looking southwards) shows the grey and white building on the right, it now becomes easier to guess the building is a cinema or theatre (an internet search of the road and town reports this is in fact a cinema). Figure 11 is an oblique view of a west – central area (looking northwards). On the overhead aerial photograph this appears to be a long rectangular concrete car park, bounded by trees (or other green vegetation) and in turn bounded by red roofed buildings. The oblique view shows this area filled with vans and market stalls. It shows the vegetation is made up of trees and the red roofed buildings are commercial shops instead of residential buildings. It is also easier to make out the Church to the south-east of the market square.

Page 19 of 67

Figure 8 - Aerial Photograph of One Grid Cell (Google Maps) This picture also demonstrates the affect of timing. The overhead aerial photograph was taken on a day when the market was not in operation. In fact the other three oblique photographs available in Microsoft Live Earth of this area do not show a market either. To the north-east of the overhead photograph there is a train station, to the north-west and the south-west there are parts of industrial factories. The south of the pixel houses a park surrounded with residential buildings. In fact just past the pixel boundaries, there exist some commercial office blocks on the south side of the park. Small houses in cul-de-sacs comprise a large section of the east side of the pixel while larger houses mixed with commercial shops comprise the west of the pixel.

Page 20 of 67

This mix is typical of many suburbs around the world. Many different styles and uses of buildings but very often mixed together. To accurately map what buildings appear in any area, actual ground surveying is needed and even that may miss events like the market shown in the third oblique photograph. This level of detail is beyond the aims of this dissertation. The output of the program works on averages and probabilities. The idea is to give an indication of what exists in any pixel without resorting to the very time consuming process of analysing overhead and oblique aerial photographs or conducting ground surveys.

Figure 9 - East Side - Looking East (Microsoft Live Earth 1) These multi-occupancy residential flats shown in figure 8 might contain multiple residential risks or might be modelled as one entity. In some cases an insurance company may only be interested in a few of these buildings rather than all of them. There could be many more residential risks in a collection of flats than in single family houses covering a similar area.

Figure 10 - North Edge - Looking South (Microsoft Live Earth 2) The large concrete cemetery in Figure 9 might lead an automatic disaggregation routine to distribute residential or commercial risks to this area base don impervious surface area data. The cinema is prime example of a single commercial risk amongst a number of residential risks.

Page 21 of 67

Figure 11 - Centre - Looking North (Microsoft Live Earth 3) Figure 11 shows cluster of residential risks (the typically red roofed buildings) which also include shops and other commercial risks plus the market. This market is only there on certain days; viewing this picture on Microsoft Live Earth facing a different direction shows an empty parking lot. Even viewing from these oblique angles, it is difficult to determine some building uses due to narrow streets, trees and other blockages. Another problem is determining the actual use of the building. In his seminal paper, Wright (1936) talks about counting houses to determine resident population but warns some houses may be summer homes and not representative of the local population. Similarly a large number of apartment residential risks could be attributed to one block of flats while in reality, the flat could be derelict and not part of the insurance risk portfolio.

Page 22 of 67

Methodology for an Automatic Disaggregator Preparation Steps: Step 1: Determine what type of data will aid in the disaggregation of residential, commercial, industrial and agricultural risks. For example a dataset indicating slope in degrees may indicate areas that are unsuitable for commercial risks. It is highly unlikely a typical high rise office block to be built on a 40 degree slope but highly likely if the ground is flat (i.e. 0 degrees). Step 2: Locate freely available worldwide datasets. The resolution had to be at least 1km2. If it were any less, then no meaningful difference could be determined for a higher resolution output. Step 3: Prepare the source datasets and convert into similar formats. Due to the size of worldwide datasets some are provided split into tiles. Datasets are provided in a variety of projections, formats and resolutions. For example the GTOPO30 dataset was provided as 33 separate integer tiles while the Impervious Surface Area dataset was just one tile but in floating point format. Disaggregation Steps: Step 1: Crop the worldwide datasets to the area being analysed. This significantly

reduces processing time later on and increases the accuracy of creating a slope map.

Step 2: Create a slope map of the area under analysis derived from the DEM. Step 3: Smooth or enhance features in the source maps as deemed necessary by

testing. For example increasing the size of urban areas on the Land Use data increases their impact during the disaggregation process.

Step 4: Create mask style maps for rural, suburban and metropolitan areas. Step 5: Re-map the values in each dataset depending on the desired weight and

impact for the final results. For example use the Slope dataset as a mask to prevent any areas over 20 degrees being deemed suitable for commercial risks. Another example is to map the ISA 0-100 values to a logarithmic curve to increase the affect of changing from 0 to 1 to 2 compared to changing from 98 to 99 to 100.

Step 6: Combine the datasets to produce a suitability dataset. Depending on the intended use of the dataset, they can be applied as masks, multiplied together or they can have their effective weight reduced compared with the other datasets.

Step 7: Using the unique IDs from the shapefile and a user defined resolution, convert the suitability dataset into a series of normalised zones of suitability. This means in each district area defined by the shapefile, all the suitability numbers become probabilities that added together equal 1.

Step 8: Using the risk numbers from the shapefile, multiply them into the normalised zones thereby allocating each pixel with a number representing the estimated amount of risks at that spatial location. This creates a final output raster in floating point format.

Step 9: Create a copy of the output and apply the metropolitan mask. Then if any values are greater than the maximum for a metropolitan area, reduce them and add up all the extra risks per district. Combine the original output with the masked metropolitan (so keeping original values on everywhere that is not metropolitan) then redistribute the extra risks back over the entire district.

Step 10: Repeat Step 9 with the Suburban mask and then the Rural mask. Step 11: Convert the final raster to integer with all values rounded up or down as

appropriate. All values less than 0.5 in each zone before conversion to integer, should be added together and redistributed throughout their zone.

Step 12: Convert the integer raster is to an X,Y ASCII file containing a unique identifier for each pixel that has risks associated with it.

Page 23 of 67

Step 13: Shape size / resolution check. Depending on the shapefile and chosen resolution, some districts may have been lost if they did not contain the centre point of a pixel or when a district was normalised, the percentages were too low for the number of risks to be distributed as whole numbers. Compare the number of districts in the shapefile with the number represented on the raster dataset. If districts have been lost, create an X,Y ASCII file similar to the one created in Step 12. Instead of being for each pixel, it is for the centre point of each district and all risks for that district are assigned to the centre point.

Step 14: Create a log file specifying what parameters were used.

Page 24 of 67

Obtaining the Most Accurate Distribution of Risks The disaggregation tool performs a variety of functions and procedures as it distributes risks across the topological features (a complete program flow is outlined in (The Disaggregator: Program Order of Operation) but the most significant section is in Step 2 of the program. Step 2 remaps the values in each of the four or five source datasets, applies different weights to each and then combines the datasets into a single suitability grid. The suitability grid has a floating point value in for each pixel denoting the likelihood of the risks being mapped to that location. It is similar to a probability grid with higher values indicating a greater proportion of risks should be assigned to those locations. The values in each dataset are not directly comparable, for example land use is a categorical dataset so higher numbers do not indicate preference over lower numbers while the ISA is a quantitative dataset so a higher value indicates a greater density of impervious surfaces in that pixel. An impervious surface area (ISA) value of 56.7 multiplied or added to a land use value of 4 will not give a meaningful result. How would the resulting value compare with another area with an ISA value of 56.7 and land use of 17? Is that area more or less likely to contain the risks being distributed? Remapping the DEM, Slope, ISA and land use (plus miscellaneous) dataset values allows them to be combined in a meaningful manner. For example, remapping all the various ranges of values into a 0-100 range means all the values can be divided by 4 (or 5 is misc is included) and added together. The resulting suitability grid would have values between 0 and 100 allowing an easy comparison of likelihoods across the map. Example Remaps: Land Use: 1 (Urban and Built-Up Land) = 100 2 (Dryland Cropland and Pasture) = 70 3 (Irrigated Cropland and Pasture) = 65 ...

16 (Water Bodies) = 0 ISA: 100 = 100 80 – 99 = 95 60 – 79 = 80 ... 0 – 3 = 0 A drawback of this method of remapping is all datasets must be remapped to the same scale. If instead of adding the remapped datasets together, they are multiplied together, then the only condition is more suitable areas for risks, should have higher values than less suitable areas. Multiplying the datasets together means a value of 0 will take precedence. If the datasets are added together, then all datasets must have a value of 0 for a pixel to be assigned zero suitability. The decision between multiplying or adding the datasets together depends on what form the user requires the end data to take. If the datasets are multiplied together, there will be more areas with zero risks and higher concentrations of large numbers of risks. Adding the datasets results in more areas with a small number of risks spread over them and fewer concentrations of large numbers of risks. After examining the results of both methods, AIR decided multiplying the datasets produced the most useful results for their calculations.

Page 25 of 67

The remapping values are stored as text files that are loaded at run-time. This allows them to be modified separately to the Disaggregator, plus numerous versions can be kept and used for different situations and spatial locations. Creating the remap tables involves analysing the source data in great detail and can be time consuming. In some occasions the user will want to use existing remapping tables but give greater or lesser emphasis to each dataset. For this reason weighting values are assigned to each dataset. These weights multiply the values inside the remapped datasets before they are all combined. As the datasets are combined through multiplication, assigning a weight of zero to one dataset will cancel them all out. If the user wants to effectively remove a dataset from consideration, a very small weighting (e.g. 0.000001) can be assigned while the other datasets have larger weights (e.g. 75). The Disaggregation Tool offers an option to treat the DEM and Slope datasets as masks. No weighting is applied to the datasets and the mask-remap tables only change values to 1 or 0. This will remove (i.e. mask) certain areas out from the final suitability grid while not changing the values of areas to be included. By keeping the remapping tables separate from the Disaggregation Tool, the user has the option of running the same source dataset multiple times with different remap tables and then comparing the output. If the tool is run using data for an area where the actual spatial locations of the risks is already known (for example from a ground survey), then tool’s output can be compared to the surveyed data. The user can modify the remap tables to produce results closer to the surveyed data. These new remap tables can be used for similar but non-surveyed areas. The key here is the areas need to be similar. Some conditions for one part of the world may not be true else where. For example setting the DEM mask to 1200meters (no risks at higher elevations) might be appropriate for Japan but would in inappropriate for Chile. AIR’s catastrophe modelling programs did not require absolute accuracy for spatial locations of the risks. What was needed was a marked improvement from using aggregated district level data so while a set of remap tables based on Japan might not give as accurate results for Chile, they would give a better source of data for the catastrophe modelling programs than aggregated data. More accurate remap tables for Chile could be produced at a later date. The following section demonstrates the process of comparing surveyed data with disaggregated data.

Page 26 of 67

Determining Remap Values There are various approaches to finding a link between the distribution of risks and the topological datasets. If they are commercial risks then it can be assumed the majority of them will be in city and town (metropolitan) centres with few in rural areas. Agricultural risks are likely to have the opposite correlation. Residential risks such as single family homes are likely to primarily exist in suburban areas. Defining rural, suburban and metropolitan areas is one approach. If the land use dataset indicates an urban area, the ISA has a value of 100, the DEM gives an elevation of 15 meters and the slope is 2 degrees, all these values would indicate a metropolitan area and so a large proportion of the commercial risks should be distributed to this spot but what if all the values were the same except the ISA was only 75. This would indicate more open natural land, with a lower population density and fewer buildings. The area would be more accurately classified as suburban. Detailed ground surveys have been carried out in various areas of the world. Researchers have travelled around finding the exact number of risks in any given area. By comparing the number of risks in a single pixel sized area with the source datasets a correlation can be determined and used to give more accurate remapping tables. These tables can then be used to predict risk distribution in areas where there hasn’t been a ground survey or the data is too expensive. It must be noted that training the remap tables on surveyed data from one area might not be accurate for elsewhere in the world. AIR made a surveyed dataset for Japan available for analysis. The dataset displayed the number of and location of residential dwellings (risks) across the entire country at a resolution of 1km2.

Figure 12 - Japan (Shown as Land Use / Land Cover)

Producing graphs of the number of commercial risks against the various ISA, land use, DEM and slope values helped indicate values to use in producing remapping tables to disaggregate commercial risk locations in Japan. There were various sources of error in using this survey data. The data was provided as a gridded shapefile in the GCS (geographic coordinate system) Tokyo with the D_Tokyo datum. Using ArcGIS, the shapefile was converted to GCS WGS 1984 with the D_WGS_1984 datum. Once converted, the grid cells were 0.01049 by 0.008279 decimal degree rectangles. The shapefile was converted to a raster at the same resolution as the source datasets (1/120 square decimal degrees). The resampling was done using the nearest neighbour method.

Page 27 of 67

The source datasets and the survey data have all been compiled by different agencies, using a multitude of methods and on occasion, do not line up exactly. Slight discrepancies in pixel centre points could lead to mismatch errors. By using a large sample area, the effects of these errors on the final result should be minimised. The four graphs in figure 13 have all the commercial risks in Japan plotted against the various topologies (DEM, ISA, Land Use and Slope). The graphs have the same scale on the Y-axis to allow accurate comparisons. In the ISA graph, there is also an ISA Modified data line. The Disaggregation Tool smoothes the ISA source map using a mean function with a 3*3 pixel window before the remapping. Similarly there is a Land Use Modified data series. The Disaggregation Tool expands the size of urban areas and smoothes other land uses before remapping. Including the modified and non-modified data sources on the same graph allows the effects of these modifications to be observed. The four graphs in figure 14 show the commercial risks of Japan by unit area (each unit area is a 1/120 decimal degree square) plotted against the various topologies. All graphs in the second group have the same scale on the Y-axis. Examining the figure 13’s graphs, the most significant peak is on the DEM between sea level (0m) and 100m. There are 7.68 million commercial risks in that category. By comparison there are only 0.62 million between 101m and 200m. In fact between 101m and 2900m there are only 1.26 million commercial risks in Japan. The second most significant point is on the Land use graph, urban areas that have been modified by the Disaggregator (see details in The Disaggregator: Program Order of Operation) have 4.73 million commercial risks which is more than all other Land use categories combined. This would seem to indicate the Disaggregator should focus primarily on distributing commercial risks to low elevation areas and then on urban areas as denoted by the Land Use dataset before considering other topologies, however, examining the second group of maps in which commercial risks per unit area are graphed, the significance of the DEM graph appears to vanish. The unit area graphs are base don a 1/120 square decimal degree resolution. This is the same as the worldwide source datasets and is the probable output resolution to be used by AIR. There are 57.7 commercial risks per unit area between 0m – 100m and in modified urban land use areas there are 285.3 commercial risks per unit area. These amounts are small compared to the modified ISA areas with a value of 96-100 where there are 1032.3 commercial risks per unit area. The effects of smoothing the ISA source data are apparent as there are only 654.3 commercial risks per unit area on the non-modified dataset. Modifying the dataset allows more emphasis to be placed on the 96-100 ISA values compared to the non-modified dataset. The Land Use dataset gives 467.4 commercial risks per unit area in the non-modified urban category. The modified Land Use only has 285.8 commercial risks. Modifying the source data has had the opposite effect to the ISA data. This means less emphasis should be given to the urban areas if the modified dataset is used. Through their experience in using Land Use / Land Cover datasets and Impervious Surface Area datasets, AIR has learnt the ISA data tends to have a greater degree of accuracy for use in their catastrophe models. For more details see the Source Data section.

Page 28 of 67

Figure 13 - Graphs of Total Commercial Risks in Japan Plotted against Topology

Commercial Units Against DEM

0

1000

2000

3000

4000

5000

6000

7000

8000

0-10

010

1-20

020

1-30

030

1-40

040

1-50

050

1-60

060

1-70

070

1-80

080

1-90

090

0-10

0010

01-11

0011

01-12

0012

01-13

0013

01-14

0014

01-15

0015

01-16

0016

01-17

0017

01-18

0018

01-19

0019

01-20

0020

01-21

0021

01-22

0022

01-23

0023

01-24

0024

01-25

0025

01-26

0026

01-27

0027

01-28

0028

01-29

00

Elevation (meters)

Com

mer

cial

Uni

ts (T

hous

ands

)

DEM

Commercial Units against Impervious Surface Areas

0

1000

2000

3000

4000

5000

6000

7000

8000

-1 0 1-5 6-10 11-15 16-20 21-25 26-30 31-35 36-40 41-45 46-50 51-55 56-60 61-65 66-70 71-75 76-80 81-85 86-90 91-95 96-100

Com

mer

cial

Uni

ts (T

hous

ands

)

ISA ISA Modified

Commercial Units Against Slope

0

1000

2000

3000

4000

5000

6000

7000

8000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

Degrees

Com

mer

cial

Uni

ts (T

hous

ands

)

Slope

Commercial Units against Land Cover & Land Cover Modified

0

1000

2000

3000

4000

5000

6000

7000

8000

1 Urba

n and

Buil

t-Up L

and

2 Dryl

and C

ropla

nd an

d Pas

ture

3 Irri

gated

Cro

pland

and P

astur

e

5 Crop

land/G

rass

land M

osaic

6 Crop

land/W

oodla

nd M

osaic

7 Gra

sslan

d8 S

hrub

land

10 S

avan

na

11 D

ecidu

ous B

road

leaf F

ores

t

12 D

ecidu

ous N

eedle

leaf F

ores

t

13 E

verg

reen

Bro

adlea

f Fore

st

14 E

verg

reen

Nee

dlelea

f Fore

st15

Mixe

d For

est

16 W

ater B

odies

17 H

erba

ceou

s Wetl

and

18 W

oode

d Wetl

and

19 B

arren

or S

parse

ly Veg

etated

21 W

oode

d Tun

dra

22 M

ixed T

undra

100 N

O DATA

Com

mer

cial

Uni

ts (T

hous

ands

)

Land Cover Land Cover Modified

Page 29 of 67

Figure 14 - Commercial Risks in Japan Plotted against Topology by Unit Area (1/120 decimal degree square)

Commercial Units Against DEM - by Unit Area

0

100

200

300

400

500

600

700

800

900

1000

1100

0-10

010

1-20

020

1-30

030

1-40

040

1-50

050

1-60

060

1-70

070

1-80

080

1-90

090

0-10

0010

01-11

0011

01-12

0012

01-13

0013

01-14

0014

01-15

0015

01-16

0016

01-17

0017

01-18

0018

01-19

0019

01-20

0020

01-21

0021

01-22

0022

01-23

0023

01-24

0024

01-25

0025

01-26

0026

01-27

0027

01-28

0028

01-29

00

Elevation (meters)

Com

mer

cial

Uni

ts

DEM

Commercial Units against Impervious Surface Areas - by Unit Area

0

100

200

300

400

500

600

700

800

900

1000

1100

-1 0 1-5 6-10 11-15 16-20 21-25 26-30 31-35 36-40 41-45 46-50 51-55 56-60 61-65 66-70 71-75 76-80 81-85 86-90 91-95 96-100

Com

mer

cial

Uni

ts

ISA ISA Modified

Commercial Units Against Slope - by Unit Area

0

100

200

300

400

500

600

700

800

900

1000

1100

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

Degrees

Com

mer

cial

Uni

ts

Slope

Commercial Units against Land Cover & Land Cover Modified - by Unit Area

0

100

200

300

400

500

600

700

800

900

1000

1100

1 Urba

n and

Buil

t-Up L

and

2 Dryl

and C

ropla

nd an

d Pas

ture

3 Irri

gated

Cro

pland

and P

astur

e

5 Crop

land/G

rass

land M

osaic

6 Crop

land/W

oodla

nd M

osaic

7 Gra

sslan

d8 S

hrub

land

10 S

avan

na

11 D

ecidu

ous B

road

leaf F

ores

t

12 D

ecidu

ous N

eedle

leaf F

ores

t

13 E

verg

reen

Bro

adlea

f Fore

st

14 E

verg

reen

Nee

dlelea

f Fore

st15

Mixe

d For

est

16 W

ater B

odies

17 H

erba

ceou

s Wetl

and

18 W

oode

d Wetl

and

19 B

arren

or S

parse

ly Veg

etated

21 W

oode

d Tun

dra

22 M

ixed T

undra

100 N

O DATA

Com

mer

cial

Uni

ts

Land Cover Land Cover Modified

Page 30 of 67

With figure 13, the ISA shows a deviation from its curve with ISA values of 1-5. This lump disappears from the per unit area ISA graph in figure 14. The lump illustrates there are a great many areas in Japan with a very low impervious surface index and so a very low likelihood of finding a commercial risk in a low index area, but the sheer number of low index locations results in a larger than expected count of total commercial risks. Examining the slope and DEM graphs show very few, if any, commercial risks above 15 degrees and 1200meters. Due to their low significance compared with ISA and Land Use, masking out areas beyond those boundaries helps prevent the Disaggregator from unnecessarily spreading commercial risks thinly over large areas. As AIR are more interested in averages rather than exacts, very small numbers of risks are not important enough to justify the added computation time needed by the catastrophe models for multitudes of low risk areas. In their paper, Steinnocher et al. (2006) talk about the error of losing dispersed settlements when mapping country level data results in comparably small errors given the total population being mapped. By comparing the Land Use graphs in figures 13 & 14, the significance of land uses (such as type 2: Dryland, Cropland and Pasture) other than Urban, appear very small. Despite quite large numbers of commercial risks being located on other land use types, viewing them by unit area shows a very low correlation. This is due to the large amount of area taken up with these land uses compared with urban areas. Simply distributing more commercial risks to other land uses could result in the risks being located in the wrong place due to the large amount of land labelled as type 2. The ISA graphs are the only ones that show a strong correlation between both overall quantity and unit quantities of commercial risks.

Page 31 of 67

Analysing Disaggregated Results To test the Disaggregation Tool, AIR provided a shapefile of the districts of Tokyo and surrounding areas populated with the number of commercial risks in each district (Figure 15).

Figure 15 - Tokyo and Surrounding Districts

To compare the results of the Disaggregator with the source data (shapefile) and the survey data, the following three diagrams in figure 16 are coloured in the same rainbow scale (red being the smallest number, dark blue being the largest number of commercial risks). The top diagram shows each district coloured with the total number of commercial risks present. Without disaggregating the data, AIR’s catastrophe models are forced to interpret the data either as uniform amounts across each district. If a catastrophic event were to occur at the east most extent of one district, the models would produce loss results as if the entire district was hit by the event. The second diagram shows the output from the Disaggregator. There are now large areas with zero commercial risks (areas in white) and areas with very low numbers of commercial risks (red and orange). Tokyo can be easily distinguished as the large area of dark and light blue along the centre of the bottom. The suburbs are a combination of light blue and yellow indicating an inverse relationship between distance from city and town centres to the number of commercial risks. The third diagram shows the surveyed data converted to raster and reprojected to the geographic coordinate system WGS 1984. The primary distinction between the Disaggregator’s output and the surveyed data is the amount of white space, i.e. zero commercial risks, and the speckled nature of the rest of the data. Disaggregation is not an exact science. The results can not be perfect but can on average be closer to a

Figure 16 - Census Data, Aggregated, Disaggregated, Surveyed

Page 32 of 67

true representation of spatial locations than district level aggregated data. Visually comparing the Disaggregator’s output with the surveyed data (and other maps) indicates the remap tables and weighting need to be adjusted to group the commercial risks more tightly together. More risks should to be assigned to the city and town areas instead the densely forested areas. Determining the difference between the topological characteristics of the low and high risk number areas is essential to improving the remap tables and thus the output of the Disaggregation Tool. The graphs shown in figures 13 & 14 were for the whole of Japan whereas the shapefile commercial risk data provided was only for Tokyo and surrounding districts. This does not mean the Japan graphs should be discounted entirely, but producing graphs on the localised area should give results more accurately. DEM & Slope: The two diagrams in figure 17 show the DEM and Slope for Tokyo and the surrounding areas. On both diagrams, grey represents No Data areas (areas outside the shapefile’s boundaries). Light to dark areas of green indicate low to high elevation on the DEM. Green to yellow to red indicates flat to steep on the slope map.

Figure 17 - DEM and Slope Datasets By comparing a graph of the surveyed data against elevation to the Disaggregator’s output data against elevation we can see where emphasis could be applied in an attempt to make the Disaggregator’s output match the surveyed data. DEM: The graph in figure 18 shows a significantly larger number of commercial risks per unit area in the 101-200m range from the surveyed data then from the Disaggregator’s results. Discounting the few irregular bumps, both graphs represent shifted Poisson curves. This type of pattern is a common in spatial mapping and can arise seemingly by random. In many countries, people tend to live at lower elevations due to easier transportation links. There are notable exceptions (such as Chile) but there is not enough detail to accurately predict residential and commercial risk locations, rather just decide where they only occur in tiny quantities. The similarity between the data series is mainly due to the other datasets (ISA and Land Use) as the DEM was used as a mask. Areas with elevations greater than 1200 meters were set to 0 while everything was set to 1. This prevents any commercial risks being located in areas higher than 1200 meters.

Page 33 of 67

Commercial Risks against DEM by unit area

0

10

20

30

40

50

60

70

80

90

100

110

01-1

00

101-2

00

201-3

00

301-4

00

401-5

00

501-6

00

601-7

00

701-8

00

801-9

00

901-1

000

1001

-1100

1101

-1200

1201

-1300

1301

-1400

1401

-1500

1501

-1600

1601

-1700

1701

-1800

1801

-1900

1901

-2000

2001

-2100

2101

-2200

2201

-2300

2301

-2400

2401

-2500

2501

-2600

2601

-2700

Elevation (meters)

Com

mer

cial

Ris

ks

Actual:DEM-UnitResult:DEM-Unit

Figure 18 - Graph of Commercial Risks against DEM by unit area Slope: The graph in figure 19 shows the surveyed data and Disaggregator’s results follow a similar pattern. The surveyed data has nearly double the number of commercial risks in the 0, 1 and 2 degree ranges than the Disaggregator’s results. The similarity has been mainly caused by the other datasets. The Slope dataset was remapped to a mask. Areas with a slope of 20 degrees or less were assigned a value of 1. Everything else was set to 0. This prevents any commercial risks being located in areas steeper than 20 degrees.

Commercial Risks against Slope by unit area

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Degrees

Com

mer

cial

Ris

ks

Actual:Slope-UnitResult:Slope-Unit

Figure 19 - Graph of Commercial Risks against Slope by unit area If the DEM dataset was not used as a mask, but ranges of elevations were remapped in a similar manner to the Land Use and ISA datasets, then more emphasis could be given to the lower elevations. Similarly the Slope dataset could be remapped so various degrees of slope had more emphasis, then the Disaggregator would assign more commercial risks to these areas, however, as shown in the ‘Determining Remap Values’ section earlier, we know the

Page 34 of 67

DEM and Slope datasets have relatively low significances to the locations of commercial risks compared with ISA and Land Use. This is why it is acceptable to just use them to mask out just the extremes of their datasets. Land Use / Land Cover: In Figure 20 the left image is the Land Use map for Tokyo and its surroundings. The red areas show Urban and Built-Up Land. The blue shows bodies of water and the green areas show a mixture of grassland, cropland and forests. (For a complete key, see Figure 1 in Source Data). The right image shows the results of the Disaggregator modifying the Land Use. Urban areas have expanded and water bodies have shrunk. Other areas have been modified too but as will be shown, these are less important changes.

Figure 20 - Land Use/Cover and Modified Land Use/Cover datasets The graph below (Figure 21) compares the survey risks with the disaggregated risks against not only Land Use but also the modified Land Use dataset. Category 1, ‘Urban and Built-up Land’, stands out as the most significant and the disaggregated results appear to be lacking in this category. This would indicate significantly more weight needs to be given to urban areas.

Commercial Risks against Land Use / Cover by Unit Area

0

100

200

300

400

500

600

1 Urba

n and

Built-U

p Lan

d

2 Dryl

and C

roplan

d and

Pastur

e

3 Irrig

ated C

roplan

d and

Pastur

e

5 Crop

land/G

rassla

nd M

osaic

6 Crop

land/W

oodla

nd M

osaic

7 Gras

sland

8 Shru

bland

9 Mixe

d Shru

bland

/Gras

sland

10 S

avan

na

11 D

ecidu

ous B

roadle

af Fore

st

12 D

ecidu

ous N

eedle

leaf F

orest

13 E

vergr

een B

roadle

af Fore

st

14 E

vergr

een N

eedle

leaf F

orest

15 M

ixed F

orest

16 W

ater B

odies

17 H

erbac

eous

Wetl

and

19 B

arren

or Spa

rsely

Vegeta

ted

21 W

oode

d Tun

dra

Com

mer

cial

Ris

ks

Actual:LU-UnitResult:LU-UnitActual:ModLU-UnitResult:ModLU-Unit

Figure 21 - Graph of Commercial risks against Land Use

Page 35 of 67

Similarly, comparing the urban modified land use shows more weight is needed but not as much. Due primarily to the age of the Land Use dataset (see Source Data) the AIR Exposures group were not willing to base to much of the disaggregation on this source data. By running the filter that expanded the urban areas and reduced the water bodies (forming the image to the right of Figure 20) a smoother pattern could be used as an indicator for risk location. Then the Impervious Surface Area dataset (Figure 22) could be given the majority of the weighting during the disaggregation process.

Figure 22 - ISA and Smoothed ISA datasets The left of Figure 22 shows the constructed impervious surfaces around Tokyo while the right shows the smoothed output of the ISA dataset from step 1 of the Disaggregator once the smoothing filter has been applied. While the left figure is more accurate, the resulting data does not produce desirable results from the AIR catastrophe models. A smoothed input results in paradoxically more accurate loss predications from the models.

Commercial Risks against Impervious Surface Area by unit area

0

100

200

300

400

500

600

700

800

900

-1 0 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100

ISA Value

Com

mer

cial

Ris

ks

Actual:ISA-UnitResult:ISA-UnitActual:ISAmod-UnitResult:ISAmod-Unit

Figure 23 - Graph of Commercial Risks against Impervious Surface Areas The graph in Figure 23 shows the surveyed and disaggregated commercial risks plotted against the ISA values. There is a very significant increase in commercial risks in ISA areas of 91 – 100. The disaggregated risks follow this pattern but do not appear to distribute enough

Page 36 of 67

commercial risks to this category. In other tests of the Disaggregator, more extreme weightings were applied to the ISA values. The results were more commercial risks clustered in the high ISA value areas.

Figure 24 – Tokyo: Surveyed and Disaggregated Commercial Risks Visually comparing the surveyed results with the disaggregated results (Figure 24) shows the high concentrations of commercial risks (blue areas) match up quite accurately. The main problem rests in reducing the large areas of few commercial risks (red and orange areas). To do this some property of the topology of these zero commercial risk areas must be determined. Weightings could then be adjusted and the Disaggregator would give more accurate results. Completely accurate results are not possible due to the required smoothing of datasets. The problem with making the weightings fit the Tokyo (& surroundings) distribution more closely is other locations in the World have different characteristics. The primary goal of this dissertation was to create the Disaggregation Tool and include a set of remapping tables (in effect the weightings) that give a reasonably good match (defined by catastrophe model output) for most Worldwide locations and will form a basis for research and development in the AIR Exposures group. The AIR Exposures group will produce remapping tables customised for many specific locations. They will then use this general base model for comparison.

Page 37 of 67

The Disaggregator: User Guide Start ArcMap 9.2 (or above) and open DisaggregationTool.mxd The following layers need to be loaded into ArcMap before the Disaggregation Tool is started Grid_dem :ESRI GRID formatted Digital elevation model file Grid_landluse :ESRI GRID formatted land use file Grid_isa :ESRI GRID formatted impervious surface area file Grid_misc (optional) :ESRI GRID formatted file Boundary Shapefile :ESRI shapefile containing boundary data with unique identifiers and

data values Config.txt :Disaggregation Tool configuration file (NB this will not appear on

ArcMap layers list unless Source view is chosen) All GRID files and shapefiles should be formatted in the same geographic projection. For file details see section below. Additionally the Remap directory should contain appropriate files (details below). All of the options below can be set by the Config.txt file and will appear as default options when the program is started. To start, click on the Disaggregator icon.

Figure 25 - ArcMap with Disaggregation Tool

Page 38 of 67

Program Main Window: • Choose Shapefile • Choose mode from Agricultural, Apartments (Residential), Commercial, Industrial or

Single Family Home (residential). • Check destination and Remap configuration file directories • Choose if temporary layers should be closed. • Current status window • Button to Run All 4 steps

Step 1:

• Confirm source DEM layer name • Confirm source Land Use layer name • Confirm source Impervious Surface Area (ISA) layer name • Choose optional Miscellaneous layer • Choose coordinates for cropping (manual or based on shapefile) and decide on

integer or decimal values. • Button to run step 1: Crop Maps and Create Slope

Figure 26 - Disaggregation Tool Step 1

Page 39 of 67

Step 2: • Confirm which attribute is being used for remapping • Decide if DEM and Slope are to be weighted or used as masks • Confirm layer names for cropped DEM, cropped Slope, cropped Land Use and

cropped ISA (and optionally cropped miscellaneous) layers. • Set weighting values for each layer (if you want a layer ignored, set weighting to a

very small positive number, e.g. 0.00000001) • Choose if temporary layers should be closed. • Button to run step 2: Create Suitability GRID

Figure 27 - Disaggregation Tool Step 2

Page 40 of 67

Step 3: • Choose the output GRID file name (name will be cropped to 13 characters) • Choose output pixel size by entering numerator and denominator values • Confirm suitability grid source layer name (output of step 2). • Select unique identifier from shapefile fields from a drop down list • Confirm shapefile value attribute (chosen by Shapefile Attribute Selection mode

earlier) • Confirm the rural, suburban and metropolitan mask layer names • Choose if temporary layers should be closed. • Button to run step 3: Distribute Risks

Figure 28 - Disaggregation Tool Step 3

Page 41 of 67

Step 4: • Choose source layer

This field is populated automatically at the end of step 3 but can be changed if required.

• Confirm the shapefile unique identifier field To change this field, change the corresponding field in tab ‘Step3’

• Confirm the shapefile value field To change this field, change the corresponding field in tab ‘Step3’

• Confirm the list of boundary shapes missing due to pixel resolution This field is populated automatically at the end of step 3 The list shows the unique identifiers for the shapefile boundaries

• Button to run step 4: Create Final GRID

Figure 29 - Disaggregation Tool Step 4

Page 42 of 67

Program Finished: • Close Program Button visible in upper right • Open final GRID text file directory button visible in lower left next to status window

Figure 30 - Disaggregation Tool Finished Once the program has finished, press the ‘Close Program’ button and accept the confirmation screen. The disaggregation tool will close and return control to ArcMap. ArcMap will have 9 new additional layers (if the close temporary layer option is set) open with only the top layer selected. If a miscellaneous layer has been included, there will be an additional layer created. TokSurE-03 The final integer raster grid file showing the disaggregated risks

(created Step 3) finalprob Floating point suitability raster grid file. (created Step 2) isa_metro Raster grid file showing metropolitan areas (created Step 1) isa_sub Raster grid file showing suburban areas (created Step 1) isa_rural Raster grid file showing rural areas (created Step 1) modcropisa Cropped ISA file that has been ‘smoothed’ by a 3*3 mean function

(created Step 1) modcroplandu Cropped (to area specified in step 1) raster Land Use file that has had

all urban areas expanded on all sides by one pixel (created Step 1) cropslope Cropped (to area specified in step 1) raster Slope file (slope in

degrees) (created Step 1) cropdem Cropped (to area specified in step 1) raster DEM file (created Step 1)

Page 43 of 67

Figure 31 - ArcMap showing Step 3 raster output Other Output Files Created: Location: Directory specified in status window after step 4 (can be open by button

adjacent to status window in Disaggregation Tool). TokSurE-03-Final.txt X,Y text file of all points from final integer

raster file (TokSurE-03). For details see below.

TokSurE-03-FinalModified.txt X,Y text file of all points from final integer raster file (TokSurE-03) with rounding error correction. For details see below.

TokSurE-03-ZoneValueChecker.txt List of shape file unique IDs, total number of risks expected and total number found before rounding error correction. For details see blow.

TokSurE-03-MissingShapes.txt X,Y text file of all boundary shapes missed due to pixel resolution or very low suitability. For details see below.

TokSurE-03-Log.txt Log text file of all settings and files used during all steps of the Disaggregation tool.

Detailed File Information: Remap Directory Contents

Remapping Files: There are five versions of the seven files below. Shown are the APR – Apartment files. The other versions are AGR – Agricultural, COM – Commercial, IND – Industrial, SFH – Single Family Home.

Page 44 of 67

APR-DEM-Mask.txt Text file to set DEM elevation value for masking APR-DEM-Remap.txt Text file to specify remapping values for DEM APR-ISA-Remap.txt Text file to specify remapping values for ISA APR-LU-Remap.txt Text file to specify remapping values for Land Use APR-MISC-Remap.txt Text file to specify remapping values for Miscellaneous data APR-SLO-Mask.txt Text file to set Slope degree value for masking APR-SLO-Remap.txt Text file to specify remapping values for Slope Additional Files: LandUseStep1-1-Remap.txt Text file used in step 1 for modifying Land Use LandUseStep1-2-Remap.txt Text file used in step 1 for modifying Land Use CONFIG.txt Configuration file for Disaggregation Tool CreateRemapFile.xsl Microsoft Excel file to aid in creating remapping files

Output Files: TokSurE-03-Final.txt Example content:

PolyID,GridCellID,Longitude,Latitude,Value 23160,162602052,2.5375,51.0875,194 23160,162702052,2.545833,51.0875,186 23160,162302051,2.5125,51.079167,268

First line has column headers and each subsequent line represents one pixel from the final integer raster grid (TokSurE-03). PolyID: Unique identifier of boundary in shapefile that holds the pixel’s centre point GridCellID: Derived unique identifier based on number of pixels from a config.txt chosen

start point. 162602052 means 1626 pixels east and 2052 pixels north of starting pixel.

Longitude: Decimal longitude of pixel centre point Latitude: Decimal latitude of pixel centre point Value: Integer value specifying number of risks disaggregated to this pixel TokSurE-03-FinalModified.txt Same format as TokSurE-03-Final.txt

By totalling the number of risks by boundary, the expected risk number and the actual risk number are compared. Any additional risks are removed (one by one from each point in a boundary in order from most risks to least) or if risks are missing, they are added in a similar manner (one by one to each point in a boundary in order from most risks to least).

TokSurE-03-ZoneValueChecker.txt

Example content: PolyID,Expected,Value 23610,1226,1223 37510,255,255 00606,54,58

First line has column headers and each subsequent line represents one boundary from the shapefile. PolyID: Unique identifier of boundary in shapefile Expected: Number of risks in that shapefile boundary Value: Total number found in each boundary in file TokSurE-03-Final.txt

Page 45 of 67

When allocating risks to each pixel, the risks are allocated as decimal numbers and then rounded to integers. Rounding the numbers can lead to extra risks or fewer risks than expected. The Disaggregator uses the information shown in this file to produce the TokSurE-03-FinalModified.txt file ensuring all risks are disaggregated. TokSurE-03-MissingShapes.txt Same format as TokSurE-03-Final.txt except for column headers.

PolyID,NearGridID,Longitude,Latitude,Value PolyID: Unique identifier of a boundary in the shapefile that was missed as there were

no pixel centre points within the boundary or the suitability grid combined with the number of risks yielded results too low to disaggregate the data.

NearGridID: Derived unique identifier of the closest pixel to the centre point of the missed boundary shape.

Longitude: Decimal longitude of missed boundary centre point Latitude: Decimal latitude of missed boundary centre point Value: Integer value specifying number of risks of the missed boundary In some occasions, the boundaries in the shapefile will not contain the centre-point of any pixel (the number of these is very dependent on pixel resolution) and so will not be included in either TokSurE-03-Final.txt or TokSurE-03-FinalModified.txt. In other circumstances the pixels in some boundary areas might have a very low suitability over a large area combined with a small number of risks to distribute. The result after rounding is a boundary with zero risks and so will not be in TokSurE-03-Final.txt or TokSurE-03-FinalModified.txt. These missing boundaries are listed in this file along with the longitude and latitude of their centroid and the GridID they would be assigned. The number of risks contained in these boundaries is also recorded. Default Program Assumptions

• Sea level is at 0 meters • No buildings in water – Land Use value 16, DEM value -1, ISA value -1 • No buildings over 1200m elevation • Few or no buildings on Snow and Ice or Tundra – Land Use values 20-24 • No buildings on slopes greater than 20° • Gas flares (and related) have been effectively masked out during ISA creation • Setting ground below sea level to an elevation of 1m will not adversely affect the

results – no slopes over 20° in these areas.

Page 46 of 67

The Disaggregator: Program Order of Operation Program: Step 1 Crops maps to area –by max extent from a Shapefile (with a buffer of 0.1) or by user entered input Crops DEM, LandUse and Impervious Surface Area (ISA) (Misc layer if selected) If the user has specified to only use whole integer values, there will be no buffer applied. Creates Slope map Z-buffer is needed as x,y coords different to z.

Find the latitude mid-point and work out the z-buffer for use with slope 1 degree = 0.0174532925 radians

DegToRad As Double = 0.0174532925 dblResult = Ymax - Ymin dblResult = Ymax - (dblResult / 2.0) dblResult = dblResult * DegToRad Z-buffer = 1.0 / (113200 * Cos(dblResult)) Slope created in degrees from 0 to 90

Adjusts Land Use a) Code Description Reclassed

16 Water Bodies Reclassed as : 25 b) Apply single pass 3*3 MINIMUM window.

Increases size of Urban while decreasing Water Bodies. Urban areas tend to grow. Buildings tend to cluster near water.

c) Code Description Reclassed 25 Water Bodies Reclassed as : 16

Adjusts ISA

Apply a single pass 3*3 MEAN window. Smoothes out values. Allows surrounding areas to have an affect upon each other. The AIR Exposures group require smoothed results for its models.

Creates Rural, Suburban and Metropolitan Masks Using the smoothed ISA file, three ‘mask’ rasters are created.

NB: The value below are for example only. The actual values used are loaded from the CONFIG.TXT file at run time.

Rural Mask = ISA values of 0 to 30 Suburban Mask = ISA values of 31 to 80 Metropolitan Mask = ISA values of 81 to 100

Page 47 of 67

Program: Step 2 Choice – Treat DEM and Slope datasets as masks or as weighted datasets If treating as datasets:

DEM, Slope, Land Use and ISA given weighting (weights multiply the remapped values and do not have to total to 100)

If treating as masks: Land Use and ISA given weighting (weights multiply the remapped values and do not have to total to 100)

If the Miscellaneous layer is included, it will be included in the same manner as the Land Use and ISA .

Each dataset reclassed based on the remap text files

E.g.: C:\Work\Code\Data\ReMapTables\ COM-ISA-Remap.txt If treating DEM and Slope as datasets: E.g.: C:\Work\Code\Data\ReMapTables\ COM-DEM-Remap.txt

DEM: -10000 0 : 0 1 2000 : 10 2000# 3000 : 2 3000# 9000 : 0

Slope: 0 19# : 10

19.1 24# : 3 24.1 29# : 2 29.1 44# : 1 44.1 90# : 0

If treating DEM and Slope as masks: E.g.: C:\Work\Code\Data\ReMapTables\ COM-DEM-Mask.txt

DEM: -10000 0 : 0 1 1200 : 1 1200# 90000 : 0

Slope: 0 20 : 1

20# 90# : 0 Land Use / Land Cover: E.g.: C:\Work\Code\Data\ReMapTables\ COM-LU-Remap.txt

1 1 : 4 2 2 : 2 3 3 : 2 4 4 : 0 5 8 : 1 9 9 :0 10 15 :1 16 24 : 0 100 100 : 0

ISA: E.g.: C:\Work\Code\Data\ReMapTables\ COM-ISA-Remap.txt

-1# -1 : -0 0 0 : 3 0# 1000000 : 19 1000000# 2000000 : 30 2000000# 3000000 : 39

Page 48 of 67

3000000# 4000000 : 46 4000000# 5000000 : 52 5000000# 6000000 : 57 .... .... .... 98000000# 99000000 : 700 99000000# 100000000 : 703

Once the datasets have been remapped, the following calculations take place. If treating DEM and Slope as datasets:

Reclassed DEM, Slope, Land Use and ISA datasets are each multiplied by their respective weightings then all are multiplied together. (Misc layer optional)

If treating DEM and Slope as masks:

Reclassed Land Use and ISA datasets are each multiplied by their respective weightings then all layers are multiplied together. (Misc layer optional)

The resulting dataset is then has a single pass 3*3 MEAN window applied. This produces a smoother suitability grid on which to distribute the risks. The AIR Exposures group require smoothed results for its models. The resulting grid is called a suitability grid. It is saved with the name ‘finalprob’ (specified in the CONFIG.TXT file) and is ready for step 3.

Program: Step 3 The “source” suitability grid is resampled to the specified grid size (resolution) using bilinear interpolation

Creates temporary grid: “a” The rural, suburban and metropolitan masks are resampled to the specified resolution. Creates temporary grids: “rm”, “sm”, “mm”

The shapefile along with a value field (e.g.: NUM_HOUSES) is converted to a raster at the specified resolution.

Creates temporary grid “b” The shapefile along with a unique identifier field (e.g.: KUID) is converted to a raster at the specified resolution. Creates temporary grid “c” A check is run to determine if any districts from the shapefile are missing from the raster. This can occur if a district does not contain the centre point of any pixel. All negative values in “a” are set to 0 (This is a failsafe as there shouldn’t be any negative numbers in the suitability gird).

Creates temporary grid “d” The Spatial Analyst tool, ‘Zonal Statistics’ then uses the Unique ID raster “c” and grid “d” to sum all values of “d” within each zone as defined by “c”. Creates temporary grid “e” Dividing grid “d” by grid “e” results in a normalised by district grid. All pixels within a district add up to one.

Creates temporary grid “f” Multiplying the value raster “b” with the normalised grid “f” results in a floating point raster. Creates temporary grid “g”

Floating point raster “g” has 0.5 added to all values, then it is converted to an integer grid (rounding down all values). This resulting raster grid values are the number of houses (or whatever is described by the value field in “b”) in each grid cell location.

Page 49 of 67

Creates temporary grid “h” If there are missing districts, the program creates a lit of their unique identifiers (e.g.:KUID). Creates an SQL statement and displays it in Step4 and the log file Using the masks “rm”, “sm” and “mm” and the grids “c”, “f”, “g” and “h”, the Disaggregator attempts to limit the number of risks per pixel depending on if they are in a rural or suburban or metropolitan area. The remaining risks are redistributed through the district. The program only applies this filter once so pixels can end up with a greater number of risks than the limit if the district is densely packed with risks. Creates temporary grids “i” and “j” When rounding the floating point values to integer, any values less than 0.5 get set to 0 and lost. The program adds up these lost risks and redistributes them throughout the districts. Uses raster grids “i”, “j”, “c” and “f”. Creates temporary grid “k” Finally the “k” grid is saved to a CONFIG.TXT specified directory with names denoting their origin based on a user specified stem. There is a limit of 13 characters for ESRI GRID names (no extension). The Disaggregation Tool will cut a user supplied name to 13 characters. If the user supplied name is shorter than 13 characters, the Tool will add details of pixel resolution to increase the file name.

For example if the user types “TokSur” with grid size “0.00833333333” and selects commercial weightings, the following grid is be created: Creates final permanent grid “TokSurE-03”

If the checkbox is selected (selected by default), all temporary grids are then closed. In addition to the temporary grids named above, numerous other grids are created but not passed between functions.

Program: Step 4

The output raster from step3 (e.g. “TokSurE-03”) is converted to an ASCII file as required by the AIR Exposures group. File format described below. Program converts the output from step3 to an ESRI defined ASCII file. Program reads in header information ncols Number of columns in the raster nrows Number of rows in the raster xllcorner Longitude of lower left corner of the raster yllcorner Latitude of lower left corner of the raster cellsize Cellsize of raster (user specified in step3) NODATA_value Value of pixel if outside the shapefile’s boundaries Program then analyses rest of the ASCII file. One line is read from the file. Single characters are read and appended together until a space or the end of the line is reached. This if this string does not equal the NODATA value of zero, it represents a pixel to be saved. Using the ncols, nrows, xllcorner, yllcorner, cellsize, current row and current column an entry for the pixel is stored in a dynamic array ( m_ArrayOfValues() ). Five fields are stored for each pixel (and for the AIR Exposures ASCII file).

PolyID, GridCellID, Longitude, Latitude and Value Value: This is the string read from the ASCII file. It represents the number

of risks at that location.

Page 50 of 67

Longitude: The current column number minus 0.5, multiplied by the cellsize and then added to the xllcorner value gives the longitude of the centre point of the pixel.

Latitude: NB: The current row is counted from the top of the file. The number of rows minus the current row minus a half, multiplied by the cellsize and then added to the yllcorner value gives the latitude of the centre point of the pixel.

GridCellID: AIR Exposures unique ID for the pixel. Number of columns from m_XSTART (i value), at least one separating zero, then number of rows from m_YSTART (j value). The AIR Exposures team currently has this defined in the CONFIG.txt file as 11° West, 34° North. To deconstruct the i and j values, take the integer result of dividing GridCellID by 100,000 (specified in CONFIG.txt), this give the i value. Taking the modulus of the GridCellID and 100,000 gives the j value. One divided by the cellsize multiplied by the difference between the xllcorner and m_XSTART. That is added to the current column number and gives the i value. One divided by the cellsize multiplied by the difference between the yllcorner and m_YSTART. That is added to the number of rows minus one less than the current row number. This gives the j value. The m_IJMODIFIER is used to space the i and j values apart. It ensures dividing the GridCellID by 100,000 always gives the i value whether the j value is 5 or 50,000. The GridCellID has been designed to be large enough to hold values for the entire world at a grid cell resolution of 1/120 decimal degrees (the resolution of the source data).

PolyID: This processor intensive section locates the shapefile district that contains the centre point of the current pixel and saves the shapefile district’s unique identifier. To improve search times, all shapefile districts have been pre-loaded into an array. An ESRI point feature is created using the pixel centre point’s longitude and latitude. Then the ‘contains’ spatial query is run on the last district to contain a pixel. If the current pixel does not fall within that district, every district is checked until the appropriate one is found. This process also populates another array ( m_ZoneCountingArray() ) that is later used to check if all risks from all districts have been distributed.

Once the raster in ASCII format file has been read, the array ( m_ArrayOfValues() ) is used to create the AIR Exposures ASCII file (e.g. TokSurE-03-Final.txt ). Then the array is sorted by district then by number of risks. If all the risks have not been distributed throughout a district (caused by rounding errors) then the extra or missing are added/removed one at a time from each pixel, starting with the pixel with the greatest number of risks. This sorted and modified array is then used to create another AIR Exposures ASCII file (e.g. TokSurE-03-FinalModified.txt ). The program then creates an array (m_ArrayOfMissingShapes() ) of all shapefile districts that have not been included in the earlier files. These are all districts that did not contain the centre point of any pixel plus all districts that have such low suitabilities, rounding to integer values left them with zero risks allocated. This array is used to create a third AIR Exposures ASCII file (e.g. TokSurE-03-MissingShapes.txt). The longitudes and latitudes are of the centre points of the districts.

Page 51 of 67

A file is created containing a list of all shapefile districts, with number of risks to be distributed and the number that were distributed in the first file (e.g. TokSurE-03-Final.txt ). Finally the program creates a log file with all run time configuration and layer information.

Recommendations for Enhancement

• Currently these are optimised for commercial risks. New tables should be researched and designed for the other four categories currently available (Apartments, Agriculture, Industrial and Single Family Homes).

• Analyse surveyed data for multiple locations to refine and modify remapping tables for different locations around the world.

• Increase the number of categories (and add relevant remapping files). • Increase the number of source datasets ensuring each new dataset has relevant

remapping tables. • Locate higher resolution source datasets • Move program out of ArcMap and recode in a lower level language to allow

compiling and faster run times. • Once high resolution worldwide road maps become available, implement road

intersection node analysis as part of the disaggregation methodology.

Page 52 of 67

References AIR Worldwide Corporation (2008). Internship and project sponsor. http://www.air-worldwide.com 14th July 2008 – 6th January 2009 Eguchi, R.T., Huyck, C.K., Ghosh, S. and Adams, B.J. (2008). The Application of Remote Sensing Technologies for Disaster Management. Presented at: The 14th World Conference on Earthquake Engineering, October 12-17, 2008, Beijing, China Elvidge, C. D., Tuttle, B., Sutton, P., Baugh, K. E., Howard, A. T., Milesi, C., Bhaduri, B.L. and Nemani, R. (2007). Global Distribution and Density of Constructed Impervious Surfaces http://www.ngdc.noaa.gov/dmsp/pubs/ISAglobal_20070921-1.pdf Sensors ISSN: 1424-8220 (Online) ESRI Support Centre - Z-Buffer Algorithm HowTo: Create a hillshade or slope using data in Geographic coordinates Article ID: 29366 http://support.esri.com/index.cfm?fa=knowledgebase.techarticles.articleShow&d=29366 (Last visited 19 January 2009) Hofstee, P. and Islam, M. (2004). Disaggregation of Census Districts: Better Population Information for Urban Risk Management. 25th Asian Conference on Remote Sensing, Chiang Mai, Thailand, 2004 Google Maps Aerial Photograph of 4.885211E 45.701438N http://maps.google.com/maps?f=q&hl=en&geocode=&q=4.885211E+45.701438N&sll=37.0625,-95.677068&sspn=35.768112,56.601563&ie=UTF8&t=k&z=16&g=4.885211E+45.701438N (Last visited 19 January 2009) Microsoft Live Earth 1 http://maps.live.com/default.aspx?v=2&FORM=LMLTCP&cp=rmh85jhd6kjg&style=o&lvl=1&tilt=-90&dir=0&alt=-1000&scene=11064399&phx=0&phy=0&phscl=1&encType=1 (Last visited 19 January 2009) Microsoft Live Earth 2 http://maps.live.com/default.aspx?v=2&FORM=LMLTCP&cp=rmhmyrhd678d&style=o&lvl=1&tilt=-90&dir=0&alt=-1000&scene=11064362&phx=0&phy=0&phscl=1&encType=1 (Last visited 19 January 2009) Microsoft Live Earth 3 http://maps.live.com/default.aspx?v=2&FORM=LMLTCP&cp=rmh0x3hd68rs&style=o&lvl=1&tilt=-90&dir=0&alt=-1000&scene=11064408&phx=0&phy=0&phscl=1&encType=1 (Last visited 19 January 2009) Miller, J.B., Cunningham, D., Koeln, G., Way, D., Metzler, J. and Cicone, R. (2002). Global Database for Geospatial Indicators. ISciences, L.L.C. http://www.isciences.com/assets/pdfs/GI_GlobalDatabase.pdf (Last visited 21 January 2009) NOAA/NGDC ISA National Oceanic and Atmospheric Administration (NOAA) National Geophysical Data Center (NGDC) Earth Observation Group – Defense Meteorological Satellite Program (DMSP)

Page 53 of 67

Global Distribution and Density of Constructed Impervious Surface Areas (ISA) http://www.ngdc.noaa.gov/dmsp/download_global_isa.html Direct File Link ftp://ftp.ngdc.noaa.gov/DMSP/web_data/isa/ngdc_isa_gcs_product.tgz (Last visited 19 January 2009) Steinnocher, K., Weichselbaum, J. and Köstl, M. (2006). Linking Remote Sensing and Demographic Analysis in Urbanised Areas. Presented at: First Workshop of the EARSeL. Special Interest Group on Urban Remote Sensing. Humboldt-Universität zu Berlin, Germany. 2-3 March 2006 http://www.earsel.org/workshops/SIG-URS-2006/PDF/Session1_steinnocher.pdf (Last visited 21 January 2009) USGS Global Land Cover Characteristics Data Base Version 2.0 Geographic Projection File Link http://edc2.usgs.gov/glcc/tabgeo_globe.php Documentation http://edc2.usgs.gov/glcc/globdoc2_0.php (Last visited 19 January 2009) USGS Global 30 Arc-Second Elevation Dataset (GTOPO30) http://eros.usgs.gov/products/elevation/gtopo30.html Documentation http://eros.usgs.gov/products/elevation/gtopo30/README.html (Last visited 19 January 2009) Voss, P.R., Long, D.D. and Hammer, R.B (1999). When Census Geography Doesn’t Work: Using Ancillary Information to Improve the Spatial Interpolation of Demographic Data. Center for Demography and Ecology. University of Wisconsin-Madison. Working Paper No. 98-107 http://www.ssc.wisc.edu/cde/cdewp/99-26.pdf (Last visited 21 January 2009) Wright, J.K. (1936). A Method of Mapping Densities of Population: With Cape Cod as an Example. Geographical Review, Vol. 26, No, 1 (Jan., 1936), pp. 103-110. Published by: American Geographical Society. http://www.jstor.org/pss/209467 (Last visited 21 January 2009) Xie, Y. (1995). The Overlaid Network Algorithms for Areal Interpolation Problem. Computer, Environment and Urban Systems, Vol 19, No. 4, pp. 287-306 Zhu, M., Feng, Z., Tong, H., Liu, Y. and Yang, Y. (2008). Study on Digital Earthquake-before Damage Evaluation. Presented at: The 14th World Conference on Earthquake Engineering, October 12-17, 2008, Beijing, China

Page 54 of 67

Appendix The Disaggregator: subroutine and function details The Disaggregation Tool was originally designed as three, then four separate programs. After working versions of the four programs were created and tested, AIR Exposures group requested the programs were combined into one application to allow unattended execution of all four steps. The program and code structure reflect the original design of four separate programs. To view the actual code, load the DisaggregationTool.mxd file into ArcMap and enter the Visual Basic editor. Program – Initialisation ThisDocument Used primarily for setting program constants which are read from a configuration text file. This file MUST be loaded into ArcGIS. Check the Table Of Contents, ‘Source’ view for CONFIG.txt The config.txt file must be formatted and contain the following fields: NB: No quotes allowed. No commas except for between field names. Each Constant must be on a new line Example Contents of CONFIG.txt

"Constant_Name", "Value", "Description" ,, Blank lines and comments like this are ignored SUGGESTED_SHAPE_FILE_NAME,Tokyo_surroundings_shape,Default name of shapefile to use SUGGESTED_SHAPE_FILE_ID_FIELD,4,Number of column in shapefile holding Default unique identifier SUGGESTED_GRID_NAME,TokSur,Stem of name for final output file SUGGESTED_WEIGHT_DEM,5,Default weighting for DEM SUGGESTED_WEIGHT_SLO,5,Default weighting for Slope SUGGESTED_WEIGHT_LU,20,Default weighting for LandUse SUGGESTED_WEIGHT_ISA,80,Default weighting for ISA SUGGESTED_PIXEL_NUMERATOR,1,Numerator of pixel resolution SUGGESTED_PIXEL_DENOMINATOR,120,Denominator of pixel resolution DESTINATION_PATH_TOP,C:\Work\Code\Data\Output\,Location for files DESTINATION_PATH1,Step1\,Subdirectories for each program step ……

Functions and Subroutines: NB: When directories are checked and more than 100 subdirectories exist, the program calls ThisDocument.OpenDirectory() NB: Most functions call frmDisaggregator.AddLogEntry() multiple times during operation to populate the log array Public Function CheckOutSpatialAnalystLicense() As Boolean Returns if able to obtain the Spatial Analyst License. Public Sub LoadConfigFromFile() Loads constant values from CONFIG.txt file Public Sub Assign_Values(choice As Integer) Copies the constant values to hidden labels and text fields in each of the 3 forms

Page 55 of 67

Private Sub UIButtonControl4_Click() Calls LoadConfigFromFile() and CheckOutSpatialAnalystLicense() then loads the frmDisaggregator

Public Function CreateFltConstant(value As Double) As IGeoDataset Returns a GeoDataset (raster) with ‘value’ in every cell Public Function DirExists(ByVal DName As String) As Boolean

Returns TRUE if the specified directory exists Public Function FileExists(FileName As String) As Boolean

Returns TRUE if the specified file exists Public Sub OpenDirectory(path As String) Uses (Windows) explorer to open the specified directory separately to ArcGIS Program: frmDisaggregator Opened, waiting for user commands List of Routines and Functions Private Sub chkbDEMSlopeMask_Click() Controls if the user wants to treat the DEM and Slope layers as masks Private Sub chkbMiscLayer_Change() Private Sub chkbMiscLayer_Click() Private Sub chkbMiscLayer1_Click() Private Sub chkbMiscLayer2_Click() These subroutines allow the user to include an optional miscellaneous source layer Private Sub chkbShapefileCoords_Click() Private Sub chkbShapefileCoords2_Click()

Allows the user to instruct the program to crop the source layers to the bounding rectangle of the shapefile

Private Sub chkbWholeValuesOnly_Click() Converts the ‘crop to’ coordinates to integers. Rounding function always increases total area to be cropped i.e.: -34.7°, -23.2° by 4.2°, 2.8°

Becomes: -35°, -24° by 5°, 3° Private Sub chkbxCloseTemp_Click() Private Sub chkbxCloseTemp2_Click() Private Sub chkbxCloseTemp3_Click() Instructs the program to close temporary layers after each step completes Private Sub cmdFinalDir_Click() Opens the directory containing steps 3 and 4’s output Private Sub HideLastButton() Hides the cmdFinalDir button Private Sub cmdCheckDir1_Click() Opens the step1 output directory Private Sub cmdCheckDir2_Click() Opens the remap files directory Public Sub cmdCloseProgram_Click() Closes the Disaggregation Tol Private Sub cmdRunEveryThing_Click() After checking for confirmation, runs all four program steps. Private Sub cmdShowLog_Click() Opens frmLog Public Sub CloseLogWindow() Closes frmLog Private Sub cmdStep1_Click() Creates log file if it doesn’t exist. Runs step 1 of the program Private Sub cmdStep2_Click() Creates log file if it doesn’t exist.

Page 56 of 67

Runs step 2 of the program Private Sub cmdStep3_Click() Creates log file if it doesn’t exist. Runs step 3 of the program Private Sub cmdStep4_Click() Creates log file if it doesn’t exist. Runs step 4 of the program Private Sub comboAttSel_Change() Reads in the user’s selection for which of the 5 modes to run (AGR,APR,COM,IND,SFH) Private Sub comboShapeFile_Change() Informs the program of a change in specified shapefile Private Sub comboShapeID_Change() Informs the program of a change in specified shapefile unique identifier Private Sub txt4MissingShapes_Change() Informs the program of a change in the list of missing shapefile districts Private Sub txtNWCoorY_Change() Informs the program of a change in the Northern Y coordinate Private Sub txtPixeldenominator_Change() Private Sub txtPixelnumerator_Change()

Informs the program of a change in the pixel resolution denominator or numerator and calls for the recalculation of the decimal value for the pixel resolution.

Private Sub txtSECoorY_Change() Informs the program of a change in the Southern Y coordinate Private Sub FindShapeFiles() Populates the user selectable list of shapefiles from ArcMap’s table of contents. Calls IsFeature() Private Function IsFeature(pLayer As ILayer2) As Boolean Determines if a layer is a shapefile Private Sub txtShapefile_Change() Informs the program of a change in user selected shapefile

Analyses the shapefile and populates the drop down list of possible unique identifiers (all attributes of the shapefile)

Private Sub txtShapeValue_Change() Informs the program of a change in the shapefile value field to be disaggregated Private Sub UserForm_Initialize() Calls for frmStep1 to be loaded and hidden.

Once frmStep1, frmStep2, frmStep3 and frmStep4 are loaded, loads the default values from ThisDocument.

Private Sub UserForm_QueryClose(Cancel As Integer, CloseMode As Integer) Asks for confirmation when the user clicks either the cmdCloseProgram button or the Windows supplied Red ‘X’ button (upper right corner).

Private Sub UserForm_Terminate() Unloads all forms and deletes the log file in memory before closing the disaggregator. Public Function CreateSubDirectories(PathName As String, Part2Name As String) As String Creates directories for output and temporary files Public Function AddLogEntry(field As String, value As String, overWrite As Boolean) As Boolean Adds an entry to the log file array. If the entry already exists, it can be overwritten Public Function ReadLogFile(field As String) As String Returns the specified value by log entry name from the log array Public Function ReadLogFileFieldByNumber(fieldNumber As Integer) As String Returns the name of the log entry specified by number position Public Function ReadLogFileValueByNumber(fieldNumber As Integer) As String

Page 57 of 67

Returns the value of the log entry specified by number position Public Function OutputLogFileToDisk(stem As String, ProgramFinished As Boolean) As String Creates a file version of the log array Public Sub DeleteLogFile() Deletes the log array from memory Public Sub CreateLogFile() Creates a new log array with only column headers Program: Step 1 – frmStep1 List of Routines and Functions Normal Program Order: Private Sub UserForm_Initialize()

Runs when form, frmStep1, is opened. Calls ThisDocument.CheckOutSpatialAnalystLicense() Opens frmStep2 and sets it to hidden. Sets default values from the CONFIG.txt file Attempts to identify the DEM, LandUse and ISA layers Conditions: Name contains ‘grid’ and one of ‘dem’, ‘landu’ and ‘isa’ Places layer names in txtSourceDEM, txtSourceLU and txtSourceISA Displays notification if layers are not found or if too many layers with similar names are found

Private Sub chkbShapefileCoords_Click() If the Checkbox is selected, the txtNWCoorX, txtNWCoorY, txtSECoorX and txtSECoorY boxes are grayed out. The specified shapefile is queried for its envelope (dimensions). A buffer of 0.1 is added to these and the values entered into the txtNWCoorX, txtNWCoorY, txtSECoorX and txtSECoorY boxes.

Private Sub txtNWCoorY_Change() Runs when the value in txtNWCoorY is changed. Calls CalculateZBuffer() with txtNWCoorY and txtSECoorY Private Sub txtSECoorY_Change() Runs when the value in txtSECoorY is changed. Calls CalculateZBuffer() with txtNWCoorY and txtSECoorY Private Sub chkbSourceMisc_Click() Unlocks Miscellaneous layer name and includes with Cropping Private Sub cmdCheckDir_Click() Calls OpenDirectory() with the Cropped directory path name (txtPathName) Public Sub cmdStep1_Click()

Locks the form from user input. Calls cmdCropMaps_Click() Calls cmdCreateSlope_Click() Calls cmdModifyLandUse_Click() Calls cmdModifyISA_Click() Calls RemoveLayers() Unlocks the form for user input Makes the cmdCloseProgram button visible

Private Sub cmdCloseProgram_Click() Unloads form frmStep1 Sub routines grouped by common function: Private Sub cmdCropMaps_Click()

Page 58 of 67

Creates the Cropped directory or a new incremented subdirectory if the top level already exists. Calls CropArea() with txtNWCoorX, txtSECoorY, txtSECoorX and txtNWCoorY

Private Sub CropArea(dblXMin As Double, dblYMin As Double, dblXMax As Double, dblYMax As Double)

Uses dblXMin, dblYMin, dblXMax and dblYMax to create an Envelope. This Envelope specifies what dimensions the cropped datasets will have. txtSourceLU is used to find the LandUse Layer txtSourceISA is used to find the ISA Layer txtSourceDEM is used to find the DEM Layer txtSourceMISC is used to find the MISC Layer (optional) CropLayer() is called with Each layer (in turn), the layer’s directory path and the Envelope The resulting cropped layers are named and added to the Table Of Contents. IncrementLayers1() is called with the layer’s identifying name. The new cropped LandUse layer’s name is copied into txtCropLU The new cropped ISA layer’s name is copied into txtCropISA SaveLayer() is called with the DEM and MISC(optional) table of contents position indicators, m_cropdem, the DEM id, CDEM_ID, a boolean stating the temporary layer should be closed, TRUE, and an indicator stating which form’s IncrementLayers() should be called, 1.

Private Function CropLayer(input_raster As IRaster, envel As IEnvelope, temp_path As String) As Raster

Takes the input_raster and applies RasterExtractionOp.Rectangle() with the envel and TRUE (to instruct the operation to save the inside of the rectangle). Returns the cropped raster.

Private Sub cmdCreateSlope_Click() Calls CreateSlope() with the cropped DEM name (txtCropDEM) Private Sub CreateSlope(input_file As String)

Creates a new subdirectory within the TEMP directory to prevent overwriting & duplicate temporary file naming errors. Calls CalculateZBuffer() with the North/South extent of the cropped DEM layer. Applies RasterSurfaceOp.Slope() with the cropped DEM layer, the command to produce a slope map in degrees, esriGeoAnalysisSlopeDegrees, and the value in txtZBuffer IncrementLayers1() is called with the Slope ID SaveLayer() is called with the Slope table of contents position indicator, m_cropslope, the Slope id, CSLOPE_ID, a boolean stating the temporary layer should be closed, TRUE, and an indicator stating which form’s IncrementLayers() should be called, 1.

Private Sub CalculateZBuffer(max As Double, min As Double) Calculates the appropriate Z-buffer for use in the Create Slope command based on the midpoint between the given latitudes (max and min). It uses the procedure:

Determine what the middle latitude of the area of interest is. Convert that degree value to radians. 1 degree = 0.0174532925 radians Use the value in radians in the following equation: Z factor = 1.0 / (113200 * cos(<input latitude in radians>)) Use this calculated Z factor in the slope tool.

From the (ESRI support centre Z-Buffer) Private Sub cmdModifyLandUse_Click()

Calls ModifyLandUse() with txtCropLU

Page 59 of 67

Private Sub ModifyLandUse(input_file As String) Creates a new subdirectory within the TEMP directory to prevent overwriting & duplicate temporary file naming errors. The cropped LandUse layer, as specified in input_file is opened. Creates a remapping table from the initial LandUse remapping file: C:\Work\Code\Data\ReMapTables\ LandUseStep1-1-Remap.txt Applies RasterReclassOp.ReclassByRemap() on the cropped LandUse layer with the remapping table and FALSE to instruct the operation to not retain missing values from the remapping table Runs RasterNeighborhoodOp.FocalStatistics() using the esriGeoAnalysisStatsMinimum command and a 3*3 cell window (RasterNeighborhood). Net effect of increasing the size of the lowest value (Urban). Negative effect of a slight shifting of other data. Creates a remapping table from the initial LandUse remapping file: C:\Work\Code\Data\ReMapTables\ LandUseStep1-2-Remap.txt Applies RasterReclassOp.ReclassByRemap() on the cropped LandUse layer with the remapping table and FALSE to instruct the operation to not retain missing values from the remapping table. IncrementLayers1() is called with the modified LandUse id, CMLU_ID. SaveLayer() is called with the Modified LandUse table of contents position indicator, m_croplu, the Modified LandUse id, CMLU_ID, a boolean stating the temporary layer should be closed, TRUE, and an indicator stating which form’s IncrementLayers() should be called, 1.

Private Sub cmdModifyISA_Click() Calls ModifyISA() txtCropISA Private Sub ModifyISA(input_file As String)

Creates a new subdirectory within the TEMP directory to prevent overwriting & duplicate temporary file naming errors. The cropped ISA layer, as specified in input_file is opened. Runs RasterNeighborhoodOp.FocalStatistics() using the esriGeoAnalysisStatsMean command and a 3*3 cell window (RasterNeighborhood). Net effect of smoothing the data. Negative effect of a slight reduction in accuracy. IncrementLayers1() is called with the modified ISA id, CMISA_ID. SaveLayer() is called with the Modified LandUse table of contents position indicator, m_cropisa, the Modified ISA id, CMISA_ID, a boolean stating the temporary layer should be closed, TRUE, and an indicator stating which form’s IncrementLayers() should be called, 1. Calls CreateRuralSuburbMetro() three times. Once with the Rural ISA values, once with the Suburban ISA values and finally with the Metropolitan ISA values. The SaveLayer() routine is called for each resulting layer.

Private Function CreateRuralSuburbMetro(pRaster As IRaster, min_val As Long, max_val As Long) As IGeoDataset

Creates a copy of the pRaster then reclassifies it as a mask (0 and 1 values) based on the min_val and max_val inputs.

Private Sub RemoveLayers() Removes the two temporary layers created by cmdRunAll_Click() Generic Subroutines and functions

Private Sub IncrementLayers1(newlayer As String) Keeps track of each layer’s position in ArcGIS’ table of Contents whenever a new layer is added

Function SaveLayer(layer_number As Integer, layer_name As String, remove_temp As Boolean, form As Integer) As Boolean

Page 60 of 67

Cuts the layer_name to less than 14 characters Saves the layer at Table Of Contents position layer_number in ESRI GRID format with name as specified by layer_name. Uses the IRasterBandCollection.SaveAs() function If remove_temp is TRUE, the old layer is closed. IncrementLayers1() with the layer_name is called. Returns TRUE if completed successfully.

End of Program: Step 1 Program: Step 2 – frmStep2 List of Routines and Functions Private Sub UserForm_Initialize()

Runs when the form, frmStep2, is opened. Opens frmStep3 and sets it to hidden Loads default values

Private Sub chkbDEMSlopeMask_Click() When true, the weighting windows for DEM and Slope are grayed out Private Sub chkbMiscLayer_Click() When true, the miscellaneous layer windows are unlocked and are editable Private Sub cmdStep2_Click () Locks the form from user input while it calls CreatingGRID () Private Sub cmdFinished_Click() Unloads the form frmStep2 Private Sub CreatingGRID ()

Checks the layer weights (2 or 4 depending on chkbDEMSlopeMask) are greater than zero. If blank, sets them to 0.000001 If chkbMiscLayer then Misc layer weight is also checked Creates a new subdirectory within the TEMP directory to prevent overwriting & duplicate temporary file naming errors. Creates a new subdirectory within the DESTINATION_PATH directory to prevent overwriting & duplicate temporary file naming errors. Calls CheckLayers() Calls ReClassifyLayers() with names for the reclassed layers Calls CombineLayers() with names for the reclassed layers Calls CloseTempLayers()

Private Sub CreatingComGRID() Checks the commercial weights (2 or 4 depending on chkbDEMSlopeMask) add up to 100% If chkbMiscLayer then Misc layer is included in the weight totals. Creates a new subdirectory within the TEMP directory to prevent overwriting & duplicate temporary file naming errors. Creates a new subdirectory within the DESTINATION_PATH directory to prevent overwriting & duplicate temporary file naming errors. Calls CheckLayers() Calls ReClassifyLayers() with names for the reclassed layers Calls CombineLayers() Calls CloseTempLayers()

Private Function CheckLayers() As Boolean

Reads in layer names from txtDEMLayer, txtLandUseLayer, txtISALayer and txtSlopeLayer and locates them in the Table Of Contents (also txtMisLayer if selected)

Page 61 of 67

Private Sub ReClassifyLayers(strDem As String, strSlo As String, strLu As String, strISA As String, strMIS As String)

Processes the 4 layers DEM, Slope, LandUse, ISA (and Misc if selected) Calls ReclassifyToOne() with the DEM layer and the number 1 to indicate DEM. If the DEM is not being treated as a mask (chkbDEMSlopeMask), uses RasterMathOps.Times() along with a constant raster created by ThisDocument.CreateFltConstant() to multiply the DEM values by the weighting stated in txtDEMWeightRes or txtDEMWeightCom The new layer is added to the table Of Contents and IncrementLayers2() is called with the new layer’s name. Calls ReclassifyToOne() with the Slope layer and the number 2 to indicate Slope. If the Slope is not being treated as a mask (chkbDEMSlopeMask), uses RasterMathOps.Times() along with a constant raster created by ThisDocument.CreateFltConstant() to multiply the Slope values by the weighting stated in txtSlopeWeightRes or txtSlopeWeightCom The new layer is added to the table Of Contents and IncrementLayers2() is called with the new layer’s name. Calls ReclassifyToOne() with the LandUse layer and the number 3 to indicate LandUse. Uses RasterMathOps.Times() along with a constant raster created by ThisDocument.CreateFltConstant() to multiply the LandUse values by the weighting stated in txtLandUseWeightRes or txtLandUseWeightCom The new layer is added to the table Of Contents and IncrementLayers2() is called with the new layer’s name. Calls ReclassifyToOne() with the ISA layer and the number 4 to indicate ISA. Uses the resulting IGeoDataset in RasterMathOps.Times() along with a constant raster created by ThisDocument.CreateFltConstant() to multiply the ISA values by the weighting stated in txtISAWeightRes or txtISAWeightCom The new layer is added to the table Of Contents and IncrementLayers2() is called with the new layer’s name.

Public Function ReclassifyToOne(pRasterLayer As IRasterLayer, choice As Integer) As IGeoDataset

Depending on which attribute from frmDisaggregator.m_AttSelection is selected, the value of the ‘choice’ argument and also chkbDEMSlopeMask, this creates a remapping table from the appropriate reclassification text file from the specified remapping directory e.g. “C:\Work\Code\Data\ReMapTables\”

COM-DEM-Mask.txt, COM-DEM-Remap.txt, COM-SLO-Mask.txt, COM-SLO-Remap.txt, COM-LU-Remap.txt, COM-ISA-Remap.txt, COM-MISC-Remap.txt

Using the RasterDescriptor.Create() and the Remapping table, RasterReclassOp.ReclassByRemap() is run on the ‘pRasterLayer’ The result is returned as an IGeoDataset

Private Function CombineLayers() As Boolean If the DEM and Slope layers are not being sued as masks, they are multiplied by their respective weights. Then reclassed DEM and Slope layers are combined using RasterMathOps.Times() function to create a temporary layer.

IncrementLayers2() is called with an argument stating the new layer is not being tracked in the Table of Contents. The function then multiplies the reclassed LandUse and ISA layers by their respective weights before using the RasterMathOps.Times() function to create another temporary layer. IncrementLayers2() is called with an argument stating the new layer is not being tracked in the Table of Contents.

Page 62 of 67

If a miscellaneous layer is being used, it is multiplied by it’s weighting and then combined with the LandUse and ISA layers using the IRasterMathOps.Times() function. The two temporary layers are then combined into a third temporary layer using the RasterMathOps.Times() function. IncrementLayers2() is called with an argument stating the new layer is not being tracked in the Table of Contents. To reduce the values to more easily recognisable levels, ThisDocument.CreateFltConstant() is used with the third temporary layer in RasterMathOps.Divide() to form a fourth temporary layer. This layer is then smoothed with the RasterNeighborhoodOp.FocalStatistics() function using the esriGeoAnalysisStatsMean command and a 3*3 cell window (RasterNeighborhood). After the DESTINATION_PATH has been checked, the fourth temporary layer is saved as a permanent layer using the IRasterBandCollection.SaveAs() function. IncrementLayers2() is called with an argument stating the new layer is not being tracked in the Table of Contents.

Private Sub CloseTempLayers()

Closes the 7 temporary layers (8 if miscellaneous included) created if chkbxCloseTemp is true.

Public Sub IncrementLayers2(newlayer As String) Keeps track of each layer’s position in ArcGIS’ table of Contents whenever a new layer is added

End of Program: Step 2 Program: Step 3 – frmStep3 List of Routines and Functions Private Sub UserForm_Initialize()

Runs when the form, frmStep3, is opened. Opens frmStep4 and hides it. Loads default values

Private Sub cmdStep3_Click () Locks the program from user input

Calls CheckInput(), if anything is missing, informs user and closes program Sets the source layer from txt3SourceName Calls CalculateFinalGrid() with the source layer name Re-enables the form and makes cmdFinished visible.

Private Sub cmdFinished_Click() Unloads the form frmStep3 Private Function CheckInput() As Boolean Reutrns TRUE if the text boxes have values in them Private Sub CalculateFinalGrid(strSource As String)

Calls ChangeResolution() with the sourcelayer name, pixel size, new temporary layer name “a” Calls ChangeResolution() with the rural mask name, pixel size, new temporary layer name “rm” Calls ChangeResolution() with the suburban mask name, pixel size, new temporary layer name “sm” Calls ChangeResolution() with the metropolitan mask name, pixel size, new temporary layer name “mm”

Page 63 of 67

Calls FeatureToRaster() with the shapefile name, pixel size, database value field name, new temporary layer name “b” Calls FeatureToRaster() with the shapefile name, pixel size, database unique identifier field name, new temporary layer name “c” Calls CheckShapeFileResolution() with the shapefile location and the location of “c” Calls RemoveNegativeValues() with temp layers position for “a” and new temp layer name “d” Calls Normalise() with “c” and “d” layer positions and new temp layer names “e” and “f” Calls CalculateBuildingNumber() with “b” and “f” layer positions and new temp layer names “g” and “h” If there are missing districts, calls AllocateMissingFeatures() with the shapefile location, “c”, “g”, “h” and the shapefile value attribute name. Calls CheckMaxLevels() with “g”, “h”, “c”, “f”, “rm”, “sm”, “mm”, “i” and “j” Calls WholeMapRedistribute() with “i”, “j”, “c”, “f” and new temporary layer name “k” Calls SaveLayer() with “k” layer position and name. Calls CloseTempLayers() Makes cmdFinished visible Unlocks the form to user input

Private Sub ChangeResolution(source As String, resolution As Double, target As String) Opens the locates and opens the “source” layer Using RasterExtractionOp.Rectangle() with IRasterAnalysisEnvironment methods SetExtent and SetCellSize the extent is kept the same but the resolution set to txtPixelSize. The raster ResampleMethod is set to RSP_BilinearInterpolation New layer added with name specified by “target” IncrementLayers3() called with the “target”

Private Sub FeatureToRaster(shapefile As String, resolution As Double, fieldname As String, target As String)

The FeatureClassDescriptor.Create() function is used with the “shapefile” layer and the “fieldname”. The TEMP_WORK_DIR is checked. The RasterConversionOp.ToRasterDataset() is called on the FeatureClassDescriptor while the environment is set to txtPixelSize. The resulting temporary raster layer is added and named the value in “target” IncrementLayers3() called with the “target”

Private Sub ConvertLayerToInt(floatingpoint_raster As Integer, raster_name As String) RasterMathOps.Int() is called on the “floatingpoint_raster”. The new layer is named “raster_name” and the “floatingpoint_raster” is closed

Private Function CheckShapeFileResolution(ShapeFile As Integer, IDRaster As Integer) As Boolean

Returns true if all districts in the shapefile have been mapped. Some districts may not be mapped if none of the pixel centre points fall within the districts.

Private Sub RemoveNegativeValues(grid_set As Integer, target As String) Calls ReclassifyToPositive() woth “grid_set”, -1000000, 1000000 The resulting temporary layer is used with “grid_set” in the RasterMathOps.Times() function. The newly formed temporary raster layer is added to the table of contents and named “target” IncrementLayers3() is called with “target”

Private Function ReclassifyToPositive(pRaster As IRaster, lower_boundary As Double, upper_boundary As Double) As IGeoDataset

Page 64 of 67

Uses the RasterReclassOp.ReclassByRemap() functon on “pRaster” to set everything from “lower_boundary” to 0, equal to 0 and everything from 0 to “upper_boundary” to 1 The resulting raster of 1’s and 0’s is returned as an IGeoDataset

Private Sub Normalise(zone_source As Integer, value_raster As Integer, mid_target As String, target As String)

This function distributes the values of “value_raster” into zones as specified by “zone_source” so that all values within a zone total 1. RasterZonalOp.ZonalStatistics() is called on “zone_source” and “value_raster” with command esriGeoAnalysisStatsSum. Resulting layer is named “mid_target” IncrementLayers3() is called with “mid_target” RasterMathOps.Divide() is called on “value_raster” and “mid_target” resulting layer is named “target” (this is the normalised grid). IncrementLayers3() is called with “target”

Private Sub CalculateBuildingNumber(feature_raster As Integer, normalised_raster As Integer, double_grid As String, integer_grid As String)

Function RasterMathOps.Times() is called on “feature_raster” and “normalised_raster”. The resulting layer is named “double_grid” and added to the table of contents. IncrementLayers3() is called with “double_grid” RasterMathOps.Plus() is called on “double_grid” and the resulting constant layer from ThisDocument.CreateFltConstant() RasterMathOps.Int() is applied to the resulting layer. This last temporary layer is added to the table of contents and named “integer_grid” IncrementLayers3() is called with “integer_grid”

Private Sub AllocateMissingFeatures(ShapeFile As Integer, ShapeRaster As Integer, FinalFloat As Integer, FinalInt As Integer, ShapeValueField As String) Creates a list of the unique IDs of all districts that have not been mapped.

Calls RetrieveUniqueID() with the ShapeFile, the unique identifier, and the row number of the attribute table.

Private Function RetrieveUniqueID(ShapeFile As Integer, ShapeIDName As String, RowNumber As Long) As String Returns the unique Identifier value of a specific district from the shapefile Private Sub CheckMaxLevels(source_layer As Integer, final_layer As Integer, zonal_layer As Integer, normalised As Integer, ruralm As Integer, subm As Integer, metrom As Integer, float_target_name As String, int_target_name As String)

Using the metropolitan mask and max values, it calls PartialRedistribution() The same procedure is run for the suburban mask and then the rural mask. The resulting floating point layer has a constant layer of 0.5 added to it. Then converted to an integer raster.

Private Function PartialRedistribution(source_layer As IRaster, zonal_layer As Integer, normalised As Integer, mask As Integer, temp_name As String, max_value As Double) As IRaster

This function ensures all pixels in the given area (e.g. metropolitan areas) have equal to or less than the specified maximum number of risks for the area. If extra risks are found, they are redistributed over the entire district. This is only run once and may result in pixels still have greater than the maximum number of risks. In some cases this is unavoidable. If a small district has a large number of risks, they may not all fit if max risk numbers are strictly adhered to.

Private Sub WholeMapRedistribute(source_layer As Integer, final_layer As Integer, zonal_layer As Integer, normalised As Integer, target_name As String)

Analyses the floating point layer during it’s conversion to an integer layer. Any risk values of less than 0.5 are totalled together by district and redistributed over the entire district.

Function SaveLayer(layer_number As Integer, layer_name As String) As Boolean

Page 65 of 67

Depending on “layer_name”, different GRID file names are constructed (less than 14 characters) The DESTINATION_PATH is checked. Saves the layer at Table Of Contents position layer_number in ESRI GRID format with name as created earlier. Uses the IRasterBandCollection.SaveAs() function Calls IncrementLayers3() with the layer_name

Private Sub CloseTempLayers() If chkbxCloseTemp is true, then the 8 temporary layers are closed

Private Sub IncrementLayers3(newlayer As String) Keeps track of each layer’s position in ArcGIS’ table of Contents whenever a new layer is added

End of Program: Step 3 Program: Step 4 – frmStep4 List of Routines and Functions Private Sub UserForm_Initialize() Runs when frmLog4 is opened Public Sub cmdStep4_Run() Creates dynamic array m_ArrayOfValues() Creates dynamic array m_ArrayOfMissingShapes() Creates dynamic array m_ZoneCountingArray() Runs CreateAsciiFile() with the name of step3’s final output raster Runs AnalyseAsciiFile() with the name of the ASCII file created above and the shapefile

Runs CreateOutputFile() with array m_ArrayOfValues(), the number of values and a filename with “–Final.txt” appended on the end. Runs ReAllocateValues() Runs CreateOutputFile() with array m_ArrayOfValues(), the number of values and a filename with “–FinalModified.txt” appended on the end. Runs LogMissingShapes() with the list of missing shapes from step3, the shapefile, the shapefile value field (containing number of risks) and the shapefile unique identifier field Runs CreateOutputFile() with array m_ArrayOfMissingShapes (), the number of values and a filename with “–MissingShapes.txt” appended on the end. Runs CreateOutputFile() with array m_ZoneCountingArray (), the number of values and a filename with “–ZoneValueChecker.txt” appended on the end. Runs frmLog.cmdSaveLog_Click()

Private Function CreateAsciiFile(SourceName As String) As Boolean Using the layer named, runs the RasterConversionOp.ExportToASCII() function. Private Function AnalyseAsciiFile(SourceTextFile As String, ShapeFile As String) As Boolean

To save processing time during the main loop, creates an array of all districts in the shapefile m_ArrayOfFeatures() and an array of relational operations on those districts m_ArrayOfRelOp() Opens the ASCII file as a FileSystemObject and reads in the 6 header lines. Then reads the rest of the file line by line. With each line it calls AnalyseRow() with the line number and the header information.

Private Sub AnalyseRow(sText As String, currentRow As Long, ncols As Long, nrows As Long, xllcorner As Integer, yllcorner As Integer, cellsize As Double, nodata_value As Long, IJModifier As Long, pInputFC As IFeatureClass)

Reads the row (sText) one character at a time building up each pixel’s value. NO_DATA and zero value pixels are ignored. With valid values, BuildArray() is called with the value, the header information and the current column number.

Page 66 of 67

Private Sub BuildArray(currentString As String, valueNumber As Long, currentRow As Long, ncols As Long, nrows As Long, xllcorner As Integer, yllcorner As Integer, cellsize As Double, IJModifier As Long, pInputFC As IFeatureClass)

Populates the dynamic array m_ArrayValueSize() – as increasing the dynamic array by one line at a time is very processor intensive, this program does it by large blocks. NB: This is the reason for separate counts for array length rather than UBOUND. m_ArrayOfValues() has five values per pixel. GridCellID is calculated by the row and column numbers with the xllcorner and yllcorner (see program description for details). Longitude and latitude for the pixel centre points are calculated and using RoundingDP() are rounded to 6 decimal places. Value is the number of risks PolyID is the shapefile district id. It is determined by calling SpatialQueryOnShapefile()

Private Function RoundingDP(dblNumber As Double, Optional ByVal numDP As Long) As Double

Using the rounding function in VBA produces an unwanted result. It rounds to the nearest even integer. This function rounds up and down in the expected manner (Rounds up 0.5 and above, rounds down if less than 0.5). Function was supplied by (http://vbcity.com/forums/topic.asp?tid=6737)

Private Function SpatialQueryOnShapefile(ShapeFile As Integer, XCoord As Double, YCoord As Double, fieldName As String, CurrentValue As String, pInputFC As IFeatureClass) As String

Creates a point feature using the x,y coordinates and determines the containing shapefile district by using the IRelationalOperator.Contains() function. m_LastShapeFound is the district to contain the previous pixel. It is checked first as adjacent pixels are likely to be contained by the same district. If m_LastShapeFound is not the correct district, then m_ArrayOfRelOp() is used to cycle through all districts until the containing one is found. Once ofund, m_ArrayOfFeatures() gives the unique identifier of the district. The m_ZoneCountingArray() is also populated for checking later.

Public Function CreateOutputFile(ArrayOfValues() As String, NumberOfValues As Long, StemFileName As String, EndText As String, Choice As Integer) As String

Creates a text file of all values in the given array and saves to the specified directory and filename. Filename is incremented if there is already an existing file.

Private Sub ReAllocateValues() Splits m_ArrayOfFeatures() by district. Sorts all values in each district by number of risks in descending order using SortFiveMemberArray() function. If the total number of risks distributed in a district do not equal the original number from the shapefile, they are added or removed one at a time from the array starting with the pixel with the greatest number of risks. Once completed m_ArrayOfFeatures() is reassembled with the modified and sorted values.

Private Function SortFiveMemberArray(TheArray() As String, NumberValues As Long) This is a simple bubble sort routine (sorting in descending order of the value field) for the relatively small array of pixels in one district.

Private Sub LogMissingShapes(missingShapes As String, ShapeFile As String, valueField As String, idField As String)

Creates a list of unique identifiers for all districts that weren’t included in m_ArrayOfFeatures(). Reads the list one district identifier at a time and uses it to call BuildMissingShapeArray() with the header information from the ASCII file.

Private Sub BuildMissingShapeArray(strValue As String, valueNumber As Long, ShapeLocation As IFeatureClass, valueField As String, idField As String, xllcorner As Integer, yllcorner As Integer, cellsize As Double, IJModifier As Long)

Page 67 of 67

Populates array m_ArrayOfMissingShapes() in a similar manner to BuildArray() except it does not need to call SpatialQueryOnShapefile() as the shapefile value is already known.

End of Program: Step 4 Program: Log – frmLog List of Routines and Functions Private Sub cmdCloseWindow_Click() Closes the log window Private Sub cmdEraseLog_Click()

After asking for confirmation, erases the current log from memory Public Sub cmdSaveLog_Click()

Saves the current log to the specified directory while incrementing the filename if necessary

Private Sub UserForm_Initialize() Runs when frmLog is open Calls RefreshLogText() Public Sub RefreshLogText() Displays entire log to the screen