Geographic Problems (Lecture 1) PDF
Document Details
Uploaded by IssueFreeEarth
Dalarna University
Tags
Summary
These lecture notes provide an introduction to geographic problems, categorizing them by scale, intent, and time frame. It also discusses the concept of geographic coordinates and spatial analysis in the context of geographic information systems (GIS).
Full Transcript
Geographic problems (Lecture 1) - are problems that involve an aspect of location, either in the information to solve them or in the solutions themselvs. Examples are forest companies, who solve geographical problems when they determine the best way to cut, manage, replant and make new roads to go...
Geographic problems (Lecture 1) - are problems that involve an aspect of location, either in the information to solve them or in the solutions themselvs. Examples are forest companies, who solve geographical problems when they determine the best way to cut, manage, replant and make new roads to go about their buisness. Another example is national parks who need to make the parks accessible, via roads or pathways, but at the same time protect important natural habitats. Farmers also solve geographical problems when they use fertilizer, in accordance to the location and soil. Calculating how much fertilizer to use and where. There are several geographic problems, which can be distinguished in 3 diffrent categories: Scale (or geographic detail) - Micro level – the coordinate system of crystalline structure. Local – cities, forests, parks. Regional – Baltic runoff, nationwide flu epidemic. Global – global warming, bird flu pandemic. Intent (or purpose) - if it’s practical (urgent) - money, emergency curiosity driven (little urgency) - continental drifts,glacial withdrawal, migration pattern Time scale - Operational – the smooth functioning of an organization. Tactical – where to cut trees in next year’s forest harvesting plan. Strategic – organizational long-term direction The complexity of the geographic problem can of course blur the line between these divisions. Geographic coordinates - Geographic refers to the coordinate system of the Earth or, actually, to any planetary coordinate system of the Universe. Spatial refers to any coordinate system whether it has geographic dimensions or not. As an example, statistical covariance (covariance is a measure of the relationship between two random variables) has spatial dimensions (in variance space). The coordinate system of the Earth is a subset of generic space. Spatial are special because it has: - Dimensions, coordinates to define a location - Resolution, going from high detail to broad strokes - Coordinate system, it can be represented in any coordinate system (on earth) - Topology, it has direction, distance and ranking of size. Analysis of any information that refers to coordinates is generally referred to as spatial unless the coordinates are strictly earth-bound, in which case we might want to use the term geographic. 1 Spatiotemporal - spatial progress in time (or vice versa). temporal progress is often captured by comparing spatial snapshots taken along a timeline. In the simplest case, temporal change may then be calculated as the difference across snapshots The anatomy of GIS organized collection of: Hardware - directly related to the trend of rapidly increased machine power. The limitations of dimensions (2D to 3D and 4D) is related to the capability of the machine. Here we still have a bit to go. Software - that runs locally on the users machine. Software development is connected to the hardware development. Network - Today it is seen as the most important. Spreading of information and knowledge. Data - which consists of a digital representation of selected aspects of some specific area of the Earth’s surface or near surface, built to serve some problem solving or scientific purpose. The relationship to GIS is dynamic. If there is no data there is nothing for GIS to work with. Data also impedes the program if there is not enough data, but flourish if there are plenty. People - who design, maintain, supply data and interpret results. Here there are two divisions: the ones who look at the functionality of the GIS program itself (enhancing it, develop) and what the application can give to the user itself (solving problems, visual representation). Procedures - managerial organizations. required to establish procedures, lines of reporting, control points, and other mechanisms for ensuring that GIS activities stay within budgets, maintain high enough quality, and generally meet the needs set at any organizational level. Applications of GIS. There is a huge range of applications of GIS. They include topographic base mapping, socioeconomic and environmental modeling, global and interplanetary modeling, and education. Applications generally set out to fulfill the five M’s of GIS: Mapping Measurement Monitoring Modelling Management Geographic representation (L2) Representation is made via a digital model of some aspect of the earth's surface. Representation needs to be simplified by leaving out details. Computers can’t handle an infinite amount of data. The world is complex and reveals infinitely more details the closer 2 you look at it (Fractal).It is useful for depicting scenarios outside our own immidiet experience. Therefore we simplify it. Simplification is basically made by two separate methods that generate different types of data: Spatial averaging by pixel (or Tessellation). Here we use pixels that we can decide the size of (dpi: pixel density). We can either simplify or add more detail depending on the pixel density. More detail, more pixels. example: A computer screen has millions of pixels, meaning if we showed a map of the earth on it, it would show one pixel at approximately 100 square meters. Manhattan would be 10 pixels on the screen. Constant value. Utilizing the fact that many attributes remain constant over large areas. For example the oceans could be simplified to only one value, since it takes up ⅔ of our world. We can therefore represent areas that have similar or shared attributes and therefore reduce the amount of information. To the simplification methods there are an connection with Fundamental concepts: discrete objects - In the discrete object view, the world is empty except where it is occupied by objects with well-defined boundaries. Discrete objects are things that can be counted, like mountains, amount of residents in an area, etc. Since nothing on the natural Earth looks like a table, it is difficult to imagine how the properties of a river or a mountain should be represented in a table – while the discrete objects view works well for some phenomena, it misses the mark badly for others. Natural phenomena and Lakes are hard to count or set the boundaries of (what scale of lakeness is the lake?). Forecasts originate in models of fields, but are presented in terms of discrete objects: Highs, lows, fronts. Continuous fields - In the continuous field view, the geographic world can be described by a number of variables, each measurable at any point at the Earth’s surface, and changing in value across the surface (across positions). Continuous fields are points in the landscape that we can measure. A vector field assigns two variables, like magnitude and direction, at every point in space. With a single variable assigned at every point, we have a scalar field. Population density is an example of a continuous field that transforms into object view when resolution increases. Traffic density is an example of a continuous field defined along a transect. Continuous fields and discrete objects can still have an infinite amount of data Data= plural of datum (which means data). Datum is singular. Geographic objects are identified by their dimensionality. Laponia national park occupies an area wherein, point-wise, bears follow trajectory lines while moving through the park Geographic data are built up from atomic elements. 3 At its most primitive, an atom of geographic data (datum) links a position (in space and time) with some descriptive property. Spatiotemporal (Space and time) positions may be represented in many ways. The descriptive property of a geographic object is called its attribute. Attributes have a value. Arable land is an attribute that may take on the values “tilled” or “fallow”. Geographic attributes are classified as: Nominal - the nominal attribute doesn't have a rank, equidistance nor an absolute zero. For example a county. We can’t rank a county, it is not equally distant (equidistant) from other counties and we can’t have 0 counties. because they would not exist then. Ordinal - Ordinal have ranks, but are not equidistant noor have an absolute zero. For example grades. We can get a grade A,B,C etc, but we can’t see the distance in between the grades in numerics, noor can you get a grade 0. Then you would just get an grade, but a grade F. The grades are “symbolic” not an actual number. Interval - intervals have ranks and equidistance, but not an absolute 0. For example temperature. The rank here is the temperature which can be high or low, depending on other measurement of temperature (in the same place, date etc) and equidistant because we have a numeric value (38,39 degrees). The numbers have equal distance in between each other, going up or down in value. We do have a 0, but not an absolute 0. The zero on the temperature is a treshold and values can go below 0. Ratio - have rank, equidistance and an absolute 0. For example concentrations. It can be ranked, have equidistant to other values and a 0. Cyclic - cyclic data is measured from a cyclic scale. It is measured from a compass with 360 degrees. The Kelvin temperature scale is also cyclic. Chloropleth - uses only nominal or ordinal scales? “A choropleth map is a type of thematic map that uses colors, shades, or patterns to represent data values associated with specific geographic areas, such as countries, states, counties, or other administrative regions. The data are typically standardized (e.g., percentages, rates, or ratios) to ensure accurate comparisons between regions of different sizes.” Isopleth - interval? 4 “An isopleth map is a type of thematic map that uses lines or shaded areas to represent the distribution of a particular phenomenon or variable across a geographic area. These lines or areas connect points with the same value of the variable being represented. Isopleths are lines that join locations with the same value of a specific attribute, such as temperature, air pressure, elevation, or concentration of a substance. Raster and vectors Raster and vectors are two methods of representing geographic data (or objects) in digital computers. Raster - Raster representation divides the world into arrays of cells and assigns attribute values to the cells. Remotely sensed images are most often produced in raster format. (ex. orthophoto) Square cells fit nicely onto a flat surface, whereas distortion is introduced when projected on Earth’s surface. A raster cell keeps a single attribute value – all variation within the cell is lost. The value of a raster cell may be assigned in many ways, the most common is by largest share (ex.most common value) or by central point (the value in the center). Although the largest share assignment is dominating, elevation is typically assigned by the central point. Rasters are better suited to show geographical phenomenons than vectors. Good at statistical inference (conclusion from evidence) and modeling. Typical files that are used are: Remotely sensed images ,TIFF,JPG, Mosaic etc Vector - In a typical vector representation, curves are captured as points (vertices- a point where two or more edges meet) connected with straight lines. Curves are approximated by increasing the density of points. An area is captured as a series of lines that share a point in-common. Since the lines are straight object elements, areas are often referred to as polygons. The term polyline has been coined to describe a curved line represented by a series of straight lines connecting vertices. Representing a curve with a series of vertices is very effective as compared with raster representation vectors give freedom when it comes to equidistant representation. It is very detailed and precise. Alot of geographic phenomena cannot be represented in high quality such as vectors. Typical files that are used are: shapefiles (shp), geodatabase files (gpkg). Representing Continuous Fields “Capturing the linear variation of the field variable over an irregularly shaped triangle (polygon vector representation).” 5 “Capturing the iso-lines of a surface as digitised lines (polyline vector representation). Isopleth maps are used to visualise phenomena that are conceptualised as fields, and measured on continuous or ratio scales (i.e. elevation,concentration” “Each of the above methods succeeds in compressing the potentially infinite amount of data in a continuous field to a finite amount. Unlike the discrete vector representation, the objects used to represent a field are not real, but simply artefacts of the representation of something that is actually conceived as spatially continuous.” Are you talking about the “above” methods as vector and raster, or capturing the linear variation and capturing the iso-lines? The paper map Scale is an key property in a paper map. The distance in a map in ratio to distance on the Earth. In a digital data scale does not make sense. There is no distance in a computer's surface, the ratio falters here. When scale is quoted for digital maps it is for the scale of the paper map that has been used, that has formed the data. Generalization “Reducing the level of detail in geographic data. “Although simplifying the real world immensely, a map can be perfectly accurate with respect to its specification. Thus, most standard maps are generalised not only to decrease the amount of generic information, but also to fit some map base specification. There are several methods of generalisation. One of the commonest in GIS is weeding” Natur of geographic data: “Since most geographic data represent a mere sample of the real world situation, the scenario will change from one sample to another. This introduces uncertainty that may be addressed by means of the classic statistical standard error. Since most geographic data are continuously distributed in space, they may be considered as topographies. The perhaps most important inherent property of any continuous topography is the presence of autocorrelation. Standard error and autocorrelation constitute the main sources of geographic uncertainty. While standard error simply needs to be minimised, autocorrelation may be utilised for modelling and interpolation purposes. In fact, autocorrelation is one of the most powerful characteristics of a general continuous space. When utilised for modelling purposes, it is often combined with other useful characteristics like correlation and equidistance” Geodesy (L3) Geodesy is the theory of the earth's shape and size, and how to measure and quantify it. Measurements of the earth's rotation, gravitation, continental drifts, angels and distances etc. This provides data and models for depicting and modeling the world. 6 Geoid - Geoid is a model of a global (imaginary or average) sea level. The surface of the geoid is always perpendicular (90 degrees) from the gravitational force. Meaning that in mountainous areas the geoid is higher up due to the gravitational pull being stronger here. The geoid has a rugged surface and is uneven (because of uneven distribution of mass). The geoid model is used to measure surface elevations with a high degree of accuracy. Ellipsoid - Is a mathematical model that is like a sphere, but flattened. non symmetrical and It is has a smooth surface. Geographic system of coordinates: Longitude and latitude. A full circle has 360 degrees. Compass. 1 degree is 60 minutes. 1 minute is 60 seconds. Meridian: is the imaginary line of the earth that connects the two geographical poles: north pole and south pole. It is a line of constant longitude. It runs south to north, but measures the distance west to east. Parallel: is the imaginary line that circles earth, running east to west. A line of constant latitude is called a parallel. Longitude angular distance of place east or west from the Greenwich meridian (or the prime meridian) on Earth (or other celestial body). “Each of the 360 degrees of longitude is divided into 60 minutes and every minute into 60 seconds. Longitude is referred to by degrees East or West, so longitude ranges from 180 degrees West to 180 degrees East” In computors West longitudes are shown as minus figures (-96 degrees, - 50 degrees). East longitudes are shown as plus figures ( 18 degrees for Uppsala, 120 degrees) Longitude is symbolized with λ (lambda) Thus having [-180 west, 180 east] degrees Longitudinal distance varies with the latitude. At the equator 1 minute is 1852 meters, in Uppsala one minute is 917 meters. This unit is called a nautical mile. The speed of 1 nautical mile/hour is called knot. 1/10 of a nautical mile is called one cable length. Latitude Horizontal lines measuring the distance north to south. 7 Latitude is often symbolized with ᵩ (phi) Thus having [-90 south, 90 north from the equator] degrees. Equator: is half way between the north and south pole (middle of the earth or other planet/celestial body). There are several global ellipsoids, we normally use the WGS84! but there are others like: Vessel 1841 and GRS80. Geodetic datum is an: ellipsoid (there are hundreds) +its relation to the geoid + its relation to Earth’s rotational axis Projections and Coordinates “In the two-dimensional world of a flat surface, a Cartesian coordinate system assigns two orthogonal coordinates to any location. Since it is common to align the Cartesian y-axis with geographic north, the coordinates of a projection on a flat sheet are often termed easting and northing. Thus, map projections transform polar coordinates into Cartesian dittos (projections?).” Projections by Distortion “With an exact spheroid definition, the geographic coordinate system is exact. When projected onto a flat surface, like a paper or a computer screen,distortion is introduced. Projections may be invariant either with respect to scale or with respect to direction. A projection cannot simultaneously preserve scale and direction. A direction-invariant projection (angles are kept invariant) is called conformal. This is where we recognise geographic proportions at local and regional scales. Sets the standard for most official maps. Important for navigation. A scale-invariant projection (areas are kept intact) is called equal-area. Applies to global or very small scales. Important for area calculations. Mostly used for special maps” Projection by model Cylindrical - an analogous projection that is like an wrapped paper around the earth, like a cylinder. Cylindrical normal is wrapped around the equator. The cylindrical transvers around the poles (north and south). Azimuthal - an analougous projection that touches the earth at a certain point. Like a paper laying flat ontop/below/sides of the earth. Conic - projections are analogous to wrapping a paper around the earth like a cone. 8 9 10 11 UTM zone in Uppsala is 18 degrees. 12 13 14 15 GSD (geographic swedish data) (L4) Different types of maps: topography 250 - for national outline. Covers the whole nation and is good for outline planning (översiktsplanering). The skeleton here is water (lakes, streams etc).standard error 50 m. 1:250 000. topography 100 - for roads, transportation and land-use. The skeleton here is infrastructure. 50 m error. 1:100 000. 16 topography 50 - terrain map. For physical planning, analysis and presentation. Shows infrastructure, land-cover, buildings, streams, water etc. 1:50 000 topography 10 - property maps (economical map). shows parcells, properties etc. 1:10 000 or 1:20 000 Orthophotos - Ortho-photos are remotely sensed photos, taken from an aeroplane. Utilising digital elevation models, the primary photo is transformed into an orthogonal projection. Covering the nation with extreme resolution. Subdivision by 5 × 5 km, in accordance with the property map. Landcover - contains info about land-use, land-type and vegetation. Vegetation map: Basically replace the terrain map in the far north-west, but also available in a few other regions Elevation - elevation curves: showing the elevation with lines with an equidistance of 5, 10, 20 etc. ASCII data structure containing the south-west mid-coordinate of a 50 × 50 m. raster-cell followed by 501 × 501 elevation values given with decimetre precision Satellite-based navigation- and positioning systems (L5) NavStar GPS - american. Glonass - russian Galileo GPS The Space segment - satellites.Minimum of 24 satellites (and reserves) 6 orbital planes. circumference of 20200 km (11 hours and 58 min, 4 min earlier everyday). The Control segment - Code- or phase-shift measurements. Orbital data calculated in WGS84. Operating data and tracking stations (calculating orbital data) The user segment - At least 4 parallel satellites required for 3D positioning (including time). 2D requires at least 3 satellite The different types of GPS Absolute GPS Absolute code in real-time. Average error 5 – 10 m. Low accuracy, low cost. Differential GPS (DGPS) Relative code in real-time. Average error 1 – 2 m. Mid-accuracy and cost. Statistical GPS Relative phase with complementary calculation. Average error 0.5 – 2 cm. Single-station RTK (Real Time Kinematics) Relative phase in real-time. Average error 1.5-3 cm. High accuracy and cost. Network RTK Relative phase in real-time. Average error 1.5-3 cm.High accuracy and cost. 17 Land-based Augmentation systems - LBAS. SWEPOS. A national grid of permanently installed reference stations for differential GPS. 18 GPS: Satellite-Based Augmentation Systems - SBAS WAAS- America EGNOS- Europe GAGAN- IndiaMSAS- japan Spectral resolution Color: 3 (RGB) Multispectral: higher than 3 Hyperspectral: 50-200 Panchromatic: black and white (grayscale) 19 20 21 Cartography and Map Production (L6) Cartography concerns the art, science, and techniques of making maps and charts. The term map is used for terrestrial areas and chart for marine areas, but they are both maps (rather than maplike visualisations) It is useful to distinguish between two types of GIS output: Formal maps, created according to well-established cartographic conventions, that are used as a reference or communication product (e.g. a 1:50,000 terrain map). There are two basic types of formal maps: reference maps, such as topographic maps from national mapping agencies that convey general information thematic maps that depict specific geographic themes, such as population census statistics, soils, or climate zones. 22 Transitory map and map-like visualisations used to display, analyse, edit, and query geographic information (e.g. results of a database query to retrieve mercury concentrations in freshwater fish throughout the county of Uppland). Principles of Map Design Map design is quite a complex procedure requiring the simultaneous optimisation of many variables and harmonisation of multiple methods. Robinson et al. (1995) define seven controls on the map design process: Purpose: The purpose for which the map is being made must determine what is to be mapped and how the information is to be portrayed. Reality: The phenomena being mapped must, usually, impose some constraints on map design (e.g. the tremendous North – South length of Sweden). Available data: The specific characteristics of data, i.e. the data type, will affect the design. Map scale: Scale will control how many data can appear in a map frame, the size of symbols, and much more. Audience: Different audiences want different types of information on a map and expect to see information presented in different ways. Conditions of use: The environment in which a map is to be used will impose significant constraints (e.g. brightness versus choice of colours). Technical limits: The display medium, be it digital or hardcopy, will impact the design process in several ways (e.g. the impact of bandwidth on resolution). Map Composition Map body (the map itself, showing a certain detail e.g. Ultuna) Inset/overview map (where we are in a broader sense for our map body, e.g. south of Uppsala) Title (the places name or what is shown e.g Ultuna parcels) Legend (symbols and their meaning) Scale (1:5 000, scalebar etc) Direction indicator (north arrow) Map metadata (data sources, projection, author etc) Map Symbolization The data to be displayed on a map should be classified and represented using graphic symbols that conform to well-defined and accepted conventions. Attribute mapping entails use of graphic symbols, which (in two dimensions) may 23 be referenced by points, lines or areas. These basic symbols may be modified in accordance with Bertin’s graphic primitives in order to communicate different types of information. 24 Cartogram Transformation Cartograms are maps that lack planimetric correctness (Planimetrics is the study of plane measurements, including angles, distances, and areas), and distort area or distance in the interest of some specific objective. The usual objective of a cartogram is to reveal patterns that might not be readily apparent from a conventional map, or, more generally, to promote legibility. Thus, the integrity of the spatial object, in terms of areal extent, location, contiguity, geometry, and/or topology, is made subservient to an emphasis upon attribute values or particular aspects of spatial relations. see example below. 25 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Spatial analysis (Lecture 7: Advanced) “Spatial analysis is the process by which we turn raw data into useful information, in pursuit of scientific discovery, or more effective decision making.” “Analytical cartography refers to methods of analysis that can be applied to maps, e.g. via a ruler, to make them more useful and informative…” Methods of spatial analysis are discussed with respect to six general headings: Queries – no changes occur in the database, and no new data are produced. Measurements – simple numerical values that describe aspects of geographic data. Transformations – simple methods of spatial analysis that change datasets, combining them or comparing them to obtain new datasets, and eventually new insights. Descriptive summaries – the equivalent of descriptive statistics. Optimisation – normative techniques designed to select ideal locations for objects given certain well-defined criteria. Hypothesis testing – inferential statistics Queries The most straightforward way in which reformulation and evaluation of a representation of the real world can take place is through posing spatial queries to ask generic spatial and temporal questions such as: Where is.....? 26 What is at location.....? What is the spatial relation between.....? What is similar to.....? Where has..... occurred? What has changed since.....? Is there a general spatial pattern, and what are the anomalies? Spatial query is articulated through the graphical user interface (GUI - WYSIWYG) paradigm called a “WIMP” interface – based upon Windows, Icons, Menus, and Pointers Measurements Many tasks require measurement from maps: measurement of distance between two points and measurement of area, e.g. the area of a parcel of land Such measurements are tedious and inaccurate if made by hand measurement using GIS tools and digital databases is fast, reliable, and accurate Measurement of length A metric is a rule for determining distance from coordinates The Pythagorean metric gives the straight line distance between two points on a flat plane The Great Circle metric gives the shortest distance between two points on a spherical globe given their latitudes and longitudes Measurements in GIS are often made on horizontal projections of objects length and area may be substantially lower than on a true three-dimensional surface The length of a true curve is always longer than the length of its polyline or polygon representation 27 Shape measures capture the degree of contortedness (förvrängning) of areas, relative to the most compact circular shape. the more contorted the area, the higher the shape measure. Slope and aspect is Calculated from a grid of elevations (a digital elevation model)Slope and aspect are calculated at each point in the grid, by comparing the point’s elevation to that of its neighbors usually its eight neighbors but the exact method varies.in a scientific study, it is important to know exactly what method is used when calculating slope, and exactly how slope is defined Transformations Transformations are simple methods of spatial analysis that change datasets, combining them or comparing them to obtain new datasets, and eventually new insights. Transformations use simple geometric, arithmetic, or logical rules, and they include operations that convert raster data into vector data, or vice versa. They may also create fields from collection of objects, or detect collections of objects in fields.Transformations apply to vector as well as to raster data Buffering (dilation) The buffer operation is one of the most important transformations available to the GIS user. Create a new object consisting of areas within a user-defined distance of an existing object e.g., to determine areas impacted by a proposed highway e.g., to determine the service area of a proposed hospital.Feasible in either raster or vector mode Point-in-polygon transformation 28 Determine whether points lie inside or outside polygons. As an example of a point-in-polygon operation, let points represent instances of a disease in a population, and the polygons represent reporting zones such as counties. The task is to determine how many instances of the disease occurred in each zone. If polygons overlap, it is possible that a point lies in one, many, or no polygons, depending on its location. The point-in-polygon operation makes sense from both the discrete-object and the continuous-field perspective. Polygon overlay Two cases: for discrete objects and for fields From the discrete-object perspective to the polygon overlay operation, the task is to determine whether two areas overlap, and to define the area formed by the overlap as one or more new area objects. The discrete object polygon overlay operation is useful to determine answers to queries such as How much of this proposed clear-cut lies in this riparian zone (strandzon)? How much of the projected catchment area of this proposed retail store lies in the catchment of this other existing store in the same chain? Polygon overlay, field case Two complete layers of polygons are input, representing two fields that classify the same area (polygon fields)e.g., soil type and land ownership. The layers are overlaid, and all intersections are computed creating a new layer each polygon in the new layer has both a soil type and a land ownership the attributes are said to be concatenated (länkade)... 29 The task is often performed in raster Spatial interpolation Spatial interpolation is a process of intelligent guesswork, in which the investigator (and the GIS) attempt to make a reasonable estimate of the value of a continuous field at places where the field has not actually been observed. Spatial interpolation is an operation that makes sense only from the continuous-field perspective. Spatial interpolation finds applications in many areas, and should ALWAYS be regarded with suspicion (it is a guesswork). We’ll consider three methods of spatial interpolation: Thiessen polygons; inverse-distance weighting (IDW), and Kriging. IDW and Kriging are based on the notion of spatial autocorrelation. IDW and Kriging functions are also used for statistical inference and hypothesis testing, and are dealt with in another course. 30 31 Descriptive summaries,optimization, and hypothesis testing Descriptive summaries attempt to capture the nature of geographic distributions, patterns, and phenomena in simple statistics that can be compared through time, across themes, and between geographic areas. Optimisation techniques apply much of the same thinking, and extend it, to help users who must select the best location for services, or find the best route for vehicles, or a host of similar tasks. Hypothesis testing address the basic scientific need to be able to generalise results from a small study to a much larger context, perhaps the entire world Means and Centroids The mean is one of a number of measures of central tendency, all of which attempt to create a summary description of a series of numbers in the form of a single number (see white board). The mean is only applicable to interval or ratio data; the ordinal analogue is called the median, and the nominal analogue the mode(commonest value). The mean is a point estimate of the mathematical expectancy, which gives the balance point of a probability distribution (the median is the point of 50% probability). Centers are multi-dimensional equivalents to the mean. The centroid is the point that minimises the sum of squared distances, and it is the balance-point of the associated multidimensional distribution. 32 Optimization properties For any set of numbers that constitute a mean along a line of coordinates, we take any value d and sum the squares of the differences between the numbers and d, then when d is set equal to the mean this sum is minimised (Figure 15.4). Of particular interest is the location that minimises the sum of distances, rather than the sum of squared distances, since this could be the most effective location for any service that is intended to serve a dispersed population. The point that minimises total straight-line distance is known as the point of minimum aggregate travel or MAT. There is no simple mathematical expression for the MAT, and instead its location must be found by iteration in which an initial guess is successively improved using a suitable algorithm. If the curvature of the Earth’s surface is taken into account, the centroid of a set of points must be calculated in three dimensions, perhaps following the great circle metric. Applications of the MAT (minimum aggregate travel) Because it minimizes distance, rather than squared distance, the MAT is a useful point at which to locate any central service e.g., a school, hospital, store, fire station finding the MAT is a simple instance of using spatial analysis for optimization Dispersion The spatial equivalent to the (weighted) standard deviation is the mean distance from the centroid. The sample variance (squared standard deviation)is a point estimator of generic variance that determines uncertainty in statistical inference and hypothesis testing. Variance estimation can only be performed on independent data, or data compensated for (spatial) dependence (like autocorrelation). 33 Methods for filtering out spatial autocorrelation utilise inverse-distance weighting and/or Kriging. Spatial dependence There are many ways of measuring this very important summary property. Autoregressive Integrated Moving Average (ARIMA) models in cases of equidistant data (like in a raster) The semi-variogram works on any set of point data irrespective of observation distance Descriptions of Pattern – unlabeled points One of the questions most commonly asked about distributions of points is whether they display a random pattern, in the sense that all locations are equally likely to contain a point, or whether some locations are more likely than others – and particularly, whether the presence of one point makes other points either more or less likely in its immediate neighbourhood. This leads to three possibilities: the pattern is random (points are located independently, and all locations are equally likely). the pattern is clustered (some locations are more likely than others, and the presence of one point may attract others to its vicinity). the pattern is dispersed (the presence of one point may make others less likely in its vicinity). Establishing the existence of clusters is often of great interest, since it may point to possible causal factors (like the incidence pattern of cholera studied by Dr. John Snow). Dispersed patterns are the typical result of competition for space, as each point establishes its own territory and excludes others. It is helpful to distinguish between two kinds of processes responsible for point patterns. First-order processes involve points being located independently, but may still result in clusters because of varying point density (like around the cholera-infected well discovered by Dr. John Snow). Second-order processes involve interaction between points, and lead to clusters when the interactions are attractive to nature (vultures attracting other vultures around a carcass), and to dispersion when they are competitive or repulsive (the spatial distribution of competitive solitary animals). Descriptive statistics of point patterns - the K Function Captures how density of points varies with distance away from a reference point. By comparing to what would be expected in a random distribution of points Fragmentation statistics Measure the patchiness of data sets e.g., of vegetation cover in an area Useful in landscape ecology, because of the importance of habitat fragmentation in determining the success of animal and bird populations populations are less likely to survive in highly fragmented landscapes The size distribution of patches, given as a histogram of patch size, is a useful indicator of fragmentation 34 Optimization The basic idea for optimisation in GIS is to analyse patterns not for the purpose of discovering anomalies or testing hypotheses about processes, as in previous sections, but with the objective of creating improved designs. The objectives might include the minimisation of travel distance, or the costs of constructing some new development, or the maximisation of someone’s profit. Design methods are often implemented as components of systems built to support decision making – so-called spatial-decision support systems, or SDSS. Optimizing point locations The MAT (minimum aggregate travel) is a simple case: one service location and the goal of minimizing total distance traveled The operator of a chain of convenience stores or fire stations might want to solve for many locations at once. where are the best locations to add new services? which existing services should be dropped? Optimum paths Find the best path across a continuous cost surface between defined origin and destination to minimize total cost cost may combine construction, environmental impact, land acquisition, and operating cost used to locate highways, power lines, pipelines requires a raster representation. The hydrologic flow of water through a landscape follows a path of maximum potential energy gain, which is given by the maximum accumulated elevation gain per distance travelled. 35 Statistical inference and hypothesis testing In statistical inference and hypothesis testing, principally all methods for the necessary estimation of uncertainty requires independent observations. In a continuous surface, this condition is hardly met (because of the presence of autocorrelative structures). Actual methods of statistical inference and hypothesis testing therefore focus on the problem of compensating for the presence of covariance (parametric methods), or on building methods so robust that the presence of covariance does not matter (non-parametric methods). As a matter of fact, advanced statistics are all about coping with the presence of covariance structures. In the parametric case, identification and compensation techniques are often based on ARIMA and/or Kriging methodology. 36