GIS and Spatial Data
Is there any spatial data? What does spatial data mean? Is any data, information, inherently spatial? These questions have been bugging me for some time now. Perhaps these questions seem bizarre to raise amongst GIS professionals, but time has only deepened my doubts. I want to examine these questions outside of the pedestrian contexts of “turf protection,” academic condescension, and vendor marketing spiels. I think that by examining these ideas, we get to the heart of what a GIS is supposed to do, and we can then do it better and more efficiently.
The question is this: What is a piece of spatial data? How do we know it when we see it? The answers I have come upon all follow this pattern: spatial data is information that includes or pertains to descriptions of the “where” of a thing, not just the “what.” Often, people make reference to information about “location” or even specific coordinates.
Okay, this is obvious, no? I am not arguing that the entire discipline of geography is based on an illusion, the illusion that space and location exist. (Although, there is a lively historical debate about just what is the nature of the discipline we call geography..?) And at bottom, let’s face it, nobody knows what space is anyway, but we do know that things are not all in the same place, and we assume that they are disposed about the universe “in space,” or that they “have space between them,” or something like that.
Maybe we should speak less about spatial data and more about place data since place is specified with location coordinates, while space is never specified by anything we in GIS do, but that’s not what I’m puzzled about, just an aside. Really, I’m wondering why place or location is in any way a special thing that can’t be handled the way we handle any other sort of information bits. Which kind of pulls the rug out from under the GIS software industry, doesn’t it?
So, let’s say we have a table with all the national capitals of the world, is that a spatial/place data set. If the first column is the name of the city and the second is the country of which it is the governmental seat, isn’t that telling us that the city is IN that country? So a simple tabular data set is a spatial dataset.
The argument goes then that you couldn’t take that table of cities and tell how far apart they are, or which was in a country next to some other country, and so on…Obviously, we can solve this first question by including the coordinates of the city in two (or one) column. The second question requires more information about the polygons of the national boundaries.
But my point here is that we are simply introducing more tabular data with more attributes, and relations among tables, nothing inherently spatial at all that can’t be treated like any other type of data. Coordinates are simply numerical data attributes for a specific record. Querying for proximity is simply another variant on SQL set inquiries. Why is asking
SELECT * FROM CITIES WHERE DISTANCE FROM BOSTON > 500
any different from
SELECT * FROM CITIES WHERE TIME_ZONE <> EST ?
The latter query, for certain data tables, might return the very answer required by the first. (It would depend on the data table, and I’m not saying this is a good solution, just that it could be a solution.) That is, all the cities in the USA that are not on Eastern Standard Time are likely to be more than 500 miles from Boston. Of course, that leaves out Florida and a lot of other places, BUT if the database didn’t have cities from those states in the first place, the query would serve fine.
And in the first query, there isn’t anything mysterious going on in that DISTANCE request. Just a lot of number crunching on (x, y) coordinate pairs to find the set of records that meets the criterion. If the query were
SELECT * FROM CITIES WHERE STATE_NAME <> MA
we would get the set of cities not in Mass. A spatial query without any geometry at all! And how is it accomplished? Does anyone care? Codd and Date, the theorists of relational databases don’t care. They say that the software mechanics of how these questions are answered is irrelevant (to them)! We all know there is a lot of parsing and crunching of text data to check the STATE_NAME for each city against ‘MA’ and find the set that matches!
My point being, there seems to be no such thing as inherently “spatial” data, only data for which place or location is an attribute. Which would seem to mean that the usual set operations of RDBMS are adquate to doing what we need to do. Which would seem to mean that GIS is a distraction: we should be thinking about:
1) Relational Database Systems
2) Geography
You are doing that already you say? A strawman? Then why do so many people in IT departments sniff that GIS is just a tool for making “pretty pictures?”
Okay, I guess you catch my drift. Comments?

July 24, 2008 at 11:48 am
Not sure what your background is in GIS, but maybe you should look at some books on what makes spatial data different.
Try “GIS: A Computing Perspective” by Michael Worboys and Matt Duckham for a thorough introduction to the factors that distinguish spatial data processing from other kinds of data processing.
If you’re interested in spatial databases, the book to read is probably “Spatial Databases” by Philippe Rigaux et al, which goes into detail on the characteristics of spatial data and how they are stored and manipulated in databases.
Spatial SQL is just the tip of the iceberg, and I’m not sure myself how far spatial data fits easily into the relational paradigm at all. In theory, a single point can have an infinite number of spatial relationships, for example, and how do you even define a single point unless you first define a minimum resolution - otherwise your “point” could contain even smaller points ad infinitum, or at least down to the Planck length!
FWIW I’m just a GIS student (but with 20 years experience as a database developer), and I definitely do not think GIS is “just a tool for making pretty pictures”.
Serious spatial analysis requires an understanding of the underlying data, the distinctive nature of spatial relationships and how they interact, and often also of the specific domain concerned - social geography, geology etc. It’s certainly true that some GIS professionals are a bit prickly about people encroaching on their turf, but in fairness, they are probably just sick of people assuming GIS is nothing more than Google Maps or a bit of nice artwork. The more I learn about what they do, the more I can sympathise with their grumpiness!
July 24, 2008 at 4:55 pm
I’ve worked with GIS for almost twenty years, so I’ve been on all sides of this thing.
I’m not clear on your argument about resolution and a “single point.” A point is defined by two numbers, that’s all. Whether a point is a good respresentation of a physical feature, e.g. a city, is another question.
I agree with you about what is required for “serious spatial analysis,” but I’m not talking about what geographers do. I’m talking about what the tools they use do to allow them to do it.
Your suggestion that I should “look at some books on what makes spatial data different” is a good one, and I will…again. Surely you see, however, that the nature of your suggestion does seem to beg the question I am asking by assuming that Spatial Data exists sui generis.
I will post more here later after I have digested some of the specific texts that people seem to believe are important and relevant.
July 24, 2008 at 6:07 pm
Man, you are one mixed up guy, aren’t you?
July 24, 2008 at 6:30 pm
I agree with the Dude. You appear to have a widely developed intellect, but you have a difficult time translating those thoughts with the written word. I’d be curious as to your exposure of “almost twenty years” with GIS. Unfortunately IMHO, your post sounds like a strangely rambling tirade.
July 24, 2008 at 11:48 pm
Dude, am I mixed up? I think I’m quite rational. Maybe you just don’t share my tastes, but this post is distinct from the rest of my blog. That’s why I put it on a different page. Venture into my Perplexity at your peril…
WaitWhat? Difficult time with the written word? I’m no Shakespeare, but I think I can write expository prose fairly well. Tirade? I tried very hard to keep my tone even and open minded. If you want a tirade, believe me, I know how to give one. I asked for comments, critiques, ideas, and you give me condescension…
As for my “exposure,” I have degrees in civil engineering and geography, and I have worked as an engineer for over 20 years. Most of that has involved mapping infrastructure, real estate, environmental pollution, and the like, particularly in cities, and for the last 15 years, that has meant using GIS tools. I work with ESRI, program in Delphi, VBA, ArcObjects, MapObjects, TatukGIS - I use Access and Firebird, Oracle, etc. etc…on and off the Web. Does this qualify me as a “real GIS professional?” In addition, much of my work has been involved with creating GIS interfaces for environmental models that produce prodigious amounts of data over very large domains and through time, so I’ve been thinking about this question of what data is and isn’t for some time.
I also have a degree in the History of Art, and a life-long interest in philosophy. For this reason, I find that I am intrigued by the fundamentals of visual communication and the logic of concepts. This is what I am addressing here. I have no objection to people who find my curiosities boring or irrelevant, but I think they should try and understand them before dismissing them as wrong.
My post is purposely a bit rambling - I’m sketching out a position and trying to evoke responses.
July 25, 2008 at 4:20 pm
lichanos: “Surely you see, however, that the nature of your suggestion does seem to beg the question I am asking by assuming that Spatial Data exists sui generis.”
Well, you sound like you have more experience of GIS than me, but my own impression as a 20-year database veteran is that spatial data definitely does exist as a distinct category of data, not so much in terms of its simple representational elements - vector point/polygon, raster etc - but in the way those elements relate to each other in specifically spatial ways.
Sticking to this idea of “spatial data” as a category, there are a whole set of spatial relationships - containment, intersection etc - that apply to spatial data and not to other data, and these relationships often exist at multiple levels simultaneously, as you know.
In practical terms, that’s why spatial data needs a distinct approach for thigns like indexing e.g. to include the concept of “near” within the index implementation. Storage is another issue with different practical implications - polygons as sets of points? multi-dimensional data such as time series etc? store rasters as individual cells? raster cell dimensions as points/polygons? etc etc. And the process of navigating spatial relationships also requires specifically “spatial” implementations. How do you define and implement a “spatial JOIN” operation for example?
So it seems to me there are lots of things that characterise/distinguish “spatial” data, justifying the use of “spatial data” as a general term.
See the Rigaux book for much more on this - written by people much smarter than me!
July 25, 2008 at 4:37 pm
One more thought:
In theory all data stored in a computer is just a lot of binary code, and the computer treats it all as a bunch of 1s and 0s, just as you imply with your remarks about “number crunching” to work out distance etc. But the operations that are performed on that data are different depending on its meaning at a higher level. We sort data differently depending on whether it’s numeric or character, for example.
So “spatial” data is essentially another distinctive category of data at a higher level. It can be reduced to a bunch of numeric coordinates or even binary codes, but we need specific “spatial” logic to store and manipulate “spatial” data in a meaningful way.
I guess it’s like the natural sciences. We can reduce everything to “just” physics (or even “just” mathematics), but in the real world we need other approaches with their own methods and theoretical structures, such as chemistry, biochemistry, ecology and of course geography, to make meaningful use of that knowledge at different levels.
Which is exactly what we are doing when we apply specific methods for handling “spatial” data.
July 25, 2008 at 5:48 pm
ChrisW:
Thank you for your comments. They are very helpful and illuminating for me. I would suggest, however, that you agree with me, even if you don’t know it! You are describing exactly the approach that I would like to see more of, i.e. treating “spatial” data as data that simply requires special treatment to make meaningful sense of it. BUT at the machine level, AND at the abstract level, it’s just…data.
Databases are supposed to mimic, however crudely, the processes of human thought by which we make relationships among bits of information. To the extent that data is spatial we must have different operations than the ones that we use for string records. But in the end, we are programming machines to do SET OPERATIONS on RECORDS. We don’t treat numerical records the same way as text records, but we don’t always talk about them as though they are in two entirely different realms either.
This is all my argument amounts to. To me, the fact that data is operated on differently by algorithms seems unimportant. I know there are volumes about all these differences, but that, to me, seems, logically secondary to the proposition that it’s all just records…with nothing inherently spatial about them…
Maybe it’s trivial, but I personally feel it has implications for the community of users. The idea that GIS and Spatial Data are so different and special is partly responsible for the underuse of the technology. I think it has implications for how it should be taught and even more important, for how people should be trained. And finally, it is a useful antidote to years of marketing by vendors who have had a vested interest in making GIS seem like a unique-thing-in-itself. The effects of this have been to “raise” a generation of users who think that ESRI=GIS=Spatial Data. This is changing now, thank goodness, largely as a result of GoogleEarth and OpenSource applications.
Over the years, the only data that seems to me to be inherently spatial is raster data, and I’m not sure about that. I’m thinking that raster is more of a model of the world the way that a globe is a model of the earth, and not a map of it. That is, in essence, a raster is an piece of terrain shrunken into a small analog form. Analog is the key. Databases are not analog, which is why, I believe, that Codd and Date make a point of saying that the actual software implementation of the system is irrelevant to their principles.
Anyway, I ordered the Warboy and Duckham book…
Cheers
July 26, 2008 at 9:14 am
Yes, I would agree that it’s all “just” data, but that is a trivial point in itself.
Data only becomes information when we add meaning to it, and in this case the meaning relates to the data’s spatial context, just as, say, actuarial data or agricultural data only has meaning when treated in its particular context. The number “123456″ has no intrinsic meaning until we know what it refers to - a population count, crop output per hectare, an altitude measurement or a grid reference. Spatial data is data with a spatial meaning and context (tautological but true!), which determines the appropriate operations and theoretical framework for using that data.
As for rasters, I disagree with your description of them as “analog”. Analog data might be an old vinyl recording, where the output signal is directly related (through an essentially mechanical process) to the original input signal. Digital data - as on a CD - is digitally sampled and represents a discrete set of data points e.g. measuring particular frequency bands to a certain degree of resolution. By its nature, it loses information from the real-world signal because it merges values within each sampling frequency band (for example), but we try to make the resolution small enough that we won’t notice the missing information, without swamping our systems with huge volumes of data.
So raster data may measure “fields” (Worboys’ term), just as sound recordings do, but just like a CD the actual data is created as a set of discrete samples with a particular resolution. A raster cell has a fixed size (its spatial resolution) and a fixed range of sampled values, just like a digital frequency band on a CD.
True, GIS often uses a set of raster data to construct a representation of the original “analog” field, such as an elevation model, but you cannot recreate the original value for points between/within your raster cells, only generate a value based on the subset of data in your raster, using various statistical techniques that are taken to produce a reasonable approximation to the likely real-world value. If your cells are too big, or your data measurements too coarse-grained, you may miss significant real-world features. The real-world is “analog”, but the data is digital these days, and a model is not the same as the thing being modelled, whether it’s a statistical population model, an investment strategy model or a DEM.
As for rasters in databases, Oracle allows you to store raster data down to the level of individual cells and frequency bands for each cell, which reflects your own view that it’s all “just” relational data. This is after all what the raster file itself does, and in theory you could store the same data in a spreadsheet. Or on a big piece of paper.
But I’m somewhat dubious as to how far you can combine data from different rasters held in the Oracle style, without a lot of processing to ensure that every cell is truly comparable across all your raster sets. Most relational databases assume that each table contains comparable records and attributes, but this may not be true of raster cell data. I don’t know.
I also wonder whether this effort is necessarily justified in many cases. After all, with vast amounts of satellite imagery being delivered every day, many rasters are “obsolete” within days of being created. And this ignores the added work required to create/maintain the time series for each raster/cell.
So in practical terms many people still prefer just to manage their raster data not as individual cells but at a higher level as stored BLOBs or files, partly because it’s easier (and often quicker) and probably also because it makes it easier to retain the spatial context and coherence of the raster datasets, which is also important.
As for “GIS”, it seems a perfectly reasonable term to me, just like “statistical” or “financial” information systems, and each application domain places particular requirements on the technologies we use to support them. Spatial data has particular characteristics requiring particular approaches, so “GIS” is as good a way as any to recognise this. Although I would agree that this should not be seen as a rigidly fixed or narrowly defined term, as the range of spatial technologies and applications is constantly changing and may overlap with other domains in different ways.
Sure, there’s probably a certain amount of marketing hype and deliberate obfuscation by some groups out to protect their turf, but that’s true of any field. And I think it’s true that some GIS people focus exclusively on the “G” and tend to forget what the “IS” stands for.
But it’s also true that each domain requires specialist knowledge to make best use of the tools and information. The data on your medical records is “just” data, but you’d probably rather the people using it were medically qualified than just a bunch of grizzled old database developers like yours truly!
July 26, 2008 at 1:11 pm
Пoлезные статьи о Seo, продвижении сайтов и заработку в интернет
July 27, 2008 at 11:50 am
There’s nothing special about spatial. It’s all about autocorrelation - “spatial data” is just our most obvious example of dealing with this thorny area with statistics. There’s no end of multidimensional statistical problems, but the familiarity of XY gets us these specialty arenas.
July 27, 2008 at 11:52 am
And dude, if you think a “point” is defined by two numbers you are insane. A “point” is 0D, it can have any number of attributes defining its position.
July 27, 2008 at 6:34 pm
mdsummer:
Yep, I guess I’m insane. At least I’m not rude. I was speaking about points relevant to maps. Could you expand on your remark, please?
July 30, 2008 at 1:05 am
ChrisW:
Thanks for your many ideas and arguments. I just got my copy of Worboys and Duckham, and I will post again after I have gone through it.
August 5, 2008 at 10:25 pm
I’ve worked with several GIS implementations that stores spatial data in tabular form. Not just coordinates, but also connectivity, topology and spatial indexes.
Actual, my job was to migrate these “old” systems to a “new” system that uses UDT’s/BLOB’s to store spatial data.
I’m convinced that you can model any kind of data in a relational database the “Codd way”. That is, using normalized tables and columns storing atomic values. I’m also convinced that using UDT’s should be avoided if possible. So why GIS platforms based on UDT’s is gaining popularity is a very interesting question.
Is spatial capabilities really so special that we need UDT’s to handle them?
I see some benefits, but also a lot of problems.
August 13, 2008 at 10:14 pm
I found your site on technorati and read a few of your other posts. Keep up the good work. I just added your RSS feed to my Google News Reader. Looking forward to reading more from you down the road!
August 14, 2008 at 7:45 pm
Your blog is interesting!
Keep up the good work!