20/9/2023

Tiling GB sectors... by hand. (pt1)

At a time when the stunning advances of AI are ever present in the news I have decided to go against the grain. For the last 6 months we've been expanding our internal postcode risk file to make it ready to be made available commercially when I struck upon an idea - could I categorise every postcode sector in Great Britain? And more importantly could I do it manually?


The reason I feel that there is still a place for this sort of analysis is not that I doubt the capability of AI to undertake a task, but that I doubt my ability to ask it the right question. How do I ask any automated system to categorise data when I don't even have an idea of how I want it to be categorised. Because it quickly comes apparent that this task is not just about assigning predictive data to enrich a postcode file, but about enriching my knowledge and understanding enough to define the problem in the first place!


But let's start at the beginning - GeoPandas.

London by dwelling type cluster.


Geopandas

A tool that I almost never see insurers using enough is an interactive map. Given that your postcode risk ratings can be so crucial in your ability to set a good price for a customer (our market premium prediction model sees our Pebbles Accident Risk rating as the most predictive feature) it surprises me that far more time is not spent on the technical allocation of data and features at an area level.


I think that one reason for this is that these sorts of tools are just not that common. GIS software is often expensive and inaccessible if you are not an expert in them. I would still advise that insurance companies invest in this activity, but it would help if there were a low cost option within existing skillsets. And there is. Geopandas is a python module that allows you to easily build maps from shape files and attach dataframe information for visual interrogation.

And the interactive maps look great. Here's a zoomed out view of our motor accident risk map (red is worse).

Pebbles Private Car Risk Accident 2023 snapshot.


Training

One thing that both AI and I need is some training on what we are trying to build. But the advantage that I have (I think!) is that I can figure things out as I go along. AI, especially LLMs, seem excellent at figuring things out and connecting the dots, but I've yet to see them start with a vague concept and drill down to create something new and then build against it. But this is where I am. I have a vague notion of what I want, but how to turn that ambition into a task?


The approaches, however, are the same. I need to draw upon my own experiences and then I need to really just dive in and see how things go; error correcting myself as I go along to improve what I am doing. Can AI ask itself questions? If not now then surely just a matter of time.


So, to the practical. The first thing I need is to define what the categories are supposed to represent, and here I think I need to go with as plain English a description as possible. If I were to randomly ask someone that lived there what was the single most notable feature about where they were, what would they say? I think this is the only way to get such a broad range of views. I'm not asking people a set of questions like urban density or transport links or distance to the nearest fire station (incidentally, one of my favourite rating factors). I'm asking them to give me a single viewpoint that encompasses everything.


Second I need to break the task down to make it manageable. If I start too big I will get muddled. For this reason I decided to start with the coastline, that way everything should more or less be a range on a single scale (note from future self: it wasn't), which allows me to start quickly.


And finally I need as much information (even if it is biased). I loaded up my Geopandas map in one window, with some Pebbles data as a colour overlay. In another Google Maps with the satellite image layer. And finally some personal experience - the most local beach to where I grew up and spent many a dreary summer, Burnham On Sea. Population 16,315. Postcode Sector TA8 1. Ok, small town but it is a town. On the Coast. Boom, category C2. I was off to a start.

It's basically fraudulent advertising at this point.


Early Lessons

My original intention was to categorise the coast into 3 bands (rural, town, city) but by the time I had made it around Cornwall and reached Bournemouth I realised I had a problem. Those categories (C1, C2, and C3) were not granular enough to meet my needs, and I needed to add in two more. C1 was supposed to represent small towns, but there was so many postcode sectors with not even that present that I needed to make C1 just for the most rural of areas. C2-C3 then needed to be stretched a little more to get C2-C4. Finally I needed a new category of C5 purely to represent docks.

But then what about HMNB Davenport in Plymouth? Well it could be a dock, but then its military nature makes me think it would be sufficiently different. So queue our second categorical range, F for Forces. And I'll probably need army bases (F1), and air bases (F2) in addition to naval bases (F3).


But possibly the most important lesson was that it was right to just get started. All too often we can try to work out the perfect answer before we commit to the work, but as data scientists, as analysts in general, information is our commodity. We can't expect to have it all at the start. We need to adapt as we learn.

And Somerset / Devon / Cornwall / Dorset are reasonably similar places. I know there's a lot more to come as we head around the coastline. And I'm excited about what new things I will find along the way.


If only I didn't need to be 15+ years into a career and have started my own company to be given the freedom to do a project like this.

insights

Continue reading