https://pds.blog.parliament.uk/2016/11/04/sharing-lists-of-lists/

Sharing lists of lists

The Parliament data platform is designed to sit between internal facing business applications and the new public website. The business applications generate, manage and report the data needed to run the two houses and the committees. The data platform provides a place to reconcile, aggregate, cross-link and subject index the data. And the website will make the data available as HTML, CSV, JSON and any other flavour people might find useful.

For the last few months we've been sketching out the details of what services the data platform needs to provide, getting the infrastructure in place and doing tech spikes with the new website team. Now we're beginning to dig into the meat of the matter and exploring the data we need to provide vs the data the business is capable of providing to us and how we close that gap.

Having sketched out the overarching domain model, we're now biting off tiny chunks and zooming into specific areas to design a data model that's capable of meeting the needs discovered by the website user research, the feedback from data.parliament users, the ODI report on Parliamentary open data, the recent community event at Newspeak House and the user needs we'll gather through ongoing research.

As we do this we're also auditing existing systems to check the integrity of the data and how much work we need to do to reshape it, to match the model, to meet the needs.

We've spent the last couple of weeks looking at the Members' Names Information Service (MNIS), a fairly core dataset of houses, memberships, members, their roles in government, opposition, parliament, committees etc. So far we've found:

  1. We have some data that matches the model that meets the needs. This is fine.
  2. We have some data that looks similar on the surface but the modelling is too simplistic to provide the service we need to provide. This is tricky. Expanding the data model to be more descriptive makes it more useful to more users but increases the workload of the people responsible for entering the data. Finding the sweet spot between the needs of users (internal and external) and the capability of the business to provide the data feels like the majority of the work for the next few years.
  3. We have missing data but some idea of where to find it. Or the usual problem of business semantics trapped inside typography trapped inside documents.
  4. We have missing data but no real idea of where to find it. This is particularly true for historical data. Which suggests we need to work closer with research and academia to backfill some of our records.
  5. We have data in some form that's maintained inside Parliament but probably shouldn't be.

The last one is interesting. Just looking at MNIS we have lists of constituencies (current and historical), election results (real and notional), government departments, government positions, government roles, political parties etc. None of which seem particular to Parliament and none of which Parliament should be the canonical source for.

Which raises the question of how many public (and private? and third?) sector organisations are maintaining reference data for things like government departments? How well maintained and reliable are those data sets? How out of sync do they get? What's the cumulative set of typos across the sources? How many shaky services are being built on top of wonky reference data? How many people are being employed to maintain them and build and maintain the systems to support them? How much does that cost? And what are the opportunity costs for advising, scrutinising, auditing, reporting on and interoperating with government when we can't even agree on identifiers and definitions of government departments?

All of which obviously brings up the GDS work on registers - authoritative lists you can trust. If the public sector could agree on how responsibility for maintaining registers gets split between the assorted organisations we could all save a lot of time, trouble and money on maintaining tottering towers of wonky data and shaky services.

So I thought it might be interesting to start a list of the lists of things Parliament has. For now it's a bit of a baby list but we'll keep adding to it as we explore more systems. I also thought it would be interesting if other organisations shared their lists of lists. And maybe arrange a meetup to compare our lists and see how much duplicate data and work we're doing. Probably there's a better way. Possibly someone else is already doing the leg work here. But the Parliament list of lists is on Github so feel free to add more.

Leave a comment