About the project

Background

When I joined the Guild of One Name Studies (GOONS), I took seriously its aim "to promote the preservation and publication of one-name genealogical data, and to maximise its accessibility to interested members of the public". Hence I became very interested in the use of the internet as the ideal medium for providing widespread access to data.

I had made patchy use of computers for storing genealogical data, since the early 1980s but now wanted to make the bulk of my researches available on the Web. Initially, I put a lot of information into word-processed documents and used built-in filters (eg menu items such as "Save as HTML") to convert the information to web pages. However, I was very dissatisfied with the results and ended up preparing most of the web pages by hand. The trouble is, this creates a maintenance problem: when you want to add a new item, you have to add it to your 'master' information, in a word-processor document, and also into the relevant web page. This provides plenty of opportunity for inconsistencies to arise between the two versions.

I tried several commercial genealogical packages, to see if they would help, but none of them seemed to be the right tool for the job. The problem seemed to be that they start at 'the wrong end'! For a conventional genealogist, the central body of data is the people who are linked together through relationships. Sources are referenced to support assertions about these relationships. One name genealogy (for me, at least) seems to start at the other end of the problem: I have a central body of source information which refers to the name but can only gradually be associated with particular people.

It has been this need to cope, not just with source data, but also with information about individuals, that has been the undoing of my current website. I started to create web pages that included biographical information about individuals and, naturally, it made sense to link this information to a transcript of its source. So, if I am stating that an individual was married on a particular date, it would be sensible to provide a link to the place on the website where I have transcribed that person's marriage certificate. What's more, hypertext lends itself very well to this kind of linking.....up to a point. And that point is one of complexity. As you add more records and more individuals, the number of links between them becomes very large and starts to reveal a weakness of HTML encoding: namely that all the information about the relationships between items of information is distributed between pages and cannot be easily managed. So, for example, if I edit a particular page, there is no easy way to discover which pages link to it and whether these links will need to be updated to reflect the changes to the page.

The database project: 2000-2001

By this stage, I knew that I needed a more sophisticated technology and decided to experiment with a Relational Database Management System (RDBMS): a software data repository which stores information in tables and allows it to be accessed and indexed in complex ways. Plenty of people use databases to store indexes, but my plan was to store the entire contents of the website and, at the touch of a mouse, to be able to regenerate the website automatically, every time I updated any records. So, in about the year 2000, I chose an RDBMS (MySQL) and a programming language (Perl) and set about representing the entire contents of the website as tables of data.

The design of the database divided, conceptually, into two main areas which stored information about records and people, respectively. These two categories were reflected in the design of the physical website, which had two major items of contents: the set of Gumbleton records and the details of people. Each type of record (baptisms, for example) was represented within the database as a table of data and a web page template. The data extraction software assembled the webpages from these records and templates.

Details of people were stored differently, with one central table that stores the main information, such as name, birth and death dates and occupations for each individual. Again there were HTML templates which determined how this information would be rendered as web pages. Each person's data also included (if know) the identity of their parents, and the computer program that generates the 'people pages' was able to find the parents and create hyperlinks to them. Because of the existence of these parent links, the program was equally able to locate each individual's children and include references to them from the parent's web page. Hence, each person's information does not need to include references to multiple children, so long as each child's information refers to its two parents.

As well as the two main sets of tables, relating to records and people, there was a third table that connects the two: the 'Events' table. A particular record may be associated with a number of events: for example, a marriage record may represent an event for the bride, the groom and the witnesses. Entries in the events table provided the link between the record and the person to whom the event relates. When the web page for an individual was generated, the programme searched for all the events that refer to them and summarises these on the page.

To summarise, then, whenever a page was generated for an individual, the programme stucks together the basic information about that person, plus references to any children who identify this person as a parent, plus any events that identify this person. All this data was then rendered by means of a configurable template.

How well did it work?

So how well did this work? Some things worked well: Some things didn't work so well:

Software update: 2005

About 4 years on from the original database-driven site, I made some improvements, such as the ability to mark individuals and items of data as 'Private'. When the website was generated, I could then select whether to include or exclude the private data. Hence, I could now have two versions: the online one, which excludes details of recent generations; and my own 'research' version which is not accessible on the internet but has everything visible. However, the problems of 'maintainability' have persisted.

Move to a web applications platform: 2014

Towards the end of 2013, and still with a backlog of changes that needed to be made, I decided it was time for a "technology refresh". The Internet had moved on a lot in the 14 years since I began the project, and there were now several very good frameworks available for building web applications. I looked at two of the open source options: 'Ruby on Rails' and 'Django'. Both were readily applicable to the task in hand but I ended up choosing Django, mainly because it includes a built-in admin application which automatically generates administration screens for every database table that is created. One long-running problem solved before I even began!

Django is a very modular system which allows the whole task of representing all my data to be broken down into manageable 'apps', each of which could be built and tested separately. Also, a great deal of the complex software for extracting and manipulating data from the database and rendering it in a template is provided as part of the framework. Most of what I have had to do is a matter of configuring the system for my needs rather than coding applications.

Much of the database structure has been left unaltered, though a few changes have been made:

The way Django works is to process any request for a web page and to construct the page, on -the-fly, according to the rules that I've programmed in and the content of the database. So, at home, my PC runs as a kind of private website that runs the Django server. Every time I look at a Gumbleton page in my web browser, Django recreates it from the database. In an ideal world I would run an identical copy as the public version of the website. However, funds are limited and the hosting package that I pay for does not support the Django platform. Therefore, I periodically use a kind of "robotic web client" to retrieve the pages from my Django-based website and to store them as "static" web pages, which I then upload to the main Gumbleton site. This may sound complex but is actually quite simple: I use a program called "Wget" which can be used to clone websites. It takes it less than half a minute to navigate the entire Django-based site and save it as as a set of web pages, ready for upload.

So far, I have concentrated on moving the existing site onto the newer technology platform,but the visual appearance is very little changed. My next job is checking and updating a lot of the actual data....I think I mentioned that backlog of updates! When all that is done, I'll think about the overall appearance and style of the site.

Any feedback is much appreciated, and thanks to everyone who has commented or contributed over the years!

Steve West

1. If you really want to know the gory details of this, the 'list of individuals is, in database speak, a "many to many" field. For such fields, Django automatically crates an intermediate table that connects together the information records and the people they relate to. This intermediate table performs, in many ways, the same function as my events table. The main difference is that Django creates and maintains it silently, without my really being aware of its existence.


Disclaimer: the owner of this website assumes no responsibility or liability for any injury, loss or damage incurred as a result of any use or reliance upon the information and material contained within or downloaded from this website. I have taken considerable care in preparing information and materials which are displayed in this website. However, I do not provide any warranty concerning the accuracy or completeness of any information contained herein.