About the project
BackgroundWhen I joined the Guild of One Name Studies (GOONS), I took seriously its aim "to promote the preservation and publication of one-name genealogical data, and to maximise its accessibility to interested members of the public". Hence I became very interested in the use of the internet as the ideal medium for providing widespread access to data.
I had made patchy use of computers for storing genealogical data, since the early 1980s but now wanted to make the bulk of my researches available on the Web. Initially, I put a lot of information into word-processed documents and used built-in filters (eg menu items such as "Save as HTML") to convert the information to web pages. However, I was very dissatisfied with the results and ended up preparing most of the web pages by hand. The trouble is, this creates a maintenance problem: when you want to add a new item, you have to add it to your 'master' information, in a word-processor document, and also into the relevant web page. This provides plenty of opportunity for inconsistencies to arise between the two versions.
I tried several commercial genealogical packages, to see if they would help, but none of them seemed to be the right tool for the job. The problem seemed to be that they start at 'the wrong end'! For a conventional genealogist, the central body of data is the people who are linked together through relationships. Sources are referenced to support assertions about these relationships. One name genealogy (for me, at least) seems to start at the other end of the problem: I have a central body of source information which refers to the name but can only gradually be associated with particular people.
It has been this need to cope, not just with source data, but also with information about individuals, that has been the undoing of my current website. I started to create web pages that included biographical information about individuals and, naturally, it made sense to link this information to a transcript of its source. So, if I am stating that an individual was married on a particular date, it would be sensible to provide a link to the place on the website where I have transcribed that person's marriage certificate. What's more, hypertext lends itself very well to this kind of linking.....up to a point. And that point is one of complexity. As you add more records and more individuals, the number of links between them becomes very large and starts to reveal a weakness of HTML encoding: namely that all the information about the relationships between items of information is distributed between pages and cannot be easily managed. So, for example, if I edit a particular page, there is no easy way to discover which pages link to it and whether these links will need to be updated to reflect the changes to the page.
The database project: 2000-2001By this stage, I knew that I needed a more sophisticated technology and decided to experiment with a Relational Database Management System (RDBMS): a software data repository which stores information in tables and allows it to be accessed and indexed in complex ways. Plenty of people use databases to store indexes, but my plan was to store the entire contents of the website and, at the touch of a mouse, to be able to regenerate the website automatically, every time I updated any records. So, in about the year 2000, I chose an RDBMS (MySQL) and a programming language (Perl) and set about representing the entire contents of the website as tables of data.
The design of the database divided, conceptually, into two main areas which stored information about records and people, respectively. These two categories were reflected in the design of the physical website, which had two major items of contents: the set of Gumbleton records and the details of people. Each type of record (baptisms, for example) was represented within the database as a table of data and a web page template. The data extraction software assembled the webpages from these records and templates.
Details of people were stored differently, with one central table that stores the main information, such as name, birth and death dates and occupations for each individual. Again there were HTML templates which determined how this information would be rendered as web pages. Each person's data also included (if know) the identity of their parents, and the computer program that generates the 'people pages' was able to find the parents and create hyperlinks to them. Because of the existence of these parent links, the program was equally able to locate each individual's children and include references to them from the parent's web page. Hence, each person's information does not need to include references to multiple children, so long as each child's information refers to its two parents.
As well as the two main sets of tables, relating to records and people, there was a third table that connects the two: the 'Events' table. A particular record may be associated with a number of events: for example, a marriage record may represent an event for the bride, the groom and the witnesses. Entries in the events table provided the link between the record and the person to whom the event relates. When the web page for an individual was generated, the programme searched for all the events that refer to them and summarises these on the page.
To summarise, then, whenever a page was generated for an individual, the programme stucks together the basic information about that person, plus references to any children who identify this person as a parent, plus any events that identify this person. All this data was then rendered by means of a configurable template.
How well did it work?So how well did this work? Some things worked well:
- The ease with which people and records can be linked together was highly successful and has proved to be useful as an analysis tool, as well as a presentation tool. Essentially, it allowed me to 'tick off' which records relate to which people and see what is left over at the end!
- The technology coped easily with the scale of this project and could cope with a massively larger job. The MySQL product also copes happily with all kinds of content types, including variable-length text, pictures, template files, etc. Basically, anything related to the website can be stuffed into the database. In the early stages of the project, I was too ready to do this and, as things evolved, I began to use pointers in the database to access additional files such as some of the large text files and pictures.
- The separation of data and templates meant that I could easily generate HTML web pages, using one set of templates and word-processor-friendly documents using a different set, at the press of a button. Also, I initially kept a set of templates which would allow the website to be generated in exactly the same format as the pre-database days, so I could always revert to that approach if I wanted to...but I have never used them!
- An unplanned benefit was that it provides some protection against the rip-and-run merchants who copy people's entire websites. I have never seen this as a huge threat — I put the information there for people to use it — but, without the behind-the-scenes database, it would be very difficult for anyone rip off the website in any usable form. However, I would emphasise that this is not an issue for me and, if anyone wants a copy of the entire database, they are welcome to it!
- The database was quite complex, and managing it was still an issue. Adding a new record entailed linking it to an appropriate parent record, creating the events that relate to it and identifying which individuals those events relate to. That was a manual process: I always intended to write programs to implement administration screens with simple input forms to simplify this process. However, I have never built them and have continued to 'knife and fork' the data into the database. This has left me with a backlog of updates that need need to be done, but haven't been.
- I haven't been able to maintain a completely clean separation between representation and content (ie between templates and data). This is largely because it is highly convenient to embed, within the data, markers that the programs can translate into links.
- I have not really worked out how much of the data's structure should be built into the database. For example, when dealing with items such as GRO index entries, it makes sense to use a data table that is specialised for this purpose, with a 'year' field, 'quarter', 'name' and so on. Other records are very much one-offs — the royal grant of a chest of arrows to William Gomeldon, for example, hardly cries out for a specialised data table. So the dilemma is how much to specialise the data (and be able to manipulate particular fields, such as 'years') and how much to generalise (so that diverse records can at least be treated consistently). I have ended up with a compromise, where some types of record are treated as special cases (GRO index entries, census entries, etc) whilst others are treated as 'Title + Text', where the text is completely opaque and such things as dates, names and places cannot be inferred from its content.
Software update: 2005About 4 years on from the original database-driven site, I made some improvements, such as the ability to mark individuals and items of data as 'Private'. When the website was generated, I could then select whether to include or exclude the private data. Hence, I could now have two versions: the online one, which excludes details of recent generations; and my own 'research' version which is not accessible on the internet but has everything visible. However, the problems of 'maintainability' have persisted.
Move to a web applications platform: 2014Towards the end of 2013, and still with a backlog of changes that needed to be made, I decided it was time for a "technology refresh". The Internet had moved on a lot in the 14 years since I began the project, and there were now several very good frameworks available for building web applications. I looked at two of the open source options: 'Ruby on Rails' and 'Django'. Both were readily applicable to the task in hand but I ended up choosing Django, mainly because it includes a built-in admin application which automatically generates administration screens for every database table that is created. One long-running problem solved before I even began!
Django is a very modular system which allows the whole task of representing all my data to be broken down into manageable 'apps', each of which could be built and tested separately. Also, a great deal of the complex software for extracting and manipulating data from the database and rendering it in a template is provided as part of the framework. Most of what I have had to do is a matter of configuring the system for my needs rather than coding applications.
Much of the database structure has been left unaltered, though a few changes have been made:
- The 'Events' table has always been a thorn in my flesh. It seemed like a really useful idea to have a database table that provided the link between records and people. However, it became a maintenance nightmare because any change to records or individuals required checking and updating the events that link them together. In the new version of the data model, I have dispensed with the Events table and, instead, have tagged on some event-related fields to every record type, together with a list of the people who are referred to in the record.1
- I've made several simplifications to the structure so that I need fewer tables in the database
So far, I have concentrated on moving the existing site onto the newer technology platform,but the visual appearance is very little changed. My next job is checking and updating a lot of the actual data....I think I mentioned that backlog of updates! When all that is done, I'll think about the overall appearance and style of the site.
Any feedback is much appreciated, and thanks to everyone who has commented or contributed over the years!
1. If you really want to know the gory details of this, the 'list of individuals is, in database speak, a "many to many" field. For such fields, Django automatically crates an intermediate table that connects together the information records and the people they relate to. This intermediate table performs, in many ways, the same function as my events table. The main difference is that Django creates and maintains it silently, without my really being aware of its existence.↩
Disclaimer: the owner of this website assumes no responsibility or liability for any injury, loss or damage incurred as a result of any use or reliance upon the information and material contained within or downloaded from this website. I have taken considerable care in preparing information and materials which are displayed in this website. However, I do not provide any warranty concerning the accuracy or completeness of any information contained herein.