Ode to a dead project

leif | main | Wednesday, May 7th, 2008

I just visited Sourceforge with the intention of declaring my “Urdabrunn” project as dead. But instead, I decided to announce that the project has moved to my own domain, Solumslekt. The Exodus project is the natural heir and successor of Urdabrunn, and it seems natural to host it here.

The problem with the Urdabrunn project was that it was far too ambitious, like so many other dead projects on Sourceforge. It’s of course much easier to concoct grand plans than to do the actual work.

I did write some routines in Python for the Urdabrunn project, but the major work was the construction and migration of the database itself and the parts of the business logic that I wrote in plpgsql. That codebase has survived to the present day. When I decided to “go incremental” with my humble PHP reports, the interface took off in a totally different direction from what I had imagined at the time I wrote the “Into the Blue” article and submitted the Urdabrunn project to Sourceforge.

For my own part, I feel that Exodus gives me what I need from a genealogy software package. For my personal use, it’s stable to the extent that several months may pass between each time I do even small incremental changes to the codebase.

During the last week, however, I have been in a flurry of refactoring and adding translation strings. That’s because I finally have been engaged in a serious discussion about the project. There is a big difference between writing software for your own use, and having another person trying it out and asking intelligent questions.

Restructured ’sources’ table

leif | main | Thursday, May 1st, 2008

Thanks to my friend Bill, who has had the guts and perseverance to actually take my heap of code for a spin, I have finally gotten around to clean up the act a bit. I’ve i18n’d the source_manager.php and the source_edit.php. In addition, I have done a major restructuring of the ’sources’ table that I’ve been pondering for a long time, and I figured now was the time to do it. I have dropped the column small_text, and renamed the large_text column to source_text. The idea with the small_text field was originally to provide a kind of ’synopsis’. But soon I started to use it for ‘hidden text’ instead. I have now changed the method for inputting hidden text in sources as follows: If you enclose any text in {curly braces}, it will not show up anywhere but in the Source Manager and the Source Edit screens. I think that is a more intuitive method. In addition, it simplifies the model.

I also found a bug in the previous version of source_edit.php; it didn’t show the contents of the small_text field, and it didn’t update it either.

In addition I have run regression tests and fixed a number of roadblocks. In its present version it should work almost out of the box, provided you do as outlined in the README.txt.

Sort_order on participants

leif | main | Friday, December 21st, 2007

I’m slowly but steadily moving away from the TMG ‘principals’ concept towards an open-ended ‘participants’ model. Today, I’ve made a couple of important steps on that road.

When you add participants to a multi-person event such as a census record, it’s desirable to have them listed in the order they appear in the census. To that end, I have added a sort-order column to the participants table:

ALTER TABLE participants
    ADD COLUMN sort_order INTEGER NOT NULL DEFAULT 1;

In the Source Manager, the code displaying events associated with a source has been changed from displaying ‘principals’ to a list of participants:

$event = $row['event_id'];
$p_handle = pg_query("select person_fk from participants
                   where event_fk = $event order by sort_order");
while ($parts = pg_fetch_row($p_handle))
    $participant_list[] = linked_name($parts[0], './family.php');
...
echo '<td>' . join($participant_list, ', ') . '</td>';

In the case of marriages, I like to list the male first. It may be old-fashioned and it might as well be the other way around; the point is that I like consistency. To that end, I decided to set the sort order to 2 for all female participants in marriages.

pgslekt=> update participants set sort_order=2 where event_fk in
pgslekt-> (select event_id from events where tag_fk = 4) and get_gender(person_fk) = 2;
UPDATE 4270

I don’t think that the sort_order column has any effect on other parts of the application — yet. But I’m planning an interface to events that will add any number of participants as well as their internal sort order. Currently, I’m doing too much from the psql command line. That’s a cumbersome and potentially dangerous habit.

The beauty of views

leif | main | Thursday, December 20th, 2007

Recently, I was asked if I had any ‘reports’ for my database. As this is a SQL database, you can of course add any number of reports that you want, and the concept of views comes in very handy in this respect. As an example, I often have to search in the database for married couples when I enter a baptism for a child, and I’m not entirely sure of who the parents are. Today, I set up a view to simplify this search:

pgslekt=> CREATE OR REPLACE VIEW couples AS
pgslekt-> SELECT
pgslekt-> sort_date,
pgslekt-> person AS p1,
pgslekt-> get_person_name(person) AS p1n,
pgslekt-> spouse AS p2,
pgslekt-> spouse_name AS p2n
pgslekt-> FROM
pgslekt-> marriages;
CREATE VIEW

And then it’s as simple as:

pgslekt=> select * from couples where p1n like 'Abr%' and p2n like 'Mart%';
 sort_date  |  p1   |               p1n                |  p2   |                  p2n
------------+-------+----------------------------------+-------+----------------------------------------
 1731-09-23 | 11182 | Abraham Zachariassen Venstøp     | 11183 | Marthe Danielsdatter Ballestad
 1758-08-19 |    57 | Abraham Jonsen Bjørntvet         |   493 | Marthe Andersdatter Veholt / Bjørntvet
 1815-01-01 |  1952 | Abraham Jensen Århus, Gjerpen    |  1953 | Marthe Gundersdatter Grøtsund
 1820-10-27 |   290 | Abraham Jonsen Moland / Grønnmyr |   495 | Marthe Johannesdatter Hustvetstrand
(4 rows)

Views often build upon other views. The ‘couples’ view is built upon the view ‘marriages’:

CREATE OR REPLACE VIEW marriages AS
SELECT
    person,
    event,
    event_date,
    sort_date,
    get_place_name(place) AS place_name,
    get_principal(event,person) AS spouse,
    get_person_name(get_principal(event,person)) AS spouse_name
FROM principals
WHERE tag_type=4
ORDER BY sort_date;

And the ‘marriages’ is again dependent on yet another view, ‘principals’:

CREATE OR REPLACE VIEW principals AS
-- find principals
-- in case of mixed-gender events (eg marriages), male is listed first
SELECT
    participants.person_fk AS person,
    events.event_id AS event,
    events.place_fk AS place,
    events.event_date AS event_date,
    events.sort_date AS sort_date,
    events.tag_fk AS tag_type
FROM
    events, participants
WHERE
    events.event_id = participants.event_fk
AND
    participants.is_principal IS TRUE
ORDER BY get_gender(participants.person_fk);

The ‘events’ and ‘participants’ referenced from ‘principals’ are real, physical tables.

In this way, you can build astonishingly complex reports by just picking a few attributes from one or more prebuilt views and functions.

Sort order on sources

leif | main | Friday, November 2nd, 2007

A few months ago, I made a change to the data definition that I somehow never got around to document. It’s an additional column called sort_order in the sources table:

    ALTER TABLE sources
        ADD COLUMN sort_order INTEGER NOT NULL DEFAULT 1;

This column is referenced from several places in the PHP code, but was missing from the datadef.sql until now. So if anybody have had trouble with getting the program up and running because of this, I’m sorry.

The “Exodus” article

leif | main | Thursday, October 4th, 2007

As a recognition that the Exodus article still is the main documentation of my project, I’ve given it a major overhaul. I’ve incorporated some of the entries from this blog, rephrased some sentences, and moved bits and pieces around. I hope that it has become clearer in the process.

Events with one, two or multiple participants

leif | main | Wednesday, September 26th, 2007

I’ve never really come to terms with the TMG notion of “principals” and “witnesses”. The idea is that eg. for a christening you can enter the child as “principal”, and the godparents as “witnesses”. Through the years, I’ve never done much more than playing around with this concept, as it’s both relatively time-consuming to enter witnesses, (IMO) of little use, and potentially cluttering up the person view with irrelevant details.

However, my data model still carries along this notion as a legacy from TMG. The participants table still has the column “is_principal”, but I’ve moved further and further away from actually using the value of this column for much. Instead, I’ve come to realize that there are really three kinds of events:

  1. The “one-person” kind, like birth and death.
  2. The “two-persons” kind, like marriage and divorce. This is also the category for my own “identical”, “probably identical”, and “confused” events.
  3. The “multiple-persons” kind, where you can have any number of participants. In other programs, this concept may be known as “shared notes” or “history tags”.

As a consequence of this realization, I’ve added another table:

CREATE TABLE tag_types (
    tag_type_id INTEGER PRIMARY KEY, description VARCHAR(20)
);

insert into tag_types values (1, 'single');
insert into tag_types values (2, 'double');
insert into tag_types values (3, 'multiple');

And a foreign key in the tags table:

ALTER TABLE tags
    ADD COLUMN tag_type_fk INTEGER
        REFERENCES tag_types (tag_type_id);

With this approach, you can forget all about “principals” and “witnesses”. A birth will have exactly one participant. A marriage will have two participants — that is, provided that it’s correctly entered in the first place. A “note” may have any number of participants, and it will show up on the person view of any of them.

One benefit of this is that it simplifies the layout of the person view. But even more, it enables me to differentiate the handling of events according to their nature. I’ve just started to explore the interface issues, but I expect that some aspects of the program will change a lot in the next few months.

AJAX and PHP

leif | main | Sunday, September 9th, 2007

Yesterday I received a book called “AJAX and PHP” that I ordered a couple of weeks ago. It’s a nice little book; in a couple of hours it enabled me to implement a little detail that I’ve struggled with for a long time: How to update the text to the right of the source number field whenever I enter a source number. I even think I understand how it works :) I’ll recommend this book for everyone who has a working knowledge of PHP and database interaction, and needs to come to terms with AJAX.

The update involves four files: form_header.php, form.js, forms.php, and srctxt.php. The form.js is a slightly modified version of the quickstart.js from the first example of the book. Because the source texts frequently contain XHTML elements, I had to change the response from a DOM ‘firstChild’ element to pure text. Besides, I had already figured out that I would use an “onchange” event to trigger the request. Here’s the code from forms.php that sets up the source input field and displays the source text:

echo "<tr><td>$_Source</td><td><input type=\"text\" size=\"10\" ";
echo "name=\"source_id\" value=\"$source_id\" onchange=\" showText(this.value)\">";
echo "<span id=\"nodeText\">$source_text</span></td></tr>\n";

The two JavaScript snippets that do the work are amazingly simple:

function showText(str) {
    if (xmlHttp.readyState == 4 || xmlHttp.readyState == 0) {
        xmlHttp.open("GET", "srctxt.php?srcid=" + str, true);
        xmlHttp.onreadystatechange = handleServerResponse;
        xmlHttp.send(null);
    }
    else
        setTimeout('process()', 1000);
}

function handleServerResponse() {
    if (xmlHttp.readyState == 4) {
        if (xmlHttp.status == 200) {
            response = xmlHttp.responseText;
            document.getElementById("nodeText").innerHTML = response;
            setTimeout('process()', 1000);
        }
        else {
          alert("There was a problem accessing the server: " + xmlHttp.statusText);
        }
    }
}

And the php script that generates the response is almost boringly trivial:

require "settings/settings.php";
require "functions.php";

$srcid = $_GET['srcid'];
if ($srcid)
    echo ' ' . get_source_text($srcid);

That’s all, folks.

Avoiding mixups

leif | main | Friday, August 31st, 2007

A very common mistake in genealogy is to attribute a source to the wrong person. An even more embarrassing mistake is to attribute the same source to two different persons, eg. using the same burial entry twice for different persons with a common name. In order to avoid such embarrassments, I decided to add a unique constraint to my sources table:

pgslekt=> ALTER TABLE sources ADD CONSTRAINT unique_source UNIQUE (parent_id,small_text,large_text);
NOTICE: ALTER TABLE / ADD UNIQUE will create implicit index "unique_source" for table "sources"
ERROR: could not create unique index
DETAIL: Table contains duplicated values.

Uh-oh. That means that I have to do a cleanup, and resolve those conflicting data. But that is of course why I want to enforce this constraint in the first place. But how do I find the duplicates? After a lot of trial and error, I finally succeeded with my first self-join:

CREATE OR REPLACE VIEW duplicate_sources AS
SELECT a.source_id, a.parent_id, a.small_text, get_source_text(a.source_id)
FROM sources AS a, sources AS b
WHERE a.parent_id=b.parent_id
AND a.large_text=b.large_text
AND a.small_text=b.small_text
AND a.source_id <> b.source_id;

I found and resolved several duplicate entries. A couple of them were unused, or rather orphaned by updates, and should have been removed anyway.

pgslekt=> ALTER TABLE sources ADD CONSTRAINT unique_source UNIQUE (parent_id,small_text,large_text);
NOTICE: ALTER TABLE / ADD UNIQUE will create implicit index "unique_source" for table "sources"
ALTER TABLE

Success. Now, I try to enter the same burial for two different persons. The screen fills up with error messages:

Warning: pg_query() [function.pg-query]: Query failed: ERROR: duplicate key violates unique constraint “unique_source” in /home/leif/public_html/exodus/functions.php on line 416

Warning: pg_query() [function.pg-query]: Query failed: ERROR: current transaction is aborted, commands ignored until end of transaction block in /home/leif/public_html/exodus/functions.php on line 419

Warning: pg_query() [function.pg-query]: Query failed: ERROR: current transaction is aborted, commands ignored until end of transaction block in /home/leif/public_html/exodus/functions.php on line 32

etc.

The regular “end-user” might find this kind of output rather unsettling. However, as I’m both the developer and the (AFAIK) sole user of this program, I find it perfectly all right. It just goes to show the robustness of the database, as the entire transaction is aborted. There’s no risk of inconsistent data being entered into the database.

This kind of constraint is of course only possible if you a) have a data model that allows you to enforce it, and b) enter your source transcripts in a consistent manner. As I usually cut and paste source transcripts from a separate text file into the database, it works reasonably well. But with just one space extra, or a period instead of a comma, a duplicate entry may of course slip through to the database. At a later stage, I may add some “fuzziness” to the constraint.

Google search for «Genealogy Data Model»

leif | main | Friday, July 20th, 2007

For a long time, the most frequent search string resulting in a visit at my web site has been “Genealogy Data Model”. On a monthly basis, I have around 10 hits with that particular string. Along with similar strings, it accounts for around 15 hits a month. Here’s the relevant entries from tonight’s Webalizer “Top 20″ search strings so far this month, containing references to data models and database design:

1 9 genealogy data model
2 3 entity relation diagram
3 3 genealogical database model
5 2 cd collection erd
6 2 celko google groups nested set tree moves
7 2 database model design genealogy
8 2 e-r diagram for house works
9 2 entity relation diagram public relations online
10 2 genealogical data model
11 2 genealogy : database : design
12 2 gentech data tmg
15 2 lage slektstre (Norwegian: “make family tree”)
17 2 open source genealogy database
18 2 slektstre (Norwegian: “family tree”)
20 2 textual data presentation

That’s quite impressive. 15 out of 20 search strings relates to database design, which actually is a very peripheral part of this site’s contents.

If I search for “genealogy data model” on Google, the top result is my page Forays Into Genealogy Data Base Modelling Part I. It doesn’t say much in the way of genealogy data models, it’s just some general thoughts that I harbored at the time it was written, about two years ago, along with my Perl script for dumping the TMG database. The second Google hit is also my second “Forays” article, called “Into the Blue”. In that article, I actually have an Entity Relationship Diagram or ERD of the data model, as well as quite detailed descriptions of the entities.

This page, ranking as number 7 on the same search result page, is probably responsible for several of the hits. It states: “I used to have a link here to Gendex which had a standard reference for database design for Genealogy. It’s gone now […] and I can’t find a suitable replacement. If you want to find one, Google for ‘genealogy data model’.”

My data model has little in common with traditional GDMs, and due to the continuing hegemony of “GEDCOM-compatible” data formats, it probably hasn’t much to offer to the average family tree researcher. However, I do believe that people like me, who focus on one-place studies, where you typically make exhaustive use of relatively few sources, might find the hierarchical organization of sources and citations useful.

Despite the apparent interest in genealogy data models, the responses I’ve been receiving on this blog as well as per e-mail are very few, and nobody seems willing to discuss the details. Nobody has put up a link to my pages. Each time I’m trying to initiate an online discussion about the issue, it dries out almost instantly.

This is not a complaint. Actually, I don’t mind if nobody else cares about my approach to the structuring of genealogical data. But I’m a little perplexed about the high rank on Google, and the resulting hits. If there’s that much interest in “Genealogy Data Models”, how come nobody wants to discuss them?

| Next Page »

Powered by WordPress | Theme by Roy Tanck