Solumslekt

Forays Into Genealogy Data Base Modelling

Part I: The Master Genealogist | Part II: Into the Blue | Part III: Exodus

Part I - The Master Genealogist

I really like the TMG data model a lot. TMG is an abbreviation of The Master Genealogist, the program I've been using for my genealogy research since 1997. The data model is very flexible, and is still kind of "state of the art" for commercial genealogy applications. It accomodates what I consider to be the baseline of what a genealogy database should cover:

  1. Unlimited number of records per person.
  2. Unlimited number of source citations per record.
  3. Unlimited space for textual input.
  4. Separate and user-maintainable structured source register.
  5. Separate and user-maintainable structured place register.
  6. Unlimited number of record types (also user-defined).
  7. Unlimited number of source types (also user-defined).

Besides, the developer Bob Velke is an extraordinarily sympatic fellow. He has written a description of the data model (well, actually it's about the file structure, but it makes the underlying data model very transparent) that is publicly available as a zipped RTF file: tmgfstr.zip. It describes the structure of version 3.6. As of early 2005, the current version is at 6.0, but the basic structure hasn't changed very much.

My main gripe with TMG is that it's a Windows application, based on FoxPro. I've been running Linux for some time now, and want a genealogy app that will run on that platform. There are a couple of decent Linux genealogy programs, Gramps and LifeLines. However, they are not comparable to TMG, not by a long shot. For a couple of years now, I've been "doing my genealogy" running TMG version 4.0d on my ancient 266MHz laptop with Windows 2000 Pro on it. It's not an entirely satisfying solution to say the least, and thus I'm seriously considering writing my own application. At first, it would be a fairly raw porting of the TMG database, table for table. I've already written a Perl script that makes a SQL dump of vital parts of the database for my Web presentation software. That presentation is written in PHP4, with MySQL as the database engine.

With the power of present day computers, there is no reason to go for anything less than a real RDBMS database engine if you consider building a new genealogy database application from scratch. The thing I have in mind, is a business-class three-tiered application:

  1. In the bottom, a full-blown RDBMS. Preferably the 800-pound gorilla of open source database engines, the PostgreSQL. Its stored procedures makes it very attractive to implement most of the data-manipulation logic directly at the database level.
  2. A middleware layer, mostly written in Python, taking care of whatever program logic that can't be handled by the database, up to and including input validation. Every data entry and retrieval should be communicated from the client to this layer as a named function with a list of key/value parameters -- a "dictionary" in the Python lingo. It should probably be thought of as an API. Over time, I imagine that a lot of plugins will crawl into this layer, developing a kind of "Artificial Intelligence" that might become almost like a personal assistant doing all the menial work.
  3. One or more rather light-weight frontends, such as a pure browser interface, a [insert your favourite OS here] desktop GUI, as well as a textual command-line interface.

With this kind of architecture, the three main layers of the application might reside on three different computers, as is normal with business database applications. It goes almost without saying that it should be able to handle multi-user projects, having a central database server accessible via the Internet. But it might as well run on a single-user system.

The Data Model

Names, Relations, Identities

The fundamental element of a genealogy data model should probably be the source fragment: Any written or oral fragment from one source describing an event with a unity in space and time, having some genealogical significance for one or more persons explicitly or implicitly mentioned in the fragment. It may be just a few words, such as the terse description of a burial in a parish record, or several pages long, such as a probate. However, any single source fragment, such as the entry of a christening in a parish record, may be the basis of any number of corollaries in the way they must be registered in a genealogy database. (An advanced application should perhaps be able to record these corollaries in a semi-automatic way directly from a transcript of the source document.) For instance, an event concerning the actual christening, may be registered for the child who is the primary person of this source document. In addition, we may be able to register relations to a father and a mother for this child. We may also register a name for this child, as well as for the father, the mother, and any other persons who are mentioned in the same document. Last, but not least, the child's sex should be recorded.

The four terms highlighted in the previous paragraph, - event, relation, name, and sex, - are fundamentally different, and must be treated in different ways when recorded in a database.

An event is something that has a location in time and space. Thus, an event record may contain references to a date and a place. However, due to the nature of the matter, it is important that the model is able to handle ambiguous and incomplete data in a flexible way, including insufficient information about time and place.

The Identity Problem

Anyone with a genuine interest in genealogy data modelling, should study the GENTECH Genealogical Data Model. It's an excellent piece of work. I certainly don't think it's "The last word in genealogy data modelling", but it raises some very relevant points about the research process itself, and the all-important difference between evidence and conclusions.

There is an underlying - and, in my view, rather naïve - assumption in current genealogy software, taking for granted that there exists a real person behind what we find in the sources, by binding all events, names, relations, and attributes to a single person key.

We should always bear in mind that in historical research, we're not actually dealing with physical persons, but with documents. A John Smith who is christened in 1733 is not necessarily the same John Smith that marries Jane Doe in the same parish in 1758. Nevertheless, when we want to enter these two disparate facts, current genealogy software forces us to either register two different "persons", or join the two source fragments to the same person ID. Thus, the software forces us to reach a conclusion about identity even before we can enter our evidence into the database. In my opinion, that is putting the cart before the horse. An advanced genealogy database should build a person as a kind of tree structure, deriving a "super-person" from several "sub-persons".

For now, I'll let the actual modelling of such a structure remain unresolved. There's also the problem that if you implement a model that differs radically from your existing database, most of your previous work won't fit in the new model, and you may spend a massive amount of time on transformation and cleanup of your old data. That, of course, is not necessarily a bad thing. However, the project that's taking priority, is to migrate my database from TMG to a home-grown application. The first incarnation has to duplicate most of TMG's basic internal layout and functionality.

The tmgdump script

As it's pretty self-documenting, and also contains my own main effort so far of documenting the transition from TMG to a SQL-based data format, I'll publish my tmgdump.pl file here. I'm currently working on a deeper transformation of the data to a PostgreSQL database, but the foundations still lie in this script:

#! /usr/bin/perl

# tmgdump.pl
# stable version 0.6, 2004.07.30
#
# 2002-2004 Leif B. Kristensen (leif at solumslekt dot org)
#
# Thanks to:
# Bob Velke for his positive attitude towards "third-party development"
# John Cardinal for some very helpful hints regarding some of the more
# esoteric ins and outs of the TMG database
# Jan Pazdziora for the invaluable XBase module
#
# *** Standard GPL Legal Stuff ***
#
# This program is free software; you can redistribute it and/or modify it under
# the terms of the GNU General Public License as published by the Free Software
# Foundation; either version 2 of the License, or (at your option) any later
# version.
#
# This program is distributed in the hope that it will be useful, but WITHOUT
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
# FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
# details.
#
# You should have received a copy of the GNU General Public License along with
# this program; if not, write to the
# Free Software Foundation, Inc.,
# 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
#
# *** End of GPL Legal Stuff ***
#
# The Purpose: convert TMG (4.0d) FoxPro dataset to MySQL source files.
#
# The resulting SQL command files are not intended for a production database,
# but as a presentation-only database on a Web server running MySQL and PHP.
# However, this code is under continuous development, and some parts of it may
# actually be intended for a production environment.
#
# The following tables are dumped by this program:
# $.dbf - the person table                person
# f.dbf - the relationship table          relation
# g.dbf, fpt - the event table            event
# n.dbf, fpt - the name table             name
# p.dbf, fpt - the place table            place
# s.dbf, fpt - the citation table         citation
# t.dbf - the event type table            event_type
#
# In addition, the major source table msource (*m.dbf, *m.fpt) must be maintained
# manually. Due to both its complexity and its fairly static nature, I have not
# implemented an automatic conversion routine. A TMG List of All Sources
# (Bibliographic) written to an ASCII comma delimited file is an excellent starting
# point for this table. See the example file msource.sql for further documentation.
#
# A viable, permanent approach might be to store the literal sources in the Memo
# field of the "M" table, which could be extracted automatically.
#
# DBD::XBase.pm by Jan Pazdziora
# http://search.cpan.org/author/JANPAZ/DBD-XBase-0.231/lib/XBase.pm
#
################################################################################

use DBD::XBase;
use Text::ParseWords;

# create header info for the sql files
# adapted from Perl Cookbook p. 73

($day, $month, $year) = (localtime)[3,4,5];
$header_msg = sprintf("MySQL source file created by tmgdump.pl on %04d-%02d-%02d\n",
                        $year+1900, $month+1, $day);

# set $tracking to 1 if you want verbose output.

$tracking=0;

################################################################################
#
# Functions
#
################################################################################

sub date_cnvt {
################################################################################
# A "date" in TMG is a composite of a qualifier and two dates.
# The qualifier can take the following values:
#     0=before (date1),
#     1=say (date1),
#     2=ca. (date1),
#     3=exact (date1),
#     4=after (date1),
#     5=between (date1) and (date2),
#     6=(date1) or (date2),
#     7=from (date1) to (date2)
#
# The date part is on the form YYYYMMDD, and thus quite SQL-compatible.
#
# In this sub, I convert dates from the proprietary date format of TMG to a
# standard SQL format, with a one byte 'qualifier' field and two date fields.
# Note that free-format or irregular dates are not supported. Change this to
# your liking.
################################################################################
    my $str = shift;
    if (!$str) {                        # date is blank
        $str = "3\',\'00000000\',\'00000000";
        return $str;
    }
    if ($str =~ /^0/) {                    # date is irregular
        print "Date $str is irregular.\nPlease substitute with a valid date (YYYYMMDD): ";
        chomp($dato = <STDIN>);
        $str = sprintf ("3\',\'%s\',\'00000000", $dato);
    }
    else {
        $str =~ s/\d(\d{8})\d(\d)(\d{8})\d{2}/$2\',\'$1\',\'$3/g;
    }
    return $str;
}

sub sort_date_cnvt {
################################################################################
# Here, the TMG date construct is replaced with a single, regular date field.
# If the sort date is not a *regular date*, this routine will break!
################################################################################
    my $str = shift;
    if (!$str) {                        # date is blank
        $str = "00000000";
        return $str;
    }
    else {
        $str =~ s/\d(\d{8})\d(\d)(\d{8})\d{2}/$1/g;
    }
    return $str;
}

sub cleanup {
    my $str = shift;
    $str =~ s/(['"])/\\$1/g;            # escape single and double quotes
    $str =~ s/\r\n/<br \/>/g;           # replace newlines with xhtml line breaks
    $str =~ s/^\[:CR:\]\[:CR:\]//g;     # strip line breaks at start of paragraph
    $str =~ s/\[:CR:\]/<br \/>/g;       # replace other line break codes
    $str =~ s/\[:NB:\]/&nbsp;/g;        # replace non-breaking space code
    $str =~ s/\[ITAL:\]/<em>/g;         # replace italic start tag
    $str =~ s/\[:ITAL\]/<\/em>/g;       # replace italic end tag
    $str =~ s/\[BOLD:\]/<strong>/g;     # replace bold start tag
    $str =~ s/\[:BOLD\]/<\/strong>/g;   # replace bold end tag
    $str =~ s/\{\[HTML:\]//g;           # strip {[HTML:] start tag
    $str =~ s/\[:HTML\]\}//g;           # strip [:HTML]} end tag
    $str =~ s/\[:TAB:\]//g;             # strip [:TAB:] tag
    $str =~ s/\[HID:\].+\[:HID\]//g;    # strip code delimited hidden text
    $str =~ s/^-.+//g;                  # strip anything starting with a minus
    $str =~ s/\{.+\}//g;                # strip curly braces delimited sensitive text
    $str =~ s/(\[L=ENGLISH\]).*(\[L=NORWEGIAN3\])/$2/g; # strip English sentence
    $str =~ s/\[R=/€\[R=/g;             # prepare role sentences for PHP explode()
    return $str;
}

sub strip_e { # strip witness memo for all but my custom sentence
    my $str = shift;
    $str =~ s/(['"])/\\$1/g;            # escape single and double quotes
    $str =~ s/(\[L=ENGLISH\]).*(\[L=NORWEGIAN3\])/$2/g; # strip English sentence
    return $str;
}

sub nor_phon {
############################################################################
# Norwegian fonetix
# Don't forget to synchronize changes in this sub with norphon.pl!
# As the Soundex algorithm is useless for Norwegian 18th century spelling,
# I've had to implement my own routine for phonetic search. However, there's
# still room for improvements here.
############################################################################
    my $str = shift;

    $str =~ s/\s.+//g;                    # strip compound names

    $str = uc($str);                    # convert to uppercase
    $str =~ s/æ/Æ/g;
    $str =~ s/ø/Ø/g;
    $str =~ s/å/Å/g;

    $str =~ s/SDATTER/S/g;              # Trim patronyms
    $str =~ s/SEN$/S/g;                 #  -----"-----
    $str =~ s/AA/O/g;                   # aa, å -> o
    $str =~ s/Å/O/g;
    $str =~ s/Æ/E/g;                    # æ -> e
    $str =~ s/Y/I/g;                    # y -> i

    $str =~ s/PH/F/g;                   # Stephen -> Stefen ( --> STFN)
    $str =~ s/FF/F/g;                   # Steffen -> Stefen ( --> STFN)
    $str =~ s/^IS/JS/g;                 # Isach --> JSAK
    $str =~ s/^SIR/SIGR/g;              # Siri <=> Sigri (!= Sigurd)
    $str =~ s/^SIVER/SJR/g;             # Siver <=> Sjur
    $str =~ s/^SIVERT/SJR/g;            # Sivert <=> Sjur
    $str =~ s/^SIGUR/SJR/g;             # Sigur <=> Sjur
    $str =~ s/HN/N/g;                   # John --> JON
    $str =~ s/TH/T/g;                   # Thor --> TOR
    $str =~ s/CHI/KJ/g;                 # Torchild --> TORKJL
    $str =~ s/CHE/KJ/g;                 # Michel --> MKJL
    $str =~ s/KKE/KJ/g;                 # Mikkel --> MKJL
    $str =~ s/KIEL/KJL/g;               # Kield -> Kjeld ( --> KJL)
    $str =~ s/CH/K/g;                   # Christen -> Kristen ( --> KRSTN)
    $str =~ s/CA/KA/g;                  # Carl -> Karl ( --> KAL)
    $str =~ s/RL/L/g;                   # Thorleif <=> Tollef <=> Tolf ( --> TLF)
    $str =~ s/W/V/g;                    # Wilhelm -> Vilhelm
    $str =~ s/V$/F/g;                   # Ellev <=> Ellef ( --> LF)
    $str =~ s/VS/FS/g;                  # same
    $str =~ s/CE/SE/g;                  # Cesilie -> Secilie ( --> SSL)
    $str =~ s/CI/SI/g;                  # Secilie -> Sesilie ( --> SSL)
    $str =~ s/CO/KO/g;                  # Cornelius -> Kornelius ( --> KORNL)
    $str =~ s/CU/KU/g;                  # Curt -> Kurt ( --> KRT)
    $str =~ s/C/K/g;                    # Isac -> Isak ( --> SAK)
    $str =~ s/GI/J/g;                   # Giert -> Jert ( --> JRT)
    $str =~ s/GJ/J/g;                   # Gjert -> Jert ( --> JRT)
    $str =~ s/GAR/GR/g;                 # Gregar <=> Greger, Margareta <=> Margrete
    $str =~ s/AHA/A/g;                  # Abraham --> ABRAM
    $str =~ s/ELBR/BR/g;                # Engelbret -> Engebret ( --> NGBRT)
    $str =~ s/GT/T/g;                   # Ingebrigt -> Ingebrit ( --> NGBRT)
    $str =~ s/IG/IK/g;                  # Tyge <=> Tyke ( --> TK)
    $str =~ s/DL/L/g;                   # Gudlaug <=> Gullaug
    $str =~ s/DV/LV/g;                  # Hedvig <=> Helvig
    $str =~ s/MOG/MO/g;                 # Mogens -> Mons
    $str =~ s/^GE/JE/g;                 # Gert -> Jert,
    $str =~ s/LGE/LJE/g;                # Helge -> Helje
    $str =~ s/RST/ST/g;                 # Kirsten -> Kisten ( --> KST)
    $str =~ s/RD/R/g;                   # Tord -> Tor ( --> TOR)
    $str =~ s/DT/T/g;                   # Ovedt -> Ovet ( --> OVT)
    $str =~ s/Z/S/g;                    # Zakarias -> Sakarias ( --> SAKAR)
    $str =~ s/LH/L/g;                   # Vilhelm -> Vilelm ( --> VLLM)
    $str =~ s/LM/M/g;                   # Vilelm -> Vilem ( --> VLLM)
    $str =~ s/LL/L/g;                   # Villum ->Vilum ( --> VLM)
    $str =~ s/IDS/IS/g;                 # Sidsel -> Sisel ( --> SSL), men
    $str =~ s/ADS/ATS/g;                # Mads -> Mats ( --> MATS)
    $str =~ s/SS/S/g;                   # Sissel -> Sisel ( --> SSL)
    $str =~ s/LD/L/g;                   # Aasold -> Aasol ( --> OSL)
    $str =~ s/SOL/SL/g;                 # Åsold --> OSL
    $str =~ s/ULF/L/g;                  # Åsulf --> OSL
    $str =~ s/BIØ/BJØ/g;                # Biøn -> Bjøn ( --> BJN)
    $str =~ s/ØRN/ØN/g;                 # Bjørn -> Bjøn ( --> BJN)
    $str =~ s/ND/N/g;                   # Anders -> Aners ( --> ANRS)
    $str =~ s/EA//g;                    # Andreas --> ANRS
    $str =~ s/NN/N/g;                   # Anne -> Ane ( --> AN)
    $str =~ s/ORGI/ORJ/g;               # Torgius -> Torjus ( --> TORJS)
    $str =~ s/ORI/ORJ/g;                # Torius -> Torjus ( --> TORJS)
    $str =~ s/IUS//g;                   # Julius -> Jul ( --> JOL)
    $str =~ s/IE/I/g;                   # Kiesten -> Kisten ( --> KST)
    $str =~ s/IA/I/g;                   # Mat(th)ias --> MATS, Maria --> MAR
    $str =~ s/KIT/KJT/g;                # Kittil <=> Kjetil
    $str =~ s/ANB/AMB/g;                # Anborg -> Amborg
    $str =~ s/RG$/R/g;                  # Amborg -> Ambor
    $str =~ s/RGS$/RS/g;                # Amborg -> Ambor
    $str =~ s/BOR/BR/g;                 # Ambor --> AMBR, Ingebor --> NGBR
    $str =~ s/NHI/NI/g;                 # Gunhil -> Gunil ( --> GNL)
    $str =~ s/KAREN/KAR/g;              # Karen --> KAR
    $str =~ s/MAREN/MAR/g;              # Maren --> MAR
    $str =~ s/K[IE]STEN/KST/g;          # Kisten, K(i)ersten, Kirsti etc --> KST
    $str =~ s/LISAB/LSB/g;              # Elisabet <=> Lisbet --> LSBT
    $str =~ s/SIMON/SMN/g;              # Simen, Simon --> SMN
    $str =~ s/GUN[AE]R/GNR/g;           # Gunder, Gunnar --> GNR
    $str =~ s/^AUG/OK/g;                # Augon --> OKN
    $str =~ s/O[GK][EO]N/OKN/g;         # Augon, Ougen --> OKN
    $str =~ s/[AO]UEN/OKN/g;            # Auen, Ouen --> OKN
    $str =~ s/OVI/V/g;                  # Lovise --> LVS
    $str =~ s/OUI/V/g;                  # Louise --> LVS
    $str =~ s/L[AOU][FV]/L/g;           # Olav, Olaf, Olof, Oluf --> OL
    $str =~ s/KNUD/KNT/g;               # Knud, Knut --> KNT
    $str =~ s/TT/T/g;                   # Petter -> Peter
    $str =~ s/PE[DT]ER/PR/g;            # Petter, Peter, Peder, Per --> PR
    $str =~ s/ØFR/ØR/g;                 # Søfren -> Søren ( --> SRN)
    $str =~ s/MM/M/g;                   # Tommes -> Tomes ( --> TOMS)
    $str =~ s/TOMAS/TOMS/g;             # Tomas --> TOMS
    $str =~ s/AUL/OL/g;                 # Paul --> POL
    $str =~ s/OUL/OL/g;                 # Poul --> POL
    $str =~ s/POVEL/POL/g;              # Povel --> POL
    $str =~ s/RGR/R/g;                  # Margrete <-> Marte
    $str =~ s/RJT/RT/g;                 # Birgitte, Bergit <-> Berte ( --> BRT)
    $str =~ s/KK/K/g;                   # Michel <-> Mikkel ( --> MKJL)
    $str =~ s/NUN/NON/g;                # Anund --> ANON
    $str =~ s/MUN/MON/g;                # Amun --> AMON
    $str =~ s/[AO]NON/ANON/g;           # Amund, Aamund, Aanund, Ommund --> ANON
    $str =~ s/BARBARA/BARBR/g;          # Barbara --> BARBR
    $str =~ s/BARBRO/BARBR/g;           # Barbro --> BARBR


    $str =~ s/([EIUØ])//g;              # strip e, i, u, ø (but keep a, o)

    $str =~ s/LL/L/g;                   # Ledvor <=> Levor (LELVOR -> LLVOR -> LVOR)
    $str =~ s/KSTOL/KSTL/g;             # Kiøstold <=> Kiøstel ( --> KSTL)
    $str =~ s/KJST/KST/g;               # Kjøstol --> KSTL, Kjersti --> KST
    $str =~ s/MOKNS/MONS/g;             # Mogens <=> Mons
    $str =~ s/BRNJL/BRNL/g;             # Brynjul <=> Brynild
    $str =~ s/RTS/RS/g;                 # La(u)r(i)ts --> LARS
    $str =~ s/^AN$/ANNE/g;              # Anne
    $str =~ s/^ANA$/ANNE/g;             # Anne

    $str =~ s/D$//g;                    # Strip D at end of words
    $str =~ s/A$//g;                    # Strip A at end of words

    return $str;
}

sub name_cnvt {
################################################################################
# TMG name structure:
# Name parts are contained in one string delimited with $!& in this order:
# Surname (0), Prefix (1), Given (2), Suffix (3)
#
# This division is a little awkward regarding old Norwegian naming practice, and
# I have used the Surname field mainly for recording patronyms, and the Suffix
# field for toponyms (ie. "farm names").
################################################################################

    my $str = shift;
    $str =~ s/\$!&//g;                # replace TMG field delimiter $!& with €
    $str =~ s/(['"])/\\$1/g;           # escape single and double quotes
    my @name = parse_line("",1,$str); # split name block into array values
    return sprintf("%s\',\'%s\',\'%s\',\'%s\',\'%s\',\'%s",
                    $name[1], $name[2], nor_phon($name[2]), $name[0],
                        nor_phon($name[0]), $name[3]);
}

sub place_cnvt {
    # In TMG, place names are contained within a single memo field which is
    # delimited in the same way as person names.
    my $str = shift;
    $str =~ s/\$!&//g;                # replace TMG field delimiter $!& with €
    $str =~ s/(['"])/\\$1/g;           # escape single and double quotes
    my @name = parse_line("",1,$str); # split place block into array values
    return sprintf("%s\',\'%s\',\'%s\',\'%s\',\'%s",
                    $name[1], $name[2], $name[3], $name[4], $name[5]);
}

sub bool_cnvt {
    # This may be a less than optimal way to represent boolean values in a SQL
    # database, however it is chosen for clarity and consistence with the flags.
    my $str = shift;
    if ($str == '0') {
        return 'F';
    }
    else {
        return 'T';
    }
}

################################################################################
#
# Main program starts here
#
################################################################################

# Replace SLEKTA_ with the name of your dataset, ie. the first 7 chars of the
# TMG table names. Alternately, you may provide the name as an argument on the
# command line: tmgdump.pl ${TMGDATADIR}/SLEKTA_

if (!$ARGV[0]) {
    $dataset = 'SLEKTA_';
    print "Assuming $dataset as name of dataset.\n";
}
else {
    $dataset = $ARGV[0];
}

################################################################################
#
# processing $.dbf
# (person table)
#
# ------------------------------------------------------------------------------
# This table holds "conclusional" data about each person. Its primary function
# is to provide a lookup table for eg. quick generation of pedigrees. Apart from
# the 'sex' and 'living' flags, data are never entered directly into the person
# table. The father_id and mother_id are derived from the respective _primary_
# relationships entered in the relation table. Likewise, the birth and death
# dates are derived from the _primary_ birth and death events of the event table.
#
# A more mature data model should probably get rid of this table altogether.
# It's really a legacy from lineage-linked programs, and doesn't belong in a
# purely event-based data model. Because of this, TMG may be considered a hybrid
# between the two.
#
# Here is perhaps the place to mention that the structure of a FoxPro table may
# be inspected with the utility dbfdump that accompanies the DBD::XBase module.
# Just run the command dbfdump --info <tablename>. Enjoy!
#
# The following is the official description of the '$' table from TMGFSTR.RTF,
# although I have replaced the descriptions for the unused fields with the legend
# (Not used). Confer to the original doc for description of these fields.
#
# ------------------------------------------------------------------------------
# Layout of Person File (_.DBF or $.DBF)
#
# "The record length of this file will vary with the addition of user-defined
# flags." (Implying that creation of user-defined flags will need to do an
# _alter_table_ routine. This is usually considered bad practice, and should
# probably be regarded as a flaw in the data model. LBK.)
#
# Field  Field Name  Type       Width
#    1  PER_NO      Numeric       10    Unique ID Number - Assigned sequentially
#                                        by the system
#    2  FATHER      Numeric       10    ID Number of Primary Father
#    3  FATHSURE    Character      1    (Not used)
#    4  MOTHER      Numeric       10    ID Number of Primary Mother
#    5  MOTHSURE    Character      1    (Not used)
#    6  LAST_EDIT   Date           8    Date this person was last edited
#    7  REFERENCE   Character     12    (Not used)
#    8  SPOULAST    Character      3    (Not used)
#    9  SCBUFF      Character     10    (Not used)
#   10  PBIRTH      Character     30    Primary birth date
#   11  PDEATH      Character     30    Primary death date
#   12  SEX         Character      1    Sex    (?/M/F)
#   13  LIVING      Character      1    Living (?/Y/N)
#   14  BIRTHORDER  Character      2    (Not used)
#   15  MULTIBIRTH  Character      1    (Not used)
#   16  ADOPTED     Character      1    (Not used)
#   17  ANCE_INT    Character      1    (Not used)
#   18  DESC_INT    Character      1    (Not used)
#   19+ (FLAGNAME)  Character      1    (Not used)
#
################################################################################

$dbf_name = $dataset.'$.DBF';
$sql_name = "person";

my $table = new XBase $dbf_name or die XBase->errstr;

open (OUTFILE, ">", "$sql_name.sql");

print OUTFILE <<EndOfText;
# $sql_name.sql
# $header_msg

DROP TABLE if exists $sql_name;
CREATE TABLE $sql_name (
    person_id mediumint unsigned not null auto_increment,
    father_id mediumint unsigned default '0',
    mother_id mediumint unsigned default '0',
    last_edit date not null,
    pb_qual tinyint unsigned default '3',
    pb_date1 date not null,
    pb_date2 date not null,
    pd_qual tinyint unsigned default '3',
    pd_date1 date not null,
    pd_date2 date not null,
    sex ENUM('?','M','F') not null,
    living ENUM('?','Y','N') not null,
    PRIMARY KEY(person_id)
) TYPE=InnoDB;


EndOfText

print "Processing $dbf_name ... ";
if ($tracking) {
    print ("\n");
}
my $cursor = $table->prepare_select("PER_NO","FATHER","MOTHER","LAST_EDIT","PBIRTH","PDEATH","SEX","LIVING");
while (my @record = $cursor->fetch) {
    if ($tracking) {
        printf ("Reading record %d\r", $cursor->last_fetched);
    }
    print OUTFILE "insert into $sql_name values (";
    for ($i=0;