<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss'><id>tag:blogger.com,1999:blog-12028427</id><updated>2009-02-20T20:34:41.653-06:00</updated><title type='text'>Tincat Mewsings</title><subtitle type='html'>Mewsings about software development, with a focus on data modeling.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://www.tincat-group.com/mewsings/atom.xml'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default'/><link rel='alternate' type='text/html' href='http://www.tincat-group.com/mewsings/'/><author><name>--dawn</name><email>noreply@blogger.com</email></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>20</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-12028427.post-7467807094717973418</id><published>2009-01-09T09:34:00.005-06:00</published><updated>2009-01-09T09:59:03.822-06:00</updated><title type='text'>New Year, New Blog</title><content type='html'>As you might have noticed, I did not continue my 2006 blog much into 2007. I have a draft of a new blog entry from back then, entitled "Property Values Flourish" about modeling data with multi-valued attributes. Alas, I did not complete it, attended to other work and started organizing a new company later that year.

I have started to blog about my current project as it unfolds this year. The first entry of the &lt;a href="http://snupnow.wordpress.com/"&gt;Persistence Matters blog&lt;/a&gt; is &lt;a href="http://snupnow.wordpress.com/2009/01/01/"&gt;here&lt;/a&gt;. Some of you might find it worthwhile and fun to follow this project. 

Thank You for your readership of this Tincat Mewsings blog. I was amazed how many people from how many countries visited. I just checked and it says there have been 17,504 "absolutely unique visitors" according to google anayltics. My goal original goal was that 100 people read my Is Codd Dead? posting. Of course, my other goal was that I help to change the industry. I doubt whether my one small voice made any difference at all, but I do feel confident that enough of the industry is invested in working with persisting logical lists, for example, that my voice is not needed.

I don't think the momentum has carried into college database classrooms extensively to date, however. If you are a college professor, teaching database courses, it might be fun ("fun" being a relative term) to add entries from the Is Codd Dead? blog series to the reading that students will do. Even if you disagree with any of my points, it will provide balance to the typical diet of only relations, no lists, for breakfast, lunch, and dinner.

Have a fantastic 2009!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/12028427-7467807094717973418?l=www.tincat-group.com%2Fmewsings%2Findex.html'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/7467807094717973418/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=12028427&amp;postID=7467807094717973418&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/7467807094717973418'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/7467807094717973418'/><link rel='alternate' type='text/html' href='http://www.tincat-group.com/mewsings/2009/01/new-year-new-blog.html' title='New Year, New Blog'/><author><name>--dawn</name><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09089606608203203470'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-12028427.post-2495295705068370271</id><published>2007-01-02T12:05:00.000-06:00</published><updated>2007-01-14T20:33:26.160-06:00</updated><title type='text'>OTLT: Metadata Piece Not Apartheid</title><content type='html'>&lt;div class="insetpic"&gt;
&lt;a href="#" onMouseOver="document.worms.src='http://www.tincat-group.com/images/live-bait.gif';" onMouseOut="document.worms.src='http://www.tincat-group.com/images/vended-bait.gif';"&gt; &lt;img src="http://www.tincat-group.com/images/vended-bait.gif" alt="Can of Worms" title="Can of Worms Vending" name="worms" /&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;
Yes, I am going to reopen a can of worms.  I first saw the acronym &lt;em&gt;OTLT&lt;/em&gt;, for &lt;em&gt;One True Lookup Table&lt;/em&gt;, when I read &lt;a href="http://www.dbazine.com/ofinterest/oi-articles/celko22"&gt;this Celko article&lt;/a&gt;.  The article pretty much made sense to me then, as it does now.  I would not recommend the use of &lt;acronym title="One True Lookup Table"&gt;OTLT&lt;/acronym&gt; to anyone wishing to use an &lt;acronym title="Structure Query Language DataBase Management System"&gt;SQL-DBMS&lt;/acronym&gt; properly as a &lt;dfn title="Yes, I know there are no pure relational databases"&gt;relational database&lt;/dfn&gt;. As a few of you might guess, I will suggest that some might be well-served using something other than a proper &lt;acronym title="Structured Query Language DataBase Management System"&gt;SQL-DBMS&lt;/acronym&gt; or using an &lt;acronym title="Structured Query Language DataBase Management System"&gt;SQL-DBMS&lt;/acronym&gt; other than properly.   
&lt;/p&gt;
&lt;p&gt;
Celko writes "I am going to venture a guess that this idea came from OO programmers" in the above mentioned article.  However, &lt;acronym title="One True Lookup Table"&gt;OTLT&lt;/acronym&gt; (or &lt;acronym title="One True Lookup File"&gt;OTLF&lt;/acronym&gt;) is a design pattern I encountered before I ever heard of &lt;acronym title="Object-Oriented Programming"&gt;OOP&lt;/acronym&gt;, or design patterns for that matter.  It even predates our use of the term &lt;em&gt;table&lt;/em&gt; in database design, and might be called a code file or validation file, for example.  It is at the point that this design pattern was brought into the world of relational tables where it switched from being a pattern to an &lt;a href="http://en.wikipedia.org/wiki/Anti-pattern"&gt;anti-pattern&lt;/a&gt;.  
&lt;/p&gt;
&lt;p&gt;
&lt;dfn title="And likely even to this day, truth be told"&gt;Many moons ago&lt;/dfn&gt;, this code file pattern was employed with implementations using indexed sequential files, e.g. &lt;dfn title="Reminiscing about Pr1mes is FUTIL, right?"&gt;MIDAS files on Primos&lt;/dfn&gt; or &lt;acronym title="Virtual Sequential Access Method or Virtual Storage Access Method"&gt;VSAM&lt;/acronym&gt; files on &lt;acronym title="Multiple Virtual Storage or Multiple Virtual System"&gt;MVS&lt;/acronym&gt; on an &lt;acronym title="International Business Machines"&gt;IBM&lt;/acronym&gt; 3081, and I'm pretty sure it was around well before my encounters with it.
&lt;/p&gt;
&lt;p&gt;
In case you are not familiar with this pattern, you can see typical values in the picture below.  It represents a simple &lt;dfn title="All functions are relations, by definition"&gt;function&lt;/dfn&gt;--put a multipart key of code_type and code_value into the vending machine to retrieve the code_description.  These attribute names, as provided by Celko in this article, are pretty much what I recall, with the exception of &lt;dfn title="Once upon a time there were caps-only terminals"&gt;all caps&lt;/dfn&gt; and dashes in place of underscores.
&lt;/p&gt;
&lt;div class="insetpic"&gt;
&lt;img src="http://www.tincat-group.com/images/OTLT3.gif" alt="OTLT" title="OTLT" /&gt;
&lt;/div&gt;
&lt;p&gt;
Once upon a time, data were not &lt;dfn title="Where every entity has the same standing, as if playing the same role, in the data model"&gt;democratized&lt;/dfn&gt;, and we could treat some differently than others within the data model.  Now we talk about attributes having types, such as varchar or int, while tables are all seen as somewhat of the same type, i.e. Relation.  [Yes, these Relations are of type Customer, Order, Person, or whatever, but I only state that so you do not wander off-course on that yourself.]  Our profession once spoke of file types of a different ilk.  Back when the database was perceived as a set of &lt;a href="http://en.wikipedia.org/wiki/Computer_file"&gt;files&lt;/a&gt;, rather than &lt;a href="http://en.wikipedia.org/wiki/Table_%28database%29"&gt;tables&lt;/a&gt;, these files were identified with various &lt;em&gt;type&lt;/em&gt; designations.  There might be files of type:
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;master (named with nouns corresponding to strong entities)
&lt;/li&gt;&lt;li&gt;
transaction (events)
&lt;/li&gt;&lt;li&gt;
log
&lt;/li&gt;&lt;li&gt;
history
&lt;/li&gt;&lt;li&gt;
parm
&lt;/li&gt;&lt;li&gt;
control
&lt;/li&gt;&lt;li&gt;
code (we will focus on this one)
&lt;/li&gt;&lt;li&gt;
&lt;dfn title="and so forth, in Dutch"&gt;enzovoort&lt;/dfn&gt;.  
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
We will zero in on the &lt;em&gt;code files&lt;/em&gt;.  Historically, some systems used separateness, or &lt;em&gt;apartheid&lt;/em&gt; in Dutch, for each type of code.  Other systems split out the really large or more complex code files as separate files but kept the rest of the codes in a single piece, pouring the data from all other potential code files into one big validation file.  One rule of thumb that I recall was that if there were more than a thousand entries, we were more likely to put such codes in their own file (so as not to adversely affect the performance of all applications doing lookups).  
&lt;/p&gt;
&lt;div class="insetpic"&gt;
&lt;img src="http://www.tincat-group.com/images/nosql.gif" alt="No SQL" title="No SQL, thanks Tom DeL" /&gt;
&lt;/div&gt;
&lt;p&gt;
Let's look at some of the objections to using this pattern with an &lt;acronym title="Structured Query Language DataBase Management System"&gt;SQL-DBMS&lt;/acronym&gt; to see how these play out in the case where we are using another &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt; solution that is not strictly a &lt;acronym title="Structured Query Language DataBase Management System"&gt;SQL-DBMS&lt;/acronym&gt;, such as Pick (see &lt;a href="http://www.tincat-group.com/mewsings/2006/01/is-codd-dead.html"&gt;Is Codd Dead?&lt;/a&gt; for a list of Pick databases).  Near the end of Celko's article, he writes that it is a "data modeling principle that a well-designed table is a set of things of the same kind instead of a pile of unrelated items."  When pouring all of our code types, values, and descriptions together into one component, are we really talking about unrelated items?  They certainly exhibit a similar pattern. 
&lt;/p&gt;
&lt;p&gt;
Relational databases, or approximations thereof, split data into two distinct groupings often referred to as &lt;a href="http://en.wikipedia.org/wiki/Data"&gt;data&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Metadata#Relational_database_metadata"&gt;metadata&lt;/a&gt;.
So what are &lt;em&gt;code files&lt;/em&gt;--data or metadata?  These code files are not quite the same as typical user data.  A table such as MaritalCodes doesn't relate to a traditional business proposition or entity quite like the Customer table does.  With modeled propositions for a business, it is common for purposes of data quality, ease of data entry, compact representation for reports, and &lt;dfn title="Way back when, there was a more significant difference between storing 'NJ' compared to 'New Jersey' millions of times"&gt;storage efficiency&lt;/dfn&gt; to encode values with abbreviations, aka &lt;em&gt;codes&lt;/em&gt;, that stand for the original value within a proposition.  With the value encoded in a &lt;em&gt;code file&lt;/em&gt;, we can then look up the longer description when needed.  
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;Code files host data that is both data and metadata, or tween data.&lt;/blockquote&gt;
&lt;p&gt;
I'm guessing you understand why and how a &lt;em&gt;code file&lt;/em&gt; in the old model translates to a &lt;em&gt;lookup table&lt;/em&gt; in an &lt;acronym title="Structured Query Language DataBase Management System"&gt;SQL-DBMS&lt;/acronym&gt;.  Such lookup data, equating an encoded term with a description, is also validation data.  So, we might also call this a &lt;em&gt;validation table&lt;/em&gt;.  It is clearer now with &lt;acronym title="Graphical User Interface"&gt;GUI&lt;/acronym&gt;s that lookups for valid codes and data validation are two sides of the same coin.  The list of valid codes is often used to populate a &lt;acronym title="Graphical User Interface"&gt;GUI&lt;/acronym&gt; widget so there is no chance of any other values being entered by the user.  The values in these old code files might also compose a list for a check constraint.  Code files host data that is both data and metadata, or what we might term &lt;dfn title="serving as both data and metadata"&gt;tween data&lt;/dfn&gt;. 
&lt;/p&gt;
&lt;p&gt;
Unlike hard-coded check constraint values, our code files often have data entry forms that permit some of the code types to have the valid list of entries maintained by the end user.  They are part of the data integrity function of constraint handling.  They are metadata that users see, some of which serves as data too.  They are used for validation as well as representation, turning codes into descriptions when needed.  Such codes serve a purpose of standardizing attribute values, and indicating valid entries throughout a software system. 
&lt;/p&gt;
&lt;p&gt;
Systems that store metadata as data, which includes most software written to be implemented using multiple &lt;acronym title="DataBase Management Systems"&gt;DBMS&lt;/acronym&gt; solutions, might indicate that the marital_status attribute of the Person relation is validated using the Marital_Status code type in the Validation_Codes or One_True_Lookup table.  
&lt;/p&gt;
&lt;p&gt;
Other metadata that might be required to be housed as data includes attribute names and file names, for example, for building business rules.  &lt;dfn title="serving as both data and metadata"&gt;tween data&lt;/dfn&gt; is both &lt;em&gt;data&lt;/em&gt; and &lt;em&gt;metadata&lt;/em&gt; and it is &lt;strong&gt;everywhere&lt;/strong&gt;.  So, how should we model such &lt;dfn title="serving as both data and metadata"&gt;tween data&lt;/dfn&gt;?
&lt;/p&gt;
&lt;div class="showoff"&gt;
We might need to model enough &lt;dfn title="serving as both data and metadata"&gt;tween data&lt;/dfn&gt; to specify rules such as
&lt;p /&gt;
if (order_amount &gt; 1500) requires_authorization = "Yes"
&lt;/div&gt;
&lt;p&gt;
These files are part of the application system architecture.  Just as we do when designing user database &lt;dfn title="things"&gt;entities&lt;/dfn&gt;, when looking at the entities involved in developing and maintaining the application, some standard entities such as Attributes, Files (or maybe Tables or Relations), Rules, and Validations might make their way into every application.  These entities are not specific to the domain for this application and are required because this is a software application, not because this business is in the xyz vertical market.  
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;Tween data is both &lt;em&gt;data&lt;/em&gt; and &lt;em&gt;metadata&lt;/em&gt; and it is &lt;strong&gt;everywhere&lt;/strong&gt;&lt;/blockquote&gt;
&lt;p&gt;
How might one make a design decision about whether to pour all such codes together in &lt;acronym title="One True Lookup Table"&gt;OTLT&lt;/acronym&gt;, rather than separating them out?  Let's see, we want to have &lt;dfn title="Or is it the other way around ;-)"&gt;tight  cohesion and loose coupling&lt;/dfn&gt;, right?  We have the option of modeling a single proposition with a type of code, a value for that code and a description.  The implementation of this proposition can then be used for dropdowns and other GUI features as well as validation and represenation.  We can otherwise model a set of propositions that all sound remarkably similar with codes and their descriptions, elevating the type of code to the name of a Table rather than having it as an attribute in our Validation Table.  But after designing the n-th such table, my patience starts to &lt;acronym title="Don't Repeat Yourself"&gt;DRY&lt;/acronym&gt; up (see &lt;a href="http://en.wikipedia.org/wiki/Don't_repeat_yourself"&gt;the DRY wikipedia entry&lt;/a&gt;).
&lt;/p&gt;
&lt;p&gt;
Admittedly, while we then have our codes together, these codes relate to different things.  That will be a problem if we want to add an attribute for one of these types and not for another, while splitting them out is more problematic if we opt to do something consistently across the board with all of our codes and must repeat the procedure for each such table.  The former is typically mitigated by a) using tools where refactoring the database when requirements change is standard fare (that is, using an agile database, such as IBM U2, Revelation, Intersystems Cache', or OpenQM) or b) adding an attribute for further classification of a code, such as code_classifier.  Different values for the code_classifier can be used to control the logic performed.  
&lt;/p&gt;
&lt;p&gt;
Let's look at more of the objections to the use of &lt;acronym title="One True Lookup Table"&gt;OTLT&lt;/acronym&gt; with an assumption that we are using a MultiValue, aka Pick, DBMS.  Not coming from the world of 80-column card input (see picture in &lt;a href="http://www.tincat-group.com/mewsings/2006/07/to-whom-should-size-matter.html"&gt;To Whom Should Size Matter?&lt;/a&gt;), Pick handles variable length data really well.  In fact, neither types nor lengths are enforced by the &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt;, prompting me to initially conclude that it could not be considered a &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt;, an opinion I have reconsidered.  In any case, this objection, while relevant to &lt;acronym title="Structured Query Language DataBase Management System"&gt;SQL-DBMS&lt;/acronym&gt;'s is irrelevant in other &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt;'s, including Pick.
&lt;/p&gt;
&lt;p&gt;
Another objection is that software is capable of putting the wrong data in for the code_type, so that a state abbreviation might become a marital status, for example.  Well, as it turns out software can put a multitude of incorrect values into a database.  It has a job not to do that, and while &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt;-specified constraints might contribute to one kind of data quality, they certainly do not ensure data quality.  As an aside, I would argue that they have even contributed to poor quality software, forcing developers and end-users alike to play games to try to match values to type, even where ill-advised.  The bottom line on this one is that if your &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt; does not enforce types, there is no related problem with this pattern.  One of the code_type values could be Valid_Code_Types where the application that permits the addition of code_values for all code_types also permits new values here, with appropriate security, of course.
&lt;/p&gt;
&lt;p&gt;
I don't like discussing performance, but I definitely recognize that many-a software project has failed due to inadequate performance.  Celko brings up the fact that having to search through a larger table is less efficient than through a smaller one when finding a particular value or a set thereof.  As long as we do not make the file or table (whichever is being used) &lt;dfn title="this value has increased over the years"&gt;way too large&lt;/dfn&gt;, this file of validation codes is an obvious choice to have in cache throughout the run-time of a software application.  
&lt;/p&gt;
&lt;p&gt;
There is no doubt that there are tasks one can perform that are slower and some that are faster using one pattern or another, so obviously there is room for taking into consideration the type of activity that will be performed with codes in any given system.  Using Pick, there are even some who design the big code validation file to have one item (aka record) per code type, with &lt;dfn title="nested multi-attribute lists, like tables"&gt;associated multivalues&lt;/dfn&gt; for the valid values and descriptions.  One can then have one disk read to suck in the entire validation list for a particular attribute, such as the entire set of state codes and their descriptions.
&lt;/p&gt;
&lt;p&gt;
One of the features I like best about using one big code validation file is that you can write one maintenance &lt;dfn title="the component formerly known as a screen"&gt;form&lt;/dfn&gt; for this file and it handily covers all of these, even future validation code types.  Sure, you will also need a code file for what code files are permitted and you might want to apply some different validations for different types of codes (more common in theory than in practice, I suspect).
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;Companies...with a need for a lot of tween data might be well-served using SQL-DBMS tools as if they were file systems&lt;/blockquote&gt;
&lt;p&gt;Some might be convinced that when using something other than a traditional &lt;acronym title="Structured Query Language DataBase Management System"&gt;SQL-DBMS&lt;/acronym&gt;, it might sometimes make sense to model the &lt;dfn title="serving as both data and metadata"&gt;tween data&lt;/dfn&gt; in a way that acknowledges that it also serves as metadata.  But this could also be relevant to &lt;acronym title="Structured Query Language DataBase Management System"&gt;SQL-DBMS&lt;/acronym&gt; users.  I will end with a heresy (as if I haven't been spouting such all along, eh?). Companies and developers with a need for a lot of &lt;dfn title="serving as both data and metadata"&gt;tween data&lt;/dfn&gt; might be well-served using &lt;acronym title="Structured Query Language DataBase Management System"&gt;SQL-DBMS&lt;/acronym&gt; tools as if they were file systems, ignoring some of the relational dogma.  If you are minimizing the amount of metadata that you duplicate by putting it both in the &lt;acronym title="Structured Query Language DataBase Management System"&gt;SQL-DBMS&lt;/acronym&gt; metadata catalog and in the database itself by minimizing your use of &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; constraints and the like, you really can use the &lt;acronym title="One True Lookup Table"&gt;OTLT&lt;/acronym&gt; pattern.  Of course, you should be aware of the tradeoffs in doing so.  
&lt;/p&gt;
&lt;p&gt;
While I recognize this is not the weighty decision that Carter lays out in his controversial book &lt;em&gt;Palestine: Peace Not Apartheid&lt;/em&gt;, for any given software metadata design for code files, you simply need to decide whether to go with a single cohesive piece of &lt;dfn title="serving as both data and metadata"&gt;tween data&lt;/dfn&gt; or whether you favor relational &lt;dfn title="separateness in Dutch"&gt;apartheid&lt;/dfn&gt;.
&lt;/p&gt;
&lt;p id="seecomments"&gt;&lt;a href="http://www.tincat-group.com/mewsings/2007/01/otlt-metadata-piece-not-apartheid.html#seecomments"&gt;See comments.&lt;/a&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/12028427-2495295705068370271?l=www.tincat-group.com%2Fmewsings%2Findex.html'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/2495295705068370271/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=12028427&amp;postID=2495295705068370271&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/2495295705068370271'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/2495295705068370271'/><link rel='alternate' type='text/html' href='http://www.tincat-group.com/mewsings/2007/01/otlt-metadata-piece-not-apartheid.html' title='OTLT: Metadata Piece Not Apartheid'/><author><name>--dawn</name><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09089606608203203470'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-12028427.post-116285562150056859</id><published>2006-11-06T17:26:00.000-06:00</published><updated>2007-01-08T04:48:36.679-06:00</updated><title type='text'>Cowboys with Promiscuous Databases</title><content type='html'>&lt;div class="insetpic"&gt;
&lt;img src="http://www.tincat-group.com/images/database-cowboy.gif" alt="Database Cowboy"  title="Database Cowboy" /&gt;
&lt;/div&gt;
&lt;p&gt;
In Northwest Iowa we have lots of cows but no &lt;dfn title="I will use this as a gender-neutral designation"&gt;cowboys&lt;/dfn&gt;.  We have cattle farms that, as best I can tell, are much like &lt;dfn title="But pig farms smell much worse"&gt;hog farms&lt;/dfn&gt;.
&lt;/p&gt;
&lt;p&gt;
Somewhere &lt;dfn title="Yes, that would be South Dakota"&gt;between Iowa and Wyoming&lt;/dfn&gt;  the backdrop changes from cattle &lt;em&gt;farmers&lt;/em&gt; to cattle &lt;em&gt;ranchers&lt;/em&gt;.  We move from &lt;dfn title="Don't remind my youngest daughter of these"&gt;confinement lots&lt;/dfn&gt; to more open ranges.  Where the land is fertile, it is farmed.  The cows remain &lt;em&gt;rounded up&lt;/em&gt; and typically crowded together.  Farm land is planned out, designed, and structured.  Farmers designate areas for corn, soy, or cows.  They work the land, toiling over its makeup throughout the seasons. 
&lt;/p&gt;
&lt;p&gt;
I'm no expert, but it seems that ranches are left with a more natural order.  As you head towards the mountain regions, the rockier unfarmed land is available for cattle.  With less control on their movements, these cattle spread out a bit more. Cowboys round them up and prompt them to move as a group to another location when needed.
&lt;/p&gt;
&lt;div class="inlinepic"&gt;
&lt;img src="http://www.tincat-group.com/images/cows-iowa.gif" alt="Iowa Cows"  title="Iowa Cows" /&gt;
&lt;h6&gt;Iowa cattle farm above, Wyoming cattle range below&lt;/h6&gt;
&lt;/div&gt;
&lt;div class="inlinepic"&gt;
&lt;img src="http://www.tincat-group.com/images/cows-wyoming.gif" alt="Wyoming Cows"  title="Wyoming Cows" /&gt;
&lt;/div&gt;
&lt;blockquote class="pullquote"&gt;Promiscuous means consisting of elements brought together without order&lt;/blockquote&gt;
&lt;p&gt;
We might say that the confinement lot is organized, constrained, and controlled in accordance with economic and other mathematical principles, while the ranch provides a more &lt;dfn title="see definition below"&gt;promiscuous&lt;/dfn&gt; landscape.  The ranch is also organized, mind you, but a natural order arises from the land.  Cowboys interact with the landscape to herd the cattle, performing tasks that might have been designed into a confinement lot.
&lt;/p&gt;
&lt;p&gt;
Using the second definition from dictionary.com, &lt;em&gt;promiscuous&lt;/em&gt; means &lt;em&gt;consisting of parts, elements, or individuals of different kinds brought together without order.&lt;/em&gt;
&lt;/p&gt;
&lt;p&gt;
Lest there be confusion, let me state that as we turn our heads from cows to data, I will interpret the word &lt;em&gt;order&lt;/em&gt; in this definition to refer broadly to the organization and structuredness of a database, not to the ordering of attributes or rows.  
&lt;/p&gt;
&lt;div class="inlinepic"&gt;
&lt;img src="http://www.tincat-group.com/images/real-cowboys.gif" alt="Real Cowboys"  title="Real Cowboys" /&gt;
&lt;h6&gt;Real cowboys, Wyoming Nov 2006&lt;/h6&gt;
&lt;/div&gt;
&lt;div class="inlinepic"&gt;
&lt;img src="http://www.tincat-group.com/images/cow-parade.gif" alt="Cow Parade"  title="Cow Parade" /&gt;
&lt;/div&gt;
&lt;p&gt;
Chose from among the most common dictionary definitions of the term &lt;em&gt;database&lt;/em&gt;, such as &lt;em&gt;a collection of related facts&lt;/em&gt; or perhaps one that requires the use of a computer, if you prefer.  Most readers will likely be familiar with the process of designing databases by employing relational modeling, given that this is taught in college courses as well as on the job.  The design is organized, constrained, and controlled according to mathematical principles from set theory and first order predicate logic.  Like the cattle farmer's land, a model could be drafted showing how the database designer will structure the database.  Structure of this sort attracts certain personalities (farmers?) and not others (ranchers?).  You might guess that I, in particular, feel more at home on the range.
&lt;/p&gt;
&lt;p&gt;
Any model other than a relational data model might seem to the software development profession as promiscuous.  I chose this derogatory, yet enticing, term in part because of the seeming unorderedness, comparatively, of legacy data models.  This is not unlike the seeming unorderedness of a cattle ranch compared to the more obvious structure of the cattle farm.  I also like using the term &lt;em&gt;promiscuous&lt;/em&gt; here because our profession currently sees these alternative database tools as improper, even if increasingly enticing.  I predict that our industry will be seduced by something resembling these legacy databases enough to switch to considering not-really-relational databases as mainstream again in the future, especially as SQL becomes less attractive as our interface language to data.
&lt;/p&gt;
&lt;div class="insetpic"&gt;
&lt;img src="http://www.tincat-group.com/images/legacy-data.gif" alt="Legacy Photo Sign"  title="Wall Drug Legacy Photo Sign" /&gt;
&lt;/div&gt;
&lt;p&gt;
Note that although I do need a term for the databases about which I have been writing, often referred to as &lt;em&gt;embedded&lt;/em&gt; in marketing literature, I will likely not latch onto this term as that would put me in the uncomfortable position of endorsing promiscuity.  I can live with that discomfort for this one blog entry.
&lt;/p&gt;
&lt;p&gt;
Ask a rancher or cowboy how they divide up the land, and they might suggest that the land divides itself.  The water is here, the grass there, and the rocks up this way.  They might draw or paint you a picture.  Ask a database cowboy, one working with a more promiscuous database than those based on the relational model, how they model the data and you might hear an analogous response.  The data orders itself.  
&lt;/p&gt;
&lt;p&gt;
Ask what steps a database cowboy takes to design a database, and you will likely hear that the first step is to have a good understanding of the landscape, the business.  Then you define the scope of your project, putting a fence around it, and then you record what you see inside that fence.  By looking at the landscape, you can make a computerized model of this reality for your database implementation.  The implementation is a model of the business, not unlike a painting of the range.
&lt;/p&gt;
&lt;p&gt;
I am well aware this scenario generates laughter from some, ridicule from others, as it sounds so unscientific.  But as my colleague Anthony Youngman (aka Wol) would suggest, relational modeling is mathematical but not very scientific.  The &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; imposes an order using mathematical terms such as predicate and relation, typically avoiding terms matching the problem domain such as &lt;em&gt;thing&lt;/em&gt;, &lt;em&gt;entity&lt;/em&gt;, &lt;em&gt;property&lt;/em&gt;, &lt;em&gt;empty&lt;/em&gt;, and &lt;em&gt;list&lt;/em&gt;, terms used by cowboys working with promiscuous databases.  Relational modeling includes putting data into Nth normal form, while the database cowboy knows the land and paints what &lt;dfn title="or she, but this flowed better, please forgive"&gt;he&lt;/dfn&gt; sees. 
&lt;/p&gt;
&lt;p&gt;
For anyone confused by the imprecision of this description, perhaps the Jayne VanDoe example in the &lt;a href="http://www.tincat-group.com/mewsings/2006/01/is-codd-dead.html"&gt;Is Codd Dead?&lt;/a&gt; mewsing provides more hints.  By the way, regarding &lt;em&gt;science&lt;/em&gt; and &lt;em&gt;databases&lt;/em&gt;, have the terms &lt;em&gt;relational model&lt;/em&gt; and &lt;em&gt;experiment&lt;/em&gt; ever made it into the same sentence?  We need to return to the science and art, the craft, of databases, modeling by painting what we see and testing our models over time.  I'll grant that there is a need for more emperical data related to the effectiveness and resources required over time for all varieties of databases.
&lt;/p&gt;
&lt;p&gt;
At the risk of repeating what I have said in earlier blog entries, but for the sake of any new readers, I will briefly suggest three features that might distinguish a seemingly promiscuous database from one that more closely implements the relational model.
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;acronym title="Two-valued Logic"&gt;2VL&lt;/acronym&gt;
&lt;p&gt;Most, if not all, languages that work with the data employ two-valued logic.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;acronym title="Non-first Normal Form"&gt;NF2&lt;/acronym&gt;
&lt;p&gt;The data need not be in what has traditionally been called first normal form.  Attribute values may be arrays or multivalues.
&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;Contraints as data
&lt;div class="insetpic"&gt;
&lt;img src="http://www.tincat-group.com/images/wall-drug-saloon-girl2.gif" alt="Saloon Girl"  title="Saloon Girl" /&gt;
&lt;/div&gt;
&lt;p&gt;This one needs a sweet acronym and a better description, but the idea is that constraints related to attribute types are typically specified with data, rather than with metadata, and are enforced outside of a &lt;acronym title="Database Management System"&gt;DBMS&lt;/acronym&gt;, rather than by one.  Rather promiscuous, wouldn't you say?
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
Let's take a look at legacy databases.  As it turns out, the data handled with/in/using databases termed &lt;em&gt;legacy&lt;/em&gt; is current, not primarily legacy, data. While it has been the conventional wisdom, that some would say has been proven, to migrate from legacy databases to SQL-DBMS tools, signs point to a return of such proven approaches as the use of two-valued, rather than three-valued logic.  Additionally, more and more work with databases is done by developers without the tired, old 1NF requirement, often by way of object-relational or XML-RM mappings.  There is reason to suggest that the future of data modeling resembles the past, the data modeling done by our current database cowboys.
&lt;/p&gt;
&lt;p&gt;
In case you are asking &lt;em&gt;Where's the beef?&lt;/em&gt; (perhaps you cannot see the pictures herein), the next blog entry will start looking at specific design patterns used when designing for one model of promiscuous databases, the Pick/MultiValue databases.  While I am not as familiar with other not-really-relational models, it is very likely that these best practices will translate to best practices in many other environments as well.  And, as always, I fear they are apt to irritate or even infuriate some &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; enthusiasts.  Heigh ho.
&lt;/p&gt;
&lt;p&gt;
Although cowboys typically prefer using an apprentice approach with new recruits, a cowboy handbook might be in order so that the next generation of cowboys can learn from the best practices of those who have gone before.  While it once looked like these cowboys were a dying breed, with the new &lt;dfn title="I heard this phrase in an ad"&gt;wild, wild west of the internet&lt;/dfn&gt;, database cowboys look like they will be around for as long as the farmers.  In the next blog entry, I will have something for you to sink your teeth into.  The least we can do is pass along some tips from seasoned cowboys on how they have been saddling promiscuous databases for the past half-century.
&lt;/p&gt;
&lt;div class="inlinepic"&gt;
&lt;img src="http://www.tincat-group.com/images/cowboy-cafe.gif" alt="Cowboy Cafe"  title="Cowboy Cafe" /&gt;
&lt;/div&gt;

&lt;a href="http://www.tincat-group.com/mewsings/2007/01/otlt-metadata-piece-not-apartheid.html"&gt;Continue to next blog &amp;rarr;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/12028427-116285562150056859?l=www.tincat-group.com%2Fmewsings%2Findex.html'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/116285562150056859/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=12028427&amp;postID=116285562150056859&amp;isPopup=true' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/116285562150056859'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/116285562150056859'/><link rel='alternate' type='text/html' href='http://www.tincat-group.com/mewsings/2006/11/cowboys-with-promiscuous-databases.html' title='Cowboys with Promiscuous Databases'/><author><name>--dawn</name><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09089606608203203470'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-12028427.post-115401921366092046</id><published>2006-07-27T11:52:00.000-05:00</published><updated>2007-01-08T04:46:43.855-06:00</updated><title type='text'>To Whom Should Size Matter?</title><content type='html'>&lt;div class="insetpic"&gt;
&lt;img src="http://www.tincat-group.com/images/tetons.gif" alt="Grand Tetons"  title="Grand Tetons" /&gt;
&lt;/div&gt;
&lt;p&gt;
The Grand Tetons are incredibly impressive.  They are big, really big.  Combining business and pleasure, with one trip this year I went from Iowa to Washington D.C. and with another from Iowa to Washington State, both by car.  I was able to see people and places, taking pictures from coast to coast.  I love mountains and water, the bigger the better.  In my home state of Michigan, the Great Lakes are great because of their size.  Of course there are very impressive small points of interest too.  &lt;em&gt;When it comes to physical things, size matters.&lt;/em&gt;
&lt;/p&gt;
&lt;p&gt;
Physical objects differ from words in that respect.  It makes sense to care about the height of a mountain, the size of a portion of food, the length of a skirt, or the size of a vehicle.  It rarely makes sense to care about the size of a word, with word games being an exception.  
&lt;/p&gt;
&lt;div class="inpic"&gt;
&lt;img src="http://www.tincat-group.com/images/maxlength.gif" alt="Max Length in Game"  title="Max Length in Game" /&gt;
&lt;/div&gt;
&lt;p&gt;
You might recall that feeling when you want to play all seven letters in Scrabble, maybe even on a triple word score, but it doesn't fit.  The board is not long enough, or there are not enough open squares to play the word.  Rats!
&lt;/p&gt;
&lt;p&gt;
A similar game is often played by users of database management systems and related applications.  The word you want stored in the database is just a little too long for the data entry screen to handle or the &lt;acronym title="
DataBase Management System"&gt;DBMS&lt;/acronym&gt; to accept.  Some &lt;acronym title="
DataBase Management System"&gt;DBMS&lt;/acronym&gt; tools require or at least strongly encourage size constraints on data attribute values.  Others do not.  &lt;em&gt;To an &lt;acronym title="Structured Query Language"&gt;&lt;em&gt;SQL&lt;/em&gt;&lt;/acronym&gt;-&lt;acronym title="
DataBase Management System"&gt;&lt;em&gt;DBMS&lt;/em&gt;&lt;/acronym&gt;, size matters.&lt;/em&gt;
&lt;/p&gt;
&lt;p&gt;
Focusing on this one specific type of constraint, let's take a look at reasons a software development team might design using a maximum length constraint (maxlen) rather than permitting variable lengths for values of a specific attribute.  
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Input&lt;/strong&gt;
&lt;div class="insetpic"&gt;
&lt;img src="http://www.tincat-group.com/images/computer-card.gif" alt="80 column card"  title="80 column card" /&gt;
&lt;/div&gt;
&lt;p&gt;Perhaps the input device and related software provides for limited space for input data.  The punch card is a good example.  While variable length attribute values could be encoded on punch cards using delimiters between values such as deBoer,Reta,piano, to ensure that a fixed list of attribute values for each person fits on a single card, we might design the musical instrument value to be in card columns 33 through 46.  The &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt; for all values of that attribute would then be 14.
&lt;/p&gt;
&lt;p&gt;
Developers once had to design for input from &lt;dfn title="Using matrix rows x columns not screen res width x height"&gt;23 x 80 character screens&lt;/dfn&gt; too&amp;#8212;remember those?  Even with a fixed screen size, however, users could be given scrolling entry fields so entered data could be larger than what is shown on the screen at any given time.
&lt;/p&gt;
&lt;p&gt;We should consider whether we would want the restrictions of an input device to dictate database-level constraints.  We might also consider that today's typical screen entry or data exchange has no such issues.  While screen size is still a factor for designers, this technology requirement need not whittle down our original business requirements.  I feel comfortable relegating this reason for a &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt; to the past.
&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Output&lt;/strong&gt;
&lt;div class="insetpic"&gt;
&lt;img src="http://www.tincat-group.com/images/greenbar.gif" alt="greenbar"  title="Greenbar" /&gt;
&lt;/div&gt;
&lt;p&gt;
I still have my nifty 132-column ruler for designing greenbar reports.  When we could fit only 132 characters in a fixed-length font on a single line of a report printed on &lt;dfn title="green and white striped paper, still used in the industry, although not nearly as popular as it once was."&gt;greenbar&lt;/dfn&gt;, we naturally cared about attribute sizes.  When designing paper forms such as payroll advices, it makes sense to care about the &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt; for names compared to the number of available characters on the form.  The representation of the value of any particular attribute might be constrained by the design of a report.  Should we also constrain the actual database values and not just the representation of those values?  Perhaps there is a reason to do so in some cases, such as the &lt;em&gt;codes&lt;/em&gt; discussed next, but, in general, no.
&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lookup Table&lt;/strong&gt;
&lt;p&gt;
When modeling data, we often swap out some terms within a proposition for succinct, consistent codes.  For example, start with a proposition such as:
&lt;/p&gt;
&lt;p&gt;
Jan (first) VanDoe (last) lives in Illinois (state).
&lt;/p&gt;
&lt;p&gt;
It is typical to model the predicate for this proposition with two base relations, Person(first,last,stateCode) and State(stateCode,stateName). The original predicate is then modeled with a view that joins these two giving PersonView(first,last,stateName).  In this example, a code of &lt;em&gt;IL&lt;/em&gt; would be in the Person relation, mapping to the name &lt;em&gt;Illinois&lt;/em&gt; in the State relation.
&lt;/p&gt;
&lt;p&gt;
This is a common pattern in data modeling.  In their &lt;a href="http://www.amazon.com/gp/product/0321293533/sr=8-1/qid=1154019683/ref=pd_bbs_1/103-3555016-3417451?ie=UTF8"&gt;&lt;em&gt;Refactoring Databases&lt;/em&gt; book&lt;/a&gt;, Ambler and Sadalage label a change of this nature to an existing schema as &lt;em&gt;Add Lookup Table&lt;/em&gt;.  So instead of calling this the &lt;em&gt;code file&lt;/em&gt; pattern, a term used back when we also talked about &lt;em&gt;master files&lt;/em&gt; and &lt;em&gt;transaction files&lt;/em&gt;, I'm OK with calling this the Lookup Table pattern, while still referring to the attribute (such as stateCode above) as a &lt;em&gt;code&lt;/em&gt;.
&lt;/p&gt;
&lt;p&gt;
Is there a good reason to constrain the length of codes?  Yes.  Lookup tables are all about constraining propositions to aid in conformity.  This helps with accuracy of captured data and ease in performing analysis, for example.  Placing a &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt; on the code does not further constrain nor alter the original proposition derived from business requirements.  There are advantages to developers in fixing a &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt;, making such a design attractive in cases where it cannot adversely affect data integrity, such as truncated values, or unnecessarily restrict flexibility.  
&lt;/p&gt;
&lt;p&gt;
Now comes the question of who should care about the &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt; on codes for lookup tables.  Have you ever seen a case where there is a requirement to change the length of a code?  If not, I've seen enough such cases for both of us.  In every case I can recall, the change was to a larger &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt;, although I can imagine the other scenario.  In the rare case of changing to a smaller &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt; on a code, there must be a mapping from the longer attribute values to the new shorter &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt;, whether truncation or a fancier algorithm, and the existing database would likely need to be modified so it fits within the new constraints.  
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;There are DBMS products that tightly couple the logical constraints with the physical implementation&lt;/blockquote&gt;
&lt;p&gt;
In the much more common case of needing to upsize the &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt;, there is no similar requirement to change the existing database.  The new requirement is less restrictive than the former requirement, so the database is logically already in compliance with the new &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt;.  It might even be feasible to roll such a change in constraints out to users along with other maintainable business rules.  Unfortunately, there are &lt;acronym title="
DataBase Management System"&gt;&lt;em&gt;DBMS&lt;/em&gt;&lt;/acronym&gt; products that tightly couple the logical constraints with the physical implementation of the database.  Such products actually make computer resource allocations based on the logical &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt; constraints!  
&lt;/p&gt;
&lt;p&gt;
In fact, every &lt;acronym title="Structured Query Language"&gt;&lt;em&gt;SQL&lt;/em&gt;&lt;/acronym&gt;-&lt;acronym title="
DataBase Management System"&gt;&lt;em&gt;DBMS&lt;/em&gt;&lt;/acronym&gt; product I have seen requires some maintenance activity on the physical database simply because of an increase to a &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt;, a relaxing of the constraint.  While there are products that have no such issue, I'm guessing that most of those pre-date the introduction of the relational model.  Coincidence?  [I have been corrected on this.  Apparently this is not the case for Oracle, so I would guess others as well.  I'm curious now what an average cost estimate might be for a requirement to increase a maxlen in a schema since I'm guessing estimates might have influenced my assumption.  &lt;dfn title="This might be a foreign language to some DBAs"&gt;Regression testing&lt;/dfn&gt; might be the biggest part of the estimate.] 
&lt;/p&gt;
&lt;p&gt;
Back to our codes where we want to design in a &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt;, where should this &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt; be encoded?  While some might be inclined to specify this constraint to the &lt;acronym title="
DataBase Management System"&gt;&lt;em&gt;DBMS&lt;/em&gt;&lt;/acronym&gt;, you can see the danger in cases where such a &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt; specification is used for the dual purpose of physical implementation design.  I recently saw a &lt;acronym title="Structured Query Language"&gt;&lt;em&gt;SQL&lt;/em&gt;&lt;/acronym&gt;-&lt;acronym title="
DataBase Management System"&gt;&lt;em&gt;DBMS&lt;/em&gt;&lt;/acronym&gt; schema example where every attribute was designated with a &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt; of 50, with 50 being larger than the max logical &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt; for all attributes.  Such an approach mitigates the issues with the tight coupling of logical and physical.  
&lt;/p&gt;
&lt;p&gt;
Where should the logical &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt; be placed in a logical data model?  Given that all &lt;acronym title="User Interface"&gt;UI&lt;/acronym&gt; and web service validation routines need access to this constraint information, it can be made available by being captured as other business rules might be, in the database itself.  I suggest that the database proper (rather than the schema) is a good place for specifying all business constraints.  There must then be enforcement of the use of standard &lt;acronym title="Create, Read, Update, Delete"&gt;CRUD&lt;/acronym&gt; services that employ such constraint logic for all software maintaining the database.  This provides more flexibility for the business and I suggest the cost of ownership is lower than when any business rules are specified to the &lt;acronym title="
DataBase Management System"&gt;&lt;em&gt;DBMS&lt;/em&gt;&lt;/acronym&gt;.
&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Processing&lt;/strong&gt;
&lt;blockquote class="pullquote"&gt;As with punch cards, size matters to COBOL&lt;/blockquote&gt;
&lt;p&gt;I'll toss in a short note that if software is being developed with one of the older languages for processing data that require &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt; on variables (e.g. COBOL), then you are stuck specifying the &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt; to your software components for the purpose of reading or writing data.  As with punch cards, size matters to COBOL.  As sweet as I suspect it can still be to write business applications in COBOL, I'm comfortable making this an historical footnote.
&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Performance&lt;/strong&gt;
&lt;p&gt;
Some might say that decoupling logical and physical &lt;dfn title="maximum length constraints"&gt;maxlens&lt;/dfn&gt; could adversely affect performance.  If that is the case, I would suggest that the industry move to a data model that does not have this defect.  Yes, of course I know that the computer is a finite resource and there is a need for some information for performance tuning and optimization. But(3) it(2) is(2) not(3) essential(9) that(4) each(4) individual(10) attribute(9)...you get the point, right?
&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Space&lt;/strong&gt;
&lt;p&gt;
Similarly, if the amount of disk or other resources is adversely affected by decoupling logical and physical constraints, I would suggest the industry move to a data model that does not have this defect.
&lt;/p&gt;
&lt;p&gt;
While the models I know that fit the bill do have other defects, including logical/physical coupling in other respects, they provide greater flexibility for those changes that companies are likely to make to their data over time.  That is because they take physical implementation information that is irrelevant to the user and the business requirements and require it of the logical model rather than the other way around.  For example, because such things as the order of stored attributes is irrelevant to the business user, there need not ever be a requirement to change that order in either the logical data model or the physical implementation.
&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real Input or Output Business Requirement&lt;/strong&gt;
&lt;p&gt;
Changing the name, my cousin told me how she has let the relevant parties know that she absolutely does not want her hyphenated last name of Oostendorp-Holland to be truncated on documents, credit cards, etc as &lt;em&gt;Oostendorp-Ho&lt;/em&gt; again, pa-lease.  Given that she is &lt;dfn title="Yes, she is wealthy"&gt;an influential person&lt;/dfn&gt;, I'm guessing these parties are trying to comply, but it might be too costly for them to do so.  
&lt;/p&gt;
&lt;p&gt;
What should a company do when there is a business case for having a maximum length on a data value?  They should typically not ditch the unrestricted real value for an attribute, but add in another attribute for the representation of the actual value at a shorter length.  In this case, they could then keep the actual name value and also have a constrained value.  They could put Oo-Holland or Oostendorp or any number of other values into the constrained version.  This would only be used in an override situation (ignoring &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; issues with NULL handling here) where the original value as written or as truncated would not suffice.  Additionally, if the constraint is on a derived value, such as the first name plus the last name (which happens to be the case in the example cited), the revised full name, with associated &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt;, would be a separate attribute.
&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote class="pullquote"&gt;Relational theory does not speak to maxlen, but every implementation of the relational model leads developers to make extensive use of such.&lt;/blockquote&gt;
&lt;p&gt;We have seen some of the reasons why a designer might design in a &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt;, with related commentary on these.  Why focus so keenly on this one type of constraint?  I would venture that the application of the &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt; in places where the proposition should have no such constraint, in addition to the tight coupling between the logical and physical &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt;, has cost the industry and individual organizations a bundle.  It is but one area related to constraint handling where we have been bleeding, but one about which I hear little mention&amp;#8212;and no noise, no fix, I'm guessing.
&lt;/p&gt;
&lt;p&gt;
Relational theory is predicate logic and set theory.  Relational theory does not speak to &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt;, but every implementation of the relational model leads developers to make extensive use of such.  It has been suggested that if &lt;dfn title="maximum length constraint"&gt;maxlen&lt;/dfn&gt; were removed from &lt;acronym title="relational model"&gt;RM&lt;/acronym&gt; implementations, the solution would be less optimal in some way.  While this has nothing to do with relational theory, it does seem to have something to do with existing relational models and implementations of such, particularly the &lt;acronym title="Structured Query Language"&gt;&lt;em&gt;SQL&lt;/em&gt;&lt;/acronym&gt;-&lt;acronym title="
DataBase Management System"&gt;&lt;em&gt;DBMS&lt;/em&gt;&lt;/acronym&gt;.  It might be the case that in order to resolve this, &lt;acronym title="
DataBase Management System"&gt;&lt;em&gt;DBMS&lt;/em&gt;&lt;/acronym&gt; implementations would be well-served to permit attribute order to be specified, or reject some other tenet of the relational model.  To whom should size matter?  Not the &lt;acronym title="
DataBase Management System"&gt;&lt;em&gt;DBMS&lt;/em&gt;&lt;/acronym&gt;.
&lt;/p&gt;

&lt;div class="inlinepic"&gt;
&lt;img src="http://www.tincat-group.com/images/vacation.gif" alt="McCall, Idaho"  title="McCall, Idaho" /&gt;
&lt;/div&gt;

&lt;a href="http://www.tincat-group.com/mewsings/2006/11/cowboys-with-promiscuous-databases.html"&gt;Continue to next blog &amp;rarr;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/12028427-115401921366092046?l=www.tincat-group.com%2Fmewsings%2Findex.html'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/115401921366092046/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=12028427&amp;postID=115401921366092046&amp;isPopup=true' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/115401921366092046'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/115401921366092046'/><link rel='alternate' type='text/html' href='http://www.tincat-group.com/mewsings/2006/07/to-whom-should-size-matter.html' title='To Whom Should Size Matter?'/><author><name>--dawn</name><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09089606608203203470'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-12028427.post-114869871623163205</id><published>2006-05-26T21:58:00.000-05:00</published><updated>2006-11-12T17:58:03.857-06:00</updated><title type='text'>Constraints Factored In and Out</title><content type='html'>&lt;div class="insetpic"&gt;
&lt;img src="http://www.tincat-group.com/images/tie.gif" alt="Neck Tie Constraint"  title="Neck Tie Constraint" /&gt;
&lt;/div&gt;
&lt;p&gt;
It's time to factor constraints into the equation.  A constraint is any component of software, including code, metadata, or data, designed to limit possibilities.  Constraints implement "business" rules such as &lt;dfn title="alas, it is true; heavy sigh"&gt;&lt;em&gt;only men may be elders in churches in Sioux Center, IA&lt;/em&gt;&lt;/dfn&gt;.  
&lt;/p&gt;
&lt;p&gt;
Every aspect of a software product is constraining.  Remove all constraints from software and you no longer have software.  While I will use the term &lt;em&gt;constraint&lt;/em&gt; with a broad definition, some use it only related to constraint specifications formed as propositions (logical predicates, to be precise).    Some prefer or even insist upon using a declarative, rather than &lt;acronym title="Object-Oriented"&gt;OO&lt;/acronym&gt; or procedural, language for constraints.  While I see the charm in that approach, I will be almost programming-language-agnostic in this mewsing.
&lt;/p&gt;
&lt;p&gt;
From a theory perspective, we can model both data and constraints as propositions, then use predicate logic to &lt;em&gt;&lt;strong&gt;and&lt;/strong&gt;&lt;/em&gt; these propositions.  For example, if we are defining a person &lt;strong&gt;P&lt;/strong&gt; to the computer, and we are collecting a birthdate for that person, we might refine our definition of a person using a constraint &lt;strong&gt;Q&lt;/strong&gt; that specifies a living person's age to be &amp;lt;= 140.  To validate our data, we can then ask the question &lt;strong&gt;P &lt;dfn title="and"&gt;^&lt;/dfn&gt; Q?&lt;/strong&gt;  
&lt;/p&gt;
&lt;p&gt;
An aside: Predicate logic is relevant to the choice of data models.  &lt;em&gt;Simple propositions&lt;/em&gt; (e.g. those without lists) require only first order predicate logic, which makes for a simpler theory than if using higher order logic.  However, it is simpler for &lt;dfn title="And it's all about me, Al Franken"&gt;me&lt;/dfn&gt; to work with property lists, for example, than to normalize as described in &lt;a href="http://www.tincat-group.com/mewsings/2006/01/is-codd-dead.html"&gt;Is Codd Dead?&lt;/a&gt;.  We will have to bring &lt;a href="http://en.wikipedia.org/wiki/Occam's_Razor"&gt;Occam&lt;/a&gt; in for that debate at some point so he can tell us whether he is relevant to this discussion or not.
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;A sculptor wields the chisel, and the stricken marble grows to beauty. &lt;font size="-2"&gt;WILLIAM CULLEN BRYANT&lt;/font&gt;&lt;/blockquote&gt;
&lt;p&gt;
No matter how a software product implements one constraint or another, there are two parts to any software constraint: a specification and a service.  A &lt;strong&gt;constraint specification&lt;/strong&gt; is developed using a computer language, whether a general purpose language such as Java, a database sublanguage such as &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt;, a declarative rules language such as &lt;acronym title="Object Constraint Language"&gt;OCL&lt;/acronym&gt;, or a homegrown or vendor-supplied proprietary language, perhaps specified using &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt; documents.  
&lt;/p&gt;
&lt;p&gt;
A full range of acronyms could also be relevant to the development of a &lt;strong&gt;constraint service&lt;/strong&gt; that applies a constraint and returns a result.  The constraint service reads in the constraint as part of the input (Q above) along with whatever needs to be verified against the constraint (P) and performs the test (P ^ Q?), taking appropriate action based on the test result.
&lt;/p&gt;
&lt;blockquote&gt;Note re terminology, feel free to skip: Constraint services are often referred to as validation services, which is somewhat narrower.  To clarify, I'm trying to use the term &lt;em&gt;rules&lt;/em&gt; for the conceptual realm, the analysis aspects of the project, with &lt;em&gt;constraints&lt;/em&gt; as the implementation of those rules in the software.  Some rules' implementations, aka &lt;em&gt;constraints&lt;/em&gt; by this terminology, do not constrain but assist, perhaps providing a suggestion, for example.  The rest of the software to implement a constraint, other than the specification is what I am referring to as a &lt;em&gt;constraint service&lt;/em&gt;, even in these cases when it does not constrain the user or the data.  So, it might be better to skip the term &lt;em&gt;constraint&lt;/em&gt; altogether and speak only of &lt;em&gt;rules&lt;/em&gt;.  Clear as mud?  For the examples here, consider all mention of constraint services to be validation services even though I am using a broader term.
&lt;/blockquote&gt;

&lt;blockquote class="pullquote"&gt;There are two parts to any constraint: a specification and a service. &lt;/blockquote&gt; 
&lt;p&gt;Sometimes the specification and service are interwoven, tightly coupled.  If a user enters a birthdate, there might be code similar to &lt;em&gt;age = (&lt;dfn title="The exact algorithm depends on how the language handles dates"&gt;today - birthdate)/365&lt;/dfn&gt;; if age &amp;gt; 140 &lt;dfn title="I used 'show' hoping that would make it clear this is pseudocode"&gt;show&lt;/dfn&gt; errorMessage&lt;/em&gt;.  Alternatively, there could be a constraint similar to &lt;em&gt;(today - birthdate)/365 &amp;lt;= 140&lt;/em&gt; specified as input to a constraint service, or &lt;em&gt;rules engine&lt;/em&gt;.  The latter is typically the case when contraints are applied by a &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt; while the former is often the case when a constraint is applied to user input.  Implementation of the same constraint is often partitioned differently in different components of a single software product.
&lt;/p&gt;
&lt;p&gt;
End-users, rather than developers, might maintain a constraint.  We could give our users a means of changing the male-only elders constraint, for example.  This could be implemented by something as simple as a system-wide WOMEN_ELDERS Yes/No flag or as complex as a general engine that interprets specifications such as &lt;em&gt;if Person's gender = 'F' then mayBeElder = false&lt;/em&gt;.  When an end-user might need to change a constraint, the specification is &lt;em&gt;data&lt;/em&gt;, even if it is also &lt;em&gt;code&lt;/em&gt;, often stored in a database.
&lt;/p&gt;
&lt;p&gt;
Moving along, have you seen the following?  Web-based software that includes constraints 1) coded in JavaScript to get that quick response time when verifying entered data in the browser environment; these constraints are also 2) specified and applied in a language associated with the web or app server; additionally, such constraints are 3) specified to and applied by the &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt; on a database server.  There are three separate specifications and three separate services for the same constraint.  For example, our &lt;em&gt;age &amp;lt;= 140&lt;/em&gt; constraint might be specified using the JavaScript language, using a &lt;acronym title="Object Constraint Language"&gt;OCL&lt;/acronym&gt; in &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt;, and using &lt;acronym title="Structure Query Language"&gt;SQL&lt;/acronym&gt;.  The associated constraint services might be coded in JavaScript and Java with the third being a component of a proprietary &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt;.  Wow!  All that and we are able to ensure that our software only models people who are not older than 140 years old.  
&lt;/p&gt;
&lt;p&gt;
I hesitate to mention that the birthdate might have been entered by someone working for another company whose software already validated this data up and down before passing it to your web service so you can do likewise, perhaps before passing it on.     We verify, verify, then verify again to &lt;em&gt;ensure&lt;/em&gt; quality data, of course, provided the date was entered correctly and no one (in Hollywood) lies about their age.  Perhaps this is another example of measuring with a micrometer and cutting with an axe.
&lt;/p&gt;
&lt;p&gt;
This approach to constraints results in bloated software with high maintenance costs.  It is a sorry state, indeed, but I'll admit that I take a small amount of delight in the irony that the database world claims concern about redundant data while being a primary player in the repeated specification of the same constraint. 
&lt;/p&gt;
&lt;div class="inlinepic"&gt;
&lt;img src="http://www.tincat-group.com/images/factoring2.gif" alt="Factoring out Constraint Management"  title="Factoring out Constraint Management" /&gt;
&lt;/div&gt;
&lt;p&gt;
Why don't we validate our data once, so we avoid the triple-specification and triple-code for services situation?  The simple answer is &lt;em&gt;trust&lt;/em&gt;.  Given JavaScript and a web browser UI as implemented today (where a user could use &lt;a href="http://greasemonkey.mozdev.org/"&gt;Greasemonkey&lt;/a&gt; to change data after it has been validated, for example), a middle tier cannot trust that data verified in the browser is the same data it sees.  While there could be more trust between application code in a middle tier and the &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt;, there is no automated mechanism for certifying an application so that a &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt; could have a security feature that is able to accept data from a trusted source.
&lt;/p&gt;
&lt;p&gt;
Without thinking about the possibility of certificates and signatures for each piece of data that has been verified, we are not going to stop the redundancy of repeatedly performing the verification. There is potential to eliminate the &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt; verification if the application software is owned by the same organization and adequate quality assurance is done for the &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt; to accept the validations of the middle tier (and why, pray tell, wouldn't that be the case?)  But in order to keep some folks happy, let's just say that all of these validations must transpire.  It surely is not also required that each constraint specification and the code for each service be written in different languages, right?
&lt;/p&gt;
&lt;p&gt;
In designing software, we partition our solution, modeling using metaphors and related implementations for such things as objects, services, functions, or sets.  We factor or refactor our software solutions so that we pull out frequently-used components for reuse.  If we have encoded a constraint such as &lt;em&gt;age &amp;lt;= 140&lt;/em&gt;, we could, in theory, reuse both the specification and the validation service wherever it is needed.  Deciding what to factor out in the overall scope of a software project, how to partition our software, is part of the software design process.
&lt;/p&gt;
&lt;p&gt;
What keeps us from factoring out constraint specifications and constraint services?  If a &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt;-&lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt; tool is part of the software solution, constraints might be encoded in the database schema.  With most &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt; tools it is not feasible, whether for performance or other reasons, to reuse the specification and constraint services of the &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt; throughout related applications.  Even if it were, it is unlikely that all relevant constraints would or should be implemented there.
&lt;/p&gt;
&lt;p&gt;
Some organizations choose to put a minimum number of constraints in the &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt;.  I hate to say it outloud as I know it is not a popular opinion, but I favor restricting the &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt; schema to a bare minimum of constraints any time the same organization that owns the database also controls the applications that update it.  I have yet to see a case where one organization permits another to write directly to its databases (although there might be such), so this is pretty much always my recommendation.  
&lt;/p&gt;
&lt;p&gt;
Back-end constraint services can be packaged with the organization's &lt;acronym title="Create, Read, Update, Delete"&gt;CRUD&lt;/acronym&gt; services used by all applications.  The database validations and related manipulations can then use the same constraints as the applications.  While working with database management systems lacking even foreign key constraints, it was once troubling how much cost savings there seemed to be using that development environment.  I would never have guessed then that I would end up recommending such a strategy.  Now I see that these constraints were still in the overall solution, even if lacking in the &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt; schema.
&lt;/p&gt;
&lt;p&gt;
On the &lt;acronym title="User Interface"&gt;UI&lt;/acronym&gt; front, if Javascript is required, it could potentially be generated from app server languages as is done with the new Google Web Toolkit for &lt;acronym title="Asynchronous JavaScript And XML"&gt;AJAX&lt;/acronym&gt; development and Ruby on Rails &lt;acronym title="Asynchronous JavaScript And XML"&gt;AJAX&lt;/acronym&gt; efforts, although debugging such generated code could be very unpleasant.  I would prefer that validations for the &lt;acronym title="User Interface"&gt;UI&lt;/acronym&gt; be performed in the middle tier, with JavaScript using &lt;acronym title="Asynchronous JavaScript And XML"&gt;AJAX&lt;/acronym&gt; for asynchronous validations.  When the constraint directly affects the &lt;acronym title="User Interface"&gt;UI&lt;/acronym&gt; widget, such as permitting a selection from a drop-down list, JavaScript will still need to get such constraint data to populate the &lt;acronym title="User Interface"&gt;UI&lt;/acronym&gt;, but it need not duplicate it.
&lt;/p&gt;
&lt;p&gt;
We could certainly get closer to having a single specification for a constraint and a single service that validates based on the constraint than what is often the case in large software development efforts.  Is it time to refactor our solutions so as not to lock constraint specifications and related services within the &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt; schema?  Factoring constraints into our software design might just mean factoring them out.
&lt;/p&gt;
&lt;a href="http://www.tincat-group.com/mewsings/2006/07/to-whom-should-size-matter.html"&gt;Continue to next blog &amp;rarr;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/12028427-114869871623163205?l=www.tincat-group.com%2Fmewsings%2Findex.html'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/114869871623163205/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=12028427&amp;postID=114869871623163205&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/114869871623163205'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/114869871623163205'/><link rel='alternate' type='text/html' href='http://www.tincat-group.com/mewsings/2006/05/constraints-factored-in-and-out.html' title='Constraints Factored In and Out'/><author><name>--dawn</name><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09089606608203203470'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-12028427.post-114609206296532978</id><published>2006-04-26T17:54:00.000-05:00</published><updated>2006-11-12T17:58:03.457-06:00</updated><title type='text'>Surf and Turf Reporting</title><content type='html'>&lt;div class="insetpic"&gt;
&lt;img src="http://www.tincat-group.com/images/surf-turf.gif" alt="Surf and Turf"  title="Surf and Turf" /&gt;
&lt;/div&gt;
&lt;p&gt;This is a continuation of the previous mewsing, &lt;a href="http://www.tincat-group.com/mewsings/2006/04/consuming-frozen-data.html"&gt;Consuming Frozen Data&lt;/a&gt;.  &lt;dfn title="Or, if there isn't, there should be"&gt;There is a new wave&lt;/dfn&gt; of surf and turf reporting in the business intelligence community.  
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;3. Surf and Turf Reporting&lt;/li&gt;
&lt;p&gt;Our final option is a blend of live, fluid data (surf) and extracted, fixed data (turf).  Some of the advantages of reporting in the same environment as your production &lt;acronym title="On-Line Transaction Processing"&gt;OLTP&lt;/acronym&gt; system and some of the advantages of reporting against data marts or extracted data are present with this option.  Additionally, any new data (or data that were not originally extracted) is automatically accessible for reporting in combination with data that has been extracted.  
&lt;/p&gt;
&lt;p&gt;
The surf and turf strategy lays an &lt;dfn title="Please don't groan, I still like this term"&gt;&lt;em&gt;agile&lt;/em&gt;&lt;/dfn&gt; foundation for analytical reporting.  I am a fan of this approach; however, the pros will also be able to determine the cons of various options once we dive into the descriptions.
&lt;/p&gt;
&lt;p&gt;
It should come as no shock to those of you who have been reading these mewsings that &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; and MultiValue database environments have different approaches to surf and turf.  Fortunately, each permits the user to perform steps that somewhat simulate the approach of the other.  Users often stick to the first approach they find that works, however.  So the first of the below descriptions is what &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; developers are more inclined to do and the second is a technique &lt;dfn title="MultiValue developers"&gt;Pickies&lt;/dfn&gt; often employ. 
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Materialized Views
&lt;p&gt;
An &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; &lt;em&gt;View&lt;/em&gt; can be defined as a stored &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; query.  A &lt;em&gt;materialized View&lt;/em&gt; is the persisted result set of the query.  For those cases where an &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; View can be used to specify the data to be extracted from a live system, running the appropriate commands to create a table based on this view results in a &lt;em&gt;materialized view&lt;/em&gt;.  Most Pick environments, such as &lt;acronym title="IBM UniData and UniVerse"&gt;U2&lt;/acronym&gt;, provide a means to accomplish a similar feat without use of &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt;.  The materialized view may then be used as any other base table in a schema. 
&lt;/p&gt;
&lt;p&gt;
Using this approach, a college or university can extract registered student data without including any accounts receivable information.  Then if someone decides to identify clusters of students who failed to pay in full at the time of registration over the past ten years, the extracted data could be joined to the live accounts receivable data, including historical transactions, to perform this analysis.
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;We don't need to measure with a micrometer before we cut with an axe.&lt;/blockquote&gt; 
&lt;p&gt;
Reporting is done in support of decision-making.  If a decision requires that only data frozen at a particular point in time be used as input, then that decision and related reporting must happen only after there has been a project to identify and extract the required data.  There is a cost to this which is sometimes justified.  We, as a profession, want to provide the very best data possible for decision-making, but we must also be business-savvy in our cost-benefit analyses.  
&lt;/p&gt;
&lt;p&gt;
Very often a small subset of the data required for a particular decision must come from data frozen at a specific point in time, while additional live data contributes a &lt;em&gt;sufficient&lt;/em&gt; approximation to make an informed decision.  There is a lot of data that rarely or never changes.  If it has changed since the time of the extract, it might even be a correction.  There is a time to cringe at the words &lt;em&gt;good enough&lt;/em&gt; and a time to recognize that quality includes fiscal stewardship.  We don't need to measure with a micrometer before we cut with an axe. 
&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;Savedlists
&lt;p&gt;
&lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt; developers and end-users alike often decouple the restriction process, narrowing down the input for a report to a fixed set of primary keys, for example, from the projection process which yields a set of attributes related to the identified keys.  In Pick, the restriction process is called the &lt;em&gt;selection&lt;/em&gt;.  While these two processes can be handled together in the query language, Pick doesn't work with result sets, in general, but with select lists &lt;acronym title="also known as"&gt;a.k.a.&lt;/acronym&gt; &lt;em&gt;savedlists&lt;/em&gt;, the result of restricting the data to a set of &lt;em&gt;records&lt;/em&gt;.  These savedlists typically store the keys to the entity (file) of interest.  For example, a school might keep a savedlist with the student IDs for all students registered for the fall of 2005 and another for the spring of 2006.
&lt;/p&gt;
&lt;p&gt;
These savedlists are, effectively, materialized views of primary key data.  It is similar to a snapshot in time of an index on a subset of rows.  After executing a SELECT statement, a SAVE-LIST or SAVE.LIST command will save the resulting keys under the name provided.  Then after retrieving a savedlist (GET.LIST or GET-LIST), the next command or query is executed only against that subset of keys.  While selection criteria is not always saved as it could otherwise be acted upon immediately, when it is stored in a savedlist we have a unique minimalist Pick twist on surf and turf reporting. 
&lt;/p&gt;
&lt;p&gt;
While materialized views are not always heavily used in &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; shops, SAVE-LIST and GET-LIST are bread and butter for &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt; shops.  It greatly improves performance if you already have a savedlist with which to start, much as an index does when reporting on live data.  Unlike an index, the savedlists are not updated once they have been built. They are minimalist data marts used to run a set of reports against a consistant set of IDs.  
&lt;/p&gt;
&lt;p&gt;
I have seen colleges use only savedlists for their census data extracts, saving these while running their census reports using option 1 from the &lt;a href="http://www.tincat-group.com/mewsings/2006/04/consuming-frozen-data.html"&gt;last mewsing&lt;/a&gt;.  If any other questions come up related to this census, they consider the live data, including historical records, adequate for the related decision-making.  I'll admit that I would advise against this approach for census data, as I think the benefits outweigh the costs for saving some of the important attribute values at the time of the census.  In this case a materialized view would likely be a better strategy.
&lt;/p&gt;
&lt;p&gt;Aside from census data, a registrar's office could create a fresh savedlist of registered students on a nightly basis, and other departments on campus could use that savedlist for their work too.  No one other than the registrar's office would then need to understand the criteria for determining who, precisely, is registered for this term.  One advantage to using savedlists rather than full data marts is that these savedlists can be fed to update processes as well as reporting processes.  Depending on the application software, a savedlist could be entered into a process that would set the graduation date for everyone stored in a savedlist, for example.  
&lt;/p&gt;
&lt;p&gt;
Another advantage of savedlists is that often end-users can decide what they want to put in a savedlist.  Of course there are different approaches to security in different shops, but in places where end-users have such access, these power users are empowered with respect to their data.  Without actually creating new tables or files (savedlists are handled differently) and without writing data extraction procedures, MV end-users are often using a surf and turf strategy with their data from &lt;dfn title="day ta"&gt;day to&lt;/dfn&gt; day.
&lt;/li&gt;
&lt;li&gt;Materialized Attributes
&lt;p&gt;
With a current industry focus on operational data stores which permit queries against current or near current data values, there is a return to thinking about how to do reporting against your production data banks without extracting data.  Although such redundancy was once considered poor form, storing data derived from other data could be just the ticket to improved reporting performance.  Aggregates and other derivations are costly when they must be performed on a large data set all for one report.  If they can be computed in the background whenever there is a change that would prompt such, the added cost of integrity for the redundant data might be more than balanced by the savings in reporting.  Performance against the live data can then be as good as from a data mart.
&lt;/p&gt;
&lt;p&gt;
For example, if we materialize the attribute CURRENT_GPA for each student, we can run reports that require this data without having to read through all courses for all terms for each student in order to compute the GPA on the fly.  A policy of no redundant data, held tightly in many shops, is one factor prompting huge redundancies in the form of data marts and warehouses.  Yes, redundancy does have a cost, and a few well-chosen materialized attributes could save you from the more excessive redundancies found with extracted data. 
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;Surf and turf is likely easier to pull off successfully if not constrained by SQL&lt;/blockquote&gt; 
&lt;p&gt;
It surely isn't a new technique, but there is a &lt;dfn title="at least I have a revived interest in it"&gt;revived interest&lt;/dfn&gt; in strategies used to materialize (compute and store) derived attributes.  A colleague pointed me to Sybase as an example of a product that has tools for materializing derived data.  I browsed the information on it without trying it out (lazy or busy, you can choose), and I'm impressed!  With other products developers can get the value of a user-defined function, virtual field, or stored procedure and store it using application code.  They might do this with triggers or CRUD services that adjust the materialized values when the underlying values change.  
&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt;
In case you are wondering where this is headed in relation to the overall topic of these mewsings, I'll give a hint, without yet providing any arguments.  In short, a surf and turf strategy can result in significant savings, but is likely easier to pull off successfully, with smiling end-users, if you are not constrained by a &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt;-based environment.  
&lt;/p&gt;
&lt;p&gt;
Getting back to today's topic, I have often been a bit queasy about the added costs vs. benefits for data marts and warehouses.  Each situation is different, but in many cases, this added cost can be reduced significantly by doing surf and turf reporting.
&lt;/p&gt;
&lt;a href="http://www.tincat-group.com/mewsings/2006/05/constraints-factored-in-and-out.html"&gt;Continue to next blog &amp;rarr;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/12028427-114609206296532978?l=www.tincat-group.com%2Fmewsings%2Findex.html'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/114609206296532978/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=12028427&amp;postID=114609206296532978&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/114609206296532978'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/114609206296532978'/><link rel='alternate' type='text/html' href='http://www.tincat-group.com/mewsings/2006/04/surf-and-turf-reporting.html' title='Surf and Turf Reporting'/><author><name>--dawn</name><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09089606608203203470'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-12028427.post-114554097433996903</id><published>2006-04-20T08:49:00.000-05:00</published><updated>2006-11-12T17:58:02.899-06:00</updated><title type='text'>Consuming Frozen Data</title><content type='html'>&lt;div class="insetpic"&gt;
&lt;img src="http://www.tincat-group.com/images/frozen-data.gif" alt="Frozen data beverage"  title="Frozen Data Beverage" /&gt;
&lt;/div&gt;
&lt;p&gt;Organizations often freeze data for reporting purposes.  By the way, before it was called &lt;em&gt;business intelligence&lt;/em&gt; (BI), we called all of it reporting, which I am still inclined to do.    
&lt;/p&gt;
&lt;p&gt;
Higher education often freezes data on designated &lt;em&gt;census&lt;/em&gt; dates to get an account of the students who are registered for a particular term.  Government, registrar, and board reports are run against &lt;dfn title="Yes, it should be 'these data'"&gt;this data&lt;/dfn&gt;, with longitudinal analysis showing changes from term to term and year to year.  Cross-tabs of faculty broken down by sex (a favorite example until &lt;em&gt;gender&lt;/em&gt; became the preferred term) and reports of students sliced and diced every which way are run against the census data.
&lt;/p&gt;
&lt;p&gt;
I have observed many approaches to such reporting and will present three distinct methods, two in this entry and one in the next, that are currently employed by organizations.  Many colleges and universities employ more than one of these approaches for their census reporting.
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;1. Keep Out Reporting&lt;/li&gt;
&lt;p&gt;
This is the basic old fashioned approach of ceasing all maintenance of registration data, rippling through the campus so a considerable amount of data entry is halted for the time it takes to run reports.  Reports are often run at night to minimize disruption.
&lt;/p&gt;
&lt;p&gt;
Sites recognize that they sometimes need to re-run the reports, perhaps after repairing some data that has not been properly cleansed for these reports. As a slight digression, I'll mention that the preparation of reports often includes specifying or coding the report in connection with cleansing the data.  Derived data prepared specifically for a particular report might be used to virtually cleanse or tailor the data, for example.  Data and code dance together, as is the case with software in general.  
&lt;/p&gt;
&lt;p&gt;
One advantage to freezing the data in this way for the purpose of running reports is that these exact same reports can be run at any time against the live data.  This is helpful when preparing for the census.  Prior to freezing the data, the reports can be run to ensure that data values, derived data, and reports are exactly what is needed for this census.
&lt;/p&gt;
&lt;p&gt;
While these reports will end up on web pages, as &lt;acronym title="Adobe Portable Document Format"&gt;PDF&lt;/acronym&gt; files, or even &lt;dfn title="Computer printer paper with green and white horizontal stripes."&gt;greenbar&lt;/dfn&gt;, sometimes output from the reports is also captured as data.  This data might be &lt;dfn title="I have even seen it retyped into Excel!"&gt;downloaded to Excel&lt;/dfn&gt;, for example, for future use.  That brings us to the next topic.
&lt;/p&gt;
&lt;li&gt;2. Data marts&lt;/li&gt;
&lt;p&gt;Before we had &lt;em&gt;data marts&lt;/em&gt; we had &lt;em&gt;extracts&lt;/em&gt;.  I'll make some not-altogether-standard distinctions between these two terms.  A data mart might be hosted by a &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt; other than the one used by the &lt;acronym title="On-Line Transaction Processing"&gt;OLTP&lt;/acronym&gt; system, where extracts were more often hosted by the same &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt;, by the file system of the host &lt;acronym title="Operating System"&gt;OS&lt;/acronym&gt;, or downloaded to client software, such as Excel.  Some systems employing MS SQL Server have &lt;em&gt;data marts&lt;/em&gt; hosted by Oracle and vice versa.  These would not typically be called &lt;em&gt;extracts&lt;/em&gt;.
&lt;/p&gt;
&lt;p&gt;
Another possible distinction is the transformation of the data, sometimes denormalizing or shaping into fact-dimension cubes for the data mart.  With an extract, there might be a simple transformation to drop attributes that are unnecessary for reporting purposes.  When the data is not reshaped at all, the extract is referred to as a &lt;em&gt;snapshot&lt;/em&gt;.  Often for data marts, the &lt;a href="http://www.tincat-group.com/mewsings/2006/01/naked-model.html"&gt;logical data model&lt;/a&gt; is changed considerably.  
&lt;/p&gt;
&lt;p&gt;
Additionally, an extract comes from a single data source, where a data mart could be populated from multiple sources.  There is sometimes a very complex &lt;acronym title="Extract, Transform, and Load"&gt;ETL&lt;/acronym&gt; process for extracting, transforming, and loading data for a data mart, where you are more likely to talk about running the extract.
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;As a rule of thumb, the closer a data mart resembles an extract, the less expensive it is.&lt;/blockquote&gt; 
&lt;p&gt;
The biggest distinction just might be whether or not an organization sinks non-personnel dollars into data mart products.  If you buy tools, you have invested in data marts.  If not, you must have extracts.  
&lt;/p&gt;
&lt;p&gt;
Now, I could mention data warehouses as a separate bullet point because they are decidedly different.  But I have rarely recommended anything I call a warehouse due to the required expansive scope of such a project.  I would rather plan for a series of data mart projects that could, over time, be perceived as a data warehouse.  But, yes, there are some warehouses out there in higher ed too.  In any of these cases, the verb is &lt;em&gt;extract&lt;/em&gt; whether the target is an &lt;em&gt;extract&lt;/em&gt;, &lt;em&gt;data mart&lt;/em&gt;, or &lt;em&gt;data warehouse&lt;/em&gt;.
&lt;/p&gt;
&lt;p&gt;
Benefits of extracting data include the ability to run reports against the same set of census data from now on, typically without an adverse effect on the transaction system.  Performance is one of the big reasons people extract data.  The use of different reporting tools is another.  Users of reporting tools such as SAS often like to have the data in a SAS data set, for example.  
&lt;/p&gt;
&lt;p&gt;
Data marts are good for longitudinal analysis.  If you do not freeze your census data, you are unlikely to report the exact same figures in the future, making it hard to compare this year to last.  Additionally, you can model the data to optimize for reporting and aggregating data--cubes, for example.
&lt;/p&gt;
&lt;p&gt;
Data mart projects can range from reasonably priced to grossly expensive.  Techniques  to help minimize the cost include hosting the data mart using the same &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt; as the transaction system and even retaining the structure of the data for the data mart.  This can permit the same reporting tools and even the same reports to be run against either live or frozen data.  As a rule of thumb, the closer a data mart resembles an extract, the less expensive it is.
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;If systems using a non-relational data model were a standard data source, we could have bigger bang for the buck reporting solutions.&lt;/blockquote&gt; 
&lt;p&gt;
Here's the rub.  This is problematic with both &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; and not-highly-relational, such as MultiValue, databases.  It is rarely advisable for a &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt;-&lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt; system to retain the data model of the transaction system for the purpose of analytical reporting because the relational model of data is not conducive to high performance, easy-to-specify analytical reporting.  With &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt; databases, the data are typically structured to be able to use the same shape for both &lt;acronym title="On-Line Transaction Processing"&gt;OLTP&lt;/acronym&gt; and &lt;acronym title="On-Line Analytical Processing"&gt;OLAP&lt;/acronym&gt;, but you cannot specify such databases as a data source from the full range of reporting tools.  In this case, data marts are often reshaped and rehosted in order to make the data more easily accessible from tools that work exclusively with &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; data sources.  
&lt;/p&gt;
&lt;p&gt;
Can you sense my frustration?  As an industry, we have adopted standards that trap us into all of the high-wire acts and costs we are sinking into reporting.  Getting reports out of our systems was once the easy part!  From a database perspective, it is read-only, for goodness sake.  One thing that changed between then and now is the proliferation of &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; and the relational model.  If systems using a non-relational data model were a standard data source, we could have bigger bang for the buck reporting solutions.  We could run reports against extracts hosted in the same shape and &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt; as the source data.  I have seen the benefits for organizations using &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt; databases combined with one of the handful of &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt;-specific reporting solutions.  
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;Enter: &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt;.&lt;/strong&gt;  The &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt; data model is, effectively, a subset of the &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt; data model.  Any query language that reports against multiple &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt; documents could, in theory, be pointed at any &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt; database, provided the database vendor accomodated such.  But I gotta say, &lt;dfn title="The emerging standard for XML queries"&gt;XQuery&lt;/dfn&gt; is &lt;dfn title="Monte, our Bischon, was a beautiful dog"&gt;dog-ugly&lt;/dfn&gt; compared to the language formerly known as &lt;acronym title="Generalized Information Retrieval Language &amp;amp; System"&gt;&lt;a href="http://www.tincat-group.com/mewsings/2006/02/list-of-girls.html"&gt;GIRLS&lt;/a&gt;&lt;/acronym&gt;.  It does work with multivalued data and employs a two-valued logic, however, so I'm encouraged by that.  If an &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt; database provider with adequate resources were on top of this situation, they could help standardize a simplified, but proven, subset of &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt; for persistence, with both &lt;acronym title="On-Line Transaction Processing"&gt;OLTP&lt;/acronym&gt; and &lt;acronym title="On-Line Analytical Processing"&gt;OLAP&lt;/acronym&gt; capabilities.
&lt;/p&gt;
&lt;/ul&gt;
&lt;p&gt;
We have now addressed the approaches of reporting against live data and against data that has been extracted for reporting.  I'm hoping that you are thinking that these two approaches cover the mix, because they don't.  We will look at a third approach to consuming frozen data in the next mewsing.
&lt;/p&gt;
&lt;a href="http://www.tincat-group.com/mewsings/2006/04/surf-and-turf-reporting.html"&gt;Continue to next blog &amp;rarr;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/12028427-114554097433996903?l=www.tincat-group.com%2Fmewsings%2Findex.html'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/114554097433996903/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=12028427&amp;postID=114554097433996903&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/114554097433996903'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/114554097433996903'/><link rel='alternate' type='text/html' href='http://www.tincat-group.com/mewsings/2006/04/consuming-frozen-data.html' title='Consuming Frozen Data'/><author><name>--dawn</name><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09089606608203203470'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-12028427.post-114480995303734491</id><published>2006-04-11T21:45:00.000-05:00</published><updated>2006-11-12T17:58:02.497-06:00</updated><title type='text'>Pascal Loses Wager</title><content type='html'>&lt;p&gt;
I'm on the road right now and stopped in at Sylvia's for the night.  Sylvia decided to google me this afternoon instead of cleaning the guest bathroom.  She was laughing when I arrived after having read how I was &lt;em&gt;infamous&lt;/em&gt;.  She wondered how I had offended this Pascal character after taking at look at &lt;a href="http://www.dbdebunk.com/page/page/2883305.htm"&gt;this&lt;/a&gt; and &lt;a href="http://www.dbdebunk.com/page/page/3081548.htm"&gt;this&lt;/a&gt;. In case these links change on the dbdebunk site, I'll just note that they basically claim that I am stupid, ignorant, and vociferous.
&lt;/p&gt;
&lt;p&gt;
The following is a &lt;a href="http://en.wikipedia.org/wiki/Rebus"&gt;rebus&lt;/a&gt; response to Mr. Pascal's ad hominem attacks.  Hover over pictures to get the word if you don't want to work the puzzle out for yourself.  I hope you enjoy, but if you find it sophomoric, well, you should meet Sylvia.
&lt;/p&gt;
&lt;img src="http://www.tincat-group.com/images/fpletter/deer.jpg" alt="Deer"  title="Deer" /&gt;
&amp;nbsp; &amp;nbsp;
&lt;img src="http://www.tincat-group.com/images/fpletter/fabian.jpg" alt="Fabian"  title="Fabian" /&gt;
&lt;img src="http://www.tincat-group.com/images/fpletter/pascal.jpg" alt="Pascal"  title="Pascal" /&gt;
&amp;nbsp; &lt;big&gt;,&lt;/big&gt;
&lt;p /&gt;
&lt;p /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;
&lt;img src="http://www.tincat-group.com/images/fpletter/ewe.jpg" alt="Ewe"  title="Ewe" /&gt;
&amp;nbsp;
&lt;img src="http://www.tincat-group.com/images/fpletter/argh.jpg" alt="Argh"  title="Argh" /&gt;
&amp;nbsp; an &amp;nbsp;
&lt;img src="http://www.tincat-group.com/images/fpletter/ass.jpg" alt="Ass"  title="Ass" /&gt; &amp;nbsp; &lt;big&gt;.&lt;/big&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;img src="http://www.tincat-group.com/images/fpletter/oui.jpg" alt="Oui"  title="Oui" /&gt;
&amp;nbsp;
&lt;img src="http://www.tincat-group.com/images/fpletter/awl.jpg" alt="Awl"  title="Awl" /&gt;
&amp;nbsp;
&lt;img src="http://www.tincat-group.com/images/fpletter/love.jpg" alt="Love"  title="Love" /&gt;
&amp;nbsp;
&lt;img src="http://www.tincat-group.com/images/fpletter/cod.jpg" alt="Cod"  title="Cod" /&gt; &lt;big&gt;.&lt;/big&gt;
&lt;p /&gt;
&lt;p /&gt;
&lt;img src="http://www.tincat-group.com/images/fpletter/time.jpg" alt="Time"  title="Time" /&gt;
&amp;nbsp;
&lt;img src="http://www.tincat-group.com/images/fpletter/2.jpg" alt="2"  title="2" /&gt;
&amp;nbsp;
&lt;img src="http://www.tincat-group.com/images/fpletter/move.jpg" alt="Move"  title="Move" /&gt;
&amp;nbsp; &lt;big&gt;a&lt;/big&gt; &amp;nbsp;
&lt;img src="http://www.tincat-group.com/images/fpletter/head.jpg" alt="Head"  title="Head" /&gt;
&lt;p /&gt;
&lt;p /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;
&lt;img src="http://www.tincat-group.com/images/fpletter/love.jpg" alt="Love"  title="Love" /&gt; &amp;nbsp; &lt;big&gt;,&lt;/big&gt;
&lt;p /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;
&lt;img src="http://www.tincat-group.com/images/fpletter/dawn.jpg" alt="Dawn"  title="Dawn" /&gt;
&lt;p&gt;
Note: I wrote this a few weeks ago.  I requested input from a few folks regarding this blog.  Roughly half thought it was great and the other half thought it was a bad idea.  I agree with others who have been targeted by Pascal that he and his approach to others in the industry should not be left unaddressed.  I've decided to take the risk and post this because it fits my style to respond, but not respond in kind.  After all, I might be ignorant of many things, but I'm not stupid.
&lt;/p&gt;
&lt;p id="seecomments"&gt;&lt;a href="http://www.tincat-group.com/mewsings/2006/04/pascal-loses-wager.html#seecomments"&gt;See comments.&lt;/a&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/12028427-114480995303734491?l=www.tincat-group.com%2Fmewsings%2Findex.html'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/114480995303734491/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=12028427&amp;postID=114480995303734491&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/114480995303734491'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/114480995303734491'/><link rel='alternate' type='text/html' href='http://www.tincat-group.com/mewsings/2006/04/pascal-loses-wager.html' title='Pascal Loses Wager'/><author><name>--dawn</name><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09089606608203203470'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-12028427.post-114364558109360198</id><published>2006-03-29T09:19:00.000-06:00</published><updated>2006-11-12T17:58:02.086-06:00</updated><title type='text'>Better to Have No Values</title><content type='html'>&lt;div class="insetpic"&gt;
&lt;img src="http://www.tincat-group.com/images/no-value.gif" alt="Mailbox, empty?"  title="Mailbox, empty?" /&gt;
&lt;/div&gt;
&lt;p&gt;The topic at hand is NULLS.  Some professionals think the best option for recording missing data is to use a NULL to mean &lt;em&gt;no value&lt;/em&gt;, an approach implemented by many database management systems.  &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; then recognizes missing data when it reads a marker referred to as a NULL.  Working with missing data requires both of these components: how one specifies the missing information to the database and how the database languages work with the missing information.
&lt;/p&gt;
&lt;p&gt;
With the &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; approach where NULL does not refer to a value, but to the lack of a value, one attribute tagged as NULL is not equal to another attribute tagged as NULL.  When you compare two values, x and y, if either of these is NULL, then your comparison is neither true nor false, but a third logical value of &lt;em&gt;I dunno&lt;/em&gt; or UNKNOWN.
&lt;/p&gt;
&lt;p&gt;
In other words, &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; employs a three-valued logic (3VL).  There is some good reasoning behind this.  Before we turn &lt;a href="http://www.tincat-group.com/mewsings/2006/03/with-data-modeling-whats-your-bag.html"&gt;again to bowling scores&lt;/a&gt;, I'll use a very simple example.  If my natural hair color is recorded as missing information and Sylvia's natural hair color is also missing, then surely we do not have enough information to say either that my natural hair color matches Sylvia's, nor that it doesn't.  It's a mystery.  So, if we evaluate the expression &lt;em&gt;Sylvia has the same natural hair color as Dawn&lt;/em&gt; we don't know if it is &lt;em&gt;true&lt;/em&gt; or &lt;em&gt;false&lt;/em&gt;.  We need a third logical value, sometimes represented as &lt;em&gt;unk&lt;/em&gt; or &lt;em&gt;UNK&lt;/em&gt; for &lt;em&gt;unknown&lt;/em&gt;.
&lt;/p&gt;
&lt;p&gt;
A lot of software is built with this strategy for missing values, but there are other options.  The Pick, aka &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt;, query language employs a two-valued logic (2VL), as do most programming languages.  Much has been written about NULLs and n-valued logics, particularly within the context of the relational data model.  The &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; does not require the 3VL of &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt;, as author Chris Date has made clear.  I will not revisit those discussions here, contributing only this glimpse into Pick's approach.
&lt;/p&gt;
&lt;p&gt;
Step one in understanding this &lt;acronym title="Two-valued Logic"&gt;2VL&lt;/acronym&gt; is to suspend the idea that NULL is the lack of a value or an indicator for &lt;em&gt;no value&lt;/em&gt;.  In &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt;, a NULL is a value.  There are exceptions as various &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt; products address their interface to &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; differently, but I will simplify and focus on what is common to all such products.  Think of a NULL as indicating that there are &lt;em&gt;no values&lt;/em&gt; for a particular variable.  One way to provide a mathematical model for this approach is to perceive the NULL as an empty set, aka null set.
&lt;/p&gt;
&lt;div class="inlinepic"&gt;
&lt;img src="http://www.tincat-group.com/images/null-set.gif" alt="NULL=the empty set"  title="Null=the empty set" /&gt;
&lt;/div&gt;
&lt;p&gt;
This empty set is implemented in &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt; as an empty list.  MultiValue solutions permit the value of a variable or the intersection of a row and column (loosly speaking), to be a list of values as indicated in &lt;a href="http://www.tincat-group.com/mewsings/2006/03/with-data-modeling-whats-your-bag.html"&gt;my last mewsing&lt;/a&gt;.  The list could have 0, 1, or Many elements.  If the cardinality is 0, the value is NULL.  If the cardinality is 1, the attribute has a single value.  If the cardinality is greater than 1, it is more obviously a list.  But no matter what the cardinality, the value of an attribute can be modeled as a list, whether empty, single-valued, or multi-valued.
&lt;/p&gt;
&lt;div class="insetpic"&gt;
&lt;img src="http://www.tincat-group.com/images/bowling-series.gif" alt="Bowling Series Scores"  title="Bowling Series Scores" /&gt;
&lt;/div&gt;
&lt;p&gt;
If there are no values in the list, set, or bag, the value of the &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt; attribute for a particular record would be NULL.  If a NULL is encountered by a user when looking at the data, the meaning of that &lt;em&gt;value&lt;/em&gt; is &lt;em&gt;no values&lt;/em&gt;.  So, if Dani has a record of data and the value of her variable &lt;em&gt;scores&lt;/em&gt; is NULL, then the corresponding proposition would be: &lt;strong&gt;Dani has no values for her score&lt;/strong&gt; or &lt;strong&gt;Dani has no bowling scores&lt;/strong&gt;.  It would not be: &lt;strong&gt;Dani's score has no value&lt;/strong&gt;.  Got it?
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;Just true or false, that's it, two values and the obvious ones at that.&lt;/blockquote&gt; 
&lt;p&gt;
Because each attribute in Pick does have a value, even if that value is &lt;em&gt;no values&lt;/em&gt;, logical expressions are easily evaluated.  Just &lt;em&gt;true&lt;/em&gt; or &lt;em&gt;false&lt;/em&gt;, that's it, two values and the obvious ones at that.  If there is ever a test for equality, a list of &lt;em&gt;no values&lt;/em&gt; is equal to another list of &lt;em&gt;no values&lt;/em&gt;.  For less than or greater than comparisons, a NULL is less than everything other than another NULL no matter what type of data is compared.
&lt;/p&gt;
&lt;p&gt;
Some might be resistant to this broad use of a NULL value as it gives us no insight into why there are no values for an attribute.  If there is a reason to collect information on why an attribute has no values, additional attributes could be included in a logical data model.  
&lt;/p&gt;
&lt;p&gt;
In our scenario, we don't know if Dani bowled the games and the scores have not yet been recorded, if Dani bowled and the score was recorded incorrectly, if Dani has not yet bowled, if Dani was sick with bird flu and missed league bowling this week, or if Dani doesn't ever bowl.  In the first two cases, our data as recorded is incorrect.  So we could interpret the NULL value for &lt;em&gt;scores&lt;/em&gt; as &lt;strong&gt;Either Dani has no bowling scores or our data is incorrect&lt;/strong&gt; but that is similar to the interpretation of any value.  We could say that either this bowler's name is Dani or our data is incorrect.  So that is not particularly helpful.  The interpretation of this NULL value, then, is that &lt;em&gt;Dani has no bowling scores&lt;/em&gt;.  That's it.  If there is a NULL value for an attribute, it means there are no values for that attribute.
&lt;/p&gt;
&lt;p&gt;
Our resulting two-valued logic comparisons are quite simple, useful, and meaningful.  If we want to list all people with their series total, ordered lowest to highest, we would get those people who have NULL values listed first.
&lt;/p&gt;
&lt;p&gt;
LIST BOWLERS SCORE BY SCORE
&lt;/p&gt;
&lt;p&gt;
If we want only those with values, we could request only those &lt;em&gt;WITH SCORE&lt;/em&gt;:
&lt;/p&gt;
&lt;p&gt;
LIST BOWLERS WITH SCORE SCORE BY SCORE
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;Instead of having no value, we have a value that is a list with no values.&lt;/blockquote&gt;
&lt;p&gt;
The first &lt;em&gt;SCORE&lt;/em&gt; is testing for any values for the variable &lt;em&gt;SCORE&lt;/em&gt;, the second shows the value (the ID for the bowler is listed automatically in the output), and the last &lt;em&gt;SCORE&lt;/em&gt; is for the ordering.  Using this approach, Dani and Shirl have the same value for their second game score.  I'm certain this rubs some &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; users wrong.  Instead of Dani and Shirl having &lt;em&gt;no score&lt;/em&gt;, which would mean the question is meaningless, Dani and Shirl have &lt;em&gt;no scores&lt;/em&gt; or an empty list of scores for the second and third games in the series.  
&lt;/p&gt; 
&lt;p&gt;
This logic can be applied to attributes that have list values as well as those considered to be single-valued since these can also be modeled as lists, even if short ones.  You can likely see how there could still be some misunderstandings in interpreting NULLs of this ilk, but it really is an amazingly simple and useful approach to what is modeled as the lack of values in &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt;.  Instead of having &lt;em&gt;no value&lt;/em&gt;, we have a value that is a list with &lt;em&gt;no values&lt;/em&gt;.
&lt;/p&gt;
&lt;a href="http://www.tincat-group.com/mewsings/2006/04/consuming-frozen-data.html"&gt;Continue to next blog &amp;rarr;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/12028427-114364558109360198?l=www.tincat-group.com%2Fmewsings%2Findex.html'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/114364558109360198/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=12028427&amp;postID=114364558109360198&amp;isPopup=true' title='18 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/114364558109360198'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/114364558109360198'/><link rel='alternate' type='text/html' href='http://www.tincat-group.com/mewsings/2006/03/better-to-have-no-values.html' title='Better to Have No Values'/><author><name>--dawn</name><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09089606608203203470'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>18</thr:total></entry><entry><id>tag:blogger.com,1999:blog-12028427.post-114334076637998617</id><published>2006-03-25T20:38:00.000-06:00</published><updated>2006-11-12T17:58:01.813-06:00</updated><title type='text'>With Data Modeling, What's Your Bag?</title><content type='html'>&lt;div class="insetpic"&gt;
&lt;img src="http://www.tincat-group.com/images/bowler.gif" alt="Bowler"  title="Bowler" /&gt;
&lt;/div&gt;
&lt;p&gt;I've been avoiding some definitions for a while now, but when writing about NULLs and two-valued logic, I digressed into explanations of a few terms.  Since I'm &lt;dfn title="love that fat chalk"&gt;writing on the road&lt;/dfn&gt; for a few weeks, I thought I'd split out some terminology into this blog and follow with the NULLs blog next.  I'll toss in an ending on this one to tie in the title, but otherwise this mewsing is a glossary.
&lt;/p&gt;
&lt;div class="inlinepic"&gt;
&lt;img src="http://www.tincat-group.com/images/bowling-series.gif" alt="Bowling Series Scorecard"  title="Bowling Series Scorecard" /&gt;
&lt;/div&gt;
&lt;p&gt;
What I know about bowling could fit in a blog entry, but I just adore the scoring system, so I'll use that for the examples, both for these descriptions and for the NULL handling to follow. 
&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;list&lt;/dt&gt; 
&lt;dd&gt;
The scores in the above bowling series that correspond to the three games bowled by one person; such as Chris's 120, 183, 144; compose a list.  Each such list of game scores could be the value of a variable named &lt;em&gt;scores&lt;/em&gt;.  This conceptual list could be implemented with a computer language using an array like &lt;em&gt;score[3]&lt;/em&gt;, for example.  
&lt;/p&gt;
&lt;p&gt;
The length of a list is the number of elements in it.  Computer languages vary as to what list implementations permit variable lengths.  Other variations in the implementation of lists include whether list members can be accessed directly with an index or require a sequential read.  Providing different features for working with a list could prompt one to work with it as a queue or a stack, for example.  Lists are a wildly popular structure in any general purpose programming language.  They are missing from any implementation of the Relational Model (RM), however.   
&lt;/dd&gt;
&lt;dt&gt;set&lt;/dt&gt;
&lt;dd&gt;
If we remove the ordering from our lists, we get a &lt;em&gt;set&lt;/em&gt; of bowling scores for Pat and Chris that includes all three scores.  However, the set of scores for Beth is {150, 160} because each element of a set must be distinct from the others.  The set of Pat's scores could be written as {110, 85, 130} or as {110, 130, 85} because the set has no implied ordering.  In the case of a bowling series, it would be a bad idea to treat these scores as a set in this way, given that we would then lose the information about how many games had a particular score, as illustrated in the case of Beth.  A list can be transformed into a set, however, by adding an ordinal attribute to identify the first and nth value.  We could, therefore, talk about Chris's set of bowling scores in this series as {(1, 120), (2, 183), (3, 144)).
&lt;p /&gt;  
A more obvious set might be our bowlers = {Chris, Pat, Dani, Shirl, Beth}.  Of course, when specifying our set, we would include a unique identifier for the elements, often referred to as a key, so that our set might look like bowlers = {(11235, Chris), (628628, Pat), (11111, Dani), (223344, Shirl), (98765, Beth)}.  We saw above that we can turn a list into a set.  We can turn our set into a list by placing it in some order, such as an alphabethic order of Beth, Chris, Dani, Pat, Shirl.
&lt;/dd&gt;
&lt;div class="insetpic"&gt;
&lt;img src="http://www.tincat-group.com/images/bowling-bag.gif" alt="Bowling Bag"  title="Bowling Bag" /&gt;
&lt;/div&gt;
&lt;dt&gt;bag&lt;/dt&gt;
&lt;dd&gt;
Also known as a &lt;em&gt;multiset&lt;/em&gt;, a bag is like a set in that it has no ordering and like a list in that it includes duplicate values.  The bowling bag for Beth is [160, 150, 150] which is equal to [150, 160, 150] and [150, 150, 160].  It is as if you tossed the values into a bag, pulling them out in any order.  We could turn this bag into a set by adding a quantity to each value such as Dani having a set {(150, 2), (160, 1)}.  We could turn it into a list by adding order to it, such as numerical order with Dani's list being 150, 150, 160.  
&lt;p /&gt;
We talk about &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; result &lt;em&gt;sets&lt;/em&gt;.  However, because a result set may have duplicate rows, these would more accurately be termed result &lt;em&gt;bags&lt;/em&gt; (or multisets, but that term is not as fun so I use it only like I use &lt;em&gt;table tennis&lt;/em&gt; instead of &lt;em&gt;ping pong&lt;/em&gt;).
&lt;/dd&gt;
&lt;dt&gt;relation&lt;/dt&gt;
&lt;dd&gt;
Given my approach to the &lt;em&gt;relational theory&lt;/em&gt; of data, I prefer to provide a clean, clear, crisp definition of a relation from mathematics, leaving it to others to embellish it as befits their application of this mathematical concept.
&lt;p /&gt;
A relation is a subset of the set of ordered tuples (A1, A2, ... Am) formed by the Cartesian cross-product of sets S1 x ... x Sm where each An is an element of Sn.
&lt;p /&gt;
Note that a relation is a set.  Bags are not sets.  Therefore, &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; &lt;em&gt;result sets&lt;/em&gt; are not relations.
&lt;p /&gt;
The prior example of a set with couples consisting of bowler id and bowler first name is a relation.
&lt;/dd&gt;
&lt;dt&gt;domain&lt;/dt&gt;
&lt;dd&gt;
Given a relation R, a domain is a set Sn such that for each tuple (A1, A2, ...An, ...Am) in R, An is an element of Sn.
&lt;p /&gt;
The example relation has a domain of bowler ids and another domain of bowler first names.  
&lt;/dd&gt;
&lt;dt&gt;function&lt;/dt&gt;
&lt;dd&gt;
A binary mathematical relation with at most one b for each a in (a,b).  Note that either or both a or b could be a relation, for example. 
&lt;/dd&gt;
&lt;dt&gt;type&lt;/dt&gt;
&lt;dd&gt;
A type is a domain, so it is a set, plus related functions.  Some folks define domain and type to be identical, typically tossing operations into the definition of each.  You might note that this is either similar to or the same as the term &lt;em&gt;class&lt;/em&gt;, depending on your definition of that term.
&lt;/dd&gt;
&lt;dt&gt;scalar&lt;/dt&gt;
&lt;dd&gt;
As with many of the terms above, this term may be applied to either a variable or a value.  A scalar variable can hold only one value at a time.  A scalar value is a single value.  It is the opposite of &lt;em&gt;composite&lt;/em&gt;.  List, set, bag, relation, and function are examples of composite types.  Common scalar types defined in computer languages are char, int, float, double, and boolean.
&lt;/dd&gt;
&lt;/dl&gt;
&lt;div class="inlinepic"&gt;
&lt;img src="http://www.tincat-group.com/images/scorecard.gif" alt="Bowling Game Scorecard"  title="Bowling Game Scorecard" /&gt;
&lt;/div&gt;
&lt;p&gt;
Take a look at this scorecard for a single game of bowling.  There are many ways to pour the scalar values into composite types when modeling these data for use in a software application.  If you were not tied to everything being a relation, what would be your first take on how to model frames in a game of bowling?  That strikes me as a list.  Like one of those &lt;em&gt;how many triangles do you see&lt;/em&gt; puzzles, how many bags can you find?  Sets?  Lists? 
&lt;/p&gt;
&lt;p&gt;
To set us up for the NULLs discussion, I'll provide a little information on the MultiValue (MV) model.  Conceptual sets are implemented in &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt; as either sets or lists.  If modeling a strong entity, they are typically modeled as sets.  When modeling a property of a strong entity, they are often modeled as lists.  Bags and lists are always modeled as lists unless they are first changed into sets by adding attributes.  Sets are top level data structures, while lists are always child structures. 
&lt;/p&gt;
&lt;p&gt;
The translation from the conceptual data model to the MV logical data model (see &lt;a href="http://www.tincat-group.com/mewsings/2006/01/naked-model.html"&gt;this mewsing&lt;/a&gt; if confused by these terms) includes modeling a conceptual set as a list.  If an implemented list is a set conceptually, the developer must provide the logic to test if a value is already present before adding it.  If it is a semantic bag or set, the developer needs to ignore the ordering.  This is a bit more difficult because the query language only checks equality of lists, not bags or sets.  A test for equality of [150, 160, 150] to [150, 150, 160] would evaluate to false.  A developer might choose some order for this bag, turning it into a list, to avoid this complexity.
&lt;/p&gt;
&lt;p&gt;
While the full collection of information about whether a value is a conceptual list, set, or bag is not available for implemented lists in an &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt; system, you can use the MV array (list) data type to implement all three concepts.  If you try your hand at modeling data this way, you might find that it is often conceptually simpler than modeling exclusively with sets.  I, for one, like modeling with data sets that have nested lists.  At the risk that you might be too young to be hip to this slang, I'll ask: when it comes to modeling data, &lt;em&gt;what's your bag?&lt;/em&gt;
&lt;/p&gt;
&lt;a href="http://www.tincat-group.com/mewsings/2006/03/better-to-have-no-values.html"&gt;Continue to next blog &amp;rarr;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/12028427-114334076637998617?l=www.tincat-group.com%2Fmewsings%2Findex.html'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/114334076637998617/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=12028427&amp;postID=114334076637998617&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/114334076637998617'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/114334076637998617'/><link rel='alternate' type='text/html' href='http://www.tincat-group.com/mewsings/2006/03/with-data-modeling-whats-your-bag.html' title='With Data Modeling, What&apos;s Your Bag?'/><author><name>--dawn</name><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09089606608203203470'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-12028427.post-114222306601427264</id><published>2006-03-12T22:10:00.000-06:00</published><updated>2006-11-12T17:58:01.533-06:00</updated><title type='text'>In Every Job That Must be Done There is a Data Element of Fun</title><content type='html'>&lt;div class="insetpic"&gt;
&lt;img src="http://www.tincat-group.com/images/carpet-bag.gif" alt="Data Carpet Bag"  title="Data Carpet Bag" /&gt;
&lt;/div&gt;
&lt;blockquote cite="http://www.moron.nl/lyrics.php?id=47539&amp;artist=Disney"&gt;In every job that must be done there is an element of fun.  You find the fun and snap, the job's a game.  -Mary Poppins
&lt;/blockquote&gt;
&lt;p&gt;
While &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; has many advantages over the languages stemming from &lt;a href="http://www.tincat-group.com/mewsings/testing/2006/02/list-of-girls.html"&gt;GIRLS&lt;/a&gt;,    the reverse is also true as the latter is quite charming.  I will &lt;a href="http://www.tincat-group.com/mewsings/2006/03/data-for-every-1.html"&gt;continue&lt;/a&gt; my exploration of some of the advantages of the &lt;acronym title="Generalized Information Retrieval Language &amp;amp; System"&gt;GIRLS&lt;/acronym&gt; family of query languages.  I am &lt;dfn title="What good would it do if I did?"&gt;not proposing&lt;/dfn&gt; that the entire industry adopt &lt;acronym title="Generalized Information Retrieval Language &amp;amp; System"&gt;GIRLS&lt;/acronym&gt;, but there are things we can learn from this language and related data model.  
&lt;/p&gt;
&lt;p&gt;
There are fewer files in an &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt;-based application than tables in a similar &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt;-&lt;acronym title="Data Base Management System"&gt;DBMS&lt;/acronym&gt; solution.  This makes sense given that &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt; solutions are not in &lt;acronym title="First Normal Form"&gt;1NF&lt;/acronym&gt;, so properties with multiple values need not be split out into separate tables.  With more nouns modeled as properties of entities, more files than tables are related to primary entities of people, places, things, or events.  There are fewer nouns modeled as &lt;a href="http://en.wikipedia.org/wiki/Weak_entity"&gt;weak entities&lt;/a&gt; in an &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt; solution.  
&lt;/p&gt;
&lt;p&gt;
Users of the &lt;acronym title="Generalized Information Retrieval Language &amp;amp; System"&gt;GIRLS&lt;/acronym&gt;-related query language ask questions about the data by way of exactly one file at a time, somewhat analogous to using a single &lt;acronym title="Structure Query Language"&gt;SQL&lt;/acronym&gt; view in a query.  That doesn't mean they can only report against base (stored) data identified in that &lt;dfn title="analogous to a table schema"&gt;file dictionary&lt;/dfn&gt;, but that they are viewing the data in the context of a single entity at a time.  Just as a secretary in the 1950's picked a file cabinet in which to find information, the user of the &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt; query language chooses a single file for each question they ask.
&lt;/p&gt;
&lt;p&gt;
Each &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt; file, or entity, provides one lens through which a user looks at the world of data.  This might seem very restrictive, but just like Mary Poppins hauling a large coat rack out of her small carpet bag, data of all sorts can come from each such file &lt;em&gt;view&lt;/em&gt; of the data.  Through the eyes of a file dictionary, a user can query data stored under that name, or data derived from any of the data in the system, including aggregated data.  
&lt;/p&gt;
&lt;p&gt;
Each dictionary is like a Land's End catalog section related to one entity.  The user can shop from any one of the available catalog sections in a system.  Did you want to be in the STUDENTS section, FACULTY, CLASS_ROSTERS, or ACCOUNTS_RECEIVABLE section...?  Once you are there, you can pick the data you want from the list of data elements on-hand.  Some data are cross-listed in multiple catalog sections.
&lt;/p&gt;
&lt;div class="showoff"&gt;
A file dictionary is a logical view of data.  We could see not only a student ID, name, and class level, but a computed GPA, all majors, and the name of each of the students' advisors through a single dictionary.  Anything you can think to ask about a student, you could ask the STUDENTS entity.
&lt;p /&gt;
LIST STUDENTS NAME CLASS GPA MAJORS ADVISOR_NAME
&lt;/div&gt;
&lt;p&gt;
As mentioned in the &lt;a href="http://www.tincat-group.com/mewsings/2006/03/data-for-every-1.html"&gt;previous mewsing&lt;/a&gt;, this listing will show one result (not necessarily one line) per student.  The majors and advisor names will show on multiple lines when there are multiple entries for these.
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;Pickies have been doing OLAP directly against their OLTP data since the 70's.&lt;/blockquote&gt; 
&lt;p&gt;
It is likely worth a full exploration in a future mewsing, but I'll just make a quick note that because of this approach to the data, metadata, and the related query language, &lt;dfn title="Pick or MultiValue Developers"&gt;Pickies&lt;/dfn&gt; have been doing &lt;acronym title="On-Line Analytical Processing"&gt;OLAP&lt;/acronym&gt; directly against their &lt;acronym title="On-Line Transaction Processing"&gt;OLTP&lt;/acronym&gt; data since the '70's, without a need to reshape it first.  This is very cool.  Of course they also create data marts for some analysis when appropriate, such as when data need to be frozen.  A file with a compound key (called a multi-part key in &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt;) is automatically a virtual fact table from which you can retrieve or derive any data you would want from the dimensions, forming a virtual &lt;a href="http://en.wikipedia.org/wiki/Weak_entity"&gt;fact-dimension&lt;/a&gt; or cube perspective of the data.
&lt;/p&gt;
&lt;p&gt;
One downside for this approach is that the carpet bags are not pre-packed with everything.  You must add to the vocabularly for the queries (the schema) by defining virtual fields for anything that is not defined as a stored data element under this file name.  So the file dictionaries become very long catalogs of &lt;em&gt;base&lt;/em&gt; or stored data and definitions for derived data.  To help develop complex virtual elements, which I'll refer to as &lt;em&gt;taking your medicine&lt;/em&gt;, &lt;a href="http://www.moron.nl/lyrics.php?id=47539&amp;artist=Disney"&gt;the spoonful of sugar&lt;/a&gt; is that you may refer to subroutines written in &lt;a href="http://www.jes.com/pb/index.html"&gt;Pick/BASIC&lt;/a&gt;, a very data-savvy general purpose programming language.  So, the sky's the limit, but typically each virtual element dictionary item must be constructed either in the dictionary or inline in a query.  Unlike &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt;, &lt;acronym title="Generalized Information Retrieval Language &amp;amp; System"&gt;GIRLS&lt;/acronym&gt; is not powerful enough to write every query strictly from the base schema for stored data.
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;Adding derived data elements is quite gratifying in a big-bang- for-the-buck way&lt;/blockquote&gt; 
&lt;p&gt;
Developing virtual data elements in Pick has the &lt;em&gt;feeling&lt;/em&gt; of enlarging the catalog of offerings in the logical view for the user querying the data.  Developing views in a &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt;-&lt;acronym title="Data Base Management System"&gt;DBMS&lt;/acronym&gt; has the &lt;em&gt;feeling&lt;/em&gt; of shrinking the view of the data for users.  With the former, the dictionaries, or logical views, can get junked up with lots of unnecessary vocabulary.  Developers talk about the need to &lt;em&gt;clean the dictionaries&lt;/em&gt; as one might suggest cleaning an office.  In spite of that, there is a sense among both end-users and developers that this is &lt;em&gt;our junk&lt;/em&gt; and, therefore, within our control.  The environment is very empowering for users.  With &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt;, your views can get brittle with difficulty adding attributes and retaining performance and existing reports on the view (for example, when you add a property with multiple values resulting in more rows in the changed view).  There are variations on virtual field definition in several other &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt;'s, with Microsoft SQL Server likely having the best such implementation among the &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt;-&lt;acronym title="Data Base Management System"&gt;DBMS&lt;/acronym&gt;'s.
&lt;/p&gt;
&lt;p&gt;
I can't quite put my finger on it, but compared to coding &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; views, adding derived data elements is quite gratifying in a big-bang-for-the-buck way.  It is often a big point of collaboration between developers and end-users.  It's very plain to see how simple it is to think in terms of querying your enitities individually.  No matter what new requirements end-users have for querying the data, you can add virtual data elements into the carpet bag for an entity without performance penalties for adding elements nor the same performance penalities often incurred with an SQL JOIN.  This makes the possibilities for queries about that entity even more magic.  "In every job that must be done there is an element of fun."
&lt;/p&gt;
&lt;a href="http://www.tincat-group.com/mewsings/2006/03/with-data-modeling-whats-your-bag.html"&gt;Continue to next blog &amp;rarr;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/12028427-114222306601427264?l=www.tincat-group.com%2Fmewsings%2Findex.html'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/114222306601427264/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=12028427&amp;postID=114222306601427264&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/114222306601427264'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/114222306601427264'/><link rel='alternate' type='text/html' href='http://www.tincat-group.com/mewsings/2006/03/in-every-job-that-must-be-done-there.html' title='In Every Job That Must be Done There is a Data Element of Fun'/><author><name>--dawn</name><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09089606608203203470'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-12028427.post-114183013966534494</id><published>2006-03-08T09:01:00.000-06:00</published><updated>2007-01-08T04:15:45.286-06:00</updated><title type='text'>Data for Every 1</title><content type='html'>&lt;div class="insetpic"&gt;
&lt;a href="http://www.threedognight.com/l_one.html"&gt;
&lt;img src="http://www.tincat-group.com/images/1.gif" alt="The number 1"  title="The number 1" /&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;
Queries and reports.  Let's talk about 'em.  My plan was to do an overview comparing the query language formerly known as &lt;a href="http://www.tincat-group.com/mewsings/testing/2006/02/list-of-girls.html"&gt;GIRLS&lt;/a&gt; with &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; in this one blog entry and then move on to features of reporting tools in general.  But I'm scratching that and blending the two.  Instead of listing a whole bunch of differences, I'm going to take it nice, and easy, &lt;dfn title="one"&gt;1&lt;/dfn&gt; thought at a time.  Please recognize that this is &lt;strong&gt;not&lt;/strong&gt; about trying to convince anyone to use my favorite query language; it is much more general than that, with concepts that might also apply to XQuery or any other query language.
&lt;/p&gt;  
&lt;p&gt;
Terms that crop up with &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; include the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt;, &lt;acronym title="First Normal Form"&gt;1NF&lt;/acronym&gt;, and &lt;acronym title="Three-valued logic"&gt;3VL&lt;/acronym&gt;.  Terms that relate to &lt;acronym title="Generalized Information Retrieval Language &amp;amp; System"&gt;GIRLS&lt;/acronym&gt; are &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt;, &lt;dfn title="Named for Dick Pick"&gt;Pick&lt;/dfn&gt;, &lt;acronym title="Non-First Normal Form"&gt;NF2&lt;/acronym&gt;, and &lt;acronym title="Two-valued Logic"&gt;2VL&lt;/acronym&gt;.  Through the eyes of these two query languages, I hope to illustrate some significant differences in data models.
&lt;/p&gt;
&lt;p&gt;
Let's dive into a contrived example of a school system booster club sale of frozen pizzas: multiple pizzas sold through multiple school booster clubs in a single school system.  If we had an information system, we might want to query it asking about the people who bought the pizzas, perhaps for a mailing list.  We might also want information about the pizzas, maybe for forecasting of supplies for future events.  We might ask which booster clubs sold which pizzas so we deliver the pizzas to the right places.  Those are all &lt;em&gt;1 entity instance&lt;/em&gt; questions.  We are asking a question about each 1 instance of people, pizzas, or clubs.
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;It is typical to want 1 chunk of information for every 1 instance of an entity.&lt;/blockquote&gt; 
&lt;p&gt;
It is typical to want information about some entity or other, whether people, places, things, or events.  Additionally, it is typical to want 1 chunk of information for every 1 instance of such an entity.  Optionally, you might want to aggregate information into groupings bigger than 1 asking, for example, how many pizzas each booster club ordered.  We might otherwise split out the instances by requesting information about a multivalued property.  For example, we might ask how many pizzas have pepperoni as a topping.
&lt;/p&gt;
&lt;p&gt;
In any case, starting with 1 and then listing, grouping, or splitting from there is a very common way to think.  1 might even suggest it is &lt;em&gt;natural&lt;/em&gt;.  If you walked into a business in the 1970's, or even today, you would find filing cabinets dedicated to 1 single entity, such as Customers or Orders, often with a folder for each instance of that 1 entity.  Why?  It has to do with how we think, I would think.    1 might be the loneliest number, but it is conceptually very powerful.
&lt;/p&gt;
&lt;div class="insetpic"&gt;
&lt;img src="http://www.tincat-group.com/images/pizza.gif" alt="Pizza"  title="Pizza" /&gt;
&lt;/div&gt;
&lt;p&gt;
A pizza chef team has been organized to assemble all of the uncooked pizzas and put them in boxes, with cooking instructions attached.  They would like each box to have a Pizza Description Label, including a unique Pizza ID, the type of crust, the list of cheeses used, and the list of toppings.  Sure, you might have mixed cheeses and toppings in a single attribute or split out meats and veggies into separate ones, but  just stick with me and don't let your mind wander in that way.  (I'll also hold off any discussion of list compared to set or bag for a rainy day.  Let's just say that this list of toppings is rather like a &lt;dfn title="Is a shopping list really a set or a shopping bag?"&gt;shopping list&lt;/dfn&gt;.) 
&lt;/p&gt;
&lt;div class="showoff"&gt;
Using &lt;acronym title="Generalized Information Retrieval Language &amp;amp; System"&gt;GIRLS&lt;/acronym&gt; (please excuse all upper case in these examples, it's my age showing):
&lt;p&gt;
&lt;strong&gt;LIST PIZZAS CRUST CHEESES TOPPINGS WITH PIZZA_ID = "12345"&lt;/strong&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;p /&gt;
&lt;table class="datatable"&gt;
&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Pizza Description&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;
&lt;tr class="clo cellleft"&gt;
&lt;td&gt;
&lt;xmp&gt;
PizzaID   Crust      Cheeses      Toppings

12345     Chicago    Mozzarella   Peppers
                     Provolone    Mushrooms
                                  Pepperoni
&lt;/xmp&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;p /&gt;
&lt;div class="showoff"&gt;
Seemingly comparable &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt;:
&lt;p&gt;
&lt;strong&gt;SELECT 
&lt;br /&gt;
&amp;nbsp;&amp;nbsp;PIZZA_ID, CRUST, CHEESE, TOPPING 
&lt;br /&gt;
FROM 
&lt;br /&gt;
&amp;nbsp;&amp;nbsp;PIZZA_VIEW 
&lt;br /&gt;
WHERE 
&lt;br /&gt;
&amp;nbsp;&amp;nbsp;PIZZA_ID = "12345" 
&lt;br /&gt;
ORDER BY 
&lt;br /&gt;
&amp;nbsp;&amp;nbsp;CHEESE&lt;/strong&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;p /&gt;
&lt;p&gt;
But this isn't going to be quite the same, is it?  There are insignificant details such as whether column headings and unique id's come along for the ride, but then there is the matter of how to create the PIZZA_VIEW and what the output from the above SELECT would be like.  The view might be produced with something like this:
&lt;/p&gt;
&lt;p&gt;
CREATE VIEW PIZZA_VIEW AS (SELECT P1.PIZZA_ID, CRUST, CHEESE, TOPPING FROM (PIZZA AS P1 JOIN PIZZA_CHEESE AS P2 ON P1.PIZZA_ID = P2.PIZZA_ID) JOIN PIZZA_TOPPING AS P3 ON P2.PIZZA_ID = P3.PIZZA_ID)
&lt;/p&gt;
&lt;p&gt;
If we take the above view and the previous SELECT on it, we might get a label something like the one below.  
&lt;/p&gt;
&lt;table class="datatable"&gt;
&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Pizza Description&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;
&lt;tr class="clo cellleft"&gt;
&lt;td&gt;
&lt;xmp&gt;
12345     Chicago    Mozzarella   Peppers
12345     Chicago    Mozzarella   Mushrooms
12345     Chicago    Mozzarella   Pepperoni
12345     Chicago    Provolone    Peppers
12345     Chicago    Provolone    Mushrooms
12345     Chicago    Provolone    Pepperoni
&lt;/xmp&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;
I'm a bit rusty on &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; and I'm not doing anything clever here to handle the multiple 1-M (one-to-many) relationships, but feel free to add comments on how the view or query on this data could be set up better.  I think the party line is that the data are all there and how it is displayed is a task for a reporting tool.  But look at how the query language shows up this key difference in the data models.  With &lt;acronym title="Generalized Information Retrieval Language &amp;amp; System"&gt;GIRLS&lt;/acronym&gt; we ask questions while thinking about 1 thing, listing, aggregating, or splitting them all the while with a sense of 1-ness.  
&lt;/p&gt;
&lt;p&gt;
A significant issue for professionals who are learning &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; after only knowing languages that derived from &lt;acronym title="Generalized Information Retrieval Language &amp;amp; System"&gt;GIRLS&lt;/acronym&gt; is the change to thinking about &lt;em&gt;going from&lt;/em&gt; the many to the 1 in their thinking instead of from the 1 to the many.  It didn't seem like they ever had to learn to start with 1, but they definitely do have to learn to start with many instead.
&lt;/p&gt;
&lt;p&gt;
Moving the story forward, we have just found out that the green peppers are delayed and we have decided not to wait any longer.  We are going to have the delivery team get all the pizzas without peppers to the right booster clubs now.  The delivery team has requested a list of the Pizza IDs that need to be loaded in the delivery van.  So, we need a listing of all Pizza IDs for pizzas without peppers as a topping.
&lt;/p&gt;
&lt;div class="showoff"&gt;
&lt;acronym title="Generalized Information Retrieval Language &amp;amp; System"&gt;GIRLS&lt;/acronym&gt;:
&lt;p&gt;
&lt;strong&gt;LIST PIZZAS WITH EVERY TOPPING &amp;lt;&amp;gt; "PEPPERS"&lt;/strong&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;p /&gt;
&lt;div class="showoff"&gt;
&lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt;:
&lt;p&gt;
&lt;strong&gt;SELECT P1.PIZZA_ID FROM PIZZA AS P1 WHERE P1.PIZZA_ID NOT IN (SELECT P2.PIZZA_ID FROM PIZZA_TOPPING AS P2 WHERE TOPPING = "PEPPERS")&lt;/strong&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;
It just doesn't matter to &lt;acronym title="Generalized Information Retrieval Language &amp;amp; System"&gt;GIRLS&lt;/acronym&gt; if there is an attribute, such as &lt;em&gt;topping&lt;/em&gt;, that has multiple values for a single pizza.  We can still look at data for every 1.
&lt;/p&gt;
&lt;a href="http://www.tincat-group.com/mewsings/2006/03/in-every-job-that-must-be-done-there.html"&gt;Continue to next blog &amp;rarr;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/12028427-114183013966534494?l=www.tincat-group.com%2Fmewsings%2Findex.html'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/114183013966534494/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=12028427&amp;postID=114183013966534494&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/114183013966534494'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/114183013966534494'/><link rel='alternate' type='text/html' href='http://www.tincat-group.com/mewsings/2006/03/data-for-every-1.html' title='Data for Every 1'/><author><name>--dawn</name><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09089606608203203470'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-12028427.post-114115253069121029</id><published>2006-02-28T12:48:00.000-06:00</published><updated>2006-11-12T17:58:00.944-06:00</updated><title type='text'>The LIST of GIRLS</title><content type='html'>&lt;div class="insetpic"&gt;
&lt;img src="http://www.tincat-group.com/images/pick-nelson-weave.gif" alt="Dick Pick and Don Nelson"  title="Dick Pick and Don Nelson" /&gt;
&lt;/div&gt;
&lt;p&gt;
The Relational Model (RM) is neither necessary (see &lt;a href=" http://www.tincat-group.com/mewsings/2006/01/naked-model.html"&gt;The Naked Model&lt;/a&gt;) nor sufficient (see &lt;a href="http://www.tincat-group.com/mewsings/2006/02/dont-suffer-impedance.html"&gt;Don't Suffer Impedance&lt;/a&gt;).  That said, it is useful.  There are pros and cons to employing it.  In order to be able to compare its usefulness to that of tools based on approaches other than the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt;, we need to know what else is out there.  This mewsing is not an opinion piece, but includes some things old, some things new, some things borrowed, and some things blue (Big Blue, that is).
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;GIRLS stands for Generalized Information Retrieval Language &amp;amp; System&lt;/blockquote&gt;  
&lt;p&gt;
While I plan to talk about a variety of possibilities in the future and am particularly curious about the future of XML-DBMS tools, the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; alternative with which I have the most direct experience is the MultiValue or PICK data model.  I'll start there with a couple of  blog entries.
&lt;/p&gt;
&lt;p&gt;
Codd's papers, including the &lt;a href="http://www.cs.duke.edu/~junyang/cps216/papers/codd-1970.pdf"&gt;pdf version of his 1970 paper&lt;/a&gt;, are readily available on the web.  That is &lt;a href="http://tinyurl.com/fd4vh"&gt;not the case&lt;/a&gt; with early papers related to the Nelson-Pick data model.  Although my source materials are not-always-easy-to-read copies and my scan, resize, and Adobe skills also leave room for improvement, I spent my time allocated for this mewsing to turn two historical papers into pdfs.  I think these papers are available only from this site, but if anyone knows of other sources, please inform me as I would be happy to point to better versions of them.  I also provide a link below to Don Nelson's resume, which indicates that he worked under F. George Steele, who invented and developed the Digital Differential Analyzer.  After studying under Steele, Nelson developed the GIRLS and GIM-1 specifications at TRW.  
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.tincat-group.com/mv/girls.html"&gt;GIRLS Papers&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="http://jwstephens.com/don-nelson/page_01.htm"&gt;Don Nelson's Resume&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote class="pullquote"&gt;
What is your preference, GIRLS or SQL?&lt;/blockquote&gt; 
&lt;p&gt;You might have noticed that GIRLS stands for Generalized Information Retrieval Language &amp;amp; System.  Many flavors of Pick have been developed over the years, as indicated in the &lt;a href="http://www.tincat-group.com/mv/familytree.html"&gt;MultiValue Family Tree poster&lt;/a&gt;.  Unlike &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt;, which has a single name covering many different implementations, &lt;acronym title="Generalized Information Retrieval Language and System"&gt;GIRLS&lt;/acronym&gt; has had almost as many names as implementations.  &lt;acronym title="Generalized Information Retrieval Language and System"&gt;GIRLS&lt;/acronym&gt; has been named &lt;em&gt;UniQuery, ENGLISH, FRENCH, AQL, ACCESS, Info/Access, jQL, RetrieVe, Vision, RECALL, QMQuery, CMQL, queryON, R/LIST&lt;/em&gt;, and &lt;em&gt;INFORM&lt;/em&gt;. Current implementations of &lt;acronym title="Generalized Information Retrieval Language and System"&gt;GIRLS&lt;/acronym&gt; are available from many different vendors, most (all?) of which are listed here, ordered by a complex algorithm.
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www-306.ibm.com/software/data/u2/"&gt;IBM U2, UniData &amp; UniVerse&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.openqm.com"&gt;Ladybridge OpenQM (open source &amp; commercial versions)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.revelation.com/"&gt;Revelation OpenInsight&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.jbase.com"&gt;jBASE&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.intersystems.com/mv"&gt;Intersystems Cache' for MultiValue&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.northgate-is.com/products__software_services/reality.php"&gt;Northgate Reality&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.maverick-dbms.org"&gt;MaVerick (Java-based open source)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="http://rainingdata.com/products/dbms/index.html"&gt;Raining Data, successor to Pick Systems&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.ongroup.com"&gt;ONgroup&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.edp.co.uk/solutions/sol_univision.html"&gt;EDP UniVision&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
What is your preference, &lt;acronym title="Generalized Information Retrieval Language and System"&gt;GIRLS&lt;/acronym&gt; or &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt;?  There are many differences between them, but I'll save that discussion for next time and just mention a few right now.  &lt;acronym title="Generalized Information Retrieval Language and System"&gt;GIRLS&lt;/acronym&gt; can perform queries with data that is in &lt;acronym title="Non-First Normal Form"&gt;NF2&lt;/acronym&gt; (non-&lt;acronym title="First Normal Form"&gt;1NF&lt;/acronym&gt;), it employs a two-valued logic (no &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; NULLS), and in place of the SELECT of &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; is the LIST of &lt;acronym title="Generalized Information Retrieval Language and System"&gt;GIRLS&lt;/acronym&gt;.
&lt;/p&gt;
&lt;a href="http://www.tincat-group.com/mewsings/2006/03/data-for-every-1.html"&gt;Continue to next blog &amp;rarr;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/12028427-114115253069121029?l=www.tincat-group.com%2Fmewsings%2Findex.html'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/114115253069121029/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=12028427&amp;postID=114115253069121029&amp;isPopup=true' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/114115253069121029'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/114115253069121029'/><link rel='alternate' type='text/html' href='http://www.tincat-group.com/mewsings/2006/02/list-of-girls.html' title='The LIST of GIRLS'/><author><name>--dawn</name><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09089606608203203470'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-12028427.post-114053655643921963</id><published>2006-02-21T09:42:00.000-06:00</published><updated>2007-01-08T04:02:26.739-06:00</updated><title type='text'>Don't Suffer Impedance</title><content type='html'>&lt;div class="insetpic"&gt;
&lt;img src="http://www.tincat-group.com/images/nick-hengeveld.gif" alt="Nick in a field of 0's and 1's"  title="Nick with Zeros and Ones" /&gt;
&lt;/div&gt;
&lt;p&gt;
A considerable amount of software development consists of mapping and converting data from one format to another, one schema to another.  When software is used to bridge the gap between the person and the machine, there is an obvious need for translation.  &lt;em&gt;Impedance mismatch&lt;/em&gt; refers to the difference between the output of one process and input of another, requiring a transformation to connect the processes.  There is a huge impedance mismatch between the thoughts of a person and the 0's and 1's of a computer, for example.  (Although there is perhaps &lt;a href="http://lkml.org/lkml/2005/11/24/215"&gt;less of a mismatch&lt;/a&gt; in the case of &lt;dfn title="One of my whiz kid employees more than a decade ago"&gt;Nick&lt;/dfn&gt; here than for others.)
&lt;/p&gt;
&lt;p&gt;
Let's narrow the scope.  As fascinating as it might be to discuss in a future mewsing, I don't want to start with a person's thoughts here, so let's move to the point where software components collect data from, or present data to, a person.  So we will start with the user interface at one end.  Without loss of generalization for these purposes, we can narrow further to text-based &lt;acronym title="User Interface"&gt;UI&lt;/acronym&gt; data.  We could return to &lt;a href="http://www.tincat-group.com/mewsings/2006/02/model-behind-interface.html#uml"&gt;this &lt;acronym title="Unified Modeling Language"&gt;UML&lt;/acronym&gt; class&lt;/a&gt; as a model for an example &lt;acronym title="eXtensible HyperText Markup Language"&gt;XHTML&lt;/acronym&gt; (and therefore &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt;) web page. [Tip: mouse-over acronyms to get the expanded form.]
&lt;/p&gt;
&lt;p&gt;
At the other end, the operating system works with the hardware to handle the translation of data to 0's and 1's.   Additionally, let's assume a database product that has at least &lt;acronym title="Create, Read, Update, Delete"&gt;CRUD&lt;/acronym&gt; services and communicates with the &lt;acronym title="operating system"&gt;OS&lt;/acronym&gt;. For example, this could be an &lt;acronym title="Structured Query Language-DataBase Management System"&gt;SQL-DBMS&lt;/acronym&gt;. In summary, we will look at text data at the point of the user interface to and from the interface with a database product.  As an example, we will start with an &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt; page at one end and an &lt;acronym title="Structured Query Language DataBase Management System"&gt;SQL-DBMS&lt;/acronym&gt; at the other. 
&lt;/p&gt;
&lt;div class="inlinepic"&gt;
&lt;img src="http://www.tincat-group.com/images/ipo.gif" alt="Input-Processing-Output"  title="Input-Processing-Output" /&gt;
&lt;/div&gt;
&lt;p&gt;
While impedance can be measured in electrical engineering, in software development it is a much more loosely-defined term often used to sell products or claim superiority.  Most definitions of &lt;em&gt;impedance mismatch&lt;/em&gt; within software development, as used in the phrase &lt;em&gt;&lt;acronym title="Object-oriented-Relational Modeling"&gt;OO-RM&lt;/acronym&gt; impedance mismatch&lt;/em&gt;, provide information specific to &lt;acronym title="Object-Oriented"&gt;OO&lt;/acronym&gt; and &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt;, so I'll try my hand at a more generic description.  An &lt;em&gt;impedance mismatch&lt;/em&gt; occurs when there is enough difference in the data model used for the output of one process and the data model employed for the input of another to require a transformer.  This transformer would be analogous to an electrical transformer, with the definition left to the reader.
&lt;/p&gt;
&lt;p&gt;
The number of transformations of any kind relates at least to the size and scope of any given project, but the number of places where there is an impedance mismatch relates to the architecture and product choices for the solution.  If there might be such a mismatch wherever we switch data models, and data models are abstractions for programming languages or sublanguages (see &lt;a href="http://www.tincat-group.com/mewsings/2006/01/naked-model.html"&gt;The Naked Model&lt;/a&gt; for a description of a &lt;em&gt;data model&lt;/em&gt;), we can search for them by looking at places where we switch programming languages.
&lt;/p&gt;
&lt;p&gt;
In our example, we could use JavaScript to read and write &lt;acronym title="User Interface"&gt;UI&lt;/acronym&gt; values via the &lt;acronym title="Document Object Model"&gt;DOM&lt;/acronym&gt; of the &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt; page.  We could pass these data using &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt; to Java, PHP, Ruby, C, C++, Perl, Python, or even your favorite derivative of Dartmouth Basic, going from data entry on our &lt;acronym title="eXtensible HyperText Markup Language"&gt;XHTML&lt;/acronym&gt; page into some middle tier.  We could otherwise GET or POST into this middle tier with name=value pairs, but I only mention that so you don't point it out.
&lt;/p&gt;
&lt;p&gt;
If we take our data into an &lt;acronym title="Object-Oriented"&gt;OO&lt;/acronym&gt; structure in the middle tier, there is a change between the UI and the middle tier or within the middle tier that requires a transformation.  This &lt;acronym title="eXtensible Markup Language to Object-Oriented"&gt;XML-OO&lt;/acronym&gt; or Strings-&lt;acronym title="Object-Oriented"&gt;OO&lt;/acronym&gt; transformation is worth a closer look in a later discussion, but permits similar or identical data structures to be used.  Each language has the ability to work with &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt;, for example.   
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;What would it take to minimize the number of impedance mismatches in a particular application?&lt;/blockquote&gt; 
&lt;p&gt;
Then we have a transition between our middle tier and the database by way of &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt;.  This is well-documented as a place where there is an impedance mismatch.  Of course there are many proprietary extensions to &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt;, but for most implementations (e.g. &lt;acronym title="Structured Query Language 1992"&gt;SQL-92&lt;/acronym&gt;) three of the differences that will need to be addressed somewhere between the front-end and &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; are 1) &lt;acronym title="Non-First Normal Form"&gt;NF2&lt;/acronym&gt; vs. &lt;acronym title="First Normal Form"&gt;1NF&lt;/acronym&gt; 2) Lists vs. unordered data and 3) two-valued vs three-valued logic (or nulls as empty sets/strings vs. &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt;-style NULLS).  
&lt;/p&gt;
&lt;p&gt;
It might be worth noting that the &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; side does not feel the pain.  &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; is not a general purpose programming language, and the expectation is typically that the transformer required to address this impedance mismatch will be handled entirely by "the other guy."  Whether this has been a cause of resentment in companies that organize with a separate group responsible for development and maintenance of the database aspects of software development is anyone's guess.  
&lt;/p&gt;
&lt;p&gt;
There might be good reasons to put up with these mismatches, but what would it take to minimize the number of impedance mismatches in a particular application?  As indicated in &lt;a href="http://www.tincat-group.com/mewsings/drafts/2006/01/who-ordered-ripple-delete.html"&gt;the ripple delete&lt;/a&gt; example, we could use a data model similar to the &lt;acronym title="User Interface"&gt;UI&lt;/acronym&gt; on the back-end.  Could we similarly choose to implement the front-end using the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt;?  What would an &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; &lt;acronym title="user interface"&gt;UI&lt;/acronym&gt; be?  I don't mean that the data are stored using the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt;, but that the data model for the actual &lt;acronym title="User Interface"&gt;UI&lt;/acronym&gt; form would conform to the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt;.  If we were to apply the &lt;a href="http://www.tincat-group.com/mewsings/2006/01/naked-model.html#ip"&gt;Information Principle&lt;/a&gt; to the &lt;acronym title="User Interface"&gt;UI&lt;/acronym&gt;, we would need the entire information content to be represented only as attribute values within tuples within relations.  While that is feasible with a data store, that would require no lists or ordered multivalued attributes, for example, which is not a sacrifice that can be made in a user interface.  
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;Unlike other data models, the RM is not sufficient for writing software.&lt;/blockquote&gt; 
&lt;p&gt;
An arbitrary &lt;acronym title="User Interface"&gt;UI&lt;/acronym&gt;, therefore, cannot use the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; for its data model.  Given that the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; was developed for the purpose of working with &lt;em&gt;large shared data banks&lt;/em&gt;, it is understandable that it might not also be useful as a &lt;acronym title="User Interface"&gt;UI&lt;/acronym&gt; data model.  But if we were to decide that life is too short for impedance, we would have to eliminate the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; from the solution.  Unlike other data models, the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; is not sufficient for writing software.
&lt;/p&gt;
&lt;a href="http://www.tincat-group.com/mewsings/2006/02/list-of-girls.html"&gt;Continue to next blog &amp;rarr;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/12028427-114053655643921963?l=www.tincat-group.com%2Fmewsings%2Findex.html'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/114053655643921963/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=12028427&amp;postID=114053655643921963&amp;isPopup=true' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/114053655643921963'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/114053655643921963'/><link rel='alternate' type='text/html' href='http://www.tincat-group.com/mewsings/2006/02/dont-suffer-impedance.html' title='Don&apos;t Suffer Impedance'/><author><name>--dawn</name><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09089606608203203470'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-12028427.post-113935006456185952</id><published>2006-02-07T16:07:00.000-06:00</published><updated>2006-11-12T17:58:00.507-06:00</updated><title type='text'>The Model Behind the interFace</title><content type='html'>&lt;p&gt;
An interface is the face that computer software shows to a person, other software, or possibly hardware devices.  While data models are often discussed related to databases and storing data, this mewsing is about data models behind software interfaces in general and user interfaces in particular.
&lt;/p&gt;
&lt;div class="insetpic" id="uml"&gt;
&lt;img src="http://www.tincat-group.com/images/webpage-uml.gif" alt="Example UML class diagram for web page data"  title="UML class diagram for web page data" /&gt;
&lt;/div&gt;
&lt;p&gt;
Let's take an example of a browser-based UI page with three text fields, one of which requires an integer value; one single selection drop-down; two multi-selection drop-downs; one text area; one radio button; and one date entry via a free-form text field.  Using all the creativity I can muster right now, I'll name them as indicated in the UML class shown here.
&lt;/p&gt;
&lt;p&gt;
Developing software is a process of modeling data and behavior.  One set of data we can model is that which will be entered by the user.  This single page of data could be backed by a view/schema modeled with this single UML box.  We could use XML or JSON, for example, within the software to define and work with this view of data. 
&lt;/p&gt;
&lt;p&gt;
Similarly if not working with a UI but a data exchange interface, such as one using web services, we could use this same data model.  This could be the model for a single &lt;em&gt;record&lt;/em&gt; of data.  For this example I'll include some sample values.  I'll use an xml-ish format (because I wish XML had arrays like this) to model this view.  [Note: I'll start the array index at 1, but I'm noting that I'm doing that just to retain my credentials in the real-programmers-start-counting-at-zero world.]
&lt;/p&gt;
&lt;p&gt;
&lt;code&gt;
&amp;lt;MyExchange&amp;gt;
&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;text1&amp;gt;elephant&amp;lt;/text1&amp;gt;
&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;text2&amp;gt;ears&amp;lt;/text2&amp;gt;
&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;text3&amp;gt;2&amp;lt;/text3&amp;gt;
&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;singleSelect&amp;gt;mouse&amp;lt;/singleSelect&amp;gt;
&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;multiSelect1&amp;gt;
&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;multiSelect1[1]&amp;gt;grey&amp;lt;/multiSelect1[1]&amp;gt;
&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;multiSelect1[2]&amp;gt;pink&amp;lt;/multiSelect1[2]&amp;gt;
&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;multiSelect1[3]&amp;gt;ivory&amp;lt;/multiSelect1[3]&amp;gt;
&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;/multiSelect1&amp;gt;
&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;multiSelect2 /&amp;gt;
&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;textArea&amp;gt;These are the times that try men's souls
&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;/textArea&amp;gt;
&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;radioButton&amp;gt;Africa&amp;lt;/radioButton&amp;gt;
&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;dateText&amp;gt;01JAN06&amp;lt;/dateText&amp;gt;
&lt;br /&gt;
&amp;lt;/MyExchange&amp;gt;
&lt;/code&gt;
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;An arbitrary web page cannot have an SQL view as a data model.&lt;/blockquote&gt; 
&lt;p&gt;
An arbitrary web page cannot have an SQL view as a data model.  While views need not be in 3rd or 5th normal form or &lt;acronym title="Boyce-Codd Normal Form"&gt;BCNF&lt;/acronym&gt;, you cannot define an SQL view that is not in 1NF.  Using my favorite definition of an SQL view being a &lt;em&gt;stored query&lt;/em&gt;, we see that while we can get a lot of different result sets in an SQL query, we cannot get a single web page of data if said data includes lists.  Lists or arrays are very common in user interfaces as well as throughout the rest of software development.  SQL-DBMS advocates have been known to say things like "You can use reporting tools to represent the view in whatever form you like--that is a representation issue".   You might recall from &lt;a href="http://www.tincat-group.com/mewsings/2006/01/naked-model.html"&gt;a previous blog&lt;/a&gt; that the RM is all about representation, however.  
&lt;/p&gt;
&lt;p&gt;
The inability to get a view that is not &lt;dfn title="See the Is Codd Dead? blog"&gt;normalized&lt;/dfn&gt; is a failure of SQL-DBMS tools, while the current state of the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; has made accommodations by redefining 1NF.  I suspect I'll bring that up in a future blog, but for now I'll just make the point that even with some new variations on the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; that permit relation-valued attributes, ordered lists are still not included in the model.
&lt;/p&gt;
&lt;p&gt;
Now that we have our UI or web services interface modeled, what might we want to do with data that are hosted by this model?  We might want to select, project, join… basically we might want to do anything we otherwise do with data.  These data need not come from a disk, they could come from a web page or pages, a web service or other interface, or a process that generates data and stores it in memory, for example.
&lt;/p&gt;
&lt;p&gt;
Are there any of these statements with which you disagree?
&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
Data modeling is required for all interfaces and, therefore, throughout the process of software development.
&lt;/li&gt;
&lt;li&gt;
When data values are provided in data models related to a UI or any other interface, there might be a requirement to do any type of manipulation of or queries against this data.
&lt;/li&gt;
&lt;li&gt;
When working with a UI data model, it is not possible to work exclusively with normalized data.
&lt;/li&gt;
&lt;/ol&gt;
&lt;blockquote class="pullquote"&gt;Therefore, it is not just important, but necessary, to have models of data other than the RM.&lt;/blockquote&gt; 
&lt;p&gt;
Therefore, it is not just important, but necessary, to have models of data other than the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt;.  Whatever the other data model, it has the same requirements for manipulating and querying the data as data models that are specific to DBMS tools.  Data in these models must be projected, inspected, dejected, neglected, and selected (apologies to Arlo Guthrie and &lt;em&gt;Alice's Restaurant&lt;/em&gt;).  
&lt;/p&gt;
&lt;p&gt;
Even if we decide to make changes to whatever data model we use for the UI when we work with large shared data banks, we cannot make the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; the data model across the board in software development.  We must have ordered lists, for example.  Before we turn our attention to the face of the database, I want to be sure you are with me on this point.  The User simply requires a more full-featured model behind their Interface.
&lt;/p&gt;
&lt;a href="http://www.tincat-group.com/mewsings/2006/02/dont-suffer-impedance.html"&gt;Continue to next blog &amp;rarr;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/12028427-113935006456185952?l=www.tincat-group.com%2Fmewsings%2Findex.html'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/113935006456185952/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=12028427&amp;postID=113935006456185952&amp;isPopup=true' title='28 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/113935006456185952'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/113935006456185952'/><link rel='alternate' type='text/html' href='http://www.tincat-group.com/mewsings/2006/02/model-behind-interface.html' title='The Model Behind the interFace'/><author><name>--dawn</name><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09089606608203203470'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>28</thr:total></entry><entry><id>tag:blogger.com,1999:blog-12028427.post-113874006772676260</id><published>2006-01-31T14:40:00.000-06:00</published><updated>2006-11-12T17:58:00.206-06:00</updated><title type='text'>The Data Movement</title><content type='html'>&lt;p&gt;
Data.  Movement.  Toss in some musings about differences between the sexes and that will do for today.  
&lt;/p&gt;
&lt;div class="insetpic"&gt;
&lt;img src="http://www.tincat-group.com/images/senior-picture.gif" alt="dawn's high school photo"  title="dawn's high school photo" /&gt;
&lt;/div&gt;
&lt;p&gt;
&amp;lt;&lt;acronym title="Cover Your Arse"&gt;CYA&lt;/acronym&gt;&amp;gt;
&lt;br /&gt;Gender can be a divisive subject, and I want to be clear that I have no expertise in the field of gender studies, nature vs. nurture, or how brains differ.  One of my readers from down under tells me that feminism is a sensitive subject there due to a lack of full-time jobs for men over forty right now.  Apparently there has been some reverse discrimination.  I do not think I have been discriminated against in my career at all.  Unlike some of you, I have never even been in an all-male meeting.  But I have taken note of statistics indicating a downward trend of women in computing and have seen a lack of female computer science majors in at least one college in the USA.  I decided to weave this topic into my mewsings today, giving possibly a new angle to the otherwise tired data vs. process topic.  
&lt;br /&gt;
&amp;lt;/CYA&amp;gt;
&lt;/p&gt;
&lt;p&gt;
It's time to put a definition of &lt;em&gt;data&lt;/em&gt; on the table, &lt;em&gt;data&lt;/em&gt; as in &lt;em&gt;data modeling&lt;/em&gt; and &lt;em&gt;database&lt;/em&gt;, and even the good old term &lt;em&gt;data processing&lt;/em&gt;.  
&lt;/p&gt;&lt;p&gt;
Data: encoded propositions, a combination of form and meaning; accurate data are facts.
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;Accurate data are facts. Ho Hum.&lt;/blockquote&gt; 
&lt;p&gt;
Ho hum.  Here's a fact: I'm in my forties.  If you capture that fact as data today and then present it thirty years from now, it will be as accurate as if I were to attach a high school picture to this blog today.  Alas, data changes.  It can also be disseminated.  Data with movement&amp;#8212;now that's much more interesting to me.
&lt;/p&gt;&lt;p&gt;
I started in data processing with a summer job during college in 1977.  I had pounded the pavement for a job.  If I had not gotten this job at the last minute, I would have started as a waitress the following Monday.  I had memorized the menu already, but was petrified, knowing I was not really waitress material.  But I also knew I didn't want to return to being a nurse's aide in a nursing home ('73-'75) or a maid in a Holiday Inn (Maid of the Month, July 1976).  
&lt;/p&gt;&lt;p&gt;
My qualifications for this job programming COBOL on a Pr1me 300 were that I had taken the equivalent of one semester course covering COBOL, BASIC, and Fortran (two half courses, to be precise).  The person hiring me said the other qualification I had was that I was majoring in mathematics and to him that meant that I was smart. 
&lt;/p&gt;&lt;p&gt;
While it is impossible to know what would have happened under different circumstances, I am as sure as I can be that I would not major in computer science were I to enter college today.  Often a computer science major is required now for those entering software development professions.  I can think of no reason why I would have chosen to major in a machine.  I have even avoided joining &lt;acronym title="Association for Computing Machinery"&gt;ACM&lt;/acronym&gt; until last year when I wanted to download more papers than would make sense on a pay-per basis.  I didn't like &lt;em&gt;Machinery&lt;/em&gt; in the name.  I have no interest in machinery, much less an association for such machinery.
&lt;/p&gt;&lt;p&gt;
What is the percentage of women in the car industry compared to the travel industry?  What is the percentage of women in computer science compared to those who were once in that now-called-something-else profession of data processing?  Some think the decline is a failure of the women's movement or, perhaps, of women.  But I see it as a success with the women's movement in that girls know they have options.  It is a  failure of our discipline to appeal to these girls as it once appealed to the girl I was when it captured me.
&lt;/p&gt;
&lt;blockquote cite="http://www.umbc.edu/cwit/computer_mania.html"&gt;
&lt;ul&gt;
&lt;li&gt;The percentage of women receiving bachelor-level degrees in computer or information sciences has declined from a peak of 35.8 percent in 1984 to 26 percent today. 
&lt;/li&gt;
&lt;li&gt;Among the science and engineering workforce, computer science is the only area where women's participation has declined since 1993. (umbc.edu/cwit/computer_mania.html)
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;blockquote class="pullquote"&gt;I'm not interested in bases of data.  I'm interested in uses of data.&lt;/blockquote&gt; 
&lt;p&gt;
I like movement and connections.  To me, data processing is like travel, like movement.  Databases are more like computers and cars.  I'm not interested in bases of data.  I'm interested in uses of data, in data movement.  I went on for a master's degree in pure mathematics, not applied mathematics or science.  So there is little appealing to me about a discipline called &lt;em&gt;Computer Science&lt;/em&gt;.  I'm drawn to it about as much as to the term &lt;em&gt;Database Management System&lt;/em&gt;.
&lt;/p&gt;
&lt;blockquote cite="John Fowles"&gt;
That is the great distinction between the sexes. Men see objects, women see the relationships between objects. (John Fowles)
&lt;/blockquote&gt;
&lt;p&gt;
Of course this is a generalization.  I do like language and data.  But I would say that I like the relationship of language and data within systems that include people.  I like connections, change, impact, and movement.  Am I really about to blame some part of  the decrease in women in computing on the change from a focus on &lt;em&gt;data processing&lt;/em&gt; and terms implying movement to a focus on &lt;em&gt;computer science&lt;/em&gt; and nouns of &lt;em&gt;data&lt;/em&gt;, &lt;em&gt;&lt;acronym title="Database Management System"&gt;DBMS&lt;/acronym&gt;&lt;/em&gt;, &lt;em&gt;tables&lt;/em&gt;, &lt;em&gt;domains&lt;/em&gt;, &lt;em&gt;constraints&lt;/em&gt;, and &lt;em&gt;objects&lt;/em&gt;?  Yup.  (Do you like how I also just tossed the &lt;acronym title="Object Oriented"&gt;OO&lt;/acronym&gt; folks into the same bucket as the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; folks?)
&lt;/p&gt;&lt;p&gt;
Thinking in terms of static data, without concurrently addressing changes to the shape, content, and distribution of that data, the use of and interactions with the data, the movement of data, is simply not compelling.  These &lt;dfn title="data"&gt;encoded propositions&lt;/dfn&gt; exist in fluid, changing systems, not as &lt;em&gt;data in a vacuum&lt;/em&gt;.  
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;Put the processing back with the data, please.&lt;/blockquote&gt; 
&lt;p&gt;
If you take a girl today similar to the girl pictured here, with similar interests and aptitudes, my hypothesis is that she will not end up in computer science.  That's what happens when the women's movement meets a discipline defined in terms that suggest no movement.  Girls rarely choose to focus on objects or data.  Let's tap into the data &lt;em&gt;movement&lt;/em&gt; and put the processing back with the data, please.
&lt;/p&gt;
&lt;p&gt;
&lt;a href="http://www.tincat-group.com/mewsings/2006/02/model-behind-interface.html"&gt;Continue to next blog &amp;rarr;&lt;/a&gt;
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/12028427-113874006772676260?l=www.tincat-group.com%2Fmewsings%2Findex.html'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/113874006772676260/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=12028427&amp;postID=113874006772676260&amp;isPopup=true' title='16 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/113874006772676260'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/113874006772676260'/><link rel='alternate' type='text/html' href='http://www.tincat-group.com/mewsings/2006/01/data-movement.html' title='The Data Movement'/><author><name>--dawn</name><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09089606608203203470'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>16</thr:total></entry><entry><id>tag:blogger.com,1999:blog-12028427.post-113815957132241678</id><published>2006-01-24T21:25:00.000-06:00</published><updated>2006-11-12T17:57:59.938-06:00</updated><title type='text'>The Naked Model</title><content type='html'>&lt;div class="insetpic"&gt;
&lt;img src="http://www.tincat-group.com/images/naked-model.gif" alt="naked-model.gif image not available"  title="No need for image with this title" /&gt;
&lt;/div&gt;
&lt;p&gt;
Strip the term &lt;em&gt;relational&lt;/em&gt; from &lt;em&gt;relational model&lt;/em&gt; and you have an unadorned &lt;em&gt;model&lt;/em&gt;.  So as not to confuse this with other possible meanings, we should be more precise.  This model is typically termed a &lt;em&gt;data model&lt;/em&gt;.  A data model is employed in the design, construction, and maintenance of computer software systems. 
&lt;/p&gt;
&lt;p&gt;
The goal of this article is to get us to a common understanding of the term &lt;em&gt;data model&lt;/em&gt; while also giving more indication of where these mewsings are headed.  Before zeroing in on the meaning of &lt;em&gt;data model&lt;/em&gt;, let's look at some similar terms used in software development that are &lt;strong&gt;NOT&lt;/strong&gt; the same.  For example, is this &lt;em&gt;data model&lt;/em&gt; minus the &lt;em&gt;relational&lt;/em&gt; adjective a...
&lt;/p&gt;
&lt;p&gt;
...Conceptual Data Model (CDM)?  &lt;strong&gt;Nope.&lt;/strong&gt;
&lt;br /&gt;
The CDM results from analyzing an area to be automated, capturing requirements, and communicating these between those who know the subject areas and those who will develop a software system.  While the CDM can be back-of-a-napkin informal, there are many techniques for adding rigor, including the use of Entity-Relationship or &lt;acronym title="Universal Modeling Language"&gt;UML&lt;/acronym&gt; Class Diagrams.
&lt;/p&gt;
&lt;p&gt;
...Logical Data Model (LDM)?  &lt;strong&gt;Nope.&lt;/strong&gt;
&lt;br /&gt;
This is the one that concerns me.  Please don't confuse the naked &lt;em&gt;data model&lt;/em&gt; with the logical data model, OK?  When talking about a particular system, an LDM might be called &lt;em&gt;the data model&lt;/em&gt; by some.  However, the LDM is different from the term &lt;em&gt;data model&lt;/em&gt; being discussed in this blog, so when I write &lt;em&gt;data model&lt;/em&gt; sans adjectives, I am not referring to an &lt;acronym title="Logical Data Model"&gt;LDM&lt;/acronym&gt;.  The LDM results from structuring a specific CDM and communicating that structure to the computer.  
&lt;/p&gt;
&lt;p&gt;
...Physical Data Model (PDM)?  &lt;strong&gt;Nope.&lt;/strong&gt;
&lt;br /&gt;
Only those writing the low-level database software need to know anything about the physical model, in theory (knowing grin goes here).  Pretty much the only time you will hear me talk about the physical data model is if I am saying that I am not talking about the physical data model.  
&lt;/p&gt;
&lt;p&gt;
Each of these three possible glossary entries is related to a particular problem space being modeled for incorporation in a computer system.  The data model we are talking about is more abstract.  Data models such as the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; have implications for all &lt;acronym title="Logical Data Model"&gt;LDM&lt;/acronym&gt;s.  
&lt;/p&gt;
&lt;p&gt;
Now that we know what our &lt;em&gt;data model&lt;/em&gt; is not, let's turn our attention to what it is.  The Relational Model (RM), introduced in an &lt;a href="http://www.tincat-group.com/mewsings/2006/01/is-codd-dead.html"&gt;earlier blog&lt;/a&gt;, is a sweet, tight, mathematical model based on set theory and predicate logic.  While you might have a hint that I'm putting the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; on trial over the course of these mewsings, I really do appreciate predicate logic and adore set theory.  I applaud the cleverness in modeling data with both set theory and predicate logic.  It can be quite helpful.  For example, if we organize data and prepare query languages aligned with first order predicate logic, we can prove that our queries will return accurate results with respect to the data, in a finite amount of time.  Also, if we choose a mathematically simplified data model, we can implement a mathematically simplified query language.
&lt;/p&gt;
&lt;p&gt;
In addition to appreciating mathematics, I also like religion.  But I hope to debunk some of the RM religion that has come along with the application of these mathematics to data.  The current use of the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; has been pervasive-enough in the industry that it will take me some time to lay out a case.  If all goes well, I plan to have closing arguments sometime before the end of 2006.  I will also admit that while I think I have a good case, I don't have it all formed into words in my head just waiting to hit paper. Writing in blog-sized units should help me refine and crystalize my thinking.  I hope that you, the jury, enjoy taking the journey through the evidence with me.
&lt;/p&gt;
&lt;p&gt;
I would like to enter into evidence the Information Principle as Exhibit A.  I will use a quotation from C. J. Date who is quoting E. F. (Ted) Codd.  Both of these men have been at the center of relational data modeling.
&lt;/p&gt;
&lt;div class="showoff" id="ip"&gt;
&lt;h6&gt;Exhibit A: The Information Principle&lt;/h6&gt;
&lt;p&gt;
"The Information Principle (which I heard Ted refer to on occasion as the fundamental principle underlying the relational model) [is]...
&lt;/p&gt;
&lt;blockquote cite="http://www.sigmod.org/codd-tribute.html"&gt;The entire information content of a relational database is represented in one and only one way: namely, as attribute values within tuples within relations." (Date, &lt;em&gt;Edgar F. Codd, A Tribute&lt;/em&gt;, www.sigmod.org/codd-tribute.html)&lt;/blockquote&gt;
&lt;/div&gt;
&lt;blockquote class="pullquote"&gt;A data model is related to the representation of data&lt;/blockquote&gt; 
&lt;p /&gt;
&lt;p&gt;
Tuck this point away: a data model is related to the representation of data. Now let's move on to a definition of a generic data model, using Date to rephrase Codd.
&lt;/p&gt;
&lt;blockquote cite="Date on Codd"&gt;
Codd defines a data model in a 1980 paper &lt;em&gt;Data models in database management&lt;/em&gt;.  By his definition a data model consists of a collection of data structure types, operators that can be applied to instances of these types and consistency rules that define valid states for the data.
&lt;/blockquote&gt;
&lt;p&gt;
Objects, operators, and, effectively, rules for assignment…Hmmm…  If we were to &lt;em&gt;implement&lt;/em&gt; a data model what would we have?  Let's take a look at a recent definition of &lt;em&gt;data model&lt;/em&gt; from Date.
&lt;/p&gt;
&lt;blockquote cite="Date, An Introduction to Database Systems, 8e"&gt;A &lt;strong&gt;data model&lt;/strong&gt; is an abstract, self-contained, logical definition of the objects, operators, and so forth, that together constitute the &lt;em&gt;abstract machine&lt;/em&gt; with which users interact.  The objects allow us to model the &lt;em&gt;structure&lt;/em&gt; of data.  The operators allow us to model its &lt;em&gt;behavior&lt;/em&gt;. (C. J. Date, &lt;em&gt;An Introduction to Database Systems&lt;/em&gt;, Addison Wesley, 8e, 2003, p 15-16)&lt;/blockquote&gt;
&lt;blockquote class="pullquote"&gt;The implementation of a data model is a programming language&lt;/blockquote&gt; 
&lt;p&gt;
I conclude from this that the implementation of a data model is a programming language, whether a general purpose programming language or not.  Also, each programming language provides an implementation of a data model or perhaps more than one.  Put another way, a data model is an abstraction of a programming language or programming sublanguage.
&lt;/p&gt;
&lt;p&gt;
Now that we have some clarification of the term &lt;em&gt;data model&lt;/em&gt;, I will make a claim that is likely agreeable to readers as I have never heard anyone argue otherwise.  The &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; is not necessary.  It is not necessary for developing software solutions, maintaining large shared databases, or any other purpose in the world of software development.  Any software solutions that can be developed while employing the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; could be written without it, using other data models.  I will follow this up in a future blog by showing that the RM is not sufficient for developing and maintaining data-based software.  Once we are all on the same page that the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; is neither necessary nor sufficient, we can look at what the purpose of the RM is and discuss its comparative usefulness.
&lt;/p&gt;
&lt;p&gt;
My beef with the RM is related both to normalization theory as taught in colleges and universities, discussed in the &lt;em&gt;&lt;a href="http://www.tincat-group.com/mewsings/2006/01/is-codd-dead.html"&gt;Is Codd Dead?&lt;/a&gt;&lt;/em&gt; blog and to the way the RM, or parts thereof, are used in the practice of software development and maintenance today.  It shapes the thinking of software developers in ways that are often not the most effective.  
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;The RM is not necessary&lt;/blockquote&gt; 
&lt;p&gt;
And, by the way, if you are thinking that the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; need not be obvious in a developer's programming language but could be hidden behind the scenes, then my work is done.  That would mean that no computer language would need to use the Information Principle, and neither you nor I would need to use the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; as a data model.  We can use any programming language that does not represent itself as an implementation of the RM to employ an alternative data model.  Did I mention that the RM is not necessary?
&lt;/p&gt;
&lt;p&gt;
&lt;a href="http://www.tincat-group.com/mewsings/2006/01/data-movement.html"&gt;Continue to next blog &amp;rarr;&lt;/a&gt;
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/12028427-113815957132241678?l=www.tincat-group.com%2Fmewsings%2Findex.html'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/113815957132241678/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=12028427&amp;postID=113815957132241678&amp;isPopup=true' title='17 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/113815957132241678'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/113815957132241678'/><link rel='alternate' type='text/html' href='http://www.tincat-group.com/mewsings/2006/01/naked-model.html' title='The Naked Model'/><author><name>--dawn</name><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09089606608203203470'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>17</thr:total></entry><entry><id>tag:blogger.com,1999:blog-12028427.post-113747716105377472</id><published>2006-01-16T23:50:00.000-06:00</published><updated>2006-11-12T17:57:59.613-06:00</updated><title type='text'>Who Ordered the Ripple Delete?</title><content type='html'>&lt;div class="insetpic"&gt;
&lt;img src="http://www.tincat-group.com/images/rippledelete.gif" alt="Ripple Delete with Digital Video Editing"  title="Ripple Delete" /&gt;
&lt;/div&gt;
&lt;p&gt;
I have dabbled a bit in digital video editing, inserting and deleting frames, for example.  If I select frames and hit the keyboard Delete, the frames are removed, but a gap remains where they once were.  That is often not what I want.  &lt;strong&gt;Enter:&lt;/strong&gt; the ripple delete.  I'll admit to having a slight shiver of delight when I perform a ripple delete.  Behind the scenes it not only deletes the frames, but moves frames up to cover the gap.  Editors who have worked with physical film and a razor blade must be ecstatic.
&lt;/p&gt;
&lt;p&gt;The frames of a video are similar to any other ordered list of data.  This ripple delete feature can be added to any software application that shows users an ordered list.  Product features are not determined by a particular underlying database data model.  However, I am using the feature of an ordered list to set the stage for investigating the meaning and implications of chosing one or another data model, with a definition for "data model" coming in the next blog.  If the same features can be implemented in software whether using the Relational Model (RM) or not, why might a team choose not to employ the RM?  
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;Rather than avoiding ordered lists, you start seeing how common they are when you free your mind of the RM.&lt;/blockquote&gt; 
&lt;p&gt;
Let's turn to an example of a simple ordered list.  If I were not such a novice with &lt;acronym title="Asynchronous JavaScript And XML"&gt;AJAX&lt;/acronym&gt;, I might have provided an example of a ripple delete on a list, but my working example should help with the illustration none-the-less.  Also, please forgive my burst of saccharin marketing spin, but because of using 
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;tag-delimited strings
&lt;/li&gt;
&lt;li&gt;&lt;acronym title="Non-First Normal Form"&gt;NF2&lt;/acronym&gt; and
&lt;/li&gt;
&lt;li&gt;two-valued logic (2VL) 
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
throughout the entire development process, I'm naming this style of development &lt;strong&gt;End-to-End AJAX&lt;/strong&gt; or maybe &lt;strong&gt;N2N AJAX&lt;/strong&gt;.  You can gag now, but it isn't like naming JavaScript after Java, given that AJAX really is used at the front-end.
&lt;/p&gt;
&lt;p&gt;Using the example from the &lt;a href="http://www.tincat-group.com/mewsings/2006/01/is-codd-dead.html"&gt;last blog&lt;/a&gt;, I'll add a requirement for the e-mail addresses to be ordered.  Someone using the database would send either bulk or individual e-mails first to the first address and if that bounces, then to the second.  I can place an ordered list in my logical model and then in my implementation.  That way I can enjoy use of the ordered list without managing a separate ordering attribute myself, without having to remember to sort the output, nor writing my own ripple delete process.&lt;/p&gt;
&lt;p&gt;See &lt;a href="http://dawn.blackflute.com/mewsings/personEmail.html"&gt;this example&lt;/a&gt; as a hint at developing using End-to-end &lt;acronym title="Asynchronous JavaScript And XML"&gt;AJAX&lt;/acronym&gt;.  While a ripple delete is not a standard feature of an &lt;acronym title="Relational Data Base Management System"&gt;RDBMS&lt;/acronym&gt;, it is part of the charm of &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt; databases.  So, the e-mail list is defined as an ordered list to our database.  &lt;acronym title="Asynchronous JavaScript And XML"&gt;AJAX&lt;/acronym&gt; is used on the front-end, including &lt;acronym title="eXtensible HyperText Markup Language"&gt;xhtml&lt;/acronym&gt;, &lt;acronym title="cascading style sheet"&gt;css&lt;/acronym&gt;, and JavaScript.  The output comes from a query of the database without procedures to reshape it, so you can see that the database includes &lt;acronym title="Non-First Normal Form"&gt;NF2&lt;/acronym&gt; data, as described in the &lt;a href="http://www.tincat-group.com/mewsings/2006/01/is-codd-dead.html"&gt;first blog&lt;/a&gt;.
&lt;/p&gt;
&lt;div class="inlinepic"&gt;
&lt;a href="http://dawn.blackflute.com/mewsings/personEmail.html"&gt;
&lt;img src="http://www.tincat-group.com/images/orderedlist.gif" alt="End-to-end AJAX Example" /&gt;
&lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;
In practice, data modelers are influenced in their choices of a logical data model by their target &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt;.  If the target database is based on the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt;, the data modeler is less likely to select a property list for an entity (i.e. a multi-valued attribute requiring a new table in a SQL-DBMS).  I have heard analysts convince users that a single-valued attribute would be best or at least appropriate "for this phase."   It makes sense that if you have to split an attribute into a separate table, add in an ordering attribute and roll your own insert and ripple delete functions, you are simply less likely to even consider it.  A technique sometimes used when lists are implemented in an RDBMS is to number using intervals that permit easy insertion in the midst of the list as long as you do not run out of numbers in the interval.  There is then no hint in any given entry in the list what its ordinal position might be.  If the first e-mail address were identified as address 10 and the second were numbered 20, another e-mail address could be inserted as 15. But, rather than avoiding ordered lists, you start seeing how common they are when you free your mind of the  &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt;.   
&lt;/p&gt;
&lt;p&gt;
Even once you go through the work of implementing an ordered property list for an entity, the end-user might still be affected if you take your &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; thinking to the &lt;acronym title="User Interface"&gt;UI&lt;/acronym&gt;.  Think what the digital video editing tool user interface might be if it were to think like the RM.  It is unlikely that this editing software holds these frames in a relational database but it shows the interface your users might want even if you did have a relational database backing an ordered property list.  Now don't forget, software developers are users too.  The &lt;acronym title="DataBase Management System"&gt;DBMS&lt;/acronym&gt; &lt;acronym title="Application Programming Interface"&gt;API&lt;/acronym&gt;s they use can make a significant difference.
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;Some...might think I'm about to confuse the data model with a representation.&lt;/blockquote&gt; 
&lt;p&gt;
Some readers around the world (I was thrilled to have readers from every continent except Antartica this past week with the &lt;a href="http://www.tincat-group.com/mewsings/2006/01/is-codd-dead.html"&gt;first blog&lt;/a&gt;) might think I'm about to confuse the data model and the representation.  I'm not.  I am laying the groundwork for examining the definition and use of a data model.  What is the relationship between a data model and the &lt;acronym title="Application Programming Interface"&gt;API&lt;/acronym&gt; that developers use in working with a &lt;acronym title="Data Base Management System"&gt;DBMS&lt;/acronym&gt;?  If we can have ordered lists and perform ripple deletes no matter what data model we are using, then what is my point?  It has to do with the title of this blog&amp;#8212;between the developer and the database, who did what work; who ordered a list of properties?  For flexibility or productivity, does it make a difference who ordered the ripple delete?
&lt;/p&gt;
&lt;p&gt;
&lt;a href="http://www.tincat-group.com/mewsings/2006/01/naked-model.html"&gt;Continue to next blog &amp;rarr;&lt;/a&gt;
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/12028427-113747716105377472?l=www.tincat-group.com%2Fmewsings%2Findex.html'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/113747716105377472/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=12028427&amp;postID=113747716105377472&amp;isPopup=true' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/113747716105377472'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/113747716105377472'/><link rel='alternate' type='text/html' href='http://www.tincat-group.com/mewsings/2006/01/who-ordered-ripple-delete.html' title='Who Ordered the Ripple Delete?'/><author><name>--dawn</name><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09089606608203203470'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-12028427.post-113686652690785592</id><published>2006-01-09T22:14:00.000-06:00</published><updated>2006-11-12T17:57:59.252-06:00</updated><title type='text'>Is Codd Dead?</title><content type='html'>&lt;p&gt;
Time Magazine issued a cover in 1966 asking "Is God Dead?"  This is not the first time the name "Codd" has replaced "God" in a &lt;dfn title="The key, the whole key, and nothing but the key, so help me Codd"&gt;phrase&lt;/dfn&gt;, but in this case it is not for the purpose of comparison.  In this blog, I will dare to question some of Codd's legacy including some of the dogma passed along in college database textbooks today.
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;
&lt;a href="http://www.tincat-group.com/images/is-codd-dead.gif"&gt;&lt;img src="http://www.tincat-group.com/images/is-codd-dead-small.gif" alt="Is Codd Dead? parody"  title="Is Codd Dead? parody" /&gt;&lt;/a&gt;
&lt;/blockquote&gt;
&lt;p&gt;
E.F. (Ted) Codd died in 2003 leaving a significant contribution.  Codd is often called the father of relational theory.  His 1970 ACM paper
&lt;cite&gt;
&lt;em&gt;A Relational Model of Data for Large Shared Data Banks&lt;/em&gt; (E. F. Codd, Communications of the ACM, v.13 n.6, p.377-387, June 1970)&lt;/cite&gt; is a significant industry milestone.
&lt;/p&gt;
&lt;p&gt;
In this paper, Codd discusses what he sees as the advantages in modeling &lt;dfn title="encoded propositions, a combination of sign and meaning; accurate data are facts"&gt;data&lt;/dfn&gt; by use of mathematical &lt;dfn title="A subset of the set of ordered tuples (A1, A2, ..., Am) formed by the cross-product of sets S1 x S2 X ... Sm where each An is an element of Sn."&gt;relations&lt;/dfn&gt; compared to mathematical graphs of trees or networks.
&lt;/p&gt;
&lt;p&gt;
Relations are often represented as tables of rows and columns.  Trees are often visualized as nested folders and documents.  The network graph, seen by Codd as overly complex and a cause of some of the problems he was addressing, can be visualized as a web.  While a web, or directed-graph, might be a more complex mathematical structure than a relation, I predict that this data model might just catch on anyway (wink).  
&lt;/p&gt;
&lt;p&gt;
There are many viable models for data.  Each has its advantages and disadvantages.  
This blog will not be about right and wrong as much as better and worse approaches.  I'm a practitioner dabbling in theory in order to help improve the practice and not the other way around.
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;My advice is this: Stop normalizing your data.  Stop removing all repeating groups.&lt;/blockquote&gt;  
&lt;p&gt;
Codd also introduces the term "normalize" to refer to removing &lt;dfn title="domains with relations as elements; avoiding the def of a domain for now; good topic for a future blog"&gt;nonsimple domains&lt;/dfn&gt;, such as lists or tables of data often referred to as "repeating groups."  He is very clear in this paper that a relation could include repeating groups, but that normalizing it would make the data model simpler for some purposes.
&lt;/p&gt;&lt;blockquote cite="A Relational Model of Data for Large Shared Data Banks p 381 http://portal.acm.org"&gt;
The simplicity of the array representation which becomes feasible when all relations are cast in normal form is not only an advantage for storage purposes but also for communication of bulk data between systems which use widely different representations of data. (Codd, p. 381)&lt;/blockquote&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;
Anyone communicating bulk data by way of &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt; or &lt;acronym title="JavaScript Object Notation"&gt;JSON&lt;/acronym&gt; will recognize that we have different issues to solve today than we had in 1970. The rise of &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt; with its associated unnormalized data model is part of the impetus for what will likely be significant changes on the database landscape.
&lt;/p&gt;
&lt;p&gt;
My advice is this: Stop normalizing your data.  Stop removing all repeating groups. Note that I am using the original description of normalization from this paper.  This meaning of normlization was later termed, or at least rolled into the term, "First Normal Form" or 1NF.  The higher normal forms, such as &lt;acronym title="Boyce-Codd Normal Form"&gt;BCNF&lt;/acronym&gt;, include laudable work with functional dependencies, but all are defined to first require normalization.  There is definitely some good that can be salvaged from this normalizing debacle of the past few decades, but we must first ditch the requirement for data to be normalized, placed in &lt;acronym title="First Normal Form"&gt;1NF&lt;/acronym&gt;, stripped of repeating groups.  I will refer to relations that are not normalized, as others have, as NF2 for Non-First Normal Form.
&lt;/p&gt;
&lt;p /&gt;
&lt;div class="example"&gt;
&lt;strong&gt;Do this &amp;#8212;&gt;&lt;/strong&gt;
&lt;pre&gt;
   Id: 123456
First: Jayne
 Last: VanDoe
Email: jvdoe@abc123.com
       jov@xyz123.com
       jo3@aol.com
&lt;/pre&gt;
&lt;/div&gt;

&lt;p /&gt;

&lt;div class="example"&gt;
&lt;strong&gt;Not this &amp;#8212;&gt;&lt;/strong&gt;
&lt;pre&gt;
   Id: 123456
First: Jayne
 Last: VanDoe

   Id: 123456
Email: jvdoe@abc123.com

   Id: 123456
Email: jov@xyz123.com

   Id: 123456
Email: jo3@aol.com
&lt;/pre&gt;
&lt;/div&gt;
&lt;p /&gt;
&lt;p&gt;
In this way you will model entities, such as the person above, with their dependent properties, such as the list of e-mail addresses.  You only need to remove lists from your model, thereby going from the first example to the second above, if you are using tools that require it.  Given that &lt;acronym title="Structured Query Language '92 on which ODBC is based"&gt;SQL-92&lt;/acronym&gt; requires it, that is a big if.  There are other viable, time-tested &lt;acronym title="Non-First Normal Form"&gt;NF2&lt;/acronym&gt; options, however.
&lt;/p&gt;
&lt;blockquote class="pullquote"&gt;Don't be fooled&amp;#8212; there is no mathematical requirement to normalize data.&lt;/blockquote&gt;
&lt;p&gt;
But the Relational Model (RM) is based on mathematics, right?  Mathematics is precise.  What part of the argument for the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; is amiss?  Don't be fooled&amp;#8212;there is no mathematical requirement to normalize data.  Mathematics provides a means for modeling propositions to be handled in software, presented to end-users, passed as messages, or stored on secondary storage devices.  The &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; is a mathematical model.  It is a model.  Models are not the real thing.  Models are often anorexic versions of the real thing.  The mathematics of the relational model is sound, but the process of determining what this model should be used for is flawed.
&lt;/p&gt;
&lt;p&gt;
The &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; has been useful, but not as useful as some pre-relational models, in my opinion.  Post-relational models of data for messages, such as those mentioned above, look very much like pre-relational models.  I am hoping for a return to best practices for data models, whether or not the theory keeps up.  I would, of course, prefer that theory be better aligned with excellent practices.  Many pre- and post-relational tools use an &lt;acronym title="Non-First Normal Form"&gt;NF2&lt;/acronym&gt; model.
&lt;/p&gt;
&lt;p&gt;
You are likely familiar with &lt;acronym title="Relational Database Management System"&gt;RDBMS&lt;/acronym&gt; products, often referred to as relational databases.  Purists might prefer these be called SQL-DBMS products since &lt;acronym title="Structured Query Language"&gt;SQL&lt;/acronym&gt; does not promote a pure relational model.  I will use this column to dispel what I think to be myths that have helped SQL and the relational model rise to become king of the hill for a couple of decades.  While this introductory column is admittedly not meaty, I will delve into this further and provide working examples in the coming weeks.  
&lt;/p&gt;
&lt;p&gt;
While I have not yet experimented with any XML DBMS tools, I have been working with one &lt;acronym title="Non-First Normal Form"&gt;NF2&lt;/acronym&gt; model, often referred to as the MultiValue (MV) or Pick® data model, for over a decade.  This is not the only such model, but one with which I am comfortable, so I will introduce it here and use it in future illustrations and implementations.  
&lt;/p&gt;
&lt;p&gt;
Putting the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; and &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt; side-by-side while wearing both a technical and business woman's hat is what prompted me into further exploration of why the &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt; data model seems to yield higher productivity for developers, greater flexibility for changes over time, and lower risk of project failure.  This was particularly perplexing when I started researching the topic because the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; was developed to help improve database maintenance.  While the &lt;acronym title="Relational Model"&gt;RM&lt;/acronym&gt; addresses some maintainability issues better than &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt;, &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt; seems more flexible in many respects.  There are different risks and benefits associated with each approach.  
&lt;/p&gt;
&lt;p&gt;
Products employing an &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt; or &lt;acronym title="Non-First Normal Form"&gt;NF2&lt;/acronym&gt; data model include the IBM U2 products, Temenos jBASE, Revelation OpenInsight, Raining Data D3, Northgate Reality, EDP Plc UniVision, Ladybridge Systems OpenQM, and InterSystems Caché.  There are other viable functional data model implementations with which I am less familiar, such as Berkeley DB from Sleepycat Software and other products marketed as embedded databases.  This is definitely not a small niche market.
&lt;/p&gt;
&lt;p&gt;
OpenQM is an open source implementation, so I will use that for my examples in future blogs.  I will be the first to admit that &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt; isn't new, and although various flavors have tools to make it prettier, it typically doesn't look new.  It is unlikely to wow you at first glance, but it often grows on developers quickly with its big bang for the buck results and maintainability.  The same principles can be applied to many environments, however, and will typically not be specific to &lt;acronym title="MultiValue"&gt;MV&lt;/acronym&gt; tools.&lt;/p&gt;
&lt;p&gt;
I would like to see the industry start with an &lt;acronym title="Non-First Normal Form"&gt;NF2&lt;/acronym&gt; model and move it forward rather than squeeze more out of SQL, as has been attempted with the more recent SQL standards.  SQL will be with us for many years, but it is time to make an abrupt cut away from it wherever feasible.
&lt;/p&gt;
&lt;p&gt;
Codd will long be remembered for some very innovative work in the area of database theory.  But, yes folks, Codd is dead.
&lt;/p&gt;
&lt;p&gt;
&lt;a href="http://www.tincat-group.com/mewsings/2006/01/who-ordered-ripple-delete.html"&gt;Continue to next blog &amp;rarr;&lt;/a&gt;
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/12028427-113686652690785592?l=www.tincat-group.com%2Fmewsings%2Findex.html'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/113686652690785592/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=12028427&amp;postID=113686652690785592&amp;isPopup=true' title='35 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/113686652690785592'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/113686652690785592'/><link rel='alternate' type='text/html' href='http://www.tincat-group.com/mewsings/2006/01/is-codd-dead.html' title='Is Codd Dead?'/><author><name>--dawn</name><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09089606608203203470'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>35</thr:total></entry><entry><id>tag:blogger.com,1999:blog-12028427.post-111300035085090193</id><published>2005-11-27T16:17:00.000-06:00</published><updated>2006-11-12T17:57:58.913-06:00</updated><title type='text'>A Modeling Profession</title><content type='html'>&lt;p&gt;Look for this column (blog?) to start up in early 2006.  This first installment is for setup and testing purposes, although it is hopefully still worth reading.
&lt;/p&gt;
&lt;p&gt;
I'm a bit old to begin a career in modeling, although I still have nice legs. Enough of that. I am interested in all aspects of software development, but have recently been studying modeling, particularly data modeling. While research is never-ending, I'm ready to start the writing side of this effort, beginning with this "tincat musings" column.
&lt;/p&gt;
&lt;p&gt;
There. I called it a column. I can't bring myself to call it a "blog" just yet as that would make me a blogger. I don't know that it is fair to call it a column, because no one has hired me as a columnist, I simply took it upon myself. If it looks like a blog and it reads like a blog...
&lt;/p&gt;
&lt;p&gt;
I will be writing text that will flow into rows in this column. The column will hold data of a type we could call "Document." Of course, we could declare that any character-based column holds data of type Document, where we define this type to include strings of unicode characters (or ascii, for those over 30). Other columns could then be of type "Mime" perhaps. That's it - that's all I need for my "database" - those two types.
&lt;/p&gt;
&lt;p&gt;
So I've moved from the vocabulary of working with character and binary files in the 70's to databases with Document and Mime types today. Some of you might be suggesting that a Document could include Dime/Mime types, but that's an implementation technique. Documents, as far as I'm concerned, are entities that could be implemented using paper and pen (as the implementation for those blessed with eyesight - it could be implemented otherwise for those not). With my definition, if it looks like a document and reads like a document...
&lt;/p&gt;
&lt;p&gt;
Now that I know that everything I want to model will be either of type Document (or a subtype) or of type Mime (ditto), I'm ready for the runway. After I get a few more of these columns under my belt (a fine silk sash from Paris), I'll be ready for takeoff. Speaking of mixed metaphors, what is a model anyway?
&lt;/p&gt;
&lt;p&gt;
&lt;a href="http://www.tincat-group.com/mewsings/2006/01/is-codd-dead.html"&gt;Continue to next blog &amp;rarr;&lt;/a&gt;
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/12028427-111300035085090193?l=www.tincat-group.com%2Fmewsings%2Findex.html'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/111300035085090193/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=12028427&amp;postID=111300035085090193&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/111300035085090193'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/12028427/posts/default/111300035085090193'/><link rel='alternate' type='text/html' href='http://www.tincat-group.com/mewsings/2005/11/modeling-profession.html' title='A Modeling Profession'/><author><name>--dawn</name><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='09089606608203203470'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>2</thr:total></entry></feed>