Thread: Data Mapping and Moving Relationships

  1. #1
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446

    Data Mapping and Moving Relationships

    Despite being some sort of an OOP fan, I cannot be completely oblivious to its limitations... or shortcomings... or perhaps some lack of foresight in its design. You choose.

    There are two issues that always made me reevaluate OOP usefulness. I'm going to discuss them. You, of course, are free to ignore me completely, or provide your insight into the discussion. The reason why I'm doing this is because I never before discussed this issue with anyone. All tidbits of info I read here and there had always been addressed by zealots of either the OOP or Anti-OOP fields, which invariably come strongly biased.

    Relational Data Mapping
    OOP always was, and still is, a mystery to me. But of all the mysteries, the one I took longer to understand was OOP lack of capacity to establish relational links between data. I mean, everything in OOP screams relations and yet the universe at which this happens is completely different from that of Relational Databases. It's a little like Einstein trying to grab electromagnetism and add gravity to produce an unified rule. It won't happen. Relational Databases and OOP have completely different rules and operate at completely different levels, even if the idea of joining them or trying to find a common place among them may seem appealing.

    The very first problem I encountered was trying to achieve with OOP the same level of atomicity expressed by relational databases. I don't think I'm the only one who has ever tried before to simply write a class that translated directly to a relational database table, names and everything, only to have to scratch the design as soon as I tried to express in code the relationships between the tables of the database. What is funny is that it took me a long while to finally understand that the sad truth is that OO design doesn't map well to Relational Data. When I finally did, I felt like Einstein on a bad day.

    OOP atomicity is at the class level. Not that of the data member. The amount of effort needed to translate data rules to OOP is high. So high, that it is clearly telling us to stop right there. That is not what OOP was designed for. And this is probably what some OOP defacers don't realize. These are two distinct universes. At first sight the Einsteins may think there is a relationship between them. But... there isn't.

    Wish: I wish OOP evolves one day to have the necessary constructs to map directly to relational data.

    Moving (Dynamic) Relationships
    But if the above is not enough, and someone still insists it is possible to express those kind of relationships (some OOP fans do really believe OOP is the solution to everything), he will be happy to show up with a neat (albeit rather complex) system of classes, interfaces, and whatnots to prove me wrong.

    I could perhaps point out the complexity of the code or the time spent for something that is expressed by SQL in 10 or so lines. But that would still lead to debate. As such, all I had to do was to change one of the database rules, to completely destroy his entire design.

    OOP is pretty much static. Once the rules are laid out, nothing ever changes. One of the problems of OOP is the fact we need to code with this in mind. OOP defacers point out, perhaps with some reason, if I want to program with change in mind I will end up with a completely different design than that where I didn't expect any changes. But adding to this, we cannot cope with every possible change that may happen in data rules. We can expect a new department to be added to the company. But can we expect Human Resources to move under the direct supervision of the Board for a period of 6 months?

    Furthermore, changes in data rules invariably lead to changes in code. There is no way around it. OOP is static and it must conform to the rules in a given period of time. It's easier to predict additions and omissions. But it's much harder to predict, and almost (if not at all) impossible to express, changes in rules.

    Wish: I Wish OOP one day adds specific structures capable of expressing ad-hoc relationships.

    Conclusion
    The best argument against my wishes is how I started them; "OOP is not designed for that". However, I wonder how much of that would be possible.

    Granted I never did much reading on these issues. However I don't think it's anyone plan to say OOP is finished, we did all we had to do, and it's the perfect tool for its job. I expect the paradigm to evolve (even if we haven't seen much of that lately) and conform to the demands of an industry that keeps moving towards the future.

    OOP lacks abilities to which it wasn't designed for. The question is though, can it add that functionality?
    Last edited by Mario F.; 12-09-2006 at 05:58 PM.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  2. #2
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    I'm somewhat confused by your post. But I'll try to answer as much as I can and ask for clarification where I'm completely lost. Bear in mind that my knowledge about relational theory is limited to a good working knowledge of relational databases.

    First, OOP wasn't designed. It evolved.

    OOP lack of capacity to establish relational links between data.
    What, exactly, is a "relational link"?
    In relational databases, some columns can be designated as referencing fields in other tables. Invariably, in my experience, this other field is some sort of identifier. In other word, since the identifier establishes identity, the field in the first table references the whole record of the second.
    In yet other words, this is exactly the same as an object holding a reference to another object, and this is how O/R mappers (like Hibernate for Java) map these things.

    OO design doesn't map well to Relational Data
    This is true, but not quite in the way you think.
    It's actually pretty easy to write a set of classes that map easily to a set of tables. Tools like Hibernate can even mostly automate the mapping. They can generate classes from table definitions or table definitions from classes.
    These are the domain objects - mostly just data.
    The problems start when you venture where objects are more than just data. Objects encapsulate behaviour. Behaviour has no place in relational theory: that's all about the data. But object-orientation is actually a step away from the data, even seen from the viewpoint of procedural languages, from which it evolved.
    So yes, OOP atomicity really seems to be at the object level (although I'm not sure what exactly you mean by atomicity). The idea is to hide the details of the data behind the interface of the object.
    But what are data rules, then? Restrictions on the form of data, on its relations, can be implemented in objects - should be implemented in objects. That's the point of encapsulation. What can relational data rules do that cannot be done in objects?

    I could perhaps point out the complexity of the code or the time spent for something that is expressed by SQL in 10 or so lines.
    What kind of thing do you have in mind? Keep in mind that SQL is a highly specialized query language, while most object-oriented languages are general-purpose languages. They will always lose in terms of code size and complexity. On the other hand, you can't write a text editor in SQL.
    That's the thing about domain specific languages, DSLs. They're efficient, but limited. With a Makefile, I can express in just 10 lines highly complex interdependencies between files, and Make will build them in the correct order, rebuild only what's needed, and run independent tasks in parallel. To do that in a general-purpose language would take ... well, about the number of lines in the Make source

    Once the rules are laid out, nothing ever changes.
    Wrong. Totally wrong. As James Coplien points out in Multi-Paradigm Design for C++, program design should always anticipate and accommodate changes. For this, it is necessary to understand where the likely points of change are. But if you identified them correctly, you will not have to redesign to incorporate the change. If you didn't identify them correctly, you will.
    But that is not something specific to object-oriented design. It applies to procedural design, functional design, and yes, to relational design. Case in point: I designed a program that managed "maps" full of "objects". These objects were of different types. From the beginning I knew that some types were significantly different from others. Being young and inexperienced, I ignored this, both in the relational design (the database structure) and the object-oriented design (the Java program).
    When the object types diverged further as requirements evolved, I was equally screwed in both areas. Sure, completely redesigning the database took less time than completely redesigning and rewriting the program, but then, the database doesn't contain behaviour.

    But can we expect Human Resources to move under the direct supervision of the Board for a period of 6 months?
    Can you? If you do, you will accommodate that possibility in the design. If not - well, bad luck. Did you anticipate this change in the data rules, or did you just have to redesign your rules, too?

    Don't be misled by amount of effort. If a ruleset takes 10 line to express in the highly specialized SQL and 400 in a general-purpose language like C++, then changing two lines in the SQL is equivalent to changing 80 lines of C++: 20% of the program.

    Furthermore, changes in data rules invariably lead to changes in code.
    Well, duh! You changed the rules - you expect the code expressing those rules to magically accommodate this?

    I Wish OOP one day adds specific structures capable of expressing ad-hoc relationships.
    I cannot even begin to fathom what such structures would look like, or what they would do to predictability and correctness.

    In conclusion, I'm now even more confused than before about what you want. It seems like perhaps you're so used to relational design that it doesn't appear like any effort to you. Like changing a business rule is nothing big, nothing to think about much.
    Well, that's not how it works. If the business rule changes, the business logic has to change.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  3. #3
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Let me see if I can clarify...

    By no means I'm trying a comparison between Relational Databases and OOP. I understand the goals behind each (or I think I do behind OOP). What I'm trying to suggest is an approximation between OOP and Relational Databases so that the mapping occurs more easily.

    Quote Originally Posted by CornedBee
    What, exactly, is a "relational link"?
    [...]
    In yet other words, this is exactly the same as an object holding a reference to another object, and this is how O/R mappers (like Hibernate for Java) map these things.
    I'm completely unaware of O/R software. I will definitely take a look. However I think references cannot really express the relationship between data fields when, for instance, that relation is spread to more than just one field, or the rules of the relationship are more than just a simple equal to.

    I wish I had the necessary English to express myself at this point... but I feel that, yes it is of course possible to express database rules through OOP. But in order to do it, I am forced to move away from data abstraction into object abstraction (which is normal... after all it's OO). The problem is that objects don't map well into data as it is understood by relational databases.

    Imagine a customers <-> invoice relationship based on the customer ID. Some query runs that extracts all invoices from a given customer. Can you see the difficulty in expressing this same problem through OOP?

    Of course the best solution (and on the case of C++, the most obvious) is to run the query as SQL and process the results with C++, possibly storing the results in a container of invoices. But I'm not trying to compare SQL to OOP. What I'm trying to understand if it is possible, or even desirable, to have OOP express these type of relationships in order to have an easier mapping to relational data. I'm assuming of course this would come at the cost of new keywords and possibly something more than just one type of object, the Class.

    Quote Originally Posted by CornedBee
    Wrong. Totally wrong. As James Coplien points out in Multi-Paradigm Design for C++, program design should always anticipate and accommodate changes.
    Yes, but I'm on the domain of speculation. What if we are given a set of tools that allow us to accommodate those changes more easily? As it is this is a lengthy, awkward and error-prone process. We spend a lot of time analysing the business rules, identifying those mutable areas, coding for them and sometimes to a effort that may even end up not being justified by a change that ended up never happening.

    I have to confess CornedBee I'm having difficulty expressing myself. I'm not comfortable with OOP enough to be able to perhaps use the right wording. But to summarise the above paragraph it seems to me there should be an effort to make this process more quickly achieved. The fact is that if we do it right, we shouldn't worry. But not only "right" takes a long time to achieve, but also very few business rules allow it to be done right. Some domains simply don't have an answer to what can change and how.

    Quote Originally Posted by CornedBee
    I cannot even begin to fathom what such structures [structures capable of expressing ad-hoc relationships] would look like, or what they would do to predictability and correctness.
    Oh. I cannot either. Especially on the actual context of OOP. However it would be nice to be able to quickly move objects between hierarchies for those domains which demand such functionality. It would also be nice to be able to do this in run-time.

    Quote Originally Posted by CornedBee
    In conclusion, I'm now even more confused than before about what you want. It seems like perhaps you're so used to relational design that it doesn't appear like any effort to you. Like changing a business rule is nothing big, nothing to think about much.
    By no means. I am used to relational databases design, yes. But business rules changes may have a big impact there too as in any OO design. It all comes down to exactly what changed. I do feel though that everything being equal, changes to data on a data-centric design such as that of relational databases is more easily solved, than changes to objects on a object-centric design such as that of OOP.

    I am not comfortable enough with OOP logic and theory to evaluate if my thoughts have any meaning. By your answer I'm starting to guess they don't. Which is fine, really. Again it's not my purpose to establish comparisons or to denounce OOP as being any less of a paradigm. I'm questioning if it is there are any grounds for improvement.
    Last edited by Mario F.; 12-09-2006 at 08:13 PM.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  4. #4
    Registered User
    Join Date
    Mar 2006
    Posts
    725
    In my opinion, OOP isn't "good" or "bad" in any sense. Tools for the job, as they say. All the scenarios you mentioned are one-shot coding tasks that can be accomplished, OOP or not, in a lazy afternoon. Consider this multi-million dollar project which requires engineers to code one tinny bit at a time. The immense project will have to be immensely broken up into lazy afternoon-sized pieces and the relationships between the code blocks will have to be clearly defined. The benefits here are obvious.

    OOP lack of capacity to establish relational links between data
    Huh. Boost has written entire language parser frameworks through templated classes.

    First, OOP wasn't designed. It evolved.
    True. Mario, you seem to be treating OOP like a nebulous thing out there which must be fully obeyed to the letter or done away with completely. Few people would try to write a nontrival program using nothing but pure OOP. OOP comes from treating data as objects, it's that simple. You can treat a crude linked list as an object in its own right; a poorly written class may exhibit no OO design at all.

    It would also be nice to be able to do this in run-time.
    You're venturing into the blue here. This is, to me, more in the domain of high-level scripting languages; Lisp comes to mind (defmacro?). OOP is a higher-level design methodology; your proposal would take OOP to an even higher level; I can see a possible implementation (if I get the gist of your post right) as a dynamic program implemented on top of a set of rules governing object behiviour.

    Or were you thinking along the lines of dynamic or genetic programming?
    Code:
    #include <stdio.h>
    
    void J(char*a){int f,i=0,c='1';for(;a[i]!='0';++i)if(i==81){
    puts(a);return;}for(;c<='9';++c){for(f=0;f<9;++f)if(a[i-i%27+i%9
    /3*3+f/3*9+f%3]==c||a[i%9+f*9]==c||a[i-i%9+f]==c)goto e;a[i]=c;J(a);a[i]
    ='0';e:;}}int main(int c,char**v){int t=0;if(c>1){for(;v[1][
    t];++t);if(t==81){J(v[1]);return 0;}}puts("sudoku [0-9]{81}");return 1;}

  5. #5
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Yes. I may be looking at this from a completely wrong perspective. Thanks for the links. They'll provide some interesting reading.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  6. #6
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Quote Originally Posted by Mario F.
    However I think references cannot really express the relationship between data fields when, for instance, that relation is spread to more than just one field, or the rules of the relationship are more than just a simple equal to.
    Every instance of a relationship spread to more than one field I have seen was about there being no better identifying set of values. But of course that could be my lack of experience. (Whether you believe it or not, I actually am a 22-year-old student. ) I also haven't seen any relations other than equal-to being expressed, although I'm aware of the possibility. (RDBMs can do it through check expressions. Ironically, things like ER diagrams cannot express these constraints except in prose.)
    But I still don't see where the RDBM is more expressive than a bit of checking code in a property setter. (Aside from the obvious - SQL being more expressive than GPLs in general.)

    The problem is that objects don't map well into data as it is understood by relational databases.
    True. Relational databases express data as sets of attributes called records (rows) stored in a table (relation, heard another term once), with relationships between the various fields that are expressed through common values and sometimes (if the DBMS supports it) constraints that they actually must match. Reading data is about formulating selection conditions and retrieving the records that match these conditions.
    OOD expresses data as sets of attributes called classes that may be stored individually or in some data structure (array lists, linked lists, sets, ...). Relationships are expressed through direct referencing. Reading data is about walking the relationship graph until you arrive where you want.

    Imagine a customers <-> invoice relationship based on the customer ID. Some query runs that extracts all invoices from a given customer. Can you see the difficulty in expressing this same problem through OOP?
    In a way. If you adhere strictly to the relational way where you have one collection that you run queries on, yes, that's very tricky to express in OOP. (But look at Boost.Multi_Index. Very interesting stuff in that department.) But it's not the OOP way. There wouldn't be a query in OOP - there would be the Customer object that already holds references to all invoices that belong to that customer. There would also probably be an object in the module that does the accounting that holds references to ALL invoices.
    An object-oriented database is even capable of storing the data this way. RDBMSs are not the only way to store data, after all.

    Of course the best solution (and on the case of C++, the most obvious) is to run the query as SQL and process the results with C++, possibly storing the results in a container of invoices.
    That's assuming that the data is in a relational database to begin with, an assumption that makes the entire example flawed to begin with. How can you express in object-oriented terms a query on relational data?
    If the data was object-oriented to begin with, the container of invoices would already exist. There would be no query, just a following of pointers. As a Wikipedia article puts it, it would be a navigational instead of a declarative interface.

    What I'm trying to understand if it is possible, or even desirable, to have OOP express these type of relationships in order to have an easier mapping to relational data.
    I'm still not sure what these types of relationships are.

    Something you should look at is the data model used by directory systems such as LDAP. It's somewhere in-between.

    What if we are given a set of tools that allow us to accommodate those changes more easily?
    Isn't that what OOP was about in the first place? Organize programs better to make them more understandable, and have runtime binding to make them extensible through polymorphism.

    As it is this is a lengthy, awkward and error-prone process.
    Such is programming.

    We spend a lot of time analysing the business rules, identifying those mutable areas, coding for them and sometimes to a effort that may even end up not being justified by a change that ended up never happening.
    I still don't understand why this is in any way particular to OOP. I don't believe for a moment that you don't have to analyse business rules for coming up with a database schema. Relational models are just as much subject to change as code.

    Let's take up your earlier example about the departments. In analysing this for the relational model, we know that departments are under the control of other departments, except for some top-level departments. We can express this like so:
    Code:
    DEPARTMENT:
        id        IDENTIFIER
        name      STRING
        controller IDENTIFIER REFERENCES DEPARTMENT NULLABLE
    If the Human Resources department goes under the direct control of the board of directors (which is not a department itself), what can you do?
    For that matter, what can you do if you know that might happen? An is_controlled_by_bod field is a hack. An unclean solution. The right way to go is to have controller reference either DEPARTMENT or any other CONTROLLING_ENTITY, whatever that might be: the board of directors, the special emergency council of the venture capitalist, ...

    In OOP, this would be expressed like this:
    Code:
    class ControllingEntity:
    
    class Department extends ControllingEntity:
        name String
        controller ControllingEntity
    
    class Council extends ControllingEntity:
        name String
    
    object BoardOfDirectors = new Council("board of directors")
    object VentureCapitalistCouncil = new Council("vc ec")
    Inheritance, however, is not well supported by the relational model, although ER diagrams have the concept. You do it by having multiple tables, one for each "class". They share primary keys. But whenever I did this, I could never get rid of that "hack" feeling. PostgreSQL has the INHERITS keyword, but it doesn't work properly with constraints. Views and instead-of triggers in Oracle form an apparently accepted way of combining tables into one virtual table (and to imagine I was so proud of coming up with this way of simulating INHERITS ...), but it's quite a bit of work you have to do for every table.

    So how does the relational model express change more easily than OOP?
    Understand that I'm not trying to push OOP on you here. I'm just trying to understand where you're coming from with your wishes.

    I do feel though that everything being equal, changes to data on a data-centric design such as that of relational databases is more easily solved, than changes to objects on a object-centric design such as that of OOP.
    I do not feel that the ease of change is greater than what is accounted for by the simple fact that objects contain behaviour as well as data. If your database schema contains a lot of triggers and stored procedures, you'll have one hell of a time verifying that they're still all correct after changing a relation or constraint. Did any of them depend on the constraint and could now fail? Questions like these are what makes correctness difficult. (Triggers and procedures aren't really the relational model, though: they might define behaviour. They're also typically founded on procedural, modular and structural design techniques.)
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  7. #7
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Finally I got some time to follow on on this. You obviously took some time to answer and not only I didn't want you to feel it was wasted time, but also the issue obviously interests me
    Just too busy with other stuff to be able to devote this thread the time it deserves.

    Every instance of a relationship spread to more than one field I have seen was about there being no better identifying set of values.
    True, of course. The very definition of a primary key. However, quiet common among anything but the simplest RDBs. The "rules of engagement" when designing a RDB are highly bound to the Data Normalization principles, of which atomicity is an important part. If the only way to define a primary key involves two or more fields, so be it. I shouldn't "concatenate" those fields into one, breaking data valency.

    I also haven't seen any relations other than equal-to being expressed, although I'm aware of the possibility. (RDBMs can do it through check expressions. Ironically, things like ER diagrams cannot express these constraints except in prose.)
    They aren't very common no. And ERs weren't designed with them in mind. Good thing comments are much a reality in ERDs as they are in UMLDs

    I still don't understand why this is in any way particular to OOP. I don't believe for a moment that you don't have to analyse business rules for coming up with a database schema. Relational models are just as much subject to change as code.
    I think this is where I haven't been expressing myself correctly.

    I do have to analyze the business rules in order to come up with a database schema. I would have been sacked numerous times otherwise. However what I find is that the type of analysis needed map very easily to business processes. I don't have to make a huge mental effort (with due exceptions) in order to come up with a schema.

    More, and this is where I think the genesis of my reasoning is, I don't deviate much from the customer own view of the business. In my mind I don't have to make a lot of translations between his input and my understanding of a database to come up with a schema.

    With OOP, it seems to me the end result is more distant from the business reality. If I look at an UML Class Diagram, the relations between what I see and the business rules aren't so obvious.

    (However, truth be told, this may be because my exposure to OOP is still limited and I simply don't have yet the Eye for it. If that is the case, surely all my reasoning is invalid.)

    Now... with that apparent distance between the business rules imagery and the way they are expressed with OOP, it's when I ask if it would be desirable to evolve OOP one step further into clearing the gap.

    One example I can give you of what I mean is with the (I think?) canonical "a penguin is not a bird, if birds can fly". In order to make a Penguin a bird, I will probably have to create an abstract interface with the fly() method and derive all birds from it. Those that don't wish to implement fly() will make it a no-op.

    Regardless of the other possibilities there are, I had to move away from the abstract notion of a penguin being a bird in order to express it in OOP. There was a mental effort and a resulting Class Diagram that demands close scrutiny in order to interpret. Well, maybe really not that much with such a simple example, but many other relations exist out there that, when expressed with OOP, span across several classes.

    It's when I question if there is still room for OOP to evolve and better map these relations.

    However, like I said before, I myself am not completely sure if this is just the result of my lack of knowledge.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  8. #8
    Crazy Fool Perspective's Avatar
    Join Date
    Jan 2003
    Location
    Canada
    Posts
    2,640
    >>Imagine a customers <-> invoice relationship based on the customer ID. Some query runs that
    >>extracts all invoices from a given customer. Can you see the difficulty in expressing this same
    >>problem through OOP?


    You're talking about associative access to data stored in an OOP format. Relational Algebra (from which SQL is derived) is powerful because it provides a mechanism for associate queries which can be quantified accross relations (all customers who... , top ten values where...). There is no native support for this in any programming language I know of, I think the best mapping is to not map at all, use an in-memory DBMS.

    OOP is a tool, like a hammer. And when you have a hammer, it seems that everything becomes a nail.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 1
    Last Post: 11-29-2002, 01:38 PM