Thread: Normalizing "tables"

  1. #1
    Registered User
    Join Date
    Jul 2007
    Posts
    19

    Unhappy Normalizing "tables"

    Guys,

    without saying anything much, I have such data at hand (rows from a database table):

    Code:
    111 | xyz | a, b, c
    Following someone's wise words, I would love to have this transformed to :

    Code:
    	111 | xyz | a
    	111 | xyz | b
    	111 | xyz | c
    DO NOT assume that there will be ONLY a single cell/column with multiple values.

    The following might also arise :

    Code:
    	111 | xyz, def | a, b, c
    This has to be transformed to :

    Code:
    	111 | xyz | a
    	111 | xyz | b
    	111 | xyz | c
    	111 | def | a
    	111 | def | b
    	111 | def | c
    C++ code or suggestions would be great.. although I am interested in knowing more about ;

    1. Working implementations/snippets
    2. Alternative but automated or programmatic approaches to get equivalent output.
    3. Links to other resources where I might get more answers to this question.. for example any good database experts forums you guys might know/heard about.. visit etc...

    I am really looking forward to some help on this because I thought up a recursive solution to this problem but I got stuck when I found out that I would have to know and predefine the table schema and model an object/class/structure on it.. that would make my program not applicable for any table, but rather I would have to redefine and recompile the classes for the table schema for every table.

    I am sure there are easier ways out.

  2. #2
    Registered User
    Join Date
    Nov 2007
    Posts
    1

    Pseudocode solution

    Let me see if I understand you... Suppose you start with a table

    Code:
    table t1
    c1  | c2       | c3
    ------------------------
    111 | xyz, def | a, b, c
    You want to write some generic code that will create a new table and populate it with the normalized version of table t1, right?

    That seems like a straightforward solution.

    First create an empty clone of the table:

    Code:
    create table t2 as select * from t1 where 1 = 2;
    Then loop through a cursor of t1, and explode each column value into an array based on the comma delimiter (this is just pseudocode to see if I understand you; it won't work anywhere):

    Code:
    cursor SELECT * FROM t1;
    for each row in cursor
    {
      c2_array = split( c2 using ',' as delimiter);
      for each c2_item in c2_array
      {
        c3_array = split( c3 using ',' as delimiter);
        for each c3_item in c3_array
        {
          INSERT INTO t2 (c1,c2,c3) VALUES (c1, c2_item, c3_item);
        } 
      }
    }
    If you don't know how many columns are in t1 before you process it, you could consult the data dictionary of the RDBMS you are using (USER_COLUMNS view in Oracle, information_schema.columns in MySQL, etc.) and loop through the table columns based on that.

    But I feel like I may have missed the point of your question. Does this help?

  3. #3
    Crazy Fool Perspective's Avatar
    Join Date
    Jan 2003
    Location
    Canada
    Posts
    2,640
    What you're looking for is a Cross (or Cartesian) Product.

  4. #4
    Registered User
    Join Date
    Jul 2007
    Posts
    19

    Post

    Quote Originally Posted by Perspective View Post
    What you're looking for is a Cross (or Cartesian) Product.
    Exactly ! But the problem is I know how to do it between tables, but not between columns. And that's exactly I want to find out.

  5. #5
    Crazy Fool Perspective's Avatar
    Join Date
    Jan 2003
    Location
    Canada
    Posts
    2,640
    Quote Originally Posted by SubhobrotoSinha View Post
    Exactly ! But the problem is I know how to do it between tables, but not between columns. And that's exactly I want to find out.
    >111 | xyz, def | a, b, c

    Those are sets. If you take an n-way cross product of all the sets you'll have the equivalent of the cross product of the tables. So start with a set-based function that is closed under the cross operation (takes sets and produces sets as output).

    Code:
    cross(Set s1, Set s2)
    //nested loop cross product.
    //return the computed set
    Then, in your example you'd get your result with:

    Code:
    cross(cross({111}, {xyz, def}), {a,b,c});
    Lastely, you insert the result of the cross() call (each set item being a tuple).

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Normalizing mp3s
    By confuted in forum A Brief History of Cprogramming.com
    Replies: 2
    Last Post: 07-03-2003, 10:23 PM