# Normalizing "tables"

• 11-15-2007
SubhobrotoSinha
Normalizing "tables"
Guys,

without saying anything much, I have such data at hand (rows from a database table):

Code:

`111 | xyz | a, b, c`
Following someone's wise words, I would love to have this transformed to :

Code:

```        111 | xyz | a         111 | xyz | b         111 | xyz | c```
DO NOT assume that there will be ONLY a single cell/column with multiple values.

The following might also arise :

Code:

`        111 | xyz, def | a, b, c`
This has to be transformed to :

Code:

```        111 | xyz | a         111 | xyz | b         111 | xyz | c         111 | def | a         111 | def | b         111 | def | c```
C++ code or suggestions would be great.. although I am interested in knowing more about ;

1. Working implementations/snippets
2. Alternative but automated or programmatic approaches to get equivalent output.
3. Links to other resources where I might get more answers to this question.. for example any good database experts forums you guys might know/heard about.. visit etc...

I am really looking forward to some help on this because I thought up a recursive solution to this problem but I got stuck when I found out that I would have to know and predefine the table schema and model an object/class/structure on it.. that would make my program not applicable for any table, but rather I would have to redefine and recompile the classes for the table schema for every table.

I am sure there are easier ways out.
• 11-15-2007
smrti
Pseudocode solution
Let me see if I understand you... Suppose you start with a table

Code:

```table t1 c1  | c2      | c3 ------------------------ 111 | xyz, def | a, b, c```
You want to write some generic code that will create a new table and populate it with the normalized version of table t1, right?

That seems like a straightforward solution.

First create an empty clone of the table:

Code:

`create table t2 as select * from t1 where 1 = 2;`
Then loop through a cursor of t1, and explode each column value into an array based on the comma delimiter (this is just pseudocode to see if I understand you; it won't work anywhere):

Code:

```cursor SELECT * FROM t1; for each row in cursor {   c2_array = split( c2 using ',' as delimiter);   for each c2_item in c2_array   {     c3_array = split( c3 using ',' as delimiter);     for each c3_item in c3_array     {       INSERT INTO t2 (c1,c2,c3) VALUES (c1, c2_item, c3_item);     }   } }```
If you don't know how many columns are in t1 before you process it, you could consult the data dictionary of the RDBMS you are using (USER_COLUMNS view in Oracle, information_schema.columns in MySQL, etc.) and loop through the table columns based on that.

But I feel like I may have missed the point of your question. Does this help?
• 11-15-2007
Perspective
What you're looking for is a Cross (or Cartesian) Product.
• 11-15-2007
SubhobrotoSinha
Quote:

Originally Posted by Perspective
What you're looking for is a Cross (or Cartesian) Product.

Exactly ! But the problem is I know how to do it between tables, but not between columns. And that's exactly I want to find out.
• 11-16-2007
Perspective
Quote:

Originally Posted by SubhobrotoSinha
Exactly ! But the problem is I know how to do it between tables, but not between columns. And that's exactly I want to find out.

>111 | xyz, def | a, b, c

Those are sets. If you take an n-way cross product of all the sets you'll have the equivalent of the cross product of the tables. So start with a set-based function that is closed under the cross operation (takes sets and produces sets as output).

Code:

```cross(Set s1, Set s2) //nested loop cross product. //return the computed set```
`cross(cross({111}, {xyz, def}), {a,b,c});`