Thoughts on a large problem
I am faced now with a large and open-ended task to allow users to upload Excel, XML, and even CSV files to the server, parse those files, perform validations, and ultimately move the data into the database.
As a constraint, I can not assume anything about how the user has organized their data. I should perform as much "intelligent" matching on the user's columns with the columns I expect. But where perfect matches aren't found, there should be a web page that allows the user to explicitly define what columns are what. This user preference should be stored in a XML file for future reuse.
A couple problems I face:
1)Excel files that are exported from the web or another program are not actually Excel files, (they're really just HTML tables with a .xls extension). Because of this, the Microsoft JET driver can't read them. I've looked into possible solutions, one being Microsoft Indexing Service. Apparently it is capable of discerning information about binary files, such as what the actual content is. Does anyone have any experience with this? If yes, is there any guidance or code-snippets you can share?
2)I've played around with a few different designs. Modularizing this project is obviously key to my sanity. One thing I would like to do is allow other developers to "plug-in" to the upload process with custom validation rules, and rules that define the structure of the data that is needed by the application. Like I said, I have several ideas that I've toyed around with, but I would love to hear some ideas from you. Sometimes an outside perspective is invaluable.