A year ago I wrote a CSV related rant on this forum, but then instead of posting it I just closed the window and went on with my life.
Each and every time someone says they'll send me a csv I know I'm going to waste a lot of time. Wikipedia has a small article about the format, you know, comma separated values, sometimes semicolon is used, ... so cute. This little detail is a nightmare, one of those stupidest of all the stupid ideas in computer history.
In US, Excel and similar applications expect comma as a separator. If you have something else, they put everything in a single column. In Europe, Excel and most other applications expect semicolon and if there's anything else then they put everything in the first column.
The thing is, you can not configure this option as you like, it is determined for you in the stupid applications by the country settings. If you get a file from other part of the world, then you're in trouble. Applications will import wrong data, and if you try to find/replace then, .... well, anything is possible.
Just the other day I got the file exported from NeoOffice I think, and the values were separated by the period character, the point, the dot. My God, they should have named the format rsv for Randomly separated values.
And today, I got a file where someone did find/replace on his own and for some reason, on the import side every second row was missing. It took me whole afternoon to realize that by mistake there were extra quotes in one of the columns. As he replaced a part that was before - ,"12.45", he changed it into ;""12,45"; and that double quote pretty much ruined my day. And, like I said, this isn't the first day ruined by csv files. For some reason, I receive them more and more lately. Most of them are not even utf8, so I need to guess the encoding by trial and error and it seems as things are going from bad to worse.
I know there are all kinds of people on this forum, so if anyone here knows the guy who invented csv files, please go near his home and throw a big rock towards the biggest window. I would appreciate that very much. Appologies to everyone else.
Strange, because when I import a .csv file with OpenOffice, I get a nice dialog where I can configure from a range of standard separators or even choose my own.
Excel has always had a way to import "CSV" files that are separated by something other than comma (or semicolon in Europe). It's just that the CSV file format itself doesn't have a way to specify the delimiter from within the file, so you must know it "externally". I can't remember where to find this in pre-2007 versions (Google "excel <version> tab separated values"), but in 2007+, with the ribbon toolbar, you go to the "Data" tab, then "From Text" in the "Get External Data" box. It gives a nice wizard that allows for fields separated by tab, comma, semicolon, space, or any user-specified character, and allows for quoted fields and a few other things.
In theory you are absolutely right. Excel can be used to import a file and then export it again into a "localized" csv format, but Excel enjoys converting long numbers like EAN codes or personal id (like your SSN) into so called scientific notation and when it sees something like 2/4 it must be April 2nd or whatever. So every import into excel takes a lot of time in that import dialog where I need to say for every column if it's a text or number or what.
In the end, I lose a lot of time in application A or application B. Plus I have a bigger problem with Excel in my country. The Mac version hates our diacritics (šđčćž) in text files. Even some Windows version are strange, so each time I import/export from Excel there is a great chance some, if not all of these characters will get lost.
But the big question of my first post was why is csv designed that way? Whay is separator volatile? Why is decimal separator unstable? What about dates? Shouldn't be better if the separator was always comma, decimal point defines numbers and application should take care of localized presentation of the data.
If you just need to convert from one basic delimiter to another, then look at PHP's fgetcsv and fputcsv functions. You could have a working script that takes a few command line arguments, and spits out an RFC 4180 compliant CSV file in a dozen lines or so.
Surely the problems and rants you describe are to do with the way seperate applications are coded either import or export 'their version / understanding ' of a csv file? there is nothing wrong with the notion of a csv file in itself?
Hmmm, maybe I've missed something because it's too early in the morning, but it seems like your problem would be easily alleviated by simply using "Text To Column" which definitely lets you specify a delimiter as well as various options like dealing with consecutive delimiters and qualifying strings so that the delimiting character does not get captured if it falls inside of one. As far as the number as scientific notation or text with a slash in it being read as a date... that's also manageable with excel as you can specify how each column is to be treated when you're splitting them. Even if you don't specify on import, you can also manipulate it after the fact using the TEXT() function (or the VBA equivalent Format()). Until you export into a new plain text format, Excel is simply manipulating the string visually, anyway. So if you had "12E034231" which was to be treated as a text and you end up getting something like "12E-03" in the cell, the actual number still exists in the data, you just have to force the correct format before you export back to a plain text file.
Really, while Excel can make things a pain-in-the-A to deal with, at the end of the day I haven't found anything it's not capable of doing if you know the idiotic way it wants you to do it. Much of my job consists of me having to write parsers for data that has been laid out for human eyes and is not at all meant to be digested by a parser. It's a big breath of fresh air when I actually get something at least somewhat standardized like CSV files are.