String fun

**chico1st** · 08-16-2007

I have a comma seperated value file.
i need to look at it and find \n <---carrage return
then insert a value just before that \n
then continue searching from that \n for the next \n

I am creating a spreadsheet

**chico1st** · 08-16-2007

i was planning on using
char *strchr(const char *, int); to find the \n
and i dont know how to insert a value into the string.

or is this possible?

**matsp** · 08-16-2007

Assuming you have a long enough buffer[1], you should be able to use "fgets" to read the string, and once it's been read, the last char according to strlen should be '\n'. Verify that this is the case, remove it and strcat the new value at the end of the string.

Remember that you will have to read from one file and write to a different one, as you can't (trivially) insert text into an existing file.

[1] If you don't know AT ALL how long your line may be, this is perhaps not the best idea... But as long as we're discussing something that may be a spread-sheet, it probably won't be ginormous lines, so some statically allocated string of (say) 10000 chars should be sufficient.

--
Mats

**simpleid** · 08-16-2007

\n is not a carriage return, \r is. \n is new-line.

**matsp** · 08-16-2007

Originally Posted by simpleid

\n is not a carriage return, \r is. \n is new-line.

Technically, '\r' is "Carriage Return" and '\n' is "linefeed" or "newline". The first returns the print-head (or cursor) to the beginning of the line, the second forwards the paper (cursor) one line.

Windows stores both in the file, whilst Unix/Linux traditionally stores only newline and prepends a carriage return in the actual output "automagically" where needed. Mac's have previously used the other way around, storing '\r' only, and appending a '\n' when needed in the output process. Since more recent MacOS are based in BSD(?), they probably have the same mechanism as Linux.

And if you are NOT reading/writing the file in binary, the difference is hidden in Windows and MacOS as long as you use the stanard file-io functions (as opposed to using the native system calls for example) - there's only a '\n' at the end of the line.

--
Mats

**chico1st** · 08-16-2007

ok, i have 20x 128 000 000 data points in this spreadsheet so this isnt going to work, its wayyy to slow

im going to do something crazy and write this data to a OBDC MySQL database, im scared but here goes.

**chico1st** · 08-16-2007

how do i get mysql.h

i think i need it for my include.

also

MySQL databases may be used by programs written in the C programming language on Socrates and Plato and on the IS Solaris workstations

what?

**matsp** · 08-16-2007

Originally Posted by chico1st

ok, i have 20x 128 000 000 data points in this spreadsheet so this isnt going to work, its wayyy to slow

im going to do something crazy and write this data to a OBDC MySQL database, im scared but here goes.

Writing it to a SQL database is probably not the solution either - the SQL database engine can't write to the disk any faster than any other method of storing data to disk - and that is by far the biggest factor in writing such a large amount of data to disk. On top of that, you'll have the SQL database trying to keep indices and such of your data, which won't make it any better.

So, you want to write 20 x 128000000 16-bit data entries to disk, right? And how long a time do you have to do that?

--
Mats

**chico1st** · 08-16-2007

nevermind i got mysql.h

the reason i want to use the database is that i can just write to a new field as opposed to finding the new line and writing to a new file when i need to add data.

the data i get comes in arrays of 1x128 000 000. So writing to a spreadsheet (tdv,csv) is no good. do to all the string work i need to do.

I have a few hours i can do this in, lets say 3. I think the database will be very nice to work with once i figure out how to access it. I have experience in MySQL but never with C/MySQL before.

**matsp** · 08-16-2007

Let's see if I got this right...

You get some 128M x 20 (or 20 x 128M - whichever way you want to put it), and you want to create a CSV-file of it, yes?

And you want this file to be ready in less than 3 hours or thereabouts?

Why not just create a binary file-format first, and then convert that to a .csv (or get a machine with a bit more than 20x 128M x sizeof(datatype) of RAM, then just write it when you've collected all the data? It's only just over 5GB if you can store it as 16-bit integers. You'll need a 64-bit version of the OS (and related 64-bit compiler of course) - but that's available from all major OS vendors (Microsoft, Apple and Linux distros).

I'll have a little play and see if I can come up with any other suggestions.

--
Mats

**chico1st** · 08-16-2007

nono i get 1x128 000 000 arrays at 20 different points, about 10 minutes apart.
then i have to stick them together in a particular format and create a binary file with it, then it goes into someone elses program.

**matsp** · 08-16-2007

I just wrote 20 x 128M 16-bit numbers to a file on my not-very-new-but-not-ancient machine, and it took 430 seconds (doing it in 20000 at a time). That's in a binary file.

Of course, if you only get 128M numbers at a time, you'll probably end up doing this many times.

One solution I can think of is this:

Use a fixed format CSV file - that is, pad the numbers with suitable number of spaces/zeros so that each entry takes up exactly the same amount of space, and have all 20 columns present in the file. Then all you need to do is to put the right data in the right place in the file, and you can calculate where the data belongs (saving the string manipulation at least to some extent, as well as saving the need to read one file and write another).

So your file would look a bit like this:

Code:

00.00000,01.00000,02.00000,03.00000,04.00000,05.00000,06.00000,07.00000
-1.00000,-2.00000,-3.00000,-4.00000,

Does that make some sense?

Just don't forget to figure out if '\n' is one or two characters in the implementation you have.

--
Mats

**matsp** · 08-16-2007

If you want one binary file with 20x 128M numbers, where you get each 128M array at 10minutes interval, why not just write each 128M array to disk in a separate file, then read from 20 different files into one 2D array and write that back out to a single file? Shouldn't be majorly much longer than double the 430 seconds my experiment took. And it's dead simple.

--
Mats

**chico1st** · 08-17-2007

yeah thats what im doing but it took a long time.

**matsp** · 08-17-2007

Originally Posted by chico1st

yeah thats what im doing but it took a long time.

How long?

--
Mats

Thread: String fun

Thread Tools

Search Thread

Display

String fun

Similar Threads

String editor for a sentence inputted by a user - any suggestions or ideas?

Calculator + LinkedList

Classes inheretance problem...

creating class, and linking files

Warnings, warnings, warnings?