Evolution of Strings

**sh3rpa** · 09-27-2007

In my other two posts, I asked about bitwise operations, the other, about signed/unsigned classes (which after some thought, I don't think that's functionality that I need)

What I'm trying to do is write an encrypter (512 - bit preferrably). I want to be able to "compress" any type of data, be it int, float, string, etc.

Since string is an object, i'm guessing bitwise operations might be out of the question.

I know that strings are basically built off of char's, but is it char[] or char*?

I know most compilers nowadays convert between the three (string, char[] and char*) right?

From my best guess, i would imagine char[] would be the easiest way to deal with this problem, since ALL data entering the layer of encryption is const and doesn't need the functionality of editing any of it. Ideally, I just want to read the data from a text file and enter it into the encrypter and output it into another file.

Any ideas?

**CornedBee** · 09-27-2007

I know that strings are basically built off of char's

And that's pretty much all you can know, because strings are black boxes. As most classes are. You use the public interface - data(), the iterators, the [] operator, etc. - to access the string.

Modern strings can be quite complex. But if you want to know more, try implementing your own string class as an exercise.

**QuantumPete** · 09-27-2007

Originally Posted by sh3rpa

I know most compilers nowadays convert between the three (string, char[] and char*) right?

I'm not sure what you're trying to ask here. You can initialise a std::string with both char * and char[], and you can copy a char[] to a char * and vice-versa, but there is no 'implicit' conversion between them.

Originally Posted by sh3rpa

From my best guess, i would imagine char[] would be the easiest way to deal with this problem, since ALL data entering the layer of encryption is const and doesn't need the functionality of editing any of it.

You will almost definitly need to malloc memory. As a rule of thumb, you use char[] when you know how many chars you will have and char * when you don't know or the number is so huge that you'd risk blowing the stack with char[].

Why don't you start with a skeleton project: Open a file, read char by char, write each back into a different file. Then, once this is done, you can add the encryption code between the read and the write.

QuantumPete

**sh3rpa** · 09-27-2007

Originally Posted by QuantumPete

I'm not sure what you're trying to ask here.

That was really just so I could understand the underlying mechanism of strings.

Really, I think this is simpler than I first thought. If I treat everything in the file as a char, than I can look at the first char, move it 504 bits to the left, then on to the next one, move it 496 bits to the left, so on and so forth.

Once I've filled up the 512 bit data type, I can apply my encryption scheme. To access the byte-sized elements that make up the 512 bits, i'm going to abstract all 512 bits like a cube (8x8x8) and use that as an addressing scheme, not to retrieve the data, but to apply the encryption.

I'm not programming anything extreme here, I just like to give myself assignments to keep myself ahead of the game at school.

Thank you everyone here in this forum for helping out a newb like me!

**QuantumPete** · 09-27-2007

Originally Posted by sh3rpa

move it 504 bits to the left

Ok, what are you moving to the left of what? There's really no need to move anything. Once you've read in a char you can do:

Code:

plain_text[i] = read_char;
i++;

inside your char-getting-loop. Don't try to make things too complicated

But I'm intruiged now as to which encryption algorithm you're using that involves a cube...!

QuantumPete

**sh3rpa** · 09-27-2007

Originally Posted by QuantumPete

Ok, what are you moving to the left of what? There's really no need to move anything.

Using bitwise ops to move a byte to the left: charRead << 504

as for the encryption and the cube, i want to use a cycle-shifting cypher.

"Cycles" meaning the cipher changes periodically and "Cipher" meaning, well, a bit-by-bit cipher.

The "Cube" is really just a way to abstract the 512 "bits" making it easier to manage the cipher changes (the cube is 8x8x8). So one "bit" could be referenced like this:

Code:

    0x4x7  (X-pos: 0, Y-pos: 4, and Z-pos: 7)

and a byte could be referenced like:

Code:

    3x5    (the collection of 8-bits beginning with X-pos: 3 and Y-pos: 5)

**CornedBee** · 09-27-2007

Using bitwise ops to move a byte to the left: charRead << 504

You can't do that. Well, you can if you have an arbitrary-precision integer class that let's you do it. But in general, you have to respect the size limits of your types. If int has 32 bits (it has on the x86 PC), shifting by anything greater than 31 bits makes no sense.

**sh3rpa** · 09-27-2007

Originally Posted by CornedBee

You can't do that. Well, you can if you have an arbitrary-precision integer class that let's you do it. But in general, you have to respect the size limits of your types. If int has 32 bits (it has on the x86 PC), shifting by anything greater than 31 bits makes no sense.

Right. So each "char" read will be "packed" as much as is permitted. Like I said, the cube is just an abstraction layer that can manage the collection of these "packed" values, so the abstraction really benefits the encryption scheme, not the compression. And I suppose it benefits me too in designing the bloody mess!

So to address what you said CornedBee, I guess charRead << 504 isn't valid, but what will really be happening is something like:

Code:

PSEUDO CODE:

tempChar = charFromFile;

/* move char over */
collection = tempChar << 31   // assuming *collection* is an *int*

    -Then on to the next one until *collection* is filled up

    -Once the collection is filled, assign it an address

collection = someObj(0, 0)   // where someObj is an addressing scheme for the whole container

    -then start a new collection

    -and repeat the process until all 512 position are filled

Can't I overload "<<" to work with a number like 511?

**QuantumPete** · 09-28-2007

What you're tring to do is store chars in an array, no? I get that from the "until collection is filled up" bit. There's no need to use bit shifting to store chars, you can work on chars just as easily as ints or longs.

QuantumPete

**sh3rpa** · 09-28-2007

Originally Posted by QuantumPete

What you're tring to do is store chars in an array, no? I get that from the "until collection is filled up" bit. There's no need to use bit shifting to store chars, you can work on chars just as easily as ints or longs.

An array of chars would certainly be easier, but the whole point of this *assignment* I've made for myself is more to teach me things about C++ that I'm not so familiar with, bitwise operations being one of them.

So what I'd like to do is:

1) Read chars one by one from file.

2) Compress the chars into a some other storage type (int i suppose), and name this collection.

3) Have another container which is a collection of the smaller collections that make up this "cube" abstraction.

4) Implement the encryption.

I'm no expert by any means on compression or encryption, but maybe this exercise might get my feet a little wet.

**QuantumPete** · 09-28-2007

You can store 4 chars in an int, if you really want to, but it's not "compression", since a char takes up 1 byte and an int 4, so the net memory usage is the same. Before people get upset at the sizes I mention, this is true on most, modern, 32-bit systems.

With encryption you'll definitly have plenty of opportunity to use bitwise operations, especially XOR. Why don't you post what you have so far with any questions you have about it?

QuantumPete

**King Mir** · 09-28-2007

Compression is all about identifying patters in data. Compression of characters is limited in ability, because each character is already the minimum size a computer can easily work with, namely 8 bits. Having each character representation that uses less then 8 bits is inconvenient, because the computer still must read the characters in 8 bit pieces. However such bitwise compression is possible. Ascii code only uses 7 bits to encode a character (the most significant bit is always 0). That ectra bit could be eliminated. Further, it would be possible to use 6 bits, if you escape out less frequently used characters like '!' and '$'. By 'escape out' I mean use two characters to represent one.

As for encryption, it would help to know what you want the encryption to do.

**matsp** · 09-28-2007

The most common compression technique is to use a variable number of bits for a set of tokens (4, 8, 16 or 32 bit can be a token - or 6, 7, 19 or whatever seems suitable to the data you use).

As long as the frequency of some tokens is noticably higher than others, you can compress it by assigning shorter bit patterns to the more frequent tokens, and longer bit patterns to some of the rarer sets.

One such example is Huffman coding - it is for example used in fax-machines to avoid sening every single pixel across, instead the fax-data is compressed so that for example 35 pixels of white is just a few bits long.

--
Mats

**sh3rpa** · 09-28-2007

Thanx guys for the info on compression, I was questioning myself and whether it was really "saving space".

I'm really only in the pseudo code stage right now, as I'm quite busy with my classes, taking care of my daughter, and just getting over a cold.

Here though is what I envision for the encryption:

I have a list (linked-list) of the "cube" collections. Some how I use a "stamping" technique (maybe a time-stamp) that will set the cipher.

If you can imagine a linked-list of these cubes, in 3-D, then you can imagine a trig-function passing through them that will change periodically (with respect to the x, y, and z planes).

The cross-section where this trig-function passes through the cubes will constitute an entire byte. This byte will dictate the encryption method for that individual cube.

The different encryption methods (there will be a set number of them) can be randomly chosen and assigned for each different cube, so no two cubes will have to contain the same encryption method (unless of course there are a large number of these cubes).

So unless you know the time-stamping method and what trig-function it selects, it would be extremely difficult to decrypt, since you have know idea what byte contains the key.

And brute-forcing any set of 512 bits would be next to useless since 512 bits have a possible 1.3X10^154 combinations!

**sh3rpa** · 09-29-2007

Originally Posted by sh3rpa

The cross-section where this trig-function passes through the cubes will constitute an entire byte. This byte will dictate the encryption method for that individual cube.

Let me clarify this part, where the trig-function passes through EACH cube makes up a BYTE for that cube which contains the KEY for that cube.

Hopefully that somewhat clarifies that. Be sure that EACH cube contains a byte that is the key for that cube, that's how each cube can have a different encryption method.

So if the byte for a particular cube can be expressed as:

f3 (in hex)

"f" might represent a function that "&'s" each group of four bits with some number.
"3" might then represent a different function that performs some other change on the bits withing the cube.

This is basically what I see happening.

Thread: Evolution of Strings

Thread Tools

Search Thread

Display

Evolution of Strings

Similar Threads

Strings Program

Programming using strings

Reading strings input by the user...

menus and strings