Thread: Evolution of Strings

  1. #1
    Registered Abuser
    Join Date
    Sep 2007
    Location
    USA/NJ/TRENTON
    Posts
    127

    Evolution of Strings

    In my other two posts, I asked about bitwise operations, the other, about signed/unsigned classes (which after some thought, I don't think that's functionality that I need)

    What I'm trying to do is write an encrypter (512 - bit preferrably). I want to be able to "compress" any type of data, be it int, float, string, etc.

    Since string is an object, i'm guessing bitwise operations might be out of the question.

    I know that strings are basically built off of char's, but is it char[] or char*?

    I know most compilers nowadays convert between the three (string, char[] and char*) right?

    From my best guess, i would imagine char[] would be the easiest way to deal with this problem, since ALL data entering the layer of encryption is const and doesn't need the functionality of editing any of it. Ideally, I just want to read the data from a text file and enter it into the encrypter and output it into another file.

    Any ideas?

  2. #2
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    I know that strings are basically built off of char's
    And that's pretty much all you can know, because strings are black boxes. As most classes are. You use the public interface - data(), the iterators, the [] operator, etc. - to access the string.

    Modern strings can be quite complex. But if you want to know more, try implementing your own string class as an exercise.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  3. #3
    Technical Lead QuantumPete's Avatar
    Join Date
    Aug 2007
    Location
    London, UK
    Posts
    894
    Quote Originally Posted by sh3rpa View Post
    I know most compilers nowadays convert between the three (string, char[] and char*) right?
    I'm not sure what you're trying to ask here. You can initialise a std::string with both char * and char[], and you can copy a char[] to a char * and vice-versa, but there is no 'implicit' conversion between them.
    Quote Originally Posted by sh3rpa View Post
    From my best guess, i would imagine char[] would be the easiest way to deal with this problem, since ALL data entering the layer of encryption is const and doesn't need the functionality of editing any of it.
    You will almost definitly need to malloc memory. As a rule of thumb, you use char[] when you know how many chars you will have and char * when you don't know or the number is so huge that you'd risk blowing the stack with char[].

    Why don't you start with a skeleton project: Open a file, read char by char, write each back into a different file. Then, once this is done, you can add the encryption code between the read and the write.

    QuantumPete
    "No-one else has reported this problem, you're either crazy or a liar" - Dogbert Technical Support
    "Have you tried turning it off and on again?" - The IT Crowd

  4. #4
    Registered Abuser
    Join Date
    Sep 2007
    Location
    USA/NJ/TRENTON
    Posts
    127
    Quote Originally Posted by QuantumPete View Post
    I'm not sure what you're trying to ask here.
    That was really just so I could understand the underlying mechanism of strings.

    Really, I think this is simpler than I first thought. If I treat everything in the file as a char, than I can look at the first char, move it 504 bits to the left, then on to the next one, move it 496 bits to the left, so on and so forth.

    Once I've filled up the 512 bit data type, I can apply my encryption scheme. To access the byte-sized elements that make up the 512 bits, i'm going to abstract all 512 bits like a cube (8x8x8) and use that as an addressing scheme, not to retrieve the data, but to apply the encryption.

    I'm not programming anything extreme here, I just like to give myself assignments to keep myself ahead of the game at school.

    Thank you everyone here in this forum for helping out a newb like me!

  5. #5
    Technical Lead QuantumPete's Avatar
    Join Date
    Aug 2007
    Location
    London, UK
    Posts
    894
    Quote Originally Posted by sh3rpa View Post
    move it 504 bits to the left
    Ok, what are you moving to the left of what? There's really no need to move anything. Once you've read in a char you can do:
    Code:
    plain_text[i] = read_char;
    i++;
    inside your char-getting-loop. Don't try to make things too complicated But I'm intruiged now as to which encryption algorithm you're using that involves a cube...!

    QuantumPete
    "No-one else has reported this problem, you're either crazy or a liar" - Dogbert Technical Support
    "Have you tried turning it off and on again?" - The IT Crowd

  6. #6
    Registered Abuser
    Join Date
    Sep 2007
    Location
    USA/NJ/TRENTON
    Posts
    127
    Quote Originally Posted by QuantumPete
    Ok, what are you moving to the left of what? There's really no need to move anything.
    Using bitwise ops to move a byte to the left: charRead << 504


    as for the encryption and the cube, i want to use a cycle-shifting cypher.

    "Cycles" meaning the cipher changes periodically and "Cipher" meaning, well, a bit-by-bit cipher.

    The "Cube" is really just a way to abstract the 512 "bits" making it easier to manage the cipher changes (the cube is 8x8x8). So one "bit" could be referenced like this:

    Code:
        0x4x7  (X-pos: 0, Y-pos: 4, and Z-pos: 7)
    and a byte could be referenced like:

    Code:
        3x5    (the collection of 8-bits beginning with X-pos: 3 and Y-pos: 5)

  7. #7
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Using bitwise ops to move a byte to the left: charRead << 504
    You can't do that. Well, you can if you have an arbitrary-precision integer class that let's you do it. But in general, you have to respect the size limits of your types. If int has 32 bits (it has on the x86 PC), shifting by anything greater than 31 bits makes no sense.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  8. #8
    Registered Abuser
    Join Date
    Sep 2007
    Location
    USA/NJ/TRENTON
    Posts
    127
    Quote Originally Posted by CornedBee
    You can't do that. Well, you can if you have an arbitrary-precision integer class that let's you do it. But in general, you have to respect the size limits of your types. If int has 32 bits (it has on the x86 PC), shifting by anything greater than 31 bits makes no sense.
    Right. So each "char" read will be "packed" as much as is permitted. Like I said, the cube is just an abstraction layer that can manage the collection of these "packed" values, so the abstraction really benefits the encryption scheme, not the compression. And I suppose it benefits me too in designing the bloody mess!

    So to address what you said CornedBee, I guess charRead << 504 isn't valid, but what will really be happening is something like:
    Code:
    PSEUDO CODE:
    
    tempChar = charFromFile;
    
    /* move char over */
    collection = tempChar << 31   // assuming *collection* is an *int*
    
        -Then on to the next one until *collection* is filled up
    
        -Once the collection is filled, assign it an address
    
    collection = someObj(0, 0)   // where someObj is an addressing scheme for the whole container
    
        -then start a new collection
    
        -and repeat the process until all 512 position are filled

    Can't I overload "<<" to work with a number like 511?

  9. #9
    Technical Lead QuantumPete's Avatar
    Join Date
    Aug 2007
    Location
    London, UK
    Posts
    894
    What you're tring to do is store chars in an array, no? I get that from the "until collection is filled up" bit. There's no need to use bit shifting to store chars, you can work on chars just as easily as ints or longs.

    QuantumPete
    "No-one else has reported this problem, you're either crazy or a liar" - Dogbert Technical Support
    "Have you tried turning it off and on again?" - The IT Crowd

  10. #10
    Registered Abuser
    Join Date
    Sep 2007
    Location
    USA/NJ/TRENTON
    Posts
    127
    Quote Originally Posted by QuantumPete
    What you're tring to do is store chars in an array, no? I get that from the "until collection is filled up" bit. There's no need to use bit shifting to store chars, you can work on chars just as easily as ints or longs.
    An array of chars would certainly be easier, but the whole point of this *assignment* I've made for myself is more to teach me things about C++ that I'm not so familiar with, bitwise operations being one of them.

    So what I'd like to do is:

    1) Read chars one by one from file.

    2) Compress the chars into a some other storage type (int i suppose), and name this collection.

    3) Have another container which is a collection of the smaller collections that make up this "cube" abstraction.

    4) Implement the encryption.

    I'm no expert by any means on compression or encryption, but maybe this exercise might get my feet a little wet.

  11. #11
    Technical Lead QuantumPete's Avatar
    Join Date
    Aug 2007
    Location
    London, UK
    Posts
    894
    You can store 4 chars in an int, if you really want to, but it's not "compression", since a char takes up 1 byte and an int 4, so the net memory usage is the same. Before people get upset at the sizes I mention, this is true on most, modern, 32-bit systems.

    With encryption you'll definitly have plenty of opportunity to use bitwise operations, especially XOR. Why don't you post what you have so far with any questions you have about it?

    QuantumPete
    "No-one else has reported this problem, you're either crazy or a liar" - Dogbert Technical Support
    "Have you tried turning it off and on again?" - The IT Crowd

  12. #12
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    Compression is all about identifying patters in data. Compression of characters is limited in ability, because each character is already the minimum size a computer can easily work with, namely 8 bits. Having each character representation that uses less then 8 bits is inconvenient, because the computer still must read the characters in 8 bit pieces. However such bitwise compression is possible. Ascii code only uses 7 bits to encode a character (the most significant bit is always 0). That ectra bit could be eliminated. Further, it would be possible to use 6 bits, if you escape out less frequently used characters like '!' and '$'. By 'escape out' I mean use two characters to represent one.

    As for encryption, it would help to know what you want the encryption to do.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  13. #13
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    The most common compression technique is to use a variable number of bits for a set of tokens (4, 8, 16 or 32 bit can be a token - or 6, 7, 19 or whatever seems suitable to the data you use).

    As long as the frequency of some tokens is noticably higher than others, you can compress it by assigning shorter bit patterns to the more frequent tokens, and longer bit patterns to some of the rarer sets.

    One such example is Huffman coding - it is for example used in fax-machines to avoid sening every single pixel across, instead the fax-data is compressed so that for example 35 pixels of white is just a few bits long.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  14. #14
    Registered Abuser
    Join Date
    Sep 2007
    Location
    USA/NJ/TRENTON
    Posts
    127
    Thanx guys for the info on compression, I was questioning myself and whether it was really "saving space".

    I'm really only in the pseudo code stage right now, as I'm quite busy with my classes, taking care of my daughter, and just getting over a cold.

    Here though is what I envision for the encryption:

    I have a list (linked-list) of the "cube" collections. Some how I use a "stamping" technique (maybe a time-stamp) that will set the cipher.

    If you can imagine a linked-list of these cubes, in 3-D, then you can imagine a trig-function passing through them that will change periodically (with respect to the x, y, and z planes).

    The cross-section where this trig-function passes through the cubes will constitute an entire byte. This byte will dictate the encryption method for that individual cube.

    The different encryption methods (there will be a set number of them) can be randomly chosen and assigned for each different cube, so no two cubes will have to contain the same encryption method (unless of course there are a large number of these cubes).

    So unless you know the time-stamping method and what trig-function it selects, it would be extremely difficult to decrypt, since you have know idea what byte contains the key.

    And brute-forcing any set of 512 bits would be next to useless since 512 bits have a possible 1.3X10^154 combinations!

  15. #15
    Registered Abuser
    Join Date
    Sep 2007
    Location
    USA/NJ/TRENTON
    Posts
    127
    Quote Originally Posted by sh3rpa View Post
    The cross-section where this trig-function passes through the cubes will constitute an entire byte. This byte will dictate the encryption method for that individual cube.
    Let me clarify this part, where the trig-function passes through EACH cube makes up a BYTE for that cube which contains the KEY for that cube.

    Hopefully that somewhat clarifies that. Be sure that EACH cube contains a byte that is the key for that cube, that's how each cube can have a different encryption method.

    So if the byte for a particular cube can be expressed as:

    f3 (in hex)

    "f" might represent a function that "&'s" each group of four bits with some number.
    "3" might then represent a different function that performs some other change on the bits withing the cube.

    This is basically what I see happening.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Strings Program
    By limergal in forum C++ Programming
    Replies: 4
    Last Post: 12-02-2006, 03:24 PM
  2. Programming using strings
    By jlu0418 in forum C++ Programming
    Replies: 5
    Last Post: 11-26-2006, 08:07 PM
  3. Reading strings input by the user...
    By Cmuppet in forum C Programming
    Replies: 13
    Last Post: 07-21-2004, 06:37 AM
  4. menus and strings
    By garycastillo in forum C Programming
    Replies: 3
    Last Post: 04-29-2002, 11:23 AM