Thread: Mapping Characters to Ints

  1. #1
    Registered User
    Join Date
    Nov 2007
    Posts
    6

    Mapping Characters to Ints

    For my encryption assignment our professor wants specific values for characters rather than the normal ones that textpad or jedit uses (a=97, 2=50, /=47 etc)

    The mapping from character to integers is shown below:

    0:'a' 1:'b' 2:'c' 3:'d' 4:'e' 5:'f' 6:'g' 7:'h'
    8:'i' 9:'j'10:'k' 11:'l' 12:'m' 13:'n' 14:'o' 15:'p'
    16:'q' 17:'r' 18:'s' 19:'t' 20:'u' 21:'v' 22:'w' 23:'x'
    24:'y' 25:'z' 26:'A' 27:'B' 28:'C' 29:'D' 30:'E' 31:'F'
    32:'G' 33:'H' 34:'I' 35:'J' 36:'K' 37:'L' 38:'M' 39:'N'
    40:'O' 41:'P' 42:'Q' 43:'R' 44:'S' 45:'T' 46:'U' 47:'V'
    48:'W' 49:'X' 50:'Y' 51:'Z' 52:'0' 53:'1' 54:'2' 55:'3'
    56:'4' 57:'5' 58:'6' 59:'7' 60:'8' 61:'9' 62:' ' 63:','
    64:'.' 65:'!' 66:'?' 67:']' 68:'[' 69:'*'

    As you can see in the table above, the lower case letters are mapped to the integers 0 to 25, the upper
    case letters are mapped to the integers 26 to 51, the decimal digits 0 to 9 are mapped to the integers 52
    to 61 and the integers 62 to 69 represent some other printable characters.

    it is suggested you write two functions:

    // Given a character c, map it to its integer
    // position as per the assignment spec
    int mapCharToInt (char c )

    // Given an integer i, map it to its character
    // as per the assignment spec
    char mapIntToChar (int i)
    I'm not really sure how to even begin this, except to actually go through each character individually and assign it an integer value, and vice versa with integers to characters.

    Thanks in advance.

  2. #2
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Well, there are a couple ways to do it. Have you considered using an array to map the characters? For instance, let's take the example of a real short alphabet with only the letters ABC. For our tiny encryption program, we'll assume B=0, C=1 and A=2.

    If we create a char array, the index of each character (our integer) will point to the corresponding character:

    char map[] = {'B', 'C', 'A'} ;

    If you are looking up the integer of a character, find the character in the array and get it's index. If you have an integer and need the character, just use the integer as the index.

    Simple enough?

    Todd

  3. #3
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    I'd probably just create 2 char[256] arrays to use as lookup tables:
    Code:
    unsigned char EncodeASCII[256] = { ... 26, 27, 28... };
    char DecodeASCII[256] = { 'a', 'b', 'c' ... };
    Then you can create functions which simply use those arrays to map from one code to another.

  4. #4
    Registered User
    Join Date
    Oct 2001
    Posts
    2,934
    I'm not sure how you'd populate the EncodeASCII array without a bunch of assignment statements though. Although once you did that, it would be fastest.

  5. #5
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by swoopy View Post
    I'm not sure how you'd populate the EncodeASCII array without a bunch of assignment statements though. Although once you did that, it would be fastest.
    In this case, probably. But lookup tables aren't always faster than logic. Sure, you might execute a dozen instructions instead of a single memory lookup, but that memory lookup, especially if the table is large, can make the code run slower by kicking recent information out of cache. Compare the speed of a dozen instructions to the speed of a single cache miss -- the cache miss is far more expensive.

  6. #6
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Quote Originally Posted by brewbuck View Post
    In this case, probably. But lookup tables aren't always faster than logic. Sure, you might execute a dozen instructions instead of a single memory lookup, but that memory lookup, especially if the table is large, can make the code run slower by kicking recent information out of cache. Compare the speed of a dozen instructions to the speed of a single cache miss -- the cache miss is far more expensive.
    But wouldn't a smart system/compiler put the lookup table in the cache?

  7. #7
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by cpjust View Post
    But wouldn't a smart system/compiler put the lookup table in the cache?
    The compiler has little say on what's in the cache - the processor decides what is in the cache. If a table is really random access, and larger than the cache, two things happen:
    1. Accesses to the table generate real memory accesses - which is slow.
    2. Since the processor loads that table entry into cache, it also throws something else out of the cache.

    There are ways, if you don't mind some architecture dependant assembler code, to avoid loading things into the cache (for example the x86 instructions "mov*nt" - nt stands for "non-temporal", which means "we are not re-using this").

    If we are using predictable patterns of access, we can also "preload" the cache - which will reduce the latency of the actual access.

    To make matters even worse, if you have really large tables, you may have to read the page-table entries for the memory access too. So what is a single "int x = p[i]" access, turns into a sequence of 3-4 page-table reads [at completely different addresses, so all the worst cases for the memory access too], and then a write of 16 bytes from the cache [we throw out 16 bytes of "written" cache-line], then read 16 bytes of data into the cache. So one access now takes several hundred clock-cycles - and that's assuming we didn't page-fault and need to swap it in from disk, of course - in which case many thousands of cycles will fly past before the data is ready to be used].

    A really bad case of this is for example a "telephone exchange database" - each access is for a different telephone number - and there's absolutely no predictable pattern to that.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  8. #8
    Registered User
    Join Date
    Sep 2006
    Posts
    230
    Why not just do some logical assignments that will work on groups of the characters.
    e.g.
    If the character is a small letter ( > 'a' && < 'z') then subract 'a' from the character.
    If the character is a capital letter then subtract 39 (difference between ASCII 'A' and encrypted 'A') from the character.
    If the character is a digit then add 4 to it (difference between ASCII '1' and encrypted '1').
    For the characters I think you'll just have to use a map since I can't really find any relations between the encrypted and ASCII character maps.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. A development process
    By Noir in forum C Programming
    Replies: 37
    Last Post: 07-10-2011, 10:39 PM
  2. Replies: 10
    Last Post: 07-10-2008, 03:45 PM
  3. How do you check how many characters a user has entered?
    By engstudent363 in forum C Programming
    Replies: 5
    Last Post: 04-08-2008, 06:05 AM
  4. help with text input
    By Alphawaves in forum C Programming
    Replies: 8
    Last Post: 04-08-2007, 04:54 PM
  5. Characters into bitwise ints
    By Code Zer0 in forum C++ Programming
    Replies: 9
    Last Post: 04-24-2003, 08:34 AM