# Thread: Mapping Characters to Ints

1. ## Mapping Characters to Ints

For my encryption assignment our professor wants specific values for characters rather than the normal ones that textpad or jedit uses (a=97, 2=50, /=47 etc)

The mapping from character to integers is shown below:

0:'a' 1:'b' 2:'c' 3:'d' 4:'e' 5:'f' 6:'g' 7:'h'
8:'i' 9:'j'10:'k' 11:'l' 12:'m' 13:'n' 14:'o' 15:'p'
16:'q' 17:'r' 18:'s' 19:'t' 20:'u' 21:'v' 22:'w' 23:'x'
24:'y' 25:'z' 26:'A' 27:'B' 28:'C' 29:'D' 30:'E' 31:'F'
32:'G' 33:'H' 34:'I' 35:'J' 36:'K' 37:'L' 38:'M' 39:'N'
40:'O' 41:'P' 42:'Q' 43:'R' 44:'S' 45:'T' 46:'U' 47:'V'
48:'W' 49:'X' 50:'Y' 51:'Z' 52:'0' 53:'1' 54:'2' 55:'3'
56:'4' 57:'5' 58:'6' 59:'7' 60:'8' 61:'9' 62:' ' 63:','
64:'.' 65:'!' 66:'?' 67:']' 68:'[' 69:'*'

As you can see in the table above, the lower case letters are mapped to the integers 0 to 25, the upper
case letters are mapped to the integers 26 to 51, the decimal digits 0 to 9 are mapped to the integers 52
to 61 and the integers 62 to 69 represent some other printable characters.

it is suggested you write two functions:

// Given a character c, map it to its integer
// position as per the assignment spec
int mapCharToInt (char c )

// Given an integer i, map it to its character
// as per the assignment spec
char mapIntToChar (int i)
I'm not really sure how to even begin this, except to actually go through each character individually and assign it an integer value, and vice versa with integers to characters.

2. Well, there are a couple ways to do it. Have you considered using an array to map the characters? For instance, let's take the example of a real short alphabet with only the letters ABC. For our tiny encryption program, we'll assume B=0, C=1 and A=2.

If we create a char array, the index of each character (our integer) will point to the corresponding character:

char map[] = {'B', 'C', 'A'} ;

If you are looking up the integer of a character, find the character in the array and get it's index. If you have an integer and need the character, just use the integer as the index.

Simple enough?

Todd

3. I'd probably just create 2 char[256] arrays to use as lookup tables:
Code:
unsigned char EncodeASCII[256] = { ... 26, 27, 28... };
char DecodeASCII[256] = { 'a', 'b', 'c' ... };
Then you can create functions which simply use those arrays to map from one code to another.

4. I'm not sure how you'd populate the EncodeASCII array without a bunch of assignment statements though. Although once you did that, it would be fastest.

5. Originally Posted by swoopy
I'm not sure how you'd populate the EncodeASCII array without a bunch of assignment statements though. Although once you did that, it would be fastest.
In this case, probably. But lookup tables aren't always faster than logic. Sure, you might execute a dozen instructions instead of a single memory lookup, but that memory lookup, especially if the table is large, can make the code run slower by kicking recent information out of cache. Compare the speed of a dozen instructions to the speed of a single cache miss -- the cache miss is far more expensive.

6. Originally Posted by brewbuck
In this case, probably. But lookup tables aren't always faster than logic. Sure, you might execute a dozen instructions instead of a single memory lookup, but that memory lookup, especially if the table is large, can make the code run slower by kicking recent information out of cache. Compare the speed of a dozen instructions to the speed of a single cache miss -- the cache miss is far more expensive.
But wouldn't a smart system/compiler put the lookup table in the cache?

7. Originally Posted by cpjust
But wouldn't a smart system/compiler put the lookup table in the cache?
The compiler has little say on what's in the cache - the processor decides what is in the cache. If a table is really random access, and larger than the cache, two things happen:
1. Accesses to the table generate real memory accesses - which is slow.
2. Since the processor loads that table entry into cache, it also throws something else out of the cache.

There are ways, if you don't mind some architecture dependant assembler code, to avoid loading things into the cache (for example the x86 instructions "mov*nt" - nt stands for "non-temporal", which means "we are not re-using this").

If we are using predictable patterns of access, we can also "preload" the cache - which will reduce the latency of the actual access.

To make matters even worse, if you have really large tables, you may have to read the page-table entries for the memory access too. So what is a single "int x = p[i]" access, turns into a sequence of 3-4 page-table reads [at completely different addresses, so all the worst cases for the memory access too], and then a write of 16 bytes from the cache [we throw out 16 bytes of "written" cache-line], then read 16 bytes of data into the cache. So one access now takes several hundred clock-cycles - and that's assuming we didn't page-fault and need to swap it in from disk, of course - in which case many thousands of cycles will fly past before the data is ready to be used].

A really bad case of this is for example a "telephone exchange database" - each access is for a different telephone number - and there's absolutely no predictable pattern to that.

--
Mats

8. Why not just do some logical assignments that will work on groups of the characters.
e.g.
If the character is a small letter ( > 'a' && < 'z') then subract 'a' from the character.
If the character is a capital letter then subtract 39 (difference between ASCII 'A' and encrypted 'A') from the character.
If the character is a digit then add 4 to it (difference between ASCII '1' and encrypted '1').
For the characters I think you'll just have to use a map since I can't really find any relations between the encrypted and ASCII character maps.