Write or modify program to leverage CRC collision samples

Hello,

Inpsired by @rcgldr amazing posts here https://cboard.cprogramming.com/c-pr...-crc-16-a.html

That I am still trying to make full sense out of

Goal:
Reverse engineer this checksum format

0100100A00000000004080000000FFFFFFFFFFD7BB
0100100A00000000000060000000FFFFFFFFFFD7BB
0100100A00000000002010000000FFFFFFFFFFD7BB

(many more collision samples will be available soon)

Either by fully reverse engineering it, or coming up with an alternate algorithm that produces collisions at the "same times" that the "Blackbox" algorithm does.

This would allow me to create a lookup table.

If I can come up with a more typical CRC algorithm / parameters that causes collisions (even if the collision values are different from the "Blackbox" algorithm collisions) whenever the "Blackbox" algorithm produces collisions on a given set of samples, then I can create a lookup table from "Normal CRC" to "Blackbox CRC"

The relationship in this lookup table between "Normal CRC" and "Blackbox CRC" does not have to be 1:1, nor would it need to be. I.e. the "Blackbox" algorithm may introduce more collisions on top of what the underlying CRC produces

The main requirement for this to work is that the "Normal CRC" algorithm does not produce collisions when the "Blackbox" algorithm does not

The final output of the "Blackbox" algorithm can produce more collisions than my "Normal CRC" algorithm, but not the other way around.

Assumptions:
1.) There is a "Blackbox" algorithm that obfuscates a CRC16 in a manner more complex than "messing" with initial values and/or XorOut values

2.) The "underlying" algorithm is a fairly typical CRC, with very good entropy and low collision rate, that protects against positional swapping of the same bytes, and can distinguish between 00 and FF values.

What I need Help With:

Conceptually, am I full of **** with this strategy? If not, then I would either:

1.) Modify an existing program such as SRP16 to test CRC parameters until it finds a configuration that causes a collision on all 3 input samples, without caring what the actual collision CRC value is. Sort of a wildcard CRC, but all 3 wildcard CRC outputs must match each other to be considered a candidate / result.

2.) Write an entirely new program to leverage the collisions, looking for a set of CRC parameters that results in collisions (of any value) for all supplied input samples.

Potential issues:
1.) Due to the possibility of additional "extra" collisions being introduced by the "Blackbox" algorithm, not all samples in a given collision set are "good" to analyze. I.e. in a set of 5 sample message / CRC collisions, 3 of them may "make sense" from a CRC poly multiples standpoint, while the other 2 may make "no sense at all" because they are "artificially introduced" by week/poorly designed math in the "Blackbox" algorithm.

I should add that I have spent some time analyzing additional collision samples in binary

Observations Thus Far:

1.) Within a changed 16-bit word between the collision samples with same checksum, there is the same number of 1 bits set
2.) The value of the 16-bit word(s) that changed between the collision samples seems to always be a multiple of 8

Therefore, we may be dealing with a Poly that is also a multiple of 8 such that the remainder is still 0 across the changed 16-bit words in the collision samples

Some additional samples:

0100100A00000000000001200000FFFFFFFFFFA883
0100100A00000000000080020000FFFFFFFFFFA883

0100100A00000000000000220000FFFFFFFFFFE6AD
0100100A00000000000081000000FFFFFFFFFFE6AD

0100100A00000000004010000000FFFFFFFFFF73B9
0100100A00000000002080000000FFFFFFFFFF73B9

0100100A00000000006000000000FFFFFFFFFF8E3F
0100100A00000000000090000000FFFFFFFFFF8E3F

0100100A00000000002020000000FFFFFFFFFF6512
0100100A00000000000050000000FFFFFFFFFF6512

0100100A00000000002040000000FFFFFFFFFF9894
0100100A00000000000030000000FFFFFFFFFF9894

0100100A00000000004020000000FFFFFFFFFFC110
0100100A000000000000C0000000FFFFFFFFFFC110

0100100A00000000004040000000FFFFFFFFFF3C96
0100100A000000000000A0000000FFFFFFFFFF3C96

0100100A00000000000001020000FFFFFFFFFF6413
0100100A00000000000080200000FFFFFFFFFF6413

XORing those strings together, I'm not so sure it's a CRC.

There is minimal chance that a 16-bit CRC would generate the same checksum for different bit patterns that are ~16 bits long.

[/code]
0100100A00000000000001200000FFFFFFFFFFA883
0100100A00000000000080020000FFFFFFFFFFA883
--------------------8122------------------

0100100A00000000000000220000FFFFFFFFFFE6AD
0100100A00000000000081000000FFFFFFFFFFE6AD
--------------------8122------------------

0100100A00000000004010000000FFFFFFFFFF73B9
0100100A00000000002080000000FFFFFFFFFF73B9
------------------6-9---------------------

0100100A00000000006000000000FFFFFFFFFF8E3F
0100100A00000000000090000000FFFFFFFFFF8E3F
------------------6-9---------------------

0100100A00000000002020000000FFFFFFFFFF6512
0100100A00000000000050000000FFFFFFFFFF6512
------------------2-7---------------------

0100100A00000000002040000000FFFFFFFFFF9894
0100100A00000000000030000000FFFFFFFFFF9894
------------------2-7---------------------

0100100A00000000004020000000FFFFFFFFFFC110
0100100A000000000000C0000000FFFFFFFFFFC110
------------------4-E-----------------0000

0100100A00000000004040000000FFFFFFFFFF3C96
0100100A000000000000A0000000FFFFFFFFFF3C96
------------------4-E-----------------0000

0100100A00000000000001020000FFFFFFFFFF6413
0100100A00000000000080200000FFFFFFFFFF6413
--------------------8122--------------0000
[/code]
My bet is that it's an add-em-up checksum.

Can't tease out single bits with the data you supplied, but can get a few pairs out:

Code:

0100100A00000000002040000000FFFFFFFFFF9894 0100100A00000000002020000000FFFFFFFFFF6512 --------------------6-----------------FD86 0100100A00000000000050000000FFFFFFFFFF6512 0100100A00000000000030000000FFFFFFFFFF9894 --------------------6-----------------FD86 0100100A00000000002040000000FFFFFFFFFF9894 0100100A00000000004040000000FFFFFFFFFF3C96 ------------------6-------------------A402 0100100A00000000000001200000FFFFFFFFFFA883 0100100A00000000000001020000FFFFFFFFFF6413 ----------------------22--------------CC70 0100100A00000000000080200000FFFFFFFFFF6413 0100100A00000000000080020000FFFFFFFFFFA883 ----------------------22--------------CC70

Hummm... maybe it is a CRC? Do you have any data with just 1 bit differences?

@hamster_nz thank you so much for taking a look!

I am still collecting samples. I am only up to ~12,000 samples, and I am seeing a HUGE amount of checksums with at least 1 collision, with a few with 5 or 6 collisions so far. however my test messages are not very diverse. My messages seem to also feature 16bit word values of decimal 1, 2, or a multiple of 8. not sure if this means anything

So either they chose the most stupid Poly in the world, that has extra trouble when trying to differentiate 16-bit words that are decimal 1, 2, or a multiple of 8, or its something weaker / worse than a CRC

HOWEVER, it is extremely tamper resistant. I have done a lot of bit level "tamper tests" and it seems pretty good at detecting changes even when I try to keep word values as multiples of 8, and keep the same number of non-zero bits in a word it still catches me red handed every time.

It can also detect between 00 and FF (unlike Fletcher), and positional swapping of the same data.

Its ability to detect changes pretty darn well is what makes me think its a CRC

Here are some 6-way collisions

0100100A00000000000010200000FFFFFFFFFFC011
0100100A00000000410008080000FFFFFFFFFFC011
0100100A00000000006001020000FFFFFFFFFFC011
0100100A00000000006080200000FFFFFFFFFFC011
0100100A00000000002060200000FFFFFFFFFFC011
0100100A00000000000091020000FFFFFFFFFFC011

0100100A00000000000010080000FFFFFFFFFFE8E8
0100100A00000000040400040000FFFFFFFFFFE8E8
0100100A00000000410008200000FFFFFFFFFFE8E8
0100100A00802000000200040000FFFFFFFFFFE8E8
0100100A00000000006080080000FFFFFFFFFFE8E8
0100100A00000000002060080000FFFFFFFFFFE8E8

I wrote a C# program to test these messages and determine what Poly would also result in collisions, I am awaiting those results to finish.

If I can find a Poly that produces collisions when and only when the mystery algo does, then I will attempt to use that and build a lookup table from there.

Code

Here is my code I'm using to test which Poly cause collisions on a set of test input messages.

The idea would be to come up with a CRC configuration that produces collisions when and only when the mystery algorithm does, then try to build a 65536 entry lookup table between the CRC output and the values the mystery would produce.

It occurs to me I may not even need to be testing Init values in this case

On the first test I ran, I came up with possible Poly of:

0xFD00
0x8000
0x5600

My operating theory is that they chose a terribly "weak" poly that is vulnerable to 16 bit words that are decimal 1, 2, or a multiple of 8, or that its something different than a CRC

Code:

using System.Collections; using System.Collections.Generic; using System.Linq; using System.Text; using static System.Runtime.InteropServices.JavaScript.JSType; namespace CRC_CollisionResearcher { internal class Program { static byte[] StringToByteArray(string hex) { return Enumerable.Range(0, hex.Length) .Where(x => x % 2 == 0) .Select(x => Convert.ToByte(hex.Substring(x, 2), 16)) .ToArray(); } static void Main() { string LogPath = "CollisionResearch_FullResults.txt"; string LogPath2 = "CollisionResearch_PolysOnly.txt"; if (File.Exists(LogPath)) { File.Delete(LogPath); } else { var LogFileStream = File.Create(LogPath); LogFileStream.Close(); } if (!File.Exists(LogPath2)) { var LogFileStream2 = File.Create(LogPath2); LogFileStream2.Close(); } Console.WriteLine("Welcome to CRC Collision Researcher!\n"); Console.WriteLine("You will need 3-7 sample messages with the 16bit checksums removed, that are known to / should result in collisions.\n"); Console.WriteLine("NOTICE: The input messages must be an even number of bytes\n"); Console.WriteLine("How many sample messages would you like to provide?"); int desiredInputs = int.Parse(Console.ReadLine()); List<byte[]> inputs = new List<byte[]>(); for (int k = 0; k < desiredInputs; k++) { Console.WriteLine("Enter message " + (k+1) + ":"); string tempstring = Console.ReadLine().Replace(" ", "").ToUpper(); if (tempstring.Length % 4 != 0) { Console.WriteLine("ERROR: The string you entered does not represent an EVEN number of bytes (ENTER to exit)\n"); while (Console.ReadKey(true).Key != ConsoleKey.Enter) ; Environment.Exit(0); } inputs.Add(StringToByteArray(tempstring)); } ushort poly = 0xFFFF; int initialValue = 0x0000; DateTime previousTime = DateTime.Now; Console.WriteLine("\nProcessing... Poly: " + poly.ToString("X4") + " (0% complete)"); while (poly > 0x0) { int NoteWorthyPoly = 0; if (poly != 65500 && poly % 655 == 0) { TimeSpan ts = DateTime.Now - previousTime; TimeSpan TimeRemaining = ts.Multiply(100 - ((0xFFFF - poly) / 655)); StreamWriter Logger3 = new StreamWriter(LogPath, true); Console.WriteLine("Processing... Poly: " + poly.ToString("X4") + " (" + ((0xFFFF - poly) / 655) + "% complete, Time Remaining: " + TimeRemaining.ToString(@"dd\.hh\:mm\:ss") + ")"); Logger3.WriteLine("Processing... Poly: " + poly.ToString("X4") + " (" + ((0xFFFF - poly) / 655) + "% complete, Time Remaining: " + TimeRemaining.ToString(@"dd\.hh\:mm\:ss") + ")"); Logger3.Close(); previousTime = DateTime.Now; } while (initialValue <= 0xFFFF) { List<ushort> CrcResultsCache = new List<ushort>(); for (int i = 0; i < inputs.Count; i++) { CrcResultsCache.Add(Crc16(inputs[i], poly, (ushort)initialValue)); } if (CrcResultsCache.Count != CrcResultsCache.Distinct().Count()) { StreamWriter Logger = new StreamWriter(LogPath, true); NoteWorthyPoly = 1; Console.WriteLine("\nCOLLISIONS FOUND!!!\n"); Logger.WriteLine("\nCOLLISIONS FOUND!!!\n"); Console.WriteLine("Current CRC Parameters:"); Logger.WriteLine("Current CRC Parameters:"); Console.WriteLine("Poly: " + poly.ToString("X4") + " Init: " + initialValue.ToString("X4")); Logger.WriteLine("Poly: " + poly.ToString("X4") + " Init: " + initialValue.ToString("X4")); Console.WriteLine("List of Collisions:"); Logger.WriteLine("List of Collisions:"); var q = from x in CrcResultsCache group x by x into g let count = g.Count() orderby count descending select new { Value = g.Key, Count = count }; List<int> Counts = new List<int>(); foreach (var x in q) { Console.WriteLine("Value: " + x.Value.ToString("X4") + " Count: " + x.Count); Logger.WriteLine("Value: " + x.Value.ToString("X4") + " Count: " + x.Count); Counts.Add(x.Count); } Console.WriteLine("Highest Collision Count: " + Counts.Max()); Logger.WriteLine("Highest Collision Count: " + Counts.Max()); Logger.Close(); } initialValue++; } if (NoteWorthyPoly == 1) { StreamWriter Logger11 = new StreamWriter(LogPath2, true); Logger11.WriteLine("Poly: " + poly.ToString("X4")); Logger11.Close(); } initialValue = 0x0000; //StreamWriter Logger2 = new StreamWriter(LogPath, true); poly--; //Logger2.WriteLine("Progress Status: Poly: " + poly.ToString("X4")); //Logger2.Close(); } } static ushort Crc16(byte[] bytes, ushort poly, ushort initialValue) { ushort[] table = new ushort[256]; ushort temp, a; ushort crc = initialValue; for (int i = 0; i < table.Length; ++i) { temp = 0; a = (ushort)(i << 8); for (int j = 0; j < 8; ++j) { if (((temp ^ a) & 0x8000) != 0) temp = (ushort)((temp << 1) ^ poly); else temp <<= 1; a <<= 1; } table[i] = temp; } for (int i = 0; i < bytes.Length; ++i) { crc = (ushort)((crc << 8) ^ table[((crc >> 8) ^ (0xff & bytes[i]))]); } return crc; } } }