![]() |
| | #1 |
| Rampaging 35 Stone Welsh Join Date: Apr 2007
Posts: 2,929
| Text Compression Contest Best compression within a reasonable time limit wins. Implement your compression as a function that accepts a pointer to a zero terminated string consisting of the entire text of War and Peace as input and a dereferenced pointer (pointer to a pointer) where it should place the output. The function must also return the number of bytes used in the output. Below is an example declaration - Code: unsigned long abachler_compress(char* Input , unsigned char** Ouput); Code: char* abachler_decompress(unsigned char* Input , unsigned long InputSize); Must be written in C++, C code is fine as long as it compiles as C++. No use of global variables. No use of the file system. All code must be yours, no use of 3rd party compression libraries or API calls. If your function crashes the application for any reason you lose. Inline assembly is fine as long as it does not jump outside the assembly block. Runtime is limited to 1 hour maximum on a Pentium 4 3.2 GHz with HT and 2GB ram. You may use threads. All submissions must be made by 23:59CST (-6 GMT) on 9/30/2009 The following is the main() function that will test the application I highly suggest you use this exact code for testing as no modifications will be made to accommodate anyones code. - Code: #include <malloc.h>
#include <stdio.h>
#define SIZE_OF_WAR_AND_PEACE 0 // will be actual size of the file
unsigned long abachler_compress(char* Input , unsigned char** Ouput);
char* abachler_decompress(unsigned char* Input , unsigned long InputSize);
int main(){
FILE* pFile = fopen("warandpeace.txt" , "r+b");
char* pRandomVariableName = (char*)malloc(SIZE_OF_WAR_AND_PEACE);
fread(pRandomVariableName , 1 , SIZE_OF_WAR_AND_PEACE , pFile);
fclose(pFile);
unsigned char* pCompressed;
unsigned long DataSize = abachler_compress(pRandomVariableName , &pCompressed);
// compressed data may be copied to a new array at this point
char* pDecompressed = abachler_decompress(pCompressed , DataSize);
return 0;
}
__________________ He is free, you say. Ah! That is his misfortune… These men… [have] the most terrible, the most imperious of masters, that is, need. … They must therefore find someone to hire them, or die of hunger. Is that to be free? - Simon Linguet Last edited by abachler; 09-23-2009 at 12:36 AM. |
| abachler is offline | |
| | #2 | ||
| C++ Witch Join Date: Oct 2003 Location: Singapore
Posts: 10,368
| Quote:
Quote:
__________________ C + C++ Compiler: MinGW port of GCC Build + Version Control System: SCons + Bazaar Look up a C/C++ Reference and learn How To Ask Questions The Smart Way | ||
| laserlight is offline | |
| | #3 |
| Registered User Join Date: Sep 2009
Posts: 63
| Very interesting. Even if I don't submit anything, I do think this will be a neat exercise. |
| Zach_the_Lizard is offline | |
| | #4 |
| Hat seller extraordinaire Join Date: Apr 2008
Posts: 159
| Well, with the leak intact... Code: unsigned int compress(char* text, char** compressed)
{
*compressed = text;
return sizeof(*compressed);
}
char* decompress(char* compressed, unsigned int /*len*/)
{
return compressed;
}
|
| adeyblue is offline | |
| | #5 | |
| Rampaging 35 Stone Welsh Join Date: Apr 2007
Posts: 2,929
| There's no leak, windows implicitly frees the memory when the process terminates. While your entry is technically valid, I'm not even going to bother running it since it will lose to any non-smartass entry. It also fails to decompress the data properly. The last time i checked War and Peace was longer than 4 bytes. Consider yourself disqualified. Quote:
I would suggest you focus on the contest and not trying to nitpick the rules.
__________________ He is free, you say. Ah! That is his misfortune… These men… [have] the most terrible, the most imperious of masters, that is, need. … They must therefore find someone to hire them, or die of hunger. Is that to be free? - Simon Linguet Last edited by abachler; 09-24-2009 at 12:00 AM. | |
| abachler is offline | |
| | #6 |
| Senior software engineer Join Date: Mar 2007 Location: Portland, OR
Posts: 5,381
| The rules have a gaping hole, in that they don't forbid the use of the actual text of War and Peace. I could embed a reference text in my program, and output the diff between that text and the input. If I got lucky and used exactly the same text you used (from Gutenberg, right?) I might have an output size of 0 bytes while remaining within the rules. It's actually amazing that you selected that text, because it's the same text I use as a mid-size corpus for language processing experiments. I have a LOT of experience with that text, and I know its statistics and idiosyncracies
__________________ "Congratulations on your purchase. To begin using your quantum computer, set the power switch to both off and on simultaneously." -- raftpeople@slashdot |
| brewbuck is offline | |
| | #7 | |
| Rampaging 35 Stone Welsh Join Date: Apr 2007
Posts: 2,929
| Quote:
so no gaping hole. I also didn't specify what language it is in.
__________________ He is free, you say. Ah! That is his misfortune… These men… [have] the most terrible, the most imperious of masters, that is, need. … They must therefore find someone to hire them, or die of hunger. Is that to be free? - Simon Linguet | |
| abachler is offline | |
| | #8 |
| Registered User Join Date: Apr 2006 Location: United States
Posts: 3,202
| Since it's an ASCII file I would hazard a guess it is English.
__________________ Os iusti meditabitur sapientiam Et lingua eius loquetur indicium "There is nothing either good or bad, but thinking makes it so." (Shakespeare, Hamlet, Act II scene ii) http://www.myspace.com/whiteflags99 |
| whiteflags is offline | |
| | #9 |
| Registered User Join Date: Dec 2006
Posts: 1,780
| I think it would be a good idea measure the size of BOTH the compressed text and the decompresser, like the calgary challenge. That would eliminate these kind of gaping holes. |
| cyberfish is offline | |
| | #10 |
| Rampaging 35 Stone Welsh Join Date: Apr 2007
Posts: 2,929
| Don't bet on it, it could be in Klingon for all you know.
__________________ He is free, you say. Ah! That is his misfortune… These men… [have] the most terrible, the most imperious of masters, that is, need. … They must therefore find someone to hire them, or die of hunger. Is that to be free? - Simon Linguet |
| abachler is offline | |
| | #11 | ||
| C++ Witch Join Date: Oct 2003 Location: Singapore
Posts: 10,368
| Quote:
Quote:
__________________ C + C++ Compiler: MinGW port of GCC Build + Version Control System: SCons + Bazaar Look up a C/C++ Reference and learn How To Ask Questions The Smart Way | ||
| laserlight is offline | |
| | #12 |
| Rampaging 35 Stone Welsh Join Date: Apr 2007
Posts: 2,929
| There's nothing wrong with the rules as they are. No professional compiler would be so anal as to NOT include malloc.h if only for compatibility with pre-standard code. If your implementation doesn't support malloc.h then too bad, it's not my problem. As I said, I highly SUGGEST, you are free to do whatever you want, if it fails to run it fails to run. But, as laserlight pointed out, it is unlikely that any implementation would run with stdlib and fail to run with malloc. Since malloc is used in the main function, not in your function, you are free to include whatever headers you desire, within the limits of the contest. As a sign of good faith, I will provide a very simple compressor. Obviously this doesn't meet the requirements of the contest, as I rolle dthe whole thing into main. it also suffers from poor compression because of a known weakness, that is each letter gets compressed by 2 bits, but every non letter, specifically spaces, eat an extra 6 bits. Overall it saved only 60KB out fo a 3MB file. Like I said its very crude compression. Code: int main(){
char* pBuffer = (char*)malloc(1048576);
FILE* pFile = fopen("e:\\warandpeace.txt" , "r+b");
long BytesRead = -1;
unsigned long ByteCount[256];
for(int x = 0;x<256;x++) ByteCount[x] = 0;
unsigned long TotalBytes = 0;
while(BytesRead){
BytesRead = fread(pBuffer , 1 , 1048576 , pFile);
if(BytesRead) for(int x=0;x<BytesRead;x++) ByteCount[pBuffer[x]]++;
TotalBytes += BytesRead;
}
fclose(pFile);
pFile = 0;
pFile = fopen("e:\\ByteCount.txt" , "w+b");
long Index = 0;
for(int x = 0;x<256;x++){
fprintf(pFile , "%2.2x - %d%c%c" , x , ByteCount[x] , 13 , 10);
if(ByteCount[x]) Index++;
}
fprintf(pFile , "%c%cTotal Symbols Used %d%c%c" , 13 , 10 , Index , 13 , 10);
fclose(pFile);
pBuffer = (char*)realloc(pBuffer , TotalBytes);
pFile = fopen("e:\\warandpeace.txt" , "r+b");
fread(pBuffer , 1 , TotalBytes , pFile);
fclose(pFile);
char* pOutput = (char*)malloc(TotalBytes);
unsigned long BitCache = 0;
long BitIndex = 0;
Index = 0;
pFile = fopen("e:\\warandpeace.paq" , "w+b");
for(int x = 0;x<TotalBytes;x++){
if((pBuffer[x] > 64) && (pBuffer[x] < 128)){
BitCache *= 64;
BitCache = BitCache & 0xFFFFFFC4;
BitCache = BitCache | (pBuffer[x] & 63);
BitIndex += 6;
} else {
BitCache *= 64;
BitCache = BitCache & 0xFFFFFFC4;
BitIndex += 6;
BitCache *= 256;
BitCache = BitCache & 0xFFFFFF00;
BitCache = BitCache | pBuffer[x];
BitIndex += 8;
}
while(BitIndex > 7){
pOutput[Index] = BitCache & 0xFF;
BitCache /= 256;
BitCache = BitCache & 0x00FFFFFF;
BitIndex -= 8;
Index++;
}
}
while(BitIndex > 0){
pOutput[Index] = BitCache & 0xFF;
BitCache /= 256;
BitCache = BitCache & 0x00FFFFFF;
BitIndex -= 8;
Index++;
}
fwrite(&TotalBytes , sizeof(unsigned long) , 1 , pFile);
fwrite(&Index , sizeof(unsigned long) , 1 , pFile);
fwrite(pOutput , 1 , Index , pFile);
fclose(pFile);
return 0;
}
__________________ He is free, you say. Ah! That is his misfortune… These men… [have] the most terrible, the most imperious of masters, that is, need. … They must therefore find someone to hire them, or die of hunger. Is that to be free? - Simon Linguet Last edited by abachler; 09-25-2009 at 11:41 AM. |
| abachler is offline | |
| | #13 |
| Registered User Join Date: Dec 2006
Posts: 1,780
| And of course, this is a "This is my contest. Take it or leave it. No comments or suggestions accepted nor welcomed" contest. |
| cyberfish is offline | |
| | #14 |
| +++ OK NO CARRIER Join Date: Oct 2001
Posts: 10,262
| I've always wanted to make one that used a RNG to compress. The file would basically be just a set of 265 number pairs. The seed, and the skip. The seed would be the seed you need to use for the RNG to produce the required set, and the skip would be how far down that seed you need to skip before you can start using it. And also, the final size of the output file in bytes. It would basically look like this: Code: type seedset[2][265];
for each of 256
read pairs into seedset
for each of 256
seed( seedset[0][x] )
skip( seedset[1][x] ) // basically a loop to eat rand calls
output x (where x is the current number 0 to 255 ) at rand()
// say we're on loop 'a' ... here we skip, then we start outputting 'a' at rand()
// position from where we last put an 'a' (or from byte 0 in the case of the first call)
It's still a fun idea. Quzah.
__________________ Hundreds of thousands of dipshits can't be wrong. Are you up for the suck? Last edited by quzah; 09-28-2009 at 03:38 PM. |
| quzah is offline | |
| | #15 | |
| Rampaging 35 Stone Welsh Join Date: Apr 2007
Posts: 2,929
| Quote:
__________________ He is free, you say. Ah! That is his misfortune… These men… [have] the most terrible, the most imperious of masters, that is, need. … They must therefore find someone to hire them, or die of hunger. Is that to be free? - Simon Linguet | |
| abachler is offline | |
![]() |
| Thread Tools | |
| Display Modes | |
|
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Detecting Edit Control Text | yoonkwun | Windows Programming | 2 | 07-12-2009 08:48 AM |
| DirectX | Drawing text | gavra | Game Programming | 4 | 06-08-2009 12:23 AM |
| Small HTML question | Thantos | Tech Board | 4 | 12-29-2003 12:37 AM |
| WANTED: Contest Master | kermi3 | A Brief History of Cprogramming.com | 15 | 01-23-2003 10:15 PM |
| Ok, Structs, I need help I am not familiar with them | incognito | C++ Programming | 7 | 06-29-2002 09:45 PM |