C Board  

Go Back   C Board > Community Boards > Contests Board

Reply
 
LinkBack Thread Tools Display Modes
Old 09-23-2009, 12:30 AM   #1
Rampaging 35 Stone Welsh
 
abachler's Avatar
 
Join Date: Apr 2007
Posts: 2,929
Text Compression Contest

OK, this is similar to sebastiani's contest, except there are no rules on what type of compression you use. The target is a txt file in ascii format.

Best compression within a reasonable time limit wins.

Implement your compression as a function that accepts a pointer to a zero terminated string consisting of the entire text of War and Peace as input and a dereferenced pointer (pointer to a pointer) where it should place the output. The function must also return the number of bytes used in the output. Below is an example declaration -

Code:
unsigned long abachler_compress(char* Input , unsigned char** Ouput);
Implement your decompressor as a function that takes a pointer to an array containing the compressed data, the size of the array, and returns a pointer to the decompressed output, which must be zero terminated. An example is below -
Code:
char* abachler_decompress(unsigned char* Input , unsigned long InputSize);
Rules -
Must be written in C++, C code is fine as long as it compiles as C++.
No use of global variables.
No use of the file system.
All code must be yours, no use of 3rd party compression libraries or API calls.
If your function crashes the application for any reason you lose.
Inline assembly is fine as long as it does not jump outside the assembly block.
Runtime is limited to 1 hour maximum on a Pentium 4 3.2 GHz with HT and 2GB ram.
You may use threads.
All submissions must be made by 23:59CST (-6 GMT) on 9/30/2009

The following is the main() function that will test the application I highly suggest you use this exact code for testing as no modifications will be made to accommodate anyones code. -

Code:
#include <malloc.h>
#include <stdio.h>

#define SIZE_OF_WAR_AND_PEACE 0 // will be actual size of the file

unsigned long abachler_compress(char* Input , unsigned char** Ouput);
char* abachler_decompress(unsigned char* Input , unsigned long InputSize);

int main(){

	FILE* pFile = fopen("warandpeace.txt" , "r+b");

	char* pRandomVariableName = (char*)malloc(SIZE_OF_WAR_AND_PEACE);
	fread(pRandomVariableName , 1 , SIZE_OF_WAR_AND_PEACE , pFile);
	fclose(pFile);
	
	unsigned char* pCompressed;
	unsigned long DataSize = abachler_compress(pRandomVariableName , &pCompressed);

	// compressed data may be copied to a new array at this point

	char* pDecompressed = abachler_decompress(pCompressed , DataSize);
		
	return 0;
	}
__________________
He is free, you say. Ah! That is his misfortune… These men… [have] the most terrible, the most imperious of masters, that is, need. … They must therefore find someone to hire them, or die of hunger. Is that to be free? - Simon Linguet

Last edited by abachler; 09-23-2009 at 12:36 AM.
abachler is offline   Reply With Quote
Old 09-23-2009, 12:44 AM   #2
C++ Witch
 
laserlight's Avatar
 
Join Date: Oct 2003
Location: Singapore
Posts: 10,368
Quote:
Originally Posted by abachler
All code must be yours, no use of 3rd party compression libraries or API calls.
You might want to clarify if use of the standard library is allowed, and if use of libraries unrelated to compression (e.g., a library for threads) is allowed as long as the relevant library files are provided (and presumably the application is rejected if a reasonable attempt to compile it fails).

Quote:
Originally Posted by abachler
The following is the main() function that will test the application I highly suggest you use this exact code for testing as no modifications will be made to accommodate anyones code.
I suggest that you change <malloc.h> to <stdlib.h> since <malloc.h> is non-standard/pre-standard. The Comeau online compiler is an example of a rather standard conformant C++ compiler that does not accept your program due to a missing header file.
__________________
C + C++ Compiler: MinGW port of GCC
Build + Version Control System: SCons + Bazaar

Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
laserlight is offline   Reply With Quote
Old 09-23-2009, 06:11 AM   #3
Registered User
 
Join Date: Sep 2009
Posts: 63
Very interesting. Even if I don't submit anything, I do think this will be a neat exercise.
Zach_the_Lizard is offline   Reply With Quote
Old 09-23-2009, 10:25 AM   #4
Hat seller extraordinaire
 
Join Date: Apr 2008
Posts: 159
Quote:
Originally Posted by abachler View Post
no modifications will be made to accommodate anyones code.
Well, with the leak intact...

Code:
unsigned int compress(char* text, char** compressed)
{
    *compressed = text;
    return sizeof(*compressed);
}

char* decompress(char* compressed, unsigned int /*len*/)
{
    return compressed;
}
adeyblue is offline   Reply With Quote
Old 09-23-2009, 11:49 PM   #5
Rampaging 35 Stone Welsh
 
abachler's Avatar
 
Join Date: Apr 2007
Posts: 2,929
Quote:
Originally Posted by adeyblue View Post
Well, with the leak intact...
There's no leak, windows implicitly frees the memory when the process terminates.

While your entry is technically valid, I'm not even going to bother running it since it will lose to any non-smartass entry.
It also fails to decompress the data properly. The last time i checked War and Peace was longer than 4 bytes.
Consider yourself disqualified.

Quote:
Originally Posted by laserlight View Post
I suggest that you change <malloc.h> to <stdlib.h> since <malloc.h> is non-standard/pre-standard. The Comeau online compiler is an example of a rather standard conformant C++ compiler that does not accept your program due to a missing header file.
malloc.h is present on my system, and since you are not allowed to modify main() in any case it is irrelevant. If you require specific headers you need to include them in your submission. You can use any library you want with the caveat that I will not be linking OBJ LIB or DLL files to the project, so you either have to include something that comes standard with Visual Studio 2008 Express or figure out a way to include the source code for the libraries in your submission.

I would suggest you focus on the contest and not trying to nitpick the rules.
__________________
He is free, you say. Ah! That is his misfortune… These men… [have] the most terrible, the most imperious of masters, that is, need. … They must therefore find someone to hire them, or die of hunger. Is that to be free? - Simon Linguet

Last edited by abachler; 09-24-2009 at 12:00 AM.
abachler is offline   Reply With Quote
Old 09-24-2009, 12:06 AM   #6
Senior software engineer
 
brewbuck's Avatar
 
Join Date: Mar 2007
Location: Portland, OR
Posts: 5,381
The rules have a gaping hole, in that they don't forbid the use of the actual text of War and Peace. I could embed a reference text in my program, and output the diff between that text and the input. If I got lucky and used exactly the same text you used (from Gutenberg, right?) I might have an output size of 0 bytes while remaining within the rules.

It's actually amazing that you selected that text, because it's the same text I use as a mid-size corpus for language processing experiments. I have a LOT of experience with that text, and I know its statistics and idiosyncracies
__________________
"Congratulations on your purchase. To begin using your quantum computer, set the power switch to both off and on simultaneously." -- raftpeople@slashdot
brewbuck is offline   Reply With Quote
Old 09-24-2009, 12:34 AM   #7
Rampaging 35 Stone Welsh
 
abachler's Avatar
 
Join Date: Apr 2007
Posts: 2,929
Quote:
Originally Posted by brewbuck View Post
The rules have a gaping hole, in that they don't forbid the use of the actual text of War and Peace. I could embed a reference text in my program, and output the diff between that text and the input. If I got lucky and used exactly the same text you used (from Gutenberg, right?) I might have an output size of 0 bytes while remaining within the rules.

It's actually amazing that you selected that text, because it's the same text I use as a mid-size corpus for language processing experiments. I have a LOT of experience with that text, and I know its statistics and idiosyncracies
The gaping hole runs smack into cboards upload limit of course...
so no gaping hole. I also didn't specify what language it is in.
__________________
He is free, you say. Ah! That is his misfortune… These men… [have] the most terrible, the most imperious of masters, that is, need. … They must therefore find someone to hire them, or die of hunger. Is that to be free? - Simon Linguet
abachler is offline   Reply With Quote
Old 09-24-2009, 11:15 PM   #8
Registered User
 
Join Date: Apr 2006
Location: United States
Posts: 3,202
Since it's an ASCII file I would hazard a guess it is English.
__________________
Os iusti meditabitur sapientiam
Et lingua eius loquetur indicium

"There is nothing either good or bad, but thinking makes it so." (Shakespeare, Hamlet, Act II scene ii)

http://www.myspace.com/whiteflags99
whiteflags is offline   Reply With Quote
Old 09-24-2009, 11:33 PM   #9
Registered User
 
Join Date: Dec 2006
Posts: 1,780
Quote:
Originally Posted by abachler View Post
The gaping hole runs smack into cboards upload limit of course...
so no gaping hole. I also didn't specify what language it is in.
I think it would be a good idea measure the size of BOTH the compressed text and the decompresser, like the calgary challenge. That would eliminate these kind of gaping holes.
cyberfish is offline   Reply With Quote
Old 09-25-2009, 01:34 AM   #10
Rampaging 35 Stone Welsh
 
abachler's Avatar
 
Join Date: Apr 2007
Posts: 2,929
Quote:
Originally Posted by whiteflags View Post
Since it's an ASCII file I would hazard a guess it is English.
Don't bet on it, it could be in Klingon for all you know.
__________________
He is free, you say. Ah! That is his misfortune… These men… [have] the most terrible, the most imperious of masters, that is, need. … They must therefore find someone to hire them, or die of hunger. Is that to be free? - Simon Linguet
abachler is offline   Reply With Quote
Old 09-25-2009, 01:57 AM   #11
C++ Witch
 
laserlight's Avatar
 
Join Date: Oct 2003
Location: Singapore
Posts: 10,368
Quote:
Originally Posted by abachler
malloc.h is present on my system, and since you are not allowed to modify main() in any case it is irrelevant.
It is relevant because you are going to test with that code and you "highly suggest (submitters) use this exact code for testing as no modifications will be made to accommodate anyones code" (emphasis mine). Of course, it is only a theoretical problem since one should be able to safely change <malloc.h> to <stdlib.h> without fear that the code will work on their system but not on yours.

Quote:
Originally Posted by abachler
I would suggest you focus on the contest and not trying to nitpick the rules.
Hey, I am just trying to help you improve your contest rules
__________________
C + C++ Compiler: MinGW port of GCC
Build + Version Control System: SCons + Bazaar

Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
laserlight is offline   Reply With Quote
Old 09-25-2009, 02:36 AM   #12
Rampaging 35 Stone Welsh
 
abachler's Avatar
 
Join Date: Apr 2007
Posts: 2,929
Quote:
Originally Posted by laserlight View Post
Hey, I am just trying to help you improve your contest rules
There's nothing wrong with the rules as they are. No professional compiler would be so anal as to NOT include malloc.h if only for compatibility with pre-standard code. If your implementation doesn't support malloc.h then too bad, it's not my problem. As I said, I highly SUGGEST, you are free to do whatever you want, if it fails to run it fails to run. But, as laserlight pointed out, it is unlikely that any implementation would run with stdlib and fail to run with malloc. Since malloc is used in the main function, not in your function, you are free to include whatever headers you desire, within the limits of the contest.

As a sign of good faith, I will provide a very simple compressor. Obviously this doesn't meet the requirements of the contest, as I rolle dthe whole thing into main. it also suffers from poor compression because of a known weakness, that is each letter gets compressed by 2 bits, but every non letter, specifically spaces, eat an extra 6 bits. Overall it saved only 60KB out fo a 3MB file. Like I said its very crude compression.

Code:
int main(){
	char* pBuffer = (char*)malloc(1048576);

	FILE* pFile = fopen("e:\\warandpeace.txt" , "r+b");
	long BytesRead = -1;
	unsigned long ByteCount[256];
	for(int x = 0;x<256;x++) ByteCount[x] = 0;
	unsigned long TotalBytes = 0;
	
	while(BytesRead){
		BytesRead = fread(pBuffer , 1 , 1048576 , pFile);
		if(BytesRead) for(int x=0;x<BytesRead;x++) ByteCount[pBuffer[x]]++;
		TotalBytes += BytesRead;
		}

	fclose(pFile);

	
	pFile = 0;
	pFile = fopen("e:\\ByteCount.txt" , "w+b");
	long Index = 0;
	for(int x = 0;x<256;x++){
		fprintf(pFile , "%2.2x - %d%c%c" , x , ByteCount[x] , 13 , 10);
		if(ByteCount[x]) Index++;
		}
	fprintf(pFile , "%c%cTotal Symbols Used %d%c%c" , 13 , 10 , Index , 13 , 10);
	fclose(pFile);

	pBuffer = (char*)realloc(pBuffer , TotalBytes);

	pFile = fopen("e:\\warandpeace.txt" , "r+b");
	fread(pBuffer , 1 , TotalBytes , pFile);
	fclose(pFile);

	char* pOutput = (char*)malloc(TotalBytes);
	unsigned long BitCache = 0;
	long BitIndex = 0;
	Index = 0;

	pFile = fopen("e:\\warandpeace.paq" , "w+b");
	for(int x = 0;x<TotalBytes;x++){
		if((pBuffer[x] > 64) && (pBuffer[x] < 128)){
			BitCache *= 64;
			BitCache = BitCache & 0xFFFFFFC4;
			BitCache = BitCache | (pBuffer[x] & 63);
			BitIndex += 6;
			} else {
			BitCache *= 64;
			BitCache = BitCache & 0xFFFFFFC4;
			BitIndex += 6;

			BitCache *= 256;
			BitCache = BitCache & 0xFFFFFF00;
			BitCache = BitCache | pBuffer[x];
			BitIndex += 8;
			}

		while(BitIndex > 7){
			pOutput[Index] = BitCache & 0xFF;
			BitCache /= 256;
			BitCache = BitCache & 0x00FFFFFF;
			BitIndex -= 8;
			Index++;
			}
		}

	while(BitIndex > 0){
		pOutput[Index] = BitCache & 0xFF;
		BitCache /= 256;
		BitCache = BitCache & 0x00FFFFFF;
		BitIndex -= 8;
		Index++;
		}
	
	fwrite(&TotalBytes , sizeof(unsigned long) , 1 , pFile);
	fwrite(&Index , sizeof(unsigned long) , 1 , pFile);
	fwrite(pOutput , 1 , Index , pFile);

	fclose(pFile);

	return 0;
	}
__________________
He is free, you say. Ah! That is his misfortune… These men… [have] the most terrible, the most imperious of masters, that is, need. … They must therefore find someone to hire them, or die of hunger. Is that to be free? - Simon Linguet

Last edited by abachler; 09-25-2009 at 11:41 AM.
abachler is offline   Reply With Quote
Old 09-26-2009, 02:05 PM   #13
Registered User
 
Join Date: Dec 2006
Posts: 1,780
And of course, this is a "This is my contest. Take it or leave it. No comments or suggestions accepted nor welcomed" contest.
cyberfish is offline   Reply With Quote
Old 09-28-2009, 03:32 PM   #14
+++ OK NO CARRIER
 
quzah's Avatar
 
Join Date: Oct 2001
Posts: 10,262
I've always wanted to make one that used a RNG to compress. The file would basically be just a set of 265 number pairs. The seed, and the skip. The seed would be the seed you need to use for the RNG to produce the required set, and the skip would be how far down that seed you need to skip before you can start using it. And also, the final size of the output file in bytes.

It would basically look like this:
Code:
type seedset[2][265];
for each of 256
    read pairs into seedset

for each of 256
    seed( seedset[0][x] )
    skip( seedset[1][x] ) // basically a loop to eat rand calls
    output x (where x is the current number 0 to 255 ) at rand()
    // say we're on loop 'a' ... here we skip, then we start outputting 'a' at rand()
    // position from where we last put an 'a' (or from byte 0 in the case of the first call)
In theory (monkeys and typewriters) it should be killer. You'd just need a quantum computer to calculate the 256 pairs you'd need to generate the correct output file.

It's still a fun idea.


Quzah.
__________________
Hundreds of thousands of dipshits can't be wrong.


Are you up for the suck?

Last edited by quzah; 09-28-2009 at 03:38 PM.
quzah is offline   Reply With Quote
Old 09-29-2009, 09:45 PM   #15
Rampaging 35 Stone Welsh
 
abachler's Avatar
 
Join Date: Apr 2007
Posts: 2,929
Quote:
Originally Posted by quzah View Post
I've always wanted to make one that used a RNG to compress. The file would basically be just a set of 265 number pairs. The seed, and the skip. The seed would be the seed you need to use for the RNG to produce the required set, and the skip would be how far down that seed you need to skip before you can start using it. And also, the final size of the output file in bytes.

It's still a fun idea.


Quzah.
So basically a bible code compressor using a an RNG with a run length of at least the factorial of 256^ length of the book. Yeah I'm thinking the seed would probably exceed the size of the original file. But it's a good idea anyway.
__________________
He is free, you say. Ah! That is his misfortune… These men… [have] the most terrible, the most imperious of masters, that is, need. … They must therefore find someone to hire them, or die of hunger. Is that to be free? - Simon Linguet
abachler is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Detecting Edit Control Text yoonkwun Windows Programming 2 07-12-2009 08:48 AM
DirectX | Drawing text gavra Game Programming 4 06-08-2009 12:23 AM
Small HTML question Thantos Tech Board 4 12-29-2003 12:37 AM
WANTED: Contest Master kermi3 A Brief History of Cprogramming.com 15 01-23-2003 10:15 PM
Ok, Structs, I need help I am not familiar with them incognito C++ Programming 7 06-29-2002 09:45 PM


All times are GMT -6. The time now is 02:12 PM.


Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.3.0 RC2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22