Code:
D:\Leprechaun_BBhex_rev15fixfix_subrevA_OSHO>dir test.txt
10/10/2012 08:53 PM 10 test.txt
D:\Leprechaun_BBhex_rev15fixfix_subrevA_OSHO>type test.txt
Lo! Hello!
D:\Leprechaun_BBhex_rev15fixfix_subrevA_OSHO>dir test.txt/b>test.lst
D:\Leprechaun_BBhex_rev15fixfix_subrevA_OSHO>Leprechaun_BB002hex.exe test.lst test.wrd 3 y
Leprechaun_BBhex (Fast-In-Future Greedy Building-Block-Ripper), subrev. A, BB = 2.
Leprechaun_singleton (Fast-In-Future Greedy n-gram-Ripper), rev. 15FIXFIX, written by Svalqyatchx.
Purpose: Rips all distinct 1-grams (1-word phrases) with length 1..31 chars from incoming texts.
Feature1: All words within x-lets/n-grams are in range 1..31 chars inclusive.
Feature2: In this revision 128MB 1-way hash is used which results in 16,777,216 external B-Trees of order 3.
Feature3: In this revision 1 pass is to be made.
Feature4: If the external memory has latency 99+microseconds then !(look no further), IOPS(seek-time) rules.
Pass #1 of 1:
Size of input file with files for Leprechauning: 10
Allocating HASH memory 134,217,793 bytes ... OK
Allocating memory 1MB ... OK
Size of Input TEXTual file: 10
/; 00,000,009P/s; Phrase count: 9 of them 8 distinct; Done: 64/64
Bytes per second performance: 10B/s
Phrases per second performance: 9P/s
Time for putting phrases into trees: 1 second(s)
Flushing UNsorted phrases: 100%; Shaking trees performance: 00,000,016P/s
Time for shaking phrases from trees: 1 second(s)
Leprechaun: Current pass done.
Total memory needed for one pass: 1KB
Total distinct phrases: 8
Total time: 1 second(s)
Total performance: 9P/s i.e. phrases per second
Leprechaun: Done.
D:\Leprechaun_BBhex_rev15fixfix_subrevA_OSHO>type test.wrd
6F21
656C
2120
4865
2048
4C6F
6C6C
6C6F
D:\Leprechaun_BBhex_rev15fixfix_subrevA_OSHO>
Another dump showing what I am talking about
Code:
E:\Leprechaun_BBhex_rev15fixfix_subrevA>RUNME_dump_all_BB_2chars_long_with_OVERLAPPING.bat
E:\Leprechaun_BBhex_rev15fixfix_subrevA>Leprechaun_BB002hex.exe OSHO.LST OSHO_BB002.txt 3000 Y
Leprechaun_BBhex (Fast-In-Future Greedy Building-Block-Ripper), subrev. A, BB = 2.
Leprechaun_singleton (Fast-In-Future Greedy n-gram-Ripper), rev. 15FIXFIX, written by Svalqyatchx.
Purpose: Rips all distinct 1-grams (1-word phrases) with length 1..31 chars from incoming texts.
Feature1: All words within x-lets/n-grams are in range 1..31 chars inclusive.
Feature2: In this revision 128MB 1-way hash is used which results in 16,777,216 external B-Trees of order 3.
Feature3: In this revision 1 pass is to be made.
Feature4: If the external memory has latency 99+microseconds then !(look no further), IOPS(seek-time) rules.
Pass #1 of 1:
Size of input file with files for Leprechauning: 10
Allocating HASH memory 134,217,793 bytes ... OK
Allocating memory 3MB ... OK
Size of Input TEXTual file: 206,908,949
-; 09,852,807P/s; Phrase count: 206,908,948 of them 4,424 distinct; Done: 64/64
Bytes per second performance: 9,852,807B/s
Phrases per second performance: 9,852,807P/s
Time for putting phrases into trees: 21 second(s)
Flushing UNsorted phrases: 100%; Shaking trees performance: 00,008,848P/s
Time for shaking phrases from trees: 1 second(s)
Leprechaun: Current pass done.
Total memory needed for one pass: 180KB
Total distinct phrases: 4,424
Total time: 22 second(s)
Total performance: 9,404,952P/s i.e. phrases per second
Leprechaun: Done.
E:\Leprechaun_BBhex_rev15fixfix_subrevA>sort OSHO_BB002.txt /R /O OSHO_BB002S.txt
E:\Leprechaun_BBhex_rev15fixfix_subrevA>type OSHO_BB002S.txt | more
9,999,999 2020
5,237,360 6520
4,089,535 2074
3,507,835 7468
3,279,359 6865
3,067,788 7320
2,965,735 2061
2,940,029 7420
2,459,508 0D0A
2,433,671 696E
2,415,492 2069
2,147,895 0A20
1,987,797 616E
1,980,114 6F75
1,947,282 6E20
1,925,315 6420
1,904,822 6572
1,889,210 6973
1,882,860 7265
-- More --
Enjoy!