Thread: Leprechaun_BBhex - Superfast Building-Block Ripper

  1. #1
    Registered User Sanmayce's Avatar
    Join Date
    Oct 2012
    Location
    Bulgaria, Sofia
    Posts
    3

    Leprechaun_BBhex - Superfast Building-Block Ripper

    Hi to all C programmers,
    many times the problem with counting/dumping all distinct words from incoming files arises,
    here I want to share my experience in ripping/extracting BBs (it stands for Building-Blocks) which is quite similar.

    Regardless of my dirty C style here comes Leprechaun_BBhex.c within the freely downloadable archive:

    This console tool works fine both as 32/64bit code and both under Linux/Windows.

    Let's rip down to chunks of size 2 the next filelet (he-he) 10 bytes long:
    Lo! Hello!
    Code:
    Lo
    o!
    ! 
     H
    He
    el
    ll
    lo
    o!
    
    Since BBs are unique/distinct and we have 'o!' occurring twice the final dump is:
    Code:
    Lo
    o!
    ! 
     H
    He
    el
    ll
    lo
    Or given in HEX format:
    Code:
    D:\Leprechaun_BBhex_rev15fixfix_subrevA_OSHO>dir test.txt
    10/10/2012  08:53 PM                10 test.txt
    D:\Leprechaun_BBhex_rev15fixfix_subrevA_OSHO>type test.txt
    Lo! Hello!
    D:\Leprechaun_BBhex_rev15fixfix_subrevA_OSHO>dir test.txt/b>test.lst
    D:\Leprechaun_BBhex_rev15fixfix_subrevA_OSHO>Leprechaun_BB002hex.exe test.lst test.wrd 3 y
    Leprechaun_BBhex (Fast-In-Future Greedy Building-Block-Ripper), subrev. A, BB = 2.
    Leprechaun_singleton (Fast-In-Future Greedy n-gram-Ripper), rev. 15FIXFIX, written by Svalqyatchx.
    Purpose: Rips all distinct 1-grams (1-word phrases) with length 1..31 chars from incoming texts.
    Feature1: All words within x-lets/n-grams are in range 1..31 chars inclusive.
    Feature2: In this revision 128MB 1-way hash is used which results in 16,777,216 external B-Trees of order 3.
    Feature3: In this revision 1 pass is to be made.
    Feature4: If the external memory has latency 99+microseconds then !(look no further), IOPS(seek-time) rules.
    Pass #1 of 1:
    Size of input file with files for Leprechauning: 10
    Allocating HASH memory 134,217,793 bytes ... OK
    Allocating memory 1MB ... OK
    Size of Input TEXTual file: 10
    /; 00,000,009P/s; Phrase count: 9 of them 8 distinct; Done: 64/64
    Bytes per second performance: 10B/s
    Phrases per second performance: 9P/s
    Time for putting phrases into trees: 1 second(s)
    Flushing UNsorted phrases: 100%; Shaking trees performance: 00,000,016P/s
    Time for shaking phrases from trees: 1 second(s)
    Leprechaun: Current pass done.
    Total memory needed for one pass: 1KB
    Total distinct phrases: 8
    Total time: 1 second(s)
    Total performance: 9P/s i.e. phrases per second
    Leprechaun: Done.
    D:\Leprechaun_BBhex_rev15fixfix_subrevA_OSHO>type test.wrd
    6F21
    656C
    2120
    4865
    2048
    4C6F
    6C6C
    6C6F
    D:\Leprechaun_BBhex_rev15fixfix_subrevA_OSHO>
    Another dump showing what I am talking about:
    Code:
    E:\Leprechaun_BBhex_rev15fixfix_subrevA>RUNME_dump_all_BB_2chars_long_with_OVERLAPPING.bat
    E:\Leprechaun_BBhex_rev15fixfix_subrevA>Leprechaun_BB002hex.exe OSHO.LST OSHO_BB002.txt 3000 Y
    Leprechaun_BBhex (Fast-In-Future Greedy Building-Block-Ripper), subrev. A, BB = 2.
    Leprechaun_singleton (Fast-In-Future Greedy n-gram-Ripper), rev. 15FIXFIX, written by Svalqyatchx.
    Purpose: Rips all distinct 1-grams (1-word phrases) with length 1..31 chars from incoming texts.
    Feature1: All words within x-lets/n-grams are in range 1..31 chars inclusive.
    Feature2: In this revision 128MB 1-way hash is used which results in 16,777,216 external B-Trees of order 3.
    Feature3: In this revision 1 pass is to be made.
    Feature4: If the external memory has latency 99+microseconds then !(look no further), IOPS(seek-time) rules.
    Pass #1 of 1:
    Size of input file with files for Leprechauning: 10
    Allocating HASH memory 134,217,793 bytes ... OK
    Allocating memory 3MB ... OK
    Size of Input TEXTual file: 206,908,949
    -; 09,852,807P/s; Phrase count: 206,908,948 of them 4,424 distinct; Done: 64/64
    Bytes per second performance: 9,852,807B/s
    Phrases per second performance: 9,852,807P/s
    Time for putting phrases into trees: 21 second(s)
    Flushing UNsorted phrases: 100%; Shaking trees performance: 00,008,848P/s
    Time for shaking phrases from trees: 1 second(s)
    Leprechaun: Current pass done.
    Total memory needed for one pass: 180KB
    Total distinct phrases: 4,424
    Total time: 22 second(s)
    Total performance: 9,404,952P/s i.e. phrases per second
    Leprechaun: Done.
    E:\Leprechaun_BBhex_rev15fixfix_subrevA>sort OSHO_BB002.txt /R /O OSHO_BB002S.txt
    E:\Leprechaun_BBhex_rev15fixfix_subrevA>type OSHO_BB002S.txt  | more
    9,999,999       2020
    5,237,360       6520
    4,089,535       2074
    3,507,835       7468
    3,279,359       6865
    3,067,788       7320
    2,965,735       2061
    2,940,029       7420
    2,459,508       0D0A
    2,433,671       696E
    2,415,492       2069
    2,147,895       0A20
    1,987,797       616E
    1,980,114       6F75
    1,947,282       6E20
    1,925,315       6420
    1,904,822       6572
    1,889,210       6973
    1,882,860       7265
    -- More  --
    Enjoy!
    Last edited by webmaster; 10-13-2012 at 05:25 PM. Reason: redacted link--Sanmayce, you are welcome to repost a link to your code without osho.txt

  2. #2
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    Included in your linked 7z file (in fact, the vast majority of it) is a book copyrighted by OSHO International Foundation.

    From Osho International Foundation Copyright Information
    You may not copy, reproduce, sell, distribute, publish, display, perform, modify, create derivative works, transmit, or in any way exploit any material, works or intellectual property owned by the OSHO International Foundation without explicit permission.
    Do you have permission to distribute this work?
    The cost of software maintenance increases with the square of the programmer's creativity. - Robert D. Bliss

  3. #3
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    O_o

    I'm not even sure what purpose this serves. What is this?

    It seems like the dictionary generation side of a strict compression algorithm, but then, without the actual compression I'm not sure it has value.

    [Edit]
    Copyleft Sanmayce, 2012-Oct-10
    O_o

    That expression has no legal significance in any jurisdiction of which I'm aware.

    In most every jurisdiction I'm aware of a work is copyrighted by virtue of its creation. In other words, the work you've posted is almost certainly copyrighted and without bearing a legal license so anyone downloading has no actual right to convey or transform the source or binary.

    You may as have said "Great Googly Moogly - Cthulhu, 3154 XE" for all the good that line does you for a license.

    If you want to apply the "GPL" or other license to your code you need to follow the instructions for the relevant license; such instructions aren't there because the lawyers who draft such licenses enjoy playing games; such instruction are relevant have legal significance.

    I'm not trying to be a jerk. And no, I'm not a lawyer. And no, this isn't legal advice. (Which is fun because several courts in the United States has ruled such a disclaimer invalid in certain circumstances.) I'm just saying, if you want people to be able to use this under a "copyleft" license you need to apply such a license in such a way that it is legal.
    [/Edit]

    Soma

  4. #4
    Registered User Sanmayce's Avatar
    Join Date
    Oct 2012
    Location
    Bulgaria, Sofia
    Posts
    3
    Ha, instead of thanks here come pretentious accusations.

    @administrators
    Long story short:
    Feel free to delete my "criminal" post, but before doing so first read please the following links, in order not to say later "I didn't know".

    @phantomotap
    Hi Soma,
    thanks for the license lecture, but who told you that I want a copyleft license, I am not a programmer just a program-mess-er and I don't care
    under what license my code goes as long as it is freely accessible to ALL with source.
    >That expression has no legal significance in any jurisdiction of which I'm aware.
    Let me ask you who cares what you aware of?

    >It seems like the dictionary generation side of a strict compression algorithm, but then, without the actual compression I'm not sure it has value.
    Wow, this is deep.

    @oogabooga
    >Do you have permission to distribute this work?
    I don't need one and by the nature of OSHO legacy YES I do.

    Everyone can prefix/postfix a work with some lines like Copyright/Permissions/..., but this is as if making the work one's property - I don't
    know how well you see this matter but COPYRIGHTING a life-long work of one spiritual teacher is not just a CRIME in sense of our worldly
    strife for money but it is a crime against freedom.

    Now let me ask you in return who are you to ask, as far as I see you are an anonymous guy who is a fighter for justice?!

    Three months ago I have had some thoughts exchange with a devotee from the OSHO ashram in Pune:
    Hartmut Balke,
    Pune
    India
    wrote to me:
    [
    ...
    Here in Pune we are having a court case with the OshoAshram Management. Some two, three people think it is their source of income and are
    privately making millions of it.
    See www.oshowork.org for more information. - Copyright is my topic in this case, it will come up after this case is done.
    You and me, we both love freedom of expression and freedom of sharing. Truth will prevail.
    Wish you all the best and thanks again
    ]

    My letter to [email protected]:
    [
    I write this as a hope to strengthen SANNYASINS (www.oshowork.org in particular) positions in the court.

    "My interest is my people, who are with me ..."
    Osho

    After reading the FAQ I have been hit by this:
    Q: Why you are spreading negativity?
    A: ... Raising your voice for truth is not spreading negativity.

    Where is the negativity in wanting to enliven (again) a place of joy?!
    Personally I was acussed countless times of such SLANDERS, the need for truth is called 'negativity' by guess who - the moneymakers,
    powerlovers and hypocrites.

    There is no higher religion than truth, right.

    Georgi 'Sanmayce'
    Sofia, Bulgaria
    ]

    Osho; an Open Wave:
    Save Osho Samadhi Place Pune Support

    Here the right position is given ABSOLUTELY in a clear way:
    Although we have many areas of difference, we have listed the six main fundamental objectives that need to be resolved. This resolution should
    be made between ourselves and the administrators, including Inner Circle members, trustees and persons responsible from the following
    organizations: Osho International Foundation, Neo Sannyas Foundation and the directors/members of Osho Multimedia & Resorts Private Limited
    and Osho Media International, etc.
    It is to be clearly understood by the said trustees, managers and administrators that they are not the ‘owners’, but rather the ‘custodians’ of Osho’s legacy. This legacy includes: immovable properties; movable properties; intellectual property rights, including their digital
    versions; and the entire infrastructure supporting the same. These were created for the well-being of the beneficiaries of Osho legacy.

    Also the court case updates:
    OshoWorks.org
    India must protect Osho’s Samadhi - Analysis - DNA
    The above article says:
    After Osho died in Pune on Jan 19, 1990, he was cremated, and as per his wishes, his ashes were interred in his bedroom at his residence ‘Lao
    Tzu’ on the Osho Commune premises.
    This marble floored and marble-walled bedroom has a majestic circular chandelier and is recognised as Osho’s samadhi where his followers were
    allowed to meditate silently.
    However, over the last two decades, the Osho International Foundation (OIF), which controls all of Osho’s properties, has been taking gradual
    steps to erase the significance of the samadhi for his followers.
    ...
    It is always the disciples of the master who betray him in the end. Is something similar happening in the case of Osho?
    ...
    The problem is that we Indians are too tolerant and too slow to act and react — whether it is in regard to corruption or something else. But
    there’s no easy way out. The British or the Americans won’t fight your battles; you have to fight your own...and win.

    Enjoy!
    Last edited by Sanmayce; 10-10-2012 at 03:42 PM.

  5. #5
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    instead of thanks here come pretentious accusations
    About what are we to offer our thanks?

    The facilities this tool offers, as I understand them, can be written with a single line of Perl.

    who told you that I want a copyleft license
    O_o

    Whoever wrote the code told me that.

    You see, the code told me that. The code that told me that which is in the archive you are distributing. The code in the archive which you claim to have written.

    So, if you wrote the code you are the one who told me that.

    Soma

  6. #6
    Registered User Sanmayce's Avatar
    Join Date
    Oct 2012
    Location
    Bulgaria, Sofia
    Posts
    3
    >About what are we to offer our thanks?
    Who 'we'? Are you speaking on behalf of others?

    >The facilities this tool offers, as I understand them, can be written with a single line of Perl.
    This is even deepest, please enlighten me (not "us") what line is this.

    >So, if you wrote the code you are the one who told me that.
    Yes, I agree but I didn't know that "copyleft" is copyrighted in such a manner.

  7. #7
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    You're the one acting like a pretentious idiot. I simply asked a question, and you answered it.

    Your answer, after a bunch of weasel words, is NO, you do not have permission to distribute this bunch of meaningless garbage. Just because you are some kind of pseudo-religious nut doesn't mean you're allowed to break the law.

    The administrators have no choice but to delete your "criminal" post.

    BTW, I like the picture of the angry leprechaun you use as an avatar. Hilarious!
    The cost of software maintenance increases with the square of the programmer's creativity. - Robert D. Bliss

  8. #8
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    Yes, I agree but I didn't know that "copyleft" is copyrighted in such a manner.
    O_o

    Teaching you some small bit of the issues was the point of my edit.

    Perhaps if you were not so arrogant you would have understood that; I would have happily pointed you in the right direction so you could make sure that your work here would be "freely accessible to ALL with source" and have some ability to guarantee it legally.

    As it is, a lot of programmers wouldn't touch it even if something valuable could be learned from it because we live in a world where copyright is usually enforced.

    Soma

  9. #9
    Administrator webmaster's Avatar
    Join Date
    Aug 2001
    Posts
    1,012
    Guys, if you're going to continue this discussion, please do so without insulting each other, or I'll have to close the thread.

    Sanmayce, please re-upload the zip file without the osho.txt file in question so we can keep the link up. Copyright issues aside, your code sample does not need to come with a 200 MB txt file. It makes the download much larger than it has to be. The user can provide their own sample input.

  10. #10
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    O_o

    I know you like to go with a gentle hand, but I really think you should have removed the link.

    If he does post a version without the text he could have added an alternative link.

    Soma

  11. #11
    TEIAM - problem solved
    Join Date
    Apr 2012
    Location
    Melbourne Australia
    Posts
    1,907
    Any copywrite violations are against rule 6 for the forum

    Any "Solicitation of any product without the consent of the Administration is forbidden" (Rule 7)
    Fact - Beethoven wrote his first symphony in C

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. AES after block
    By EVOEx in forum Tech Board
    Replies: 2
    Last Post: 05-16-2010, 03:56 AM
  2. DVD ripper
    By afreedboy in forum Tech Board
    Replies: 6
    Last Post: 03-28-2004, 05:30 PM
  3. CD to MP3 Ripper
    By Davros in forum Windows Programming
    Replies: 7
    Last Post: 10-03-2002, 12:25 PM
  4. New Kid on the Block
    By danielthomas3 in forum Game Programming
    Replies: 2
    Last Post: 04-10-2002, 10:36 PM
  5. Help - New kid on the block
    By seli2565 in forum C Programming
    Replies: 2
    Last Post: 09-27-2001, 09:00 PM