Thread: The core of the core of the big data solutions -- Map

  1. #1
    Registered User
    Join Date
    Mar 2015
    Posts
    2

    The core of the core of the big data solutions -- Map

    Title: The core of the core of the big data solutions -- Map
    Author: pengwenwei
    Email: [email protected]
    Language: c++
    Platform: Windows, linux
    Technology: Perfect hash algorithm
    Level: Advanced
    Description: Map algorithm with high performance
    Section MFC c++ map stl
    SubSection c++ algorithm
    License: (GPLv3)

    Download demo project - 1070 Kb
    Download source - 1070 Kb

    Introduction:
    For the c++ program, map is used everywhere.And bottleneck of program performance is often the performance of map.Especially in the case of large data,and the business association closely and unable to realize the data distribution and parallel processing condition.So the performance of map becomes the key technology.

    In the work experience with telecommunications industry and the information security industry, I was dealing with the big bottom data,especially the most complex information security industry data,all can’t do without map.

    For example, IP table, MAC table, telephone number list, domain name resolution table, ID number table query, the Trojan horse virus characteristic code of cloud killing etc..

    The map of STL library using binary chop, its has the worst performance.Google Hash map has the optimal performance and memory at present, but it has repeated collision probability.Now the big data rarely use a collision probability map,especially relating to fees, can’t be wrong.

    Now I put my algorithms out here,there are three kinds of map,after the build is Hash map.We can test the comparison,my algorithm has the zero probability of collision,but its performance is also better than the hash algorithm, even its ordinary performance has no much difference with Google.

    My algorithm is perfect hash algorithm,its key index and the principle of compression algorithm is out of the ordinary,the most important is a completely different structure,so the key index compression is fundamentally different.The most direct benefit for program is that for the original map need ten servers for solutions but now I only need one server.
    Declare: the code can not be used for commercial purposes, if for commercial applications,you can contact me with QQ 75293192.
    Download:
    https://sourceforge.net/projects/pwwhashmap/files

    Applications:
    First,modern warfare can’t be without the mass of information query, if the query of enemy target information slows down a second, it could lead to the delaying fighter, leading to failure of the entire war. Information retrieval is inseparable from the map, if military products use pwwhashMap instead of the traditional map,you must be the winner.

    Scond,the performance of the router determines the surfing speed, just replace open source router code map for pwwHashMap, its speed can increase ten times.
    There are many tables to query and set in the router DHCP ptotocol,such as IP,Mac ,and all these are completed by map.But until now,all map are using STL liabrary,its performance is very low,and using the Hash map has error probability,so it can only use multi router packet dispersion treatment.If using pwwHashMap, you can save at least ten sets of equipment.

    Third,Hadoop is recognized as the big data solutions at present,and its most fundamental thing is super heavy use of the map,instead of SQL and table.Hadoop assumes the huge amounts of data so that the data is completely unable to move, people must carry on the data analysis in the local.But as long as the open source Hadoop code of the map changes into pwwHashMap, the performance will increase hundredfold without any problems.


    Background to this article that may be useful such as an introduction to the basic ideas presented:
    完美哈希函数(Perfect Hash Function) - chixinmuzi的专æ - åšå®¢é¢‘é“ - CSDN.NET

  2. #2
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    Technology transfer $2000000000.
    O_o

    Cheap at twice the price, I'm sure.

    Soma
    “Salem Was Wrong!” -- Pedant Necromancer
    “Four isn't random!” -- Gibbering Mouther

  3. #3

  4. #4
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    I really don't know if anyone is going to take the claims seriously, but the poster seems to genuinely believe his/her claims, so I shall move this to Tech Board.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  5. #5
    Registered User MutantJohn's Avatar
    Join Date
    Feb 2013
    Posts
    2,665
    Is there like a paper that outlines everything?

  6. #6
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Quote Originally Posted by laserlight View Post
    I really don't know if anyone is going to take the claims seriously,
    I won't. Such bold claims require an actual paper describing the algorithm and the methodology. Plus the obligatory performance graphs along with a link to the code (which is the only thing the OP provided, if that is indeed the code).

    Besides, I don't know what the author is on about:

    1. Qualifying a generic hash algorithm as 'Perfect' is a contradiction in terms. The 'perfect' hash algorithm would have to know beforehand everything about the data. Which essentially means that the perfect hash is the data itself.

    2. Most big data database engines don't use standard hash algorithms for search purposes. They won't definitely use binary search trees and they won't definitely use of the C++ map container.

    3. Big data manipulation and transformation should happen entirely in the database layer, not on the application layer. That's what those databases were made for. The application layer only extracts data sets. So the performance of C++ map is irrelevant for the whole problem domain.

    EDIT:
    I should have read the OP more carefully. Would have saved me the wasted time replying to this complete nonsense.

    my algorithm has the zero probability of collision


    the most important is a completely different structure,so the key index compression is fundamentally different.

    The most direct benefit for program is that for the original map need ten servers for solutions but now I only need one server.

    if military products use pwwhashMap instead of the traditional map,you must be the winner.


    the performance of the router determines the surfing speed, just replace open source router code map for pwwHashMap, its speed can increase ten times.

    There are many tables to query and set in the router DHCP ptotocol,such as IP, Mac ,and all these are completed by map .But until now,all map are using STL liabrary,its performance is very low,and using the Hash map has error probability,so it can only use multi router packet dispersion treatment.If using pwwHashMap, you can save at least ten sets of equipment.
    Last edited by Mario F.; 03-30-2015 at 07:06 PM.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  7. #7
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    Such bold claims require an actual paper describing the algorithm and the methodology.
    O_o

    ... at least some code.

    *shrug*

    I'm not sure the code would do anything to convince me in any event.

    The examples kind of imply index generation, and a lot of the implementation makes naive assumptions.

    Soma
    “Salem Was Wrong!” -- Pedant Necromancer
    “Four isn't random!” -- Gibbering Mouther

  8. #8
    Registered User Alpo's Avatar
    Join Date
    Apr 2014
    Posts
    877
    Quote Originally Posted by phantomotap View Post
    O_o

    Cheap at twice the price, I'm sure.

    Soma
    Hey, it could be as little as 430K dollars assuming the currency type is Paraguayan. I left my wallet at work though (I work in outer space, so it might be a while... :P)
    WndProc = (2[b] || !(2[b])) ? SufferNobly : TakeArms;

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Core Developer c++ need!
    By stefanbojan in forum Projects and Job Recruitment
    Replies: 0
    Last Post: 04-05-2010, 05:08 PM
  2. hi to over come with core dump
    By vijay85 in forum C Programming
    Replies: 9
    Last Post: 01-14-2009, 11:59 AM
  3. Core Dump in While()
    By KrepNatas in forum C Programming
    Replies: 5
    Last Post: 05-17-2005, 11:15 AM
  4. core dump
    By kermit in forum Linux Programming
    Replies: 0
    Last Post: 08-03-2004, 06:25 PM
  5. vc++ core dsk
    By name|ess` in forum Windows Programming
    Replies: 2
    Last Post: 06-15-2004, 05:32 PM