Who else are they going to get the algorithm from?
Who else are they going to get the algorithm from?
PageRank is a completely separated product from a search engine. It can in fact be applied to other types of searches, not just web searching, since the "link" element at the center of the PageRank methodology is itself an abstraction of a more general concept that can be named of "voting mechanism". With this in mind, it is easy to apply PageRank to other search types, on most cases by simply renaming the abstraction.
Now, your earlier comment that PageRank is the essence of the web search algorithm is, of course, entirely incorrect. PageRank acts after the search engine. The essence of the web search engine is the web search engine, or if you want the query algorithms in place. Namely the text matching algorithms applied (on the case of Google and many others, latent semantic indexing) along with the indexing and crawling algorithms that all together try to guarantee a comprehensive and relevant list of results. PageRank acts at the end of this process, or at the very best at the end of every individual result iteration. PageRank may be the visible portion of the search engine, but as many other attempts have shown in the past, it is useless if the query algorithms are of low quality. Because the only thing PageRank does is order the results list.
Meanwhile, Google's PageRank maths and methodologies are well defined and known. They have been published back in 1998 by Brin and Page and you can learn all about it here (some of the references on that paper are essential reading). As Whiteflags hinted at you, you don't go anywhere else to develop your PageRank algorithm(s). Doing that would be a useless exercise of reinventing the wheel.
So, with all this in mind, let's continue -- this time in the hypothetical scenario where web searching was administered and controlled by an open standard through a non-profit international web-centric organization.
Here, the actual development of the PageRank algorithm(s) is not closed source. There is absolutely no requirement for it to be. What is definitely kept a secret is the values used to feed the weighting variables. That is the type of information that must be kept secret. Possibly along with some formulas. This type of information has nothing to do with source code and can easily be abstracted away from the developers. In fact, I happen to believe that a non-profit international web-centric organization offers better guarantees of secrecy. I say this because these organizations are, or can be, statutorily open for inspection by all manner of government, peer and even user "inspections", whereas privately owned commercial companies are not. So the act of willingly spilling information to interested parties for their unfair benefit is a lot more difficult.
But there are other areas where a non-profit organization operating on the basis of a community-driven projects could do better than Google. Let's see:
- Without the need for any commercial interest to be involved, we would already have had access to user-based PageRank definitions along with tools and methods to define our own PageRank preferences. This would be instrumental for some large businesses and power-users wanting to re-order their web search results based on their needs and requirements.
- Without any corporate image or subproduct placement concerns, we would have had already access to alternative search results presentation. Table listings, categorized listings, raw listings, all would have been made accessible to the final user.
- Without any manpower and know-how limitations, we would have had already access to PageRank alternative algorithms and methodologies. At the very least we would probably have already defined other possible roadmaps for the development of different indexing and query strategies.
Essentially, PageRank is a flawed process, sometimes even useless (* see links at the end). But it is the best we have. The question is, would it still be (flawed or the best) if the decisions behind it were not ultimately defined by a small panel of stockholders? The web searching panorama is today entirely defined by privately owned commercial interests and a extremely competitive market. Any investigation in this area has almost exclusively been moved to very high budget plans controlled by the private corporations involved and with very high levels of trade secrecy. This scenario has completely removed universities of the capacity to join in the investigation and all but entirely displaced any interest of universities or individual researchers in this area.
Users, meanwhile lose. It's not because Google presents passable or good search results that makes me happy. Conformism is not something I'm an adept at. Never was, never will be. And I'm proud of it.
And when there are even other issues involved like privacy and the constant intrusion of new features I don't want without the possibility to turn them off, is just going to add to my criticism. The fact you say I should go elsewhere solves nothing in terms of web searching because the current scenario is there is nowhere else to go. That type of passive-aggressive response can only come from conformists who get bitter every time someone questions their beloved status-quo and drive their satisfaction from the knowledge there's nothing better out there at the moment. It essentially advising someone to chose worst.
(*) ACSys TREC-8 Experiments, Measuring Search Engine Quality, Results and Challenges in Web Search Evaluation.
Last edited by Mario F.; 09-21-2010 at 04:19 PM.
Originally Posted by brewbuck:
Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.
"The Internet treats censorship as damage and routes around it." - John Gilmore