Thread: Facebook invents radical new technology... a compiler

  1. #16
    Registered User MutantJohn's Avatar
    Join Date
    Feb 2013
    Posts
    2,665
    Quote Originally Posted by Codeplug View Post
    >> Fun details like when to garbage collect are left to the reader.
    I take that back. I didn't read the claims.

    Here is a typical atomic operation that may be used: https://en.wikipedia.org/wiki/Compare-and-swap

    gg
    Funny you mention that.

    I'm looking at this : https://www.cs.cmu.edu/~410-s05/lect...1_LockFree.pdf

    and they also use an atomicCAS() operation but it's in a do-while loop. How is a do-while loop more efficient than a locking mechanism? Sure, there's a larger storage overhead using a mutex per node but RAM is dirt cheap so it's almost a non-issue.

  2. #17
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    A full locking mechanism implies a system of wait queues to allow "blocking" of threads and "waking" of threads ... this is done in your operating system. So there is a context switch into the OS, and your thread being suspended by the OS while "blocking".

    Compare that to these CPU instructions:
    Code:
    Do 
    {
        cas(...)
    } Until (<my cas() was successful>)
    gg

  3. #18
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    Except that the linked-lists are concurrent capable.
    O_o

    The bloated language describing the sweep is exactly where I have a problem.

    Sure, there's a larger storage overhead using a mutex per node but RAM is dirt cheap so it's almost a non-issue.
    Do you think memory is the only resource?

    Soma
    “Salem Was Wrong!” -- Pedant Necromancer
    “Four isn't random!” -- Gibbering Mouther

  4. #19
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    I'm going to implement a generic statistical hypothesis test. And I have decided to implement one mutex per sample data point. With 8GB, 5 of which free, it will work great on a 1 million data point sample space. Whoosh! That fast.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  5. #20
    Registered User MutantJohn's Avatar
    Join Date
    Feb 2013
    Posts
    2,665
    Quote Originally Posted by Codeplug View Post
    A full locking mechanism implies a system of wait queues to allow "blocking" of threads and "waking" of threads ... this is done in your operating system. So there is a context switch into the OS, and your thread being suspended by the OS while "blocking".

    Compare that to these CPU instructions:
    Code:
    Do 
    {
        cas(...)
    } Until (<my cas() was successful>)
    gg
    This is a good point. I like it.

    Quote Originally Posted by phantomotap View Post
    O_o

    Do you think memory is the only resource?

    Soma
    No, but the most apparent overhead to me was memory. What are some others that you were thinking of? Was it what Codeplug already mentioned? I'm genuinely asking here.

    Quote Originally Posted by Mario F. View Post
    I'm going to implement a generic statistical hypothesis test. And I have decided to implement one mutex per sample data point. With 8GB, 5 of which free, it will work great on a 1 million data point sample space. Whoosh! That fast.
    I feel like there's a lesson for me somewhere in here but I'm not sure... Can you expand?

  6. #21
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Quote Originally Posted by MutantJohn View Post
    I feel like there's a lesson for me somewhere in here but I'm not sure... Can you expand?
    The only lesson is don't ever think of trade-offs as simple arithmetic. Also, in multithreading there is rarely an universal truth (well, perhaps except that threads are evil).

    Instead, what I was trying to say (not as a lesson) is that if I implement such a system I would bog it down to a crawl, almost certainly even crash it.

    It all depends on the implementation. But a mutex, to be effective, needs to check conditions, common implementations even raise exceptions. There's a considerable processing overhead to a mutex. There used to be an interesting chart listing some common mutex implementations with some comparative performance notes. Can't seem to find it anymore.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  7. #22
    Registered User MutantJohn's Avatar
    Join Date
    Feb 2013
    Posts
    2,665
    Interesting... Now I must begin my search for a lock-free doubly linked list...

  8. #23
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Quote Originally Posted by MutantJohn View Post
    Interesting... Now I must begin my search for a lock-free doubly linked list...
    Why would you? One of the arts of concurrency is to decide how you should partition your data and design your processes as to take the most advantage of what the OS and the PL offer you. And you are given a modest amount of information that allows you to make sensible choices. It is not about abandoning it because you want make 20 mutexes where 2 or 3 would suffice.

    Whereas in comparison, implementing a lock-free doubly-linked list will have you spend the next two months on guess work and having little to no idea if whatever you chose in the end is actually the best you could do.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  9. #24
    Registered User MutantJohn's Avatar
    Join Date
    Feb 2013
    Posts
    2,665
    Quote Originally Posted by Mario F. View Post
    Why would you? One of the arts of concurrency is to decide how you should partition your data and design your processes as to take the most advantage of what the OS and the PL offer you. And you are given a modest amount of information that allows you to make sensible choices. It is not about abandoning it because you want make 20 mutexes where 2 or 3 would suffice.

    Whereas in comparison, implementing a lock-free doubly-linked list will have you spend the next two months on guess work and having little to no idea if whatever you chose in the end is actually the best you could do.
    This is a good point.

    Would it instead be better to write the list in pages of memory? Something like,
    Code:
    std::vector< node_buffer > list_pages
    ? This way, we'd only need a number of mutexes equal to the number of pages or in this case,
    Code:
    list_pages.size()
    .

    Is this what Codeplug was alluding to in his post about bucket hashing?

  10. #25
    Registered User Alpo's Avatar
    Join Date
    Apr 2014
    Posts
    877
    Facebook is horrible... Also in breaking news, "Is water wet?", find out what top scientists have to say on News At 9, here on KBSPL!
    WndProc = (2[b] || !(2[b])) ? SufferNobly : TakeArms;

  11. #26
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Quote Originally Posted by MutantJohn View Post
    Would it instead be better to write the list in pages of memory? Something like,
    Code:
    std::vector< node_buffer > list_pages
    ? This way, we'd only need a number of mutexes equal to the number of pages or in this case,
    Code:
    list_pages.size()
    .
    I wouldn't know. That is not the point of multithreading. Multithreading is a code optimization technique. Nothing more.

    Because of this, you don't want to generalize solutions. Any multithreading solution should be evaluated against your specific set of conditions. What is the expected density of additions to your list? What are the side effects? What is the nature of the data managed by your linked list? Under what conditions should your code increase or decrease the threads running on that linked list? This and many other questions you answer only on the field, with live code and experimentation.

    You can certainly have an intuition of a best scenario. And try it out. It often ends up playing well. But if instead you try to come up with a generalization, you must know it will always fail on those edge and corner cases. And these are more common then you might think. A good rule of thumb is that any multithreaded code has usually a very low range of acceptable parameters.

    And also because multithreading is just another optimization technique, you need to evaluate if the effort pays. That is, if you really stand anything to gain from an increase in performance or whether the measured increase in performance offers you an advantage to justify the a) work involved, b) increase in the code complexity and c) increase of the expected value of bugs in your code because of b.

    One of the most damaging things you can do to your code is not evaluating it against your actual needs. I've seen (literally, not figuratively) people swoon in ecstasy over a 0.05 second gain per iteration. Only, their code is expected to run at most 10 iterations when the user clicks a button. That means, they worked for one or two full days on a multithreading problem, increased their code maintenance complexity, possibly introduced some new bugs, so that the end user gains 0.5 seconds after clicking a Save Document button.

    On a blog I used to maintain a few years back, I once called them the "false prophets of performance". And these people are everywhere. They make a large chunk of the programming world and are responsible for the increased complexity of code. They respect the notion of too much optimization is evil, but forget threading is an optimization technique and only an optimization technique, easily falling prey to over-optimization or unnecessary optimization.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  12. #27
    Unregistered User Yarin's Avatar
    Join Date
    Jul 2007
    Posts
    2,158
    Quote Originally Posted by Alpo View Post
    Facebook is horrible... Also in breaking news, "Is water wet?", find out what top scientists have to say on News At 9, here on KBSPL!
    Breaking news!

  13. #28
    Registered User MutantJohn's Avatar
    Join Date
    Feb 2013
    Posts
    2,665
    Actually Mario, that's a good point. A multithreaded linked list is dumb. Linked lists typically work best when you're removing/inserting nodes constantly and they work even better when you use a pool allocator or rather something that will constantly reuse the same section of memory (so you're using placement new(), for example). It sounds like they're be a lot of contention between threads, even if you did use a lockless implementation. They also need to be small because doing a look-up is O( n ) without some form of hashing so for a large list, probing for an element is expensive.

    Multithreading is fun but you're right, it's not always the solution. Hmm... Good perspective. Sometimes I get caught up in doing something just for the sake of doing it because I like parallelizing code.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. borland c++ facebook api
    By hbfriedman in forum C++ Programming
    Replies: 3
    Last Post: 01-08-2014, 05:09 PM
  2. facebook
    By BobMcGee123 in forum A Brief History of Cprogramming.com
    Replies: 0
    Last Post: 09-20-2006, 04:49 AM
  3. Radical Sign
    By Iceboy152 in forum C++ Programming
    Replies: 2
    Last Post: 03-02-2006, 10:08 AM
  4. rochester institude of technology and wentworth institute of technology
    By Shadow12345 in forum A Brief History of Cprogramming.com
    Replies: 24
    Last Post: 02-20-2003, 11:41 AM
  5. radical education reform
    By Aran in forum A Brief History of Cprogramming.com
    Replies: 13
    Last Post: 05-20-2002, 09:20 PM