Thread: new license-free lock-free data structure library published

  1. #16
    Registered User
    Join Date
    Dec 2009
    Posts
    83
    Quote Originally Posted by Codeplug View Post
    One thing I see in freelist_push() is that you're limiting the scalability by using a static counter.
    Note this mistake is consistent through the code. I am correcting all of them.

  2. #17
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    >> You are correct and this in fact does happen - but not in an obvious way.
    I see it now. I had "InterlockedCompareExchange" interface in mind, where the comparand is input only.

    What really confused me was that _InterlockedCompareExchange128() actually updates the comparand. Looking at the MSVC-32bit DCAS version, the comparand update is plainly visible

    gg

  3. #18
    Registered User
    Join Date
    Dec 2009
    Posts
    83
    Quote Originally Posted by Codeplug View Post
    One thing I see in freelist_push() is that you're limiting the scalability by using a static counter.
    So, I've fixed this error, both in freelist and in the other data structures. I've also removed all uses of volatile, except for those variables which are the target of a CAS.

    I've run the resulting code through the test suite on my x64 machine (which doesn't prove much - x64 is extremely tolerant) and on an ARM machine (which has worth; the ARM machine easily showed up problems which passed perfectly on x64).

    I then ran the benchmarks.

    The results are fascenating.

    My system is a quad-core Q8200, the Yorkfield architecture. It's an ugly hybrid of a pair of two-core dies bonded together. Whenever you start doing work which crosses the die boundaries, performance just *dies*.

    What I've found now with these changes is that performance on one and two cores (e.g. inside a single die) has significantly improved; the single core is the same (within normal per-test variation), but the pair of cores definitely scales better.

    However, once I go to three and four cores, performance now is *worse* than it was before! a most interesting quirk of the Yorkfield architecture.

    These are the release 5 benchmarks;

    Release 5 Freelist Benchmark #1
    CPUs,total ops,mean ops/sec per CPU,standard deviation,scalability
    1,216393136,21639314,0,1.00
    2,181827358,9091368,40879,0.42
    3,60543702,2018123,463660,0.09
    4,39903510,997588,314877,0.05

    Release 5 Queue Benchmark #1
    CPUs,total ops,mean ops/sec per CPU,standard deviation,scalability
    1,84041026,8404103,0,1.00
    2,100810314,5040516,17001,0.60
    3,53060556,1768685,906705,0.21
    4,42842374,1071059,266028,0.13

    Release 5 Ringbuffer Benchmark #1
    CPUs,total ops,mean ops/sec per CPU,standard deviation,scalability
    1,63960672,6396067,0,1.00
    2,69074618,3453731,12567,0.54
    3,36511662,1217055,341319,0.19
    4,35115418,877885,169081,0.14

    Release 5 Stack Benchmark #1
    CPUs,total ops,mean ops/sec per CPU,standard deviation,scalability
    1,116709416,11670942,0,1.00
    2,129606270,6480314,23278,0.56
    3,55214018,1840467,801913,0.16
    4,43709956,1092749,252054,0.09

    These are the new benchmarks;

    Release 5 Freelist Benchmark #1
    CPUs,total ops,mean ops/sec per CPU,standard deviation,scalability
    1,190338908,19033891,0,1.00
    2,163459288,8172964,5322,0.43
    3,48031814,1601060,303228,0.08
    4,34983074,874577,82303,0.05

    Release 5 Queue Benchmark #1
    CPUs,total ops,mean ops/sec per CPU,standard deviation,scalability
    1,87205572,8720557,0,1.00
    2,112495368,5624768,1905,0.65
    3,41929686,1397656,35002,0.16
    4,36285572,907139,58588,0.10

    Release 5 Ringbuffer Benchmark #1
    CPUs,total ops,mean ops/sec per CPU,standard deviation,scalability
    1,62880558,6288056,0,1.00
    2,73604908,3680245,2080,0.59
    3,34698028,1156601,65504,0.18
    4,33581892,839547,699,0.13

    Release 5 Stack Benchmark #1
    CPUs,total ops,mean ops/sec per CPU,standard deviation,scalability
    1,114887260,11488726,0,1.00
    2,137956766,6897838,2134,0.60
    3,46388738,1546291,556806,0.13
    4,39502716,987568,409,0.09

    My next job is to run the benchmark on the ARM quad-core machine I have access to and ask for permission to publish those benchmarks here.

    Codeplug, your peer review has been of extraordinary value. I've created a Contributors page on the wiki - ARM are in there, for providing an account on one of their machines; I'd like to add you. Would that be okay? and if so, what name would you care to go by there?
    Last edited by Toby Douglass; 12-21-2009 at 12:59 PM.

  4. #19
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    "Codeplug" is fine

    One thing that comes to mind (in terms of performance) is that MSVC added additional semantics to volatile in version 14 (VS2005). In that version, a volatile-read generates an acquire membar, and a volatile-write generates a release membar.

    I've always assumed that this meant some kind of membar is generated for x64, but when I build for x64 and "generate assembly mixed with source" - I don't see any membar-like instructions for volatile-read/writes (I only looked at freelist_pop though). (They are definitely there for IA64.)

    I've pretty much re-done the solution and projects so there are just "Release" and "Debug" targets. Along the way, I had to reset things up - like adding "_DEBUG" for Debug builds, and "NDEBUG" for Release builds. Without the "NDEBUG", all those asserts() were still in place.

    Generating assembly with source can really help (how I found assert() was being called in my release build). For GCC, I use "gcc -Wa,-alh,-L" to generate assembly with source.

    gg

  5. #20
    Registered User
    Join Date
    Dec 2009
    Posts
    83
    Quote Originally Posted by Codeplug View Post
    "Codeplug" is fine
    'tis done :-)

    One thing that comes to mind (in terms of performance) is that MSVC added additional semantics to volatile in version 14 (VS2005). In that version, a volatile-read generates an acquire membar, and a volatile-write generates a release membar.
    Yup, I read about that.

    I've always assumed that this meant some kind of membar is generated for x64, but when I build for x64 and "generate assembly mixed with source" - I don't see any membar-like instructions for volatile-read/writes (I only looked at freelist_pop though). (They are definitely there for IA64.)
    It's not necessary on that CPU. The thing is, x86/x64 really are doing VERY little in terms of re-ordering. The natural limits on what they will do actually means that acquire/release behaviour happens *normally* in a wide range of cases - which means the compiler doesn't actually have to generate any extra code at all. It knows, a priori, given the CPU, that the users requirement is being honoured.

    I've pretty much re-done the solution and projects so there are just "Release" and "Debug" targets.
    How do you mean? no lib and dll any more? just lib?

    Along the way, I had to reset things up - like adding "_DEBUG" for Debug builds, and "NDEBUG" for Release builds. Without the "NDEBUG", all those asserts() were still in place.
    Yes...I'm not an MSVC user. I use gnumake and directly access the MS C compiler command line. When I came to produce the solution files, well, the last MSVC I used was version 6 - which knew what DLLs and libs were and offered you them as project target choices. No such choice was offered to me here - so in the end, I opted for clean projects and configured them fully.

    Would you mind sending me copies of your solution files? the email address is on the liblfds web-site, 'contact' page (rightmost tab on the root page).

    Generating assembly with source can really help (how I found assert() was being called in my release build). For GCC, I use "gcc -Wa,-alh,-L" to generate assembly with source.
    Unfortunately, I don't know assembly :-/ I mean, twenty years of C, I know enough to implement CAS on a new CPU in inline assmebly, but it's not like I read and write the stuff on a regular basis.
    Last edited by Toby Douglass; 12-22-2009 at 04:01 AM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Tab Controls - API
    By -KEN- in forum Windows Programming
    Replies: 7
    Last Post: 06-02-2002, 09:44 AM
  2. gcc problem
    By bjdea1 in forum Linux Programming
    Replies: 13
    Last Post: 04-29-2002, 06:51 PM
  3. File Database & Data Structure :: C++
    By kuphryn in forum C++ Programming
    Replies: 0
    Last Post: 02-24-2002, 11:47 AM
  4. Serial Communications in C
    By ExDigit in forum Windows Programming
    Replies: 7
    Last Post: 01-09-2002, 10:52 AM
  5. vector static abstract data structure in C?
    By Unregistered in forum C Programming
    Replies: 2
    Last Post: 11-05-2001, 05:02 PM