Thread: Cross-platforming Windows Code

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    You can implement efficient MT-queue's without the complexity and risk that comes with lock-free code. There is a windows example here: ReadDirectoryChangesW question
    You can replace the std::deque with whatever suits your needs. Here is a boost::circular_buffer example: Boost thread request queue design question


    I made some improvements to liblfds and submitted them to the author. There hasn't yet been a release since then so I figured I'd share my changes here. I can only speak for the freelist code - I didn't even look at the code for the other containers.
    Quote Originally Posted by gg_modifications.txt
    The diffs aren’t pretty. A lot of the code went through a “beautifier” (macro run out of habit).
    I didn’t hack any further than the freelist code, and just the freelist benchmark on the test side. Here’s the important stuff:

    ReadMe.txt
    - LFDS doesn’t actually support IA64. _InterlockedCompareExchange128 isn’t available for IA64 (I don’t think is has a true dcas instruction). There are references to IA64 on your Wikipedia too.

    Inc\liblfds.h
    - I wanted to consolidate all the “preprocessor know-how” into a single location – making it easier to recognize and implement various ports. I came up with the following standard defines:
    Compiler (currently supported):
    LIBLFDS_MSVC
    LIBLFDS_GCC_UNIX - GCC on *nix (not Cygwin, not MinGW)

    Environment (available headers, functions)
    LIBLFDS_WINDOWS - PSDK or DDK environment
    LIBLFDS_POSIX - Posix environment

    Pointer Width:
    LIBLFDS_32BIT
    LIBLFDS_64BIT

    Architecture (optional, add as needed):
    LIBLFDS_X86
    LIBLFDS_ARM
    - I added inline support for abstraction_[dcac|cas|increment] under gcc. Part of that included marking then extern here. (more on inline’ing under gcc below).
    - The FREELIST_COUNT_DCAS stuff was something I did for the freelist benchmark that allowed each thread to get a count of the number of dcas retries.

    src\liblfds_internal.h
    - Added include guards to headers that didn’t have em.
    - I moved some things from liblfds.h to here that seemed like they didn’t need to be “publicly exposed” – ALIGN, INLINE, some headers.
    - INLINE – This is used by abstraction_ [dcac|cas|increment]. The trick here is to get a publicly exposed symbol that external modules can link to, while having the compiler inline any internal calls to those functions (within the lib/dll/so). Under MSVC this is easy via “extern __forceinline”. Under gcc the easiest thing to do is to compile the inline functions as non-inline in their own translation unit, to be linked in with the lib/dll/so. But when the rest of the lib is calling those functions, they’re defined inline. To achieve this, abstraction_cas.c, abstraction_dcas.c, and abstraction_increment.c are all #include’d at the end of liblfds_internal.h. There is a new source file, abstraction_inlines.c, which just includes liblfds_internal.h with INLINE disabled. This is only compiled/linked under gcc.
    Here’s the reference material I used for this:
    http://gcc.gnu.org/gcc-4.3/porting_to.html
    http://www.greenend.org.uk/rjk/2003/03/inline.html
    - dcas_ptr – What I really wanted was a type that I could throw anywhere and know that it had correct alignment, without having to remember to use the ALIGN macro. This is what I came up with. All the dcas_ptr_x() macros are an ‘attempt’ to make typical usage a little more readable. Another advantage is that you can omit any packing on structures. The compiler just does the right thing since it knows the alignment requirements of ‘dcas_ptr’.

    src\abstraction\abstraction_ [dcac|cas|increment].c
    - Uses LIBLFDS_* directives.
    - Added #error if none of the ports within the file are selected for compilation. This seemed easier than trying to make liblfds.h a catch-all for platforms which may not have everything ported to it.

    src\freelist\freelist_internal.h
    - USE_ORIG_FREELIST – I had that in there just make it easier to test the difference between the dcas_ptr freelist, the version 6 freelist.
    - volatile isn’t needed on any of the freelist structure members. All “synchronization” is handled by dcas.
    - I originally confused “element_count” as a dynamic count. So I renamed it to “capacity”.

    src\freelist\freelist_pop_push.c
    - dcas_ptr versions of push/pop
    - atomic increment doesn’t scale well at all on x86, so I eliminated it. The trade-off is that both pop and push now perform a local, non-atomic increment within the DCAS loop. (In the end, I think there was only a small improvement in scalability on my Core i7 920. But may benefit other platforms even more.)

    test\src\abstraction.h
    - Added some new abstractions:
    - thread_condvar_t – This behaves just like a Win32 event. It can be signaled or non-signaled and threads can wait, signal, and reset. The benchmark code uses this as a sort of barrier to ensure that all created threads are at the same point before beginning. The signaling of the condvar starts the benchmark.
    - abstraction_thread_yield(ms) – Main benchmark thread uses this after creating threads to allow them to all to reach to the “starting point”.
    - thread_timer_t – I added this because I wanted more resolution than clock(). It uses QueryPerformanceCounter() on Windows and clock_gettime(CLOCK_REALTIME) under Posix.
    - (These need to implemented for DDK platform)

    test\src\abtraction_thread_start.c
    - Added thread priority to Windows, added thread affinity and priority to Posix (had to run via sudo under Linux for priority elevation)

    test\src\benchmark_freelist.c
    - My attempt at gathering and calculating metrics. Let me know if you see any bugs, or if I can clarify on what I thought I was doing.

    makefile.linux
    - Removed abstraction_cas.c, abstraction_dcas.c, abstraction_increment.c
    - Added abstraction_inlines.c
    - Added to base: “-pedantic -D_XOPEN_SOURCE=600 -D_GNU_SOURCE" (gnu source for sched_setaffinity)
    - Added to release: “-O3 -Winline –DNDEBUG”
    - Removed –s so I could get disassembly listings from gdb (to make sure stuff was getting inlined for real).

    test\src\makefile.linux
    - Added abstraction_timer.c, abstraction_thread_condvar.c
    - Added “-pthread –lrt” (Prefer –pthread over –lpthread. “rt” needed for timer stuff)


    The VS project/solution currently only does libs. But all the project settings should be good. You can tweak makefile.windows by examining the command line that the solution/project generates when compiling (particularly the optimization settings).

    Here are some x64 benchmark results from my core i7 920 box running Win7:

    Release 0 Freelist Benchmark #1
    1 Threads, 30000000 Operations, Times:
    CPU 0 = 536.06 ms, 55964.07 ops/ms
    Total Time = 536.09 ms
    Avg Thread Time = 536.06 ms
    Total Ops/ms = 55964.07 ops/ms
    Scalability = 100.00%
    Throughput = [Baseline]

    2 Threads, 30000000 Operations, Times:
    CPU 0 = 3391.45 ms, 8845.77 ops/ms
    CPU 2 = 3353.54 ms, 8945.78 ops/ms
    Total Time = 3391.47 ms
    Avg Thread Time = 3372.49 ms
    Total Ops/ms = 8895.50 ops/ms
    Scalability = 7.95%
    Throughput Drop = 15.90%

    3 Threads, 30000000 Operations, Times:
    CPU 0 = 6977.94 ms, 4299.26 ops/ms
    CPU 2 = 6869.21 ms, 4367.31 ops/ms
    CPU 4 = 6955.57 ms, 4313.09 ops/ms
    Total Time = 6977.97 ms
    Avg Thread Time = 6934.24 ms
    Total Ops/ms = 4326.36 ops/ms
    Scalability = 2.58%
    Throughput Drop = 48.64%

    4 Threads, 30000000 Operations, Times:
    CPU 0 = 10240.62 ms, 2929.51 ops/ms
    CPU 2 = 10237.61 ms, 2930.37 ops/ms
    CPU 4 = 10229.84 ms, 2932.60 ops/ms
    CPU 6 = 10200.86 ms, 2940.93 ops/ms
    Total Time = 10240.64 ms
    Avg Thread Time = 10227.23 ms
    Total Ops/ms = 2933.34 ops/ms
    Scalability = 1.31%
    Throughput Drop = 67.80%

    5 Threads, 30000000 Operations, Times:
    CPU 0 = 10320.27 ms, 2906.90 ops/ms
    CPU 1 = 9942.31 ms, 3017.41 ops/ms
    CPU 2 = 9283.74 ms, 3231.46 ops/ms
    CPU 4 = 9302.25 ms, 3225.03 ops/ms
    CPU 6 = 9313.67 ms, 3221.07 ops/ms
    Total Time = 10320.30 ms
    Avg Thread Time = 9632.45 ms
    Total Ops/ms = 3114.47 ops/ms
    Scalability = 1.11%
    Throughput Increase = 106.17% [6.17%]

    6 Threads, 30000000 Operations, Times:
    CPU 0 = 15157.39 ms, 1979.23 ops/ms
    CPU 1 = 15151.46 ms, 1980.01 ops/ms
    CPU 2 = 15161.49 ms, 1978.70 ops/ms
    CPU 3 = 15167.51 ms, 1977.91 ops/ms
    CPU 4 = 8447.68 ms, 3551.27 ops/ms
    CPU 6 = 8491.80 ms, 3532.82 ops/ms
    Total Time = 15167.53 ms
    Avg Thread Time = 12929.55 ms
    Total Ops/ms = 2320.27 ops/ms
    Scalability = 0.69%
    Throughput Drop = 74.50%

    7 Threads, 30000000 Operations, Times:
    CPU 0 = 19867.78 ms, 1509.98 ops/ms
    CPU 1 = 19700.22 ms, 1522.83 ops/ms
    CPU 2 = 20145.11 ms, 1489.20 ops/ms
    CPU 3 = 20149.48 ms, 1488.87 ops/ms
    CPU 4 = 20174.15 ms, 1487.05 ops/ms
    CPU 5 = 20185.63 ms, 1486.21 ops/ms
    CPU 6 = 7891.37 ms, 3801.62 ops/ms
    Total Time = 20185.65 ms
    Avg Thread Time = 18301.96 ms
    Total Ops/ms = 1639.17 ops/ms
    Scalability = 0.42%
    Throughput Drop = 70.65%

    8 Threads, 30000000 Operations, Times:
    CPU 0 = 23025.25 ms, 1302.92 ops/ms
    CPU 1 = 23645.38 ms, 1268.75 ops/ms
    CPU 2 = 23955.23 ms, 1252.34 ops/ms
    CPU 3 = 24038.93 ms, 1247.98 ops/ms
    CPU 4 = 24069.25 ms, 1246.40 ops/ms
    CPU 5 = 24072.82 ms, 1246.22 ops/ms
    CPU 6 = 23702.26 ms, 1265.70 ops/ms
    CPU 7 = 22637.75 ms, 1325.22 ops/ms
    Total Time = 24072.83 ms
    Avg Thread Time = 23643.36 ms
    Total Ops/ms = 1268.86 ops/ms
    Scalability = 0.28%
    Throughput Drop = 77.41%
    gg
    Attached Files Attached Files

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Windows and Mac C Programming Cross-Compatibility
    By Leo2010 in forum C Programming
    Replies: 11
    Last Post: 02-23-2010, 07:31 AM
  2. Cross-compiling a Windows command-line application
    By papagaio in forum C Programming
    Replies: 6
    Last Post: 08-29-2008, 02:38 AM
  3. Cross (compiler | OS) && reuseable code
    By audinue in forum C Programming
    Replies: 6
    Last Post: 07-22-2008, 12:46 PM
  4. Cross-platform code...
    By Sul in forum C++ Programming
    Replies: 10
    Last Post: 06-18-2004, 04:44 PM
  5. The Windows 95 source code
    By DavidP in forum A Brief History of Cprogramming.com
    Replies: 1
    Last Post: 09-23-2001, 07:42 PM