Thread: Fast(est) way to manipulate memory

  1. #16
    Registered User
    Join Date
    Oct 2007
    Posts
    166
    Quote Originally Posted by Elysia View Post
    Yes, you are. A class + malloc = undefined behavior.
    And I have already answered your question: use a profiler.
    I have said I have timed different versions of the code, you are just going to have to take my word for it. Go with the example in the initial post if you find the if statements in the other example too horrific.

  2. #17
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    It doesn't matter if you time it. All you know is that it's slower or faster. But you don't know WHAT is faster or slower.
    You need to find the hotspots in your code before optimizing. As we have said.
    How do we do that? We use a profiler.

    After we make a change, THEN we time.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  3. #18
    Registered User
    Join Date
    Oct 2007
    Posts
    166
    Quote Originally Posted by Elysia View Post
    It doesn't matter if you time it. All you know is that it's slower or faster. But you don't know WHAT is faster or slower.
    You need to find the hotspots in your code before optimizing. As we have said.
    How do we do that? We use a profiler.

    After we make a change, THEN we time.
    Well if I'm only doing one thing in the loop that that is what is slow. Like I said I have run the basic loop separately and found it takes too much time.

  4. #19
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    If you're not going to listen, then good luck on your own.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  5. #20
    Registered User
    Join Date
    Oct 2007
    Posts
    166
    Quote Originally Posted by Elysia View Post
    If you're not going to listen, then good luck on your own.
    You are asking me to profile my whole program? That is not what I'm asking for. How is timing how long something takes not profiling? I have profiled and come to the conclusion that it is the part in the initial post that would benefit the most from being faster in this function.

  6. #21
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Why so you think your manual timing is better than a profiler?
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  7. #22
    Registered User
    Join Date
    Oct 2007
    Posts
    166
    Quote Originally Posted by Elysia View Post
    Why so you think your manual timing is better than a profiler?
    It is sufficient. If the whole loop takes as an example 16 milliseconds and when I run only the assigning memory part it takes 15 milliseconds then I'm smart enough to figure out what is taking the most time.

    Thing also is that my program is a plugin and I do not have a debug version of the main program so I only do release builds. I'm thinking profilers need debug builds.
    Last edited by DrSnuggles; 03-14-2011 at 05:33 AM.

  8. #23
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Profilers do not need debug builds, though it certainly makes it easier.
    Also, do you know what part of the loop takes the most time?
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  9. #24
    Registered User
    Join Date
    Oct 2007
    Posts
    166
    Quote Originally Posted by Elysia View Post
    Profilers do not need debug builds, though it certainly makes it easier.
    Also, do you know what part of the loop takes the most time?
    Yes in my example I'm only doing one thing, assigning memory. Obviously the loop cases where I'm blending colors will take longer than just assigning the colors but that is a separate issue. In this perticular thread I'm wondering if something can be done to improve the speed of memory assignment/access.

  10. #25
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    > Right now it takes about 15 milliseconds to update a 2048x2048 image including a few calculations in my loop.
    Which equates to 800MB/Sec, just for assigning RGB values.
    How close is this to the sustained memory throughput of your system? Unless it's like less than 10%, then writing it in asm isn't going to get past this underlying physical reality.

    You mentioned later that you only update parts of the image. The best optimisation you can make is to not do something at all.
    Are your sub-rectangles as small as possible?
    Do they all change at the same rate? Is there any possibility to cache partial blends of the less frequently changing stuff?

    It's like trying to optimise bubble sort by focussing on strcpy. Unless you understand the full context of the code you're trying to improve, focussing on the bit in the middle won't do you a lot of good.

    Since your struct is basically 4 bytes, consider using bitwise operators to unpack and pack RGB values into a 32-bit number. It's more code fiddle, but the relatively slow memory accesses (compared to register access times) would be a single long-word read/write rather than 6 bytes.
    Whether this has any effect depends on how well your L1/L2 cache manages byte accesses.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  11. #26
    Registered User
    Join Date
    Oct 2007
    Posts
    166
    Quote Originally Posted by Salem View Post
    > Right now it takes about 15 milliseconds to update a 2048x2048 image including a few calculations in my loop.
    Which equates to 800MB/Sec, just for assigning RGB values.
    How close is this to the sustained memory throughput of your system? Unless it's like less than 10%, then writing it in asm isn't going to get past this underlying physical reality.
    That could very well be the case. Not sure about what memory bandwidth I have. It would be interesting to try asm though but perhaps it is not trivial. Perhaps I can go bug some people in an assembly forum.
    You mentioned later that you only update parts of the image. The best optimisation you can make is to not do something at all.
    Are your sub-rectangles as small as possible?
    Do they all change at the same rate? Is there any possibility to cache partial blends of the less frequently changing stuff?
    Yes for the most parts it is fairly optimized. I have to keep the memory imprint as low as possible so storing values per pixel quickly eats up memory but you gave me an idea about possibly precalculating some blending values per layer.
    It's like trying to optimise bubble sort by focussing on strcpy. Unless you understand the full context of the code you're trying to improve, focussing on the bit in the middle won't do you a lot of good.
    It is all my code which I have profiled a lot so I know where the performance bottlenecks are. This perticular function is one of the most costly and the assigning of memory for large images is costly.
    Since your struct is basically 4 bytes, consider using bitwise operators to unpack and pack RGB values into a 32-bit number. It's more code fiddle, but the relatively slow memory accesses (compared to register access times) would be a single long-word read/write rather than 6 bytes.
    Whether this has any effect depends on how well your L1/L2 cache manages byte accesses.
    What do you mean 6 byte read/write? I'm only using 4.

    edit: I see what you mean but in the cases where I don't need to assign rgba values seperatly I just transfer the 4 bytes in one go as you see in the second example. In the cases where I do set rgba values seperatley it might be better to fill in an inbetween variable and assign to memory once but I doubt it. I'll try that.
    Last edited by DrSnuggles; 03-14-2011 at 07:20 AM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Memory Fragmentation with Dynamic FIFO Queue
    By fguy817817 in forum Linux Programming
    Replies: 17
    Last Post: 10-31-2009, 04:17 AM
  2. Replies: 4
    Last Post: 01-13-2008, 02:14 AM
  3. Question regarding Memory Leak
    By clegs in forum C++ Programming
    Replies: 29
    Last Post: 12-07-2007, 01:57 AM
  4. Memory problem with Borland C 3.1
    By AZ1699 in forum C Programming
    Replies: 16
    Last Post: 11-16-2007, 11:22 AM
  5. Shared Memory - shmget questions
    By hendler in forum C Programming
    Replies: 1
    Last Post: 11-29-2005, 02:15 AM