memory reading vs. writing

This is a discussion on memory reading vs. writing within the C++ Programming forums, part of the General Programming Boards category; I'm making a lookup table (1 x 256 array of unsigned chars) to record all unique pixel values found in ...

  1. #1
    Registered User
    Join Date
    Feb 2003
    Posts
    595

    memory reading vs. writing

    I'm making a lookup table (1 x 256 array of unsigned chars) to record all unique pixel values found in an image. I initialize the array to all 0s; then I want to write 1 to each element where the array index equals a pixel value in the image.

    I can simply scan the image and write 1 to the array every time, resulting in many unnecessary writes since if, for example, the value 155 appears 15,000 times in the image I'm writing 1 to table[155] 15,000 times.

    Or, I can put the assignment in an if statement and write a 1 only if that table entry is 0, so when 155 is read from the image, I test table[155], write 1 to table[155] only 1 time, but subsequently test it in the if statement 14,999 additional times.

    Which takes less time?

  2. #2
    Registered User Kernel Sanders's Avatar
    Join Date
    Aug 2008
    Posts
    61
    It would depend on the hardware, but a 256 byte array can easily fit into a cache, so I'd have to guess that reading would be faster. I'm not sure what effect the addition of a branch would have though

  3. #3
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,335
    Profile it and find out.

    Is your "memory" flash memory with disproportionately large write times compared to read times (for example), or is it your regular DRAM?

    Or how good is your processor at branch prediction, which your alternate code will have an awful lot of?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  4. #4
    3735928559
    Join Date
    Mar 2008
    Location
    RTP
    Posts
    839
    the if statement still needs to index that element in the array, so it seems to me what you're really asking is if unsigned char::++ or unsigned char::= is faster than if(unsigned char)

    should be easy enough to test.

    set up two loops that do each of the two operations a few thousand times and compare the times required to complete each of the loops.

    personally i'd just go for the former method you mentioned, as that is a straightforward lead in to making a histogram, which will likely be useful in the future.

  5. #5
    Registered User
    Join Date
    Feb 2003
    Posts
    595
    Actually I have several platforms in mind -- all pretty low-end when it comes to power...

    a robot with a mips7000 processor and sdram
    a robot with a hitachi h8s processor and flash memory
    various pc's/laptops with celeron and several-years-old pentium or athlon processors (none 64 bit).

    Sounds like maybe I'll have to test each one. I was just wondering to what extent I might generalize. For example, Salem's comment about flash memory suggests that the "if" approach is likely to be better for the hitachi.

    I don't know anything about the branch prediction capabilities of these processors.

  6. #6
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,335
    Certainly with such a range, I doubt a single answer would be the best in all cases.

    But before you catch a really severe case of "premature optimisation disease", how have you determined that this is a real bottleneck?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  7. #7
    Registered User
    Join Date
    Feb 2003
    Posts
    595
    Quote Originally Posted by Salem View Post
    But before you catch a really severe case of "premature optimisation disease", how have you determined that this is a real bottleneck?
    The short answer is that I haven't. But I'm writing image-processing code for computer vision applications that will run on "underpowered" platforms, so as I proceed I'm trying to keep optimization in mind. This particular instance involves a table that in some apps may have to be rebuilt (with new values) 15 - 30 times per second. I haven't even determined that it's possible on this hardware, but I want to be sure that I don't sabotage myself with sloppy code from the outset.

    In addition, this question is general enough that it can arise in any number of applications so it seemed worth thinking about even if it turns out not to be an issue immediately.
    Last edited by R.Stiltskin; 08-19-2008 at 01:19 PM. Reason: to expand reply

  8. #8
    3735928559
    Join Date
    Mar 2008
    Location
    RTP
    Posts
    839
    sounds more and more like a histogram is going to be of use at some point.

  9. #9
    Registered User
    Join Date
    Feb 2003
    Posts
    595
    Quote Originally Posted by m37h0d View Post
    sounds more and more like a histogram is going to be of use at some point.
    Not necessarily, and the original question is also valid in apps that have nothing to do with histograms so it seems worth considering by itself.

  10. #10
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,335
    > This particular instance involves a table that in some apps may have to be rebuilt (with new values) 15 - 30 times per second
    Example:
    -----
    Lets say QVGA (320x240=76800) x 15 = 1152000 bytes of pixel data / second.
    If you have a 50Mhz machine, you've got 50 ticks to process an image, and leave no capacity for anything else. So I'm guessing no more than 20 would be good.
    -----


    Whatever the results are for your situation, it might give you some idea as to whether
    - no problem
    - could be a problem that needs special attention
    - it's hopeless, we need hand crafted assembler to come even close to this!

    Oh, I've been digging in the H8 manual.
    > I test table[155], write 1 to table[155] only 1 time, but subsequently test it in the if statement 14,999 additional times.
    If you don't care which bit is set, then the "TAS" instruction seems to be the fastest way of achieving this. Now all you need do is figure out which code causes the compiler to generate that code.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  11. #11
    Registered User
    Join Date
    Feb 2003
    Posts
    595
    Never say hopeless. Worst case, I have to process fewer images & discard the rest.


    Quote Originally Posted by Salem View Post
    >Oh, I've been digging in the H8 manual....Now all you need do is figure out which code causes the compiler to generate that code.
    Here is a clear example of "premature optimisation disease".

    I hadn't planned to get to that level of detail at this stage. But it's a useful indication of what may be involved in answering the original question. Your interest is appreciated.

  12. #12
    3735928559
    Join Date
    Mar 2008
    Location
    RTP
    Posts
    839
    btw, not to keep plugging this forum, but the DSP guys at kvr.org focus a lot on optimization. they might be useful also.

  13. #13
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,261
    It sounds like you're doing some palette reduction, but since you've only told us about one little loop we can't see the bigger picture and as such can't point out an algorithm that would give a speed boost of some order of magnitude. We're instead forced to micro-analyse one little loop in the hopes of boosting the performance of just that loop by perhaps 10%.
    Please tell us the BIG picture!
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  14. #14
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    To answer the original question: In most processors, read and write operations take the same amount of time, so to read first, then maybe update will be slower.

    Writing to flash memory is not just a write, you'd have to have a small function that writes several commands to the flash and then writes the actual content, and finally polls to see that the data has been written correctly. This is not something you should do inside a tight picture analyzing loop - that would not work well at all - moreover, erasing and rewriting flash memory is not an operation that you should be doing frequently, as all flash memory has a limit to how many times it can be erased - about 1 million. But in this case, timing of the writes is in the microseconds range (v.s. tens of nanoseconds for RAM), so it would certainly be completely unsuitable to do this to flash memory.

    I'd say that all systems that use flash memory also have RAM - it may be fairly small, but there will be some. If nothing else, the call-stack needs to be in RAM, as you can't possibly implement a stack in flash.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  15. #15
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,892
    Quote Originally Posted by Salem View Post
    But before you catch a really severe case of "premature optimisation disease", how have you determined that this is a real bottleneck?
    He has to choose one of the two implementation strategies anyway.

    I'd say go with write-only, since it's simpler code.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Reading and editing another processes memory
    By Doodle77 in forum C++ Programming
    Replies: 8
    Last Post: 02-13-2006, 02:53 AM
  2. problem with reading and writing
    By yahn in forum C++ Programming
    Replies: 2
    Last Post: 01-03-2006, 03:38 PM
  3. reading files into memory
    By bobthebullet990 in forum C Programming
    Replies: 3
    Last Post: 11-30-2005, 02:39 PM
  4. Reading a line from mapped memory
    By Zughiaq in forum C Programming
    Replies: 7
    Last Post: 05-02-2003, 11:01 AM
  5. Reading a PCX file into memory
    By SMurf in forum C Programming
    Replies: 3
    Last Post: 11-26-2001, 11:54 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21