1. ## Bitset questions

Hi all,

I am very new to programming, I apologize if my questions seem too trivial. I have:
Code:
```#include <bitset>
#include <iostream>

int main()
{
static const unsigned n = 5;
static float p[n] = {8.34, 10.59, 7.6, 0.8, 10.32};

unsigned int i;
for( i = 32; i--;)
{
std::bitset < n > v = i;

return 0;
}```
First, I need to efficiently compute the "scalar product" of the bitset (array of boolean?) "v" with the array of float "p"?

Second, how to make an exact copy of the bitset "v"?

Thank you very much!
SK

2. This question reads a bit like "I have a dog and a stack of hay - how do I milk a cow?"

What is the meaning of a "scalar product of a bitset and an array of float"?

--
Mats

3. By "scalar product" I mean element by element multiplications followed by summation of the products.

Since "v" is an array of boolean (if I am correct), I need to sum those elements of the "p" array, whose coordinates correspond to the positions of 1's in "v" ("v" and "p" are defined in the code above)

For example, in the above loop if "i" = 19, "v" is 10011, and the result should be 1*8.34 + 0*10.59 + 0*7.6 + 1*0.8 + 1*10.32 = 19.46.

SK

4. So you want to test bit number (n - j - 1) in a nested loop where n is the size of float array and size of the bitset, and add p[j] to the sum if that bit is set.

5. Originally Posted by anon
So you want to test bit number (n - j - 1) in a nested loop where n is the size of float array and size of the bitset, and add p[j] to the sum if that bit is set.
err...yes, but it would be nice to somehow skip the multiplications completely, and just add the relevant elements of the float array, i.e. do

8.34 + 0.8 + 10.32 = 19.46.

1*8.34 + 0*10.59 + 0*7.6 + 1*0.8 + 1*10.32 = 19.46.

(The actual array in my applications are much larger)

SK

6. That's what I meant. In the inner loop only add the float if the test member function returns true for this bit.

7. Originally Posted by serge
err...yes, but it would be nice to somehow skip the multiplications completely, and just add the relevant elements of the float array, i.e. do

8.34 + 0.8 + 10.32 = 19.46.

1*8.34 + 0*10.59 + 0*7.6 + 1*0.8 + 1*10.32 = 19.46.

(The actual array in my applications are much larger)

SK
If it's an embedded system, then use
Code:
`if (bitset[n - j - 1]) sum += p[n];`
- on a modern PC it's most likely going to result in equally slow code whether you multiply by zero or take an unpredicable jump - both tend to take about 5-10 clock-cycles, so unless you have long chains of all ones or all zeros, the benefit of skipping the multiply is not great.

--
Mats

8. Floating point multiplication ought to be much faster than 5-10 cycles, but the conversion of the bit might still kill your performance.

9. According to:
http://www.amd.com/us-en/assets/cont...docs/40546.pdf
page 277: the fmul itself takes 4-6 cycles (latency time - there can be more than one FMUL in progress, so the processor won't stall until you actually MUST have the result).

But you also need to load the two values and add it up into sum, so I guess that averages to about 5-10 cycles.

--
Mats

10. You could try to SIMD the thing.

Only if testing reveals it to be a bottleneck, of course. Until then, go with the if. It's the clearest way to code it.

11. Originally Posted by CornedBee
You could try to SIMD the thing.

Only if testing reveals it to be a bottleneck, of course. Until then, go with the if. It's the clearest way to code it.
Or just multiply by zero.

SIMD (to be efficient, rather than just use some instructions that don't work on all processors) means that you most likely will have to write the code in inline assembler, and getting the data out of the bitset in inline assembler will not be easy.

But profiling will also help identify the bottlenecks.

--
Mats

12. Nah, nowadays, SIMD is written using compiler intrinsics, not inline assembler. Plays much nicer with the optimizer, and not as much trouble with types.

13. Originally Posted by CornedBee
Nah, nowadays, SIMD is written using compiler intrinsics, not inline assembler. Plays much nicer with the optimizer, and not as much trouble with types.
In gcc this may actually work - but at least last time I looked at it, Visual Studio's code generation for intrinsics left quite a bit to be desired, particularly with regards to register allocation and maintaining data in registers - although I haven't tried with VS 2008 (due to lack of XP on my home machine, and no need for Visual Studio at work).

--
Mats

14. But the situation is even more dire in VS, actually - the 64-bit compilers don't even support inline assembly.

15. Originally Posted by CornedBee
But the situation is even more dire in VS, actually - the 64-bit compilers don't even support inline assembly.
Yes, that too!

--
Mats