C Board  

Go Back   C Board > General Programming Boards > C++ Programming

Reply
 
LinkBack Thread Tools Display Modes
Old 08-10-2009, 01:08 PM   #1
3735928559
 
Join Date: Mar 2008
Posts: 662
fast checking for alignment

i need to pre-increment a pointer to be 16B aligned for use with sse2 functions, but i am somewhat concerned about the repeated use of the modulus. i have heard it is possible to check for alignment with the & operator, but i can't seem to get that to work.

Code:
//...
unsigned char* index1 = memberDataBlock;
unsigned char* index2 = someOtherDataBlock;

while(index1%16||index2%16)
{
    *index1++ |= *index2++;
}

//...
is there a faster way to achieve the condition above other than using the modulus? i don't need the actual remainder, i just need to know if there is one or not.


please feel free to move this to the C forum if it is deemed more appropriate
m37h0d is offline   Reply With Quote
Old 08-10-2009, 01:16 PM   #2
and the Hat of Guessing
 
tabstop's Avatar
 
Join Date: Nov 2007
Posts: 8,740
"while (index1&0xF)" perhaps?
tabstop is offline   Reply With Quote
Old 08-10-2009, 01:20 PM   #3
3735928559
 
Join Date: Mar 2008
Posts: 662
aaaah yes. your use of hex has made my careless error immediately obvious.

index&0xF , not index&16

edit: thanks!

Last edited by m37h0d; 08-10-2009 at 01:24 PM.
m37h0d is offline   Reply With Quote
Old 08-10-2009, 01:34 PM   #4
Senior software engineer
 
brewbuck's Avatar
 
Join Date: Mar 2007
Location: Portland, OR
Posts: 5,379
Quote:
Originally Posted by m37h0d View Post
i need to pre-increment a pointer to be 16B aligned for use with sse2 functions, but i am somewhat concerned about the repeated use of the modulus. i have heard it is possible to check for alignment with the & operator, but i can't seem to get that to work.

Code:
//...
unsigned char* index1 = memberDataBlock;
unsigned char* index2 = someOtherDataBlock;

while(index1%16||index2%16)
{
    *index1++ |= *index2++;
}
Nice infinite loop you have there. (The only way it isn't infinite is if index1 & 0xF == index2 & 0xF)
__________________
"Congratulations on your purchase. To begin using your quantum computer, set the power switch to both off and on simultaneously." -- raftpeople@slashdot
brewbuck is online now   Reply With Quote
Old 08-10-2009, 01:38 PM   #5
3735928559
 
Join Date: Mar 2008
Posts: 662
you sure?
Code:
while(index1&0xF||index2&0xF)
"seems to work" /famous last words


Quote:
The only way it isn't infinite is if index1 & 0xF == index2 & 0xF
and yes, if i want them both to be 16 byte aligned, isn't that exactly the condition i'm looking for? O_o

Last edited by m37h0d; 08-10-2009 at 01:41 PM.
m37h0d is offline   Reply With Quote
Old 08-10-2009, 01:43 PM   #6
and the Hat of Guessing
 
tabstop's Avatar
 
Join Date: Nov 2007
Posts: 8,740
Quote:
Originally Posted by m37h0d View Post
you sure?
Code:
while(index1&0xF||index2&0xF)
"seems to work"
I'm with you. (This assumes that you only care about one of them being aligned. If you need them both to be aligned, then things could get rough.)
tabstop is offline   Reply With Quote
Old 08-10-2009, 01:55 PM   #7
3735928559
 
Join Date: Mar 2008
Posts: 662
yes, i need them both aligned. and brewbuck's right. the addresses in my test just happened to be 16B aligned already.
m37h0d is offline   Reply With Quote
Old 08-10-2009, 02:06 PM   #8
Senior software engineer
 
brewbuck's Avatar
 
Join Date: Mar 2007
Location: Portland, OR
Posts: 5,379
Quote:
Originally Posted by m37h0d View Post
yes, i need them both aligned. and brewbuck's right. the addresses in my test just happened to be 16B aligned already.
If you need them both aligned, and they are not already co-aligned to 16 bytes, then you cannot do it. You will either need to copy one data set to a location where it is 16-byte aligned, or use SSE2 shuffle instructions combined with an extremely hard-to-understand inner loop to handle the misalignment. Lastly, you could use the unaligned store/load which, while slower, might still be faster than the other two alternatives.

It's extremely difficult to write efficient SIMD algorithms when you aren't in control of the original alignment.
__________________
"Congratulations on your purchase. To begin using your quantum computer, set the power switch to both off and on simultaneously." -- raftpeople@slashdot
brewbuck is online now   Reply With Quote
Old 08-10-2009, 02:21 PM   #9
3735928559
 
Join Date: Mar 2008
Posts: 662
in this case, i can be, but it may not be the case forever. the buffers in this case are data buffers for a piece of hardware. the driver for the particular piece of equipment has an option to use a user-defined buffer. future HW selections may not; thus necessitating nasty superfluous memcpys. i doubt that's really much of an issue though, because this is honestly the first piece of hw i've seen whose driver attempts to manage it's own memory on the client side.
m37h0d is offline   Reply With Quote
Old 08-10-2009, 02:24 PM   #10
Senior software engineer
 
brewbuck's Avatar
 
Join Date: Mar 2007
Location: Portland, OR
Posts: 5,379
Quote:
Originally Posted by m37h0d View Post
in this case, i can be, but it may not be the case forever. the buffers in this case are data buffers for a piece of hardware. the driver for the particular piece of equipment has an option to use a user-defined buffer. future HW selections may not; thus necessitating nasty superfluous memcpys. i doubt that's really much of an issue though, because this is honestly the first piece of hw i've seen whose driver attempts to manage it's own memory on the client side.
Most hardware buffers I've ever seen are at least 16-byte aligned and possibly quite a bit more. Particularly hardware buffers which are intended for direct memory mapping, are usually page-aligned. Hardware designers like alignment just as much as you do
__________________
"Congratulations on your purchase. To begin using your quantum computer, set the power switch to both off and on simultaneously." -- raftpeople@slashdot
brewbuck is online now   Reply With Quote
Old 08-10-2009, 02:51 PM   #11
3735928559
 
Join Date: Mar 2008
Posts: 662
yes, curious that it invariably was 16+B aligned, but the documentation doesn't guarantee it.

sadly the user-buffer option is not working according to spec :grumble:
m37h0d is offline   Reply With Quote
Reply

Tags
alignment intrinsics

Thread Tools
Display Modes

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Buidl Library with ./configure script Jardon C Programming 6 07-24-2009 09:36 AM
Profiler Valgrind afflictedd2 C++ Programming 4 07-18-2008 09:38 AM
Interpreter.c moussa C Programming 4 05-28-2008 05:59 PM
Forced moves trouble!! Zishaan Game Programming 0 03-27-2007 06:57 PM
Problems about gcc installation kevin_cat Linux Programming 4 08-09-2005 09:05 AM


All times are GMT -6. The time now is 07:52 PM.


Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.3.0 RC2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22