When you get my source - use the DXEngine.CPP and DXEngine.h files. Create a new project and change the screen mode and resolution to 320x200x256. That code already retrieves a pointer to the buffer or surface and a pointer to the backbuffer - use the backbuffer to access the screen. You must then flip the backbuffer to the surface representing the screen. Here is the code in assembly. I forgot the actual names of my buffers so just substitute them in where appropriate.

Copies 32 bit BackBuffer to 32-bit Surface.
Code:
asm {
   push  edi
   push  esi
   push  ds
   lds     esi,[BackBuffer]
   les     edi,[Surface]
   mov   ecx,03E80h   //16000 DWORDs in 64000 bytes
   rep    movsd
   pop   ds
   pop   esi
   pop   edi
}