If you want to know which is faster profile it because I cannot make any guarantees that the assembly version is faster.
void PlotPixel32(DWORD *Buffer,int x,int y,DWORD color,int bufwidth)
void PlotPixel32Ex(DWORD *Buffer,int x,int y,DWORD color,int bufwidth)
This only works if the color is defined as ARGB or XRGB.
Since y*bufwidth+x will never exceed 32-bits then you don't need to worry about what is in EDX because it will be all zeros. Mul will multiply EAX by r/m 32 and leave the result in EDX:EAX where EDX is the high order 32-bits and EAX is the low-order 32-bits. The CF and OF flags are cleared if EDX is zero (which in this case it will be so no need to check).
#define RGBTODWORD(a,r,g,b) (DWORD)((a<<24)+(r<<16)+(g<<8)+(b))
Since most cards support ARGB and XRGB natively you are fairly safe in assuming this snippet will work on all current video hardware.
For DirectX you simply would use SURFACEDESC.MemPitch for the width value instead of the actual width of the screen. This in turn would not need to be shifted left by 2 since the MemPitch takes care of that.
In Windows you simply need to get a pointer to the HDC, then to the client area of the HDC and write a 32-bit value to the client area using the API or by getting a pointer directly to the buffer.