This example matches the first example from the wiki article.
In the example you linked to, after the first pass, the first pair of the second half are descending, so the right half does not contain bitonic sub-sequences. In the case of both the wiki and overview algorithms, after the first pass, there are 4 bitonic sequence of size 4. After the second pass, there are 2 bitonic sequences of size 8. After the third pass, there is 1 bitonic sequence of size 16. After the fourth pass, the data is sorted.
Code:
12 04 09 07 05 13 08 01 10 14 06 11 15 02 00 03
04 12 | 09 07 | 05 13 | 08 01 | 10 14 | 11 06 | 02 15 | 03 00
04 07 09 12 | 08 13 05 01 | 10 06 11 14 | 03 15 02 00
04 07 | 09 12 | 13 08 | 05 01 | 06 10 | 11 14 | 15 03 | 02 00
04 07 05 01 13 08 09 12 | 15 10 11 14 06 03 02 00
04 01 05 07 | 09 08 13 12 | 15 14 11 10 | 06 03 02 00
01 04 | 05 07 | 08 09 | 12 13 | 15 14 | 11 10 | 06 03 | 02 00
01 04 05 07 06 03 02 00 15 14 11 10 08 09 12 13
01 03 02 00 06 04 05 07 | 08 09 11 10 15 14 12 13
01 00 02 03 | 05 04 06 07 | 08 09 11 10 | 12 13 15 14
00 01 | 02 03 | 04 05 | 06 07 | 08 09 | 10 11 | 12 13 | 14 15
update - The purpose of using bitonic sequences for a hardware based sort is that for each phase of each pass of the sort, the distance between compared values is fixed, which would reduce variation in propagation delays. I'm not sure if there's any advantage for a software based sort.