I'm using mac, myself, and I think the actual function doing the strlen is "__dyld_strlen". It has the following assembly dump:
Code:
0x00007fff5fc22e90 <__dyld_strlen+0>: pxor %xmm0,%xmm0
0x00007fff5fc22e94 <__dyld_strlen+4>: mov %edi,%ecx
0x00007fff5fc22e96 <__dyld_strlen+6>: mov %rdi,%rdx
0x00007fff5fc22e99 <__dyld_strlen+9>: and $0xfffffffffffffff0,%rdi
0x00007fff5fc22e9d <__dyld_strlen+13>: or $0xffffffffffffffff,%eax
0x00007fff5fc22ea0 <__dyld_strlen+16>: pcmpeqb (%rdi),%xmm0
0x00007fff5fc22ea4 <__dyld_strlen+20>: and $0xf,%ecx
0x00007fff5fc22ea7 <__dyld_strlen+23>: shl %cl,%eax
0x00007fff5fc22ea9 <__dyld_strlen+25>: pmovmskb %xmm0,%ecx
0x00007fff5fc22ead <__dyld_strlen+29>: and %eax,%ecx
0x00007fff5fc22eaf <__dyld_strlen+31>: je 0x7fff5fc22ebb <__dyld_strlen+43>
0x00007fff5fc22eb1 <__dyld_strlen+33>: bsf %ecx,%eax
0x00007fff5fc22eb4 <__dyld_strlen+36>: sub %rdx,%rdi
0x00007fff5fc22eb7 <__dyld_strlen+39>: add %rdi,%rax
0x00007fff5fc22eba <__dyld_strlen+42>: retq
0x00007fff5fc22ebb <__dyld_strlen+43>: pxor %xmm0,%xmm0
0x00007fff5fc22ebf <__dyld_strlen+47>: add $0x10,%rdi
0x00007fff5fc22ec3 <__dyld_strlen+51>: movdqa (%rdi),%xmm1
0x00007fff5fc22ec7 <__dyld_strlen+55>: add $0x10,%rdi
0x00007fff5fc22ecb <__dyld_strlen+59>: pcmpeqb %xmm0,%xmm1
0x00007fff5fc22ecf <__dyld_strlen+63>: pmovmskb %xmm1,%ecx
0x00007fff5fc22ed3 <__dyld_strlen+67>: test %ecx,%ecx
0x00007fff5fc22ed5 <__dyld_strlen+69>: je 0x7fff5fc22ec3 <__dyld_strlen+51>
0x00007fff5fc22ed7 <__dyld_strlen+71>: sub $0x10,%rdi
0x00007fff5fc22edb <__dyld_strlen+75>: jmp 0x7fff5fc22eb1 <__dyld_strlen+33>
I have to admit I didn't read all of it, but it's obvious it uses SSE2 (I think, I have to admit I've never used it myself).
About it being unportable: it's probably determined at compile time if your processor supports it. If not, it uses the "slower but more portable" versions. If it does support them, it uses them. That's how most of those applications that require a maximum speed work. Big integers, for instance, usually have dozens of implementations, one for each architecture supported.