"Block index", is normally called "block offset", it's the part of the address that is just passed through to select which byte within a cache row is to be used. The index is the portion of the physical address used to index into the cache. If the cache is multi-way associative, there will be multiple cache rows that have the same index, which one to use is determined by a match of the tag bits. If the cache is fully associative, there are no index bits. As you mentioned the number of tag bits is the number of physical address bits - (number of index bits + number of offset bits). In your case you only need 32 bits:
Wiki article:
CPU cache - Wikipedia, the free encyclopedia
Using the first wiki article example, you have 21 tag bits, 5 index bits, 6 offset bits. To split up a 32 bit unsigned address:
offset = (address >> 0) & ((1 << 6) -1);
index = (address >> 6) & ((1 << 5) -1);
tag = (address >> 11) & ((1 << 21) -1);