1. The smallest scale size can be 1/16, not 1/15
2. Fix potential heap corruption if scale to a smaller size (OOB)
3. Fix mismatching writeback and invalidate data size if in_bg/fg_buffer and out_buffer
are the same one and L2 cacheline size is larger than L1 cacheline size