Current algorithm has complexity O( |f|^2 * max(b[i]) ) , where f - size of file in blocks and b[i] - number of blocks from index with same hash as for block i from file. So in a worst case it might lead to O( f^3 ), e.g. if file contains a lot of identical blocks. In such case Sonar CPD might take a lot of time, whereas PMD CPD not.
Suffix-tree will have complexity O( k ) , where k - total number of blocks (including from index). Also note that this is space-time tradeoff, i.e. suffix-tree will consume a bit more memory in comparison with previous algorithm, but still far less than PMD.