Counting inversion

Finding "similarity" between two rankings. Given a sequence of n numbers 1..n (assume all numbers are distinct). Define a measure that tells us how far this list is from being in ascending order. The value should be 0 if a_1 < a_2 < ... < a_n and
should be higher as the list is more "out of order".

Define the number of inversion

i, j form an inversion if a_i > a_j, that is, if the two elements a_i and a_j are "out of order".

Comparing two rankings is counting the number of inversion in the sequence a_1.. a_n.

Example

2 4 1 3 5
1 2 3 4 5

The sequence 2, 4, 1, 3, 5 has three inversions (2,1), (4,1), (4,3).

Algorithm to count inversion

Use divide and conquer

divide: size of sequence n to two lists of size n/2
conquer: count recursively two lists
combine: this is a trick part (to do it in linear time)

combine use merge-and-count. Suppose the two lists are A, B. They are already sorted. Produce an output list L from A, B while also counting the number of inversions, (a,b) where a is-in A, b is-in B and a > b.

The idea is similar to "merge" in merge-sort. Merge two sorted lists into one output list, but we also count the inversion.

Everytime a_i is appended to the output, no new inversions are encountered, since a_i is smaller than everything left in list B. If b_j is appended to the output, then it is smaller than all the remaining items in A, we increase the number of count of inversions by the number of elements remaining in A.

merge-and-count(A,B)
; A,B two input lists (sorted)
; C output list
; i,j current pointers to each list, start at beginning
; a_i, b_j elements pointed by i, j
; count number of inversion, initially 0

while A,B != empty
    append min(a_i,b_j) to C
    if b_j < a_i
       count += number of element remaining in A
       j++
    else
       i++
; now one list is empty
append the remainder of the list to C
return count, C

With merge-and-count, we can design the count inversion algorithm as follows:

sort-and-count(L)
if L has one element return 0
else
     divide L into A, B
     (rA, A) = sort-and-count(A)
     (rB, B) = sort-and-count(B)
     (r, L) = merge-and-count(A,B)
return r = rA+rB+r, L

T(n) = O(n lg n)

End