Tuesday, October 8, 2013

Median of Two Sorted Arrays

Q:
There are two sorted arrays A and B of size m and n respectively.
Find the median of the two sorted arrays.
The overall run time complexity should be O(log (m+n)).
A:
方法的核心是将原问题转变成一个寻找第k小数的问题(假设两个原序列升序排列),这样中位数实际上是第(m+n)/2小的数。所以只要解决了第k小数的问题,原问题也得以解决。
首先假设数组A和B的元素个数都大于k/2,我们比较A[k/2-1]和B[k/2-1]两个元素,这两个元素分别表示A的第k/2小的元素和B的 第k/2小的元素。这两个元素比较共有三种情况:>、<和=。如果A[k/2-1]<B[k/2-1],这表示A[0]到A[k /2-1]的元素都在A和B合并之后的前k小的元素中。换句话说,A[k/2-1]不可能大于两数组合并之后的第k小值,所以我们可以将其抛弃。
-------简单的说,就是或者丢弃较大中位数的右区间,或者丢弃较小中位数的左区间。
-----------呵呵,自己想出来的, 可是,还是不自信, 没有code完,就去看别人的提示了。

---------------哎~~~~~~~ 踩到地雷了,以为很简单,没想到是难度为5 的题目------------

public class Solution {
       public double findMedianSortedArrays(int A[], int B[]) {
        int m = A.length, n = B.length;
        assert (!(n == m && n == 0)); // if both array are empty, WRONG
        int[] C = null;// C point to the non-empty array, while the other is
                        // empty
        if (m == 0) {
            C = B;
        } else if (n == 0) {
            C = A;
        }
        if (C != null) { // if A is empty
            return (C.length % 2 == 1) ? C[C.length / 2]
                    : (C[C.length / 2] + C[C.length / 2 - 1]) / 2.0;
        }
        // NOW neither A or B is empty
        if ((m + n) % 2 == 1) {
            return findKth(A, 0, m - 1, B, 0, n - 1, (m + n) / 2 + 1);
        } else {
            int leftMedian = findKth(A, 0, m - 1, B, 0, n - 1, (m + n) / 2);
            int rightMedian = findKth(A, 0, m - 1, B, 0, n - 1, (m + n) / 2 + 1);
            return (leftMedian + rightMedian) / 2.0;
        }
    }

    private int findKth(int[] a, int lowA, int highA, int[] b, int lowB,
            int highB, int k) {
        // GOAL: each time, we either cut the half left of the smaller median
        // or the right part of the bigger median array
        if (k == 1) {
            return Math.min(a[lowA], b[lowB]);
        } else {
            // if one array is empty
            if (lowA > highA) { // A is empty, search in B
                return b[lowB + k - 1];
            }
            if (lowB > highB) {// B is empty, search in A
                return a[lowA + k - 1];
            }

            int midA = (lowA + highA) / 2;
            int midB = (lowB + highB) / 2;

            // two pointers pointing , according to the value of the median
            int[] medianSmall = a;
            int smallLow = lowA;
            int smallMid = midA;
            int smallHigh = highA;

            int[] medianBig = b;
            int bigLow = lowB;
            int bigMid = midB;
            int bigHigh = highB;

            // make sure that small point to small
            if (medianSmall[smallMid] > medianBig[bigMid]) {
                medianSmall = b;
                medianBig = a;

                smallLow = lowB;
                smallMid = midB;
                smallHigh = highB;

                bigLow = lowA;
                bigMid = midA;
                bigHigh = highA;
            }
            // check value of k with the size of their half
            int smallLeftLength = smallMid - smallLow + 1;
            // smallMid is counted, and != 0
            int bigLeftLength = bigMid - bigLow + 1; // bigMid is counted;
            if (k <= smallLeftLength + bigLeftLength) {
                // discard the right part of the medianBig array
                // NOTE that the length of the right part of the medianBig array
                // could be ZERO.
                if (bigMid < bigHigh) { // right part is not of length 0
                    return findKth(medianSmall, smallLow, smallHigh, medianBig,
                            bigLow, bigMid, k);
                } else { // right part is of length 0, i.e. bigLow == bigHigh
                            // now,we can (imaginely) insert this value into the
                            // range of medianSmall array
                    int biggerMedian = medianBig[bigMid];// first candicates
                    if (k == smallLeftLength + 1) {
                        return (smallMid == smallHigh) ? biggerMedian : Math
                                .min(biggerMedian, medianSmall[smallMid + 1]);
                    } else {
                        return medianSmall[smallLow + k - 1];
                    }
                }
            } else { // discard the left part of the medianSmall array (
                        // length>0)
                return findKth(medianSmall, smallMid + 1, smallHigh, medianBig,
                        bigLow, bigHigh, k - smallLeftLength);
            }
        }
    }
}


Mistakes:
TMD: median 的定义是: If there is an even number of observations, then there is no single middle value; the median is then usually defined to be the mean of the two middle values

1: 因为我们后来在处理一个数组结束之后,只寻找另一个数组的情况的时候。 用了一个唯一的值等于Interger.Max_VALUE。   所以, 当k == 1 时,我们要考虑Interger.MAX_VALUE 对我们的影响。   因此,不能简单得返回 Math.max(a[lowA],b[lowB])  , 要改成代码中所示。

2: 当我面在递归函数里, 处理 k == 1的时候,忘记了先检验,是否数组不为空。

3:  当递归的时候, 需要处理一个情况, 就是, pointerSmall 的左边  和pointerBig的右边,如果为空的话,就会导致out of stack exception.
因此,需要先处理那些情况。

4: 要记得检测, 总数是奇数还是偶数--------导致的计算方法是不一样的


5: 哎呀, TMD,疯了,  就因为,当时, 也不知道怎么想的。可能是半夜,脑子不清醒。
当k==1的时候,我们应该取两个数组里面,各自第一个里的值的较小的呀。
应该是Math.min    ------可是自己当时,写的是Math.max    花费了一次半值班的时间, 大约是4,5个小时的宝贵时间。    得不偿失啊。  Tiger

6: when  discarding the right part of the medianBig array, we should check first the length of the to-be-discarded part.
因此这种情况下, medianBig 数组的bigHigh == bigSmall,我们就把它的值单独列出来,并对比medianSmall即可。 (否则会出现死循环调用,导致out of stack memory ,)


Learned:
1:  当 一个int 值, 例如,
int a = 3;
a/2 的值,默认还是int类型。 因此,当我们需要返回一个带小数的double类型的话,需要除以2.0

---------------------------第二遍 ---------------------------
思路同上, 只是没有用medianBig, medianSmall来区分罢了。
public class Solution {
    public double findMedianSortedArrays(int A[], int B[]) {
        int m = A.length, n = B.length;
        return  (findKth(A,0,m-1,B,0,n-1,(m+n)/2)+findKth(A,0,m-1,B,0,n-1,(m+n)/2+1))/2.0;
    }
    public int findKth(int[] A, int lowA,int highA, int[] B, int lowB,int highB,int kth){
        if(lowA>highA)
            return B[lowB+kth-1];
        if(lowB>highB)
            return A[lowA+kth-1];
        int leftACount = (lowA+highA)/2 - lowA;
        int leftBCount = (lowB+highB)/2 - lowB;
        if( A[(lowA+highA)/2] >= B[(lowB+highB)/2] ){ // median of A >= median of B
            if(leftACount+leftBCount+1 >= kth)// discard right part of A, including median A
                return findKth(A,lowA,  (lowA+highA)/2 -1,B,lowB,highB,kth);
            else // discard left part of B
                return findKth(A,lowA, highA, B, (lowB+highB)/2 +1, highB, kth - leftBCount -1 );
        }else{ // median of A  < median of B
            if(leftACount+leftBCount+1 >= kth)// discard right part of B, including median B
                return findKth(A,lowA,  highA ,B,lowB,(lowB+highB)/2-1,kth);
            else // discard left part of A
                return findKth(A,(lowA+highA)/2+1, highA, B, lowB, highB, kth - leftACount -1 );
        }
    }
}
Mistakes:
1: kth这个参数, 第一遍是按照0 based考虑的。 那么, 在对比leftCount+rightCount 和 kth的大小的时候, 就不需要加1了。  ----------(这个问题很快发现, 然后就调整了)
2:由于在 if 语句中(20行左右),写 (lowB+highB)/2] 的时候,拷贝了上面,  然后发现多了个 ‘】’, 就随手把if (这里的】 也去掉了,  ) complier error,  就又随手加上了 ]  ,。  结果加到/2 后面去了 。  哎, 结果  傻看了40多分钟,才弄明白。    哎, 人生阿~~~~~~~
状态不好的时候, 就不要去coding, 犯的SB错误,都会搞死我的时间

############ 3 rd pass##################
public class Solution {


    public double findMedianSortedArrays(int[] nums1, int[] nums2) {
        int m = nums1.length, n = nums2.length;
        return (  helper(nums1,0,m-1,nums2,0,n-1,(m+n)/2 ) + helper(nums1,0,m-1,nums2,0,n-1,(m+n-1)/2) )/2.0;
    }
    public int helper(int[] A, int startA, int endA, int B[], int startB, int endB, int k ){
        // find the kth in A and B ( k start from ==0 )
        if(startA > endA)
            return B[startB+k];
        if(startB > endB)
            return A[startA + k];
        int midA = (startA+endA)/2, midB = (startB+endB)/2;
        int nLeft = midA-startA + midB - startB +1; // inclusive the smaller middle value
        if( A[midA] <= B[midB]){
            if( nLeft >= k+1  ){// exclude the right part of B, include midB
                return helper(A,startA,endA, B, startB,midB-1,k);
            }else{//delete the left part of A, include midA
                return helper(A,midA+1, endA, B, startB, endB, k-(midA-startA+1));
            }
        }else{ // switch A and B 
            return helper(B,startB,endB, A, startA, endA, k);
        }
    }
}

改进是, 倒数第二个if, 交换了A,B 两个数组,这样,少写一遍最容易出错的代码

Mistakes:
这里最容易犯的错误,  第一个if里,开始写成了 nLeft >= index

No comments:

Post a Comment