Tiger's leetcode solution: Wildcard Matching ~~~~~~~~~~~~~~~~~~~~~~~

Implement wildcard pattern matching with support for '?' and '*'.

'?' Matches any single character.
'*' Matches any sequence of characters (including the empty sequence).

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "*") → true
isMatch("aa", "a*") → true
isMatch("ab", "?*") → true
isMatch("aab", "c*a*b") → false

A:
1: 先用递归，估计会嫌太慢。先做出来再说吧。

嗯，确实很慢。问题主要出在* 上。下面是递归的解法。

public class Solution {
    public boolean isMatch(String s, String p) {
        if(s.length() ==0){
            if(p.length()==0)
                return true;
            else
                return p.charAt(0) =='*'? isMatch(s, p.substring(1)) : false;
        }
        if(p.length() ==0)
            return false;

        if(s.charAt(0) == p.charAt(0) || p.charAt(0) == '?')
            return isMatch(s.substring(1), p.substring(1));
        else if( p.charAt(0) == '*' )
            return isMatch(s,p.substring(1)) || isMatch(s.substring(1),p);
        else
            return false;
    }
}

上面这个解法，对于输入：

"babbbaabbaaaaabbababaaaabbbbbbbbbbabbaaaabbababbabaa", "**a****a**b***ab***a*bab"

就会严重超时。

问题就出在倒数第二个return 语句上，|| 后面那个helper（）。当s只match了一个。这样效率很低。

Solution 2
贪心算法，只需要依据连续的’*'，将p分割成一些不带’*'的子串。然后在s中依次匹配就好了，只不过需要特殊照顾一下首尾两个子串：
1.对于开头不是’*'的p，第一个子串必须从s[0]开始匹配
2.对于结尾不是’*'的p，最后一个子串必须在s的尾巴上匹配

--------------------------确实快了很多，可是，当*后面的第一个字符串(非通配符）在s中match的次数过多的时候，速度会显著降低。-------------------这时候，我们就不需要匹配所有的。只需要找到匹配的即可。嗯，如果是最后的字串，需要匹配最后的位置。
还需要继续优化。

-------------------solution 3 -----------------------
我们提出solution 3，也就是把? 和其余的字符串放一起。
因此， p = "a???*a??b"被我们分成 [a, ???, * , a??b] 四个子pattern。这样出了第一个前面没有*外，别的子pattern前面都有*，可以随便移动位置。（当然，需要自己写字符匹配的函数。这个很简单）
----------------这里深入分析了8种解法----------- （还有代码）
--------------------- 这是第二遍的实现--------------思路跟solution3 一样-------只是更简洁了----

public class Solution {
    public boolean isMatch(String s, String p) {
        if (s == null || p == null) // should not occur
            return false;
        if (s.length() == 0)
            return p.length() == 0 || (p.length() == 1 && p.charAt(0) == '*');
        if (p.length() == 0)// if run this line, s.length() != 0
            return false;
        // check the first is *
        if (p.charAt(0) != '*') {
            int firstStartIndex = p.indexOf("*");
            if (firstStartIndex < 0) { // if p does not contains *
                return s.length() == p.length() && findFirstMatch(s, p) == 0;
            } else { // / p contains *
                String partP = p.substring(0, firstStartIndex);
                int matchPos = findFirstMatch(s, partP);
                if (matchPos != 0) { // if not fint at the first place
                    return false;
                } else { // we have find match at first place
                    s = s.substring(partP.length());
                    p = p.substring(partP.length());
                }
            }
        }
        String[] patternP = p.split("\\*");
        // all patternP[i] have a * ahead
        for (int i = 0; i < patternP.length; i++) {
            String nonStarP = patternP[i];
            int matchPos = findFirstMatch(s, nonStarP);
            if (matchPos < 0) {
                return false;
            } else {
                s = s.substring(matchPos + nonStarP.length());
            }
        }
        // if last have *,  or s is all matched
        return p.charAt(p.length() - 1) == '*' || s.length() == 0;
    }

    private int findFirstMatch(String s, String p) {
        // p does not contains *
        for (int i = 0; i <= s.length() - p.length(); i++) {
            // test substirng and p
            boolean isMatch = true;
            for (int j = 0; j < p.length(); j++) {
                char chS = s.charAt(i + j);
                char chP = p.charAt(j);
                if (chP != '?' && chS != chP) {
                    isMatch = false;
                    break;
                }
            }
            if (isMatch) { // if found match
                return i;
            }
        }
        return -1;
    }
}

Mistakes:
1: 当方法2的时候，没有考虑这种情况

Input：	"", ""
Output:	false
Expected:	true

2：方法2的时候，由于没有考虑到 p = "*"的情况。
而我们初始条件的startIndex = 0 是第一个valid情况。而每次startIndex跳到下一个情况（或者越出右边界一位）
这样，就产生了错误。

3：在用s的startIndex ==0 的初始情况的时候，没有考虑，当s为空的情况。导致error.

4: when the substring is "?", at first ,we simply skip to next, but it is WRONG. we need also check whether s has the character with it.

5: 当遇到?的时候，我们仅仅startIndex跳到下一个了，可是，没有注意
s="aa"
p ="*?"的情况时，我们需要多跳几个。因此，还是需要把*?的顺序，调整成？*

Learned:
1：好像在leetcode里，不用考虑，输入是null 的情况。
2:
s = "aa";
System.out.println( s.lastIndexOf("a", 0));
这里，很诡异的，输出竟然是0 而不是1. ----------------原因如下。

public int lastIndexOf(String str, int fromIndex)

Returns the index within this string of the last occurrence of the specified substring, searching backward starting at the specified index.

-----------------第二遍-----------
1：用 * 来split ，命令要写成： str.split("\\*"); -------- 要用双反斜线
2：注意， split函数，是可以得到数组的某个元素为 “”的。

--------------------------------下面是别人的代码------------ C++------------

Analysis:

For each element in s
If *s==*p or *p == ? which means this is a match, then goes to next element s++ p++.
If p=='*', this is also a match, but one or many chars may be available, so let us save this *'s position and the matched s position.
If not match, then we check if there is a * previously showed up,
if there is no *, return false;
if there is an *, we set current p to the next element of *, and set current s to the next saved s position.

e.g.

abed
?b*d**

a=?, go on, b=b, go on,
e=*, save * position star=3, save s position ss = 3, p++
e!=d, check if there was a *, yes, ss++, s=ss; p=star+1
d=d, go on, meet the end.
check the rest element in p, if all are *, true, else false;

Note that in char array, the last is NOT NULL, to check the end, use "*p" or "*p=='\0'".

class Solution {
public:
    bool isMatch(const char *s, const char *p) {
        const char* star=NULL;
        const char* ss=s;
        while (*s){
            if ((*p=='?')||(*p==*s)){s++;p++;continue;}
            if (*p=='*'){star=p++; ss=s;continue;}
            if (star){ p = star+1; s=++ss;continue;}
            return false;
        }
        while (*p=='*'){p++;}
        return !*p;
    }
};

-------------------------------------------------第三遍----------------想利用上面的想法，也用递归来做的。结果还是超时。----------------------------然后就又是按照* 来split，然后线性查找即可-------- 因此，这道题目，复杂度应该是在O(n)的--------------

对于前后的non- start string 的问题，我们事先匹配后，消去即可。

public class Solution { 
    public boolean isMatch(String s, String p) {
        if(p.length()==0)
            return s.length()==0;
        if(s.length()==0)
            return p.charAt(0)=='*' && isMatch(s,p.substring(1));
        if(p.charAt(0) !='*'){// delete all before first *
            return (p.charAt(0)=='?' || p.charAt(0)==s.charAt(0)) && isMatch(s.substring(1),p.substring(1));
        }
        if(p.charAt(p.length()-1) !='*'){ // delete all after last *
            char ch = p.charAt(p.length()-1);
            return (ch=='?' || ch ==s.charAt(s.length()-1)) && isMatch(s.substring(0,s.length()-1),p.substring(0,p.length()-1));
        }
        // now at the two end of p,  are *
        String[] P = p.split("\\*");
        int firstMatchPos = 0;
        for(int i =0;i< P.length; i++){
            if( P[i].length()==0)
                continue;
            firstMatchPos = findMatch(s, firstMatchPos,P[i]);
            if(firstMatchPos<0)
                return false;
            firstMatchPos += P[i].length();
        }
        return true;
    }
    private int findMatch(String s, int start, String p){// this p does not contain *
        for(int i =start;i <= s.length()-p.length();i++){
            //check the whole p
            boolean found = true;
            for(int j=0;j<p.length();j++){
                if( p.charAt(j) !='?' && p.charAt(j) != s.charAt(j+i)){
                    found = false;
                }
            }
            if(found)
                return i;
        }
        return -1;
    }
}

-------------------------------又一种思路---------------------
贪心的策略，能匹配就一直往后遍历，匹配不上了就看看前面有没有'*'来救救场，再从'*'后面接着试。
这个解法，其实就是我们最上面的哪种，但是，用两个index而不用递归，大大节省了时间。

public class Solution {
    public boolean isMatch(String s, String p) {
        int i = 0, j = 0;
        int star = -1, mark = -1;
        while (i < s.length()) {
            if (j < p.length()
                    && (p.charAt(j) == '?' || p.charAt(j) == s.charAt(i))) {
                ++i;
                ++j;
            } else if (j < p.length() && p.charAt(j) == '*') {
                star = j++;
                mark = i;
            } else if (star != -1) {
                j = star + 1;
                i = ++mark;
            } else {
                return false;
            }
        }
        while (j < p.length() && p.charAt(j) == '*') {
            ++j;
        }
        return j == p.length();
    }
}

Tiger's leetcode solution

Friday, November 8, 2013

Wildcard Matching ~~~~~~~~~~~~~~~~~~~~~~~

No comments:

Post a Comment

About Me

Blog Archive