Monday, December 15, 2014

Data scientist / Machine Learning Engineer 相关面试题(转)


去年我找工作的时候发现板上针对data scientist,machine learning engineer面试
题总结很少,所以尽量申请了很多公司面试相关职位,想看看行业里这个方向都在问什
么。有幸去过不少地方面试,现在把那些题目整理整理(全部来自Amazon, Microsoft,
Yelp, Pinterest,
Square, Google, Glassdoor, Groupon的电面和onsite),希望能帮助在找相关工作的
同学们。

题目写的简略,请大家见谅

====================


1. Given a coin you don’t know it’s fair or unfair. Throw it 6 times and
get 1 tail and 5 head. Determine whether it’s fair or not. What’s your
confidence value?

2. Given Amazon data, how to predict which users are going to be top
shoppers in this holiday season.

3. Which regression methods are you familiar? How to evaluate regression
result?

4. Write down the formula for logistic regression. How to determine the
coefficients given the data?

5. How do you evaluate regression?
For example, in this particular case:
item click-through-rate  predicted rate
1       0.04        0.06
2       0.68        0.78
3       0.27        0.19
4       0.52        0.57


6. What’s the formula for SVM? What is decision boundary?

7. A field with unknown number of rabbits. Catch 100 rabbits and put a label
on each of them. A few days later, catch 300 rabbits and found 60 with
labels. Estimate how many rabbits are there? 

8. Given 10 coins with 1 unfair coin and 9 fair coins. The unfair coin has &
#8532; prob. to be head. Now random select 1 coin and throw it 3 times. You
observe head, head, tail. What’s the probability that the selected coin is
the unfair one?

9. What’s the formula for Naive Bayesian classifier? What’s the assumption
in the formula? What kind of data is Naive Bayesian good at? What is not?

10. What is the real distribution of click-through rate of items? If you
want to build a predictor/classifier for this data, how do you do it? How do
you divide the data?

11. You have a stream of data coming in, in the format as the following:
item_id, views, clicks, time
1            100     10         2013-11-28
1            1000   350       2013-11-29
1            200     14         2013-11-30
2            127     13         2013-12-1


The same id are consecutive.

Click through rate = clicks / views.
On every day, I want to output the item id when its click through rate is
larger than a given threshold.
For example, at day 1, item 1’s rate is 10/100=10%, day2, its (10+350)/(100
+1000)=0.32. day3 it is (10+350+14)/(100+1000+200)=0.28.
If my threshold is 0.3, then at day 1, I don’t output. On day2 I output. On
day3, I don’t output.

11. Given a dictionary and a string. Write a function, if every word is in
the dictionary return true, otherwise return false.

12. Generate all the permutation of a string.
For example, abc, acb, cba, …

13. We want to add a new feature to our product. How to determine if people
like it?
A/B testing. How to do A/B testing? How many ways? pros and cons?

14. 44.3% vs 47.2% is it significant? 

15. Design a function to calculate people’s interest to a place against the
distance to the place.

16. How to encourage people to write more reviews on Yelp? How to determine
who are likely to write reviews? How to increase the registration rate of
Yelp? What features to add for a better Yelp app? We are expanding to other
countries. Which country we should enter first?

17. What’s the difference between classification and regression?

18. Can you explain how decision tree works? How to build a decision tree
from data?

19. What is regularization in regression? Why do regularization? How to do
regularization?

20. What is gradient descent? stochastic gradient descent?

21. We have a database of <product_id, name, description, price>. When user
inputs a product name, how to return results fast?

22. If user gives a budget value, how to find the most expensive product
under budget? Assume the data fits in memory. What data structure, or
algorithm you use to find the product quickly? Write the program for it.

23. Given yelp data, how to find top 10 restaurants in America?

24. Given a large file that we don’t know how many lines are there. It
doesn’t fit into memory. We want to sample K lines from the file uniformly.
Write a program for it.

25. How to determine if one advertisement is performing better than the
other?

26. How to evaluate classification result? What if the results are in
probability mode?
If I want to build a classifier, but the data is very unbalanced. I have a
few positive samples but a lot of negative samples. What should I do?

27. Given a lot of data, I want to random sample 1% of them. How to do it
efficiently?

28. When a new user signs up Pinterest, we want to know its interests. We
decide to show the user a few pins, 2 pins at a time. Let the user choose
which pin s/he likes. After the user clicks on one of the 2, we select
another 2 pins.
Question: how to design the system and select the pins so that we can
achieve our goal?

29. Write a function to compute sqrt(X). Write a function to compute pow(x,
n) [square root and power)


30. Given a matrix
a b c  d
e f  g  h
i  j  k   l
Print it in this order:
a  f  k
b g l
c h
d
e j
i

31. Given a matrix and an array of words, find if the words are in the
matrix. You can search the

matrix in all directions:  from left to right, right to left, up to down,
down to up, or diagonally.
For example
w o r x b
h  e l  o v
i   n d e m

then the word “world” is in the matrix.


32. Given a coordinates, and two points A and B. How many ways to go from A
to B? You can only move up or right.
For example, from (1, 1) to (5, 7), one possible way is 1,1 -> 2, 1… 5, 1 -
> 5,2 -> ..5, 7


33. In a city where there are only vertical and horizontal streets. There
are people on the cross point. These people want to meet. Please find a
cross point to minimize the cost for all the people to move.

34. Design a job search ranking algorithm on glassdoor

35. How to identify review spam?

36. Glassdoor has this kind of data about a job : (position, company,
location, salary). For example (Software Engineer, Microsoft, Seattle, $125K
). For some records, all four entires are available. But for others, the
salary is missing. Design a way to estimate salary for those records.

37. When to send emails to users in a day can get maximum click through rate?

38. Youtube has video play log like this:
Video ID, time
vid1        t1
vid2        t2
...           ...
The log is super large.
Find out the top 10 played videos on youtube in a given week.

39. Write a program to copy a graph

40. A bank has this access log:
IP address, time
ip1      t1
ip2      t2
...        ...

If one ip accessed K times within m seconds, it may be an attack.
Given the log, identify all IPs that may cause attack.

Tuesday, December 2, 2014

Singleton Design Pattern – An Introspection w/ Best Practices

Singleton is a part of Gang of Four design pattern and it is categorized under creational design patterns.

In this article we are going to take a deeper look into the usage of the Singleton pattern. It is one of the most simple design pattern in terms of the modelling but on the other hand this is one of the most controversial pattern in terms of complexity of usage.

In Java the Singleton pattern will ensure that there is only one instance of a class is created in the Java Virtual Machine. It is used to provide global point of access to the object. In terms of practical use Singleton patterns are used in logging, caches, thread pools, configuration settings, device driver objects. Design pattern is often used in conjunction with Factory design pattern. This pattern is also used in Service Locator JEE pattern.

Structure:
Singleton Class Diagram Singleton Class Diagram
  • Static member : This contains the instance of the singleton class.
  • Private constructor : This will prevent anybody else to instantiate the Singleton class.
  • Static public method : This provides the global point of access to the Singleton object and returns the instance to the client calling class.

Implementation Example: Lazy initialization

Let us look into a singleton implementation example in Java. The below code uses Lazy initialization process.

public class SingletonExample {
 
    // Static member holds only one instance of the
    // SingletonExample class
    private static SingletonExample singletonInstance;
 
    // SingletonExample prevents any other class from instantiating
    private SingletonExample() {
    }
 
    // Providing Global point of access
    public static SingletonExample getSingletonInstance() {
        if (null == singletonInstance) {
            singletonInstance = new SingletonExample();
        }
        return singletonInstance;
    }
 
    public void printSingleton(){
        System.out.println("Inside print Singleton");
    }
}
Singleton Pattern Code Explanation Singleton Pattern Code Explanation
When this class will be called from the client side using SingletonExample.getSingletonInstance().printSingleton(); then at the first time only an instance will be created. During second time onwards for all subsequent calls we will be referring to the same object and the getSingletonInstance() method returns the same instance of the SingletonExample class which was created during the first time. You can test this by adding a print statement the following code as:
public static SingletonExample getSingletonInstance() {
        if (null == singletonInstance) {
            singletonInstance = new SingletonExample();
            System.out.println("Creating new instance");
        }
        return singletonInstance;
    }
If we now call Singleton class from the client class as:
SingletonExample.getSingletonInstance().printSingleton();
SingletonExample.getSingletonInstance().printSingleton();
SingletonExample.getSingletonInstance().printSingleton();
SingletonExample.getSingletonInstance().printSingleton();
The output of the above call is:
Creating new instance
Inside print Singleton
Inside print Singleton
Inside print Singleton
Inside print Singleton

Implementation Example: Double check locking

The above code works absolutely fine in a single threaded environment and processes the result faster because of lazy initialization. However the above code might create some abrupt behaviour in the results in a multithreaded environment as in this situation multiple threads can possibly create multiple instance of the same SingletonExample class if they try to access the getSingletonInstance() method at the same time. Imagine a practical scenario where we have to create a log file and update it or while using a shared resource like Printer. To prevent this we must use some locking mechanism so that the second thread cannot use this getInstance() method until the first thread has completed the process.
Singleton in heap Singleton in heap
In Figure 3 we show how multiple threads access the singleton instance. Since the singleton instance is a static class variable in the stored in the PermGen space of the heap. This applies to getSingletonInstance() instance method as well since it is static too. In the multithreading environment to prevent each thread to create another instance of singleton object and thus creating concurrency issue we will need to use locking mechanism. This can be achieved by synchronized keyword. By using this synchronized keyword we prevent Thread2 or Thread3 to access the singleton instance while Thread1 inside the method getSingletonInstance().
From code perspective it means:

public static synchronized SingletonExample getSingletonInstance() {
        if (null == singletonInstance) {
            singletonInstance = new SingletonExample();
        }
        return singletonInstance;
    }
So this means that every time the getSingletonInstance() is called it gives us an additional overhead . To prevent this expensive operation we will use double checked locking so that the synchronization happens only during the first call and we limit this expensive operation to happen only once. It is only required for:
singletonInstance = new SingletonExample();
Code example:

public static volatile SingletonExample getSingletonInstance() {
        if (null == singletonInstance) {
            synchronized (SingletonExample.class){
                    if (null == singletonInstance) {
            singletonInstance = new SingletonExample();
            }
        }
        }
        return singletonInstance;
    }
In the above code snippet imagine that multiple threads comes concurrently and tries to create the new instance. In such situation the may be three or more threads are waiting on the synchronized block to get access. Since we have used synchronized only one thread will be given access. All the remaining threads which were waiting on the synchronized block will be given access when first thread exits this block. However when the remaining concurrent thread enters the synchronized block they are prevented to enter further due to the double check : null check. Since the first thread has already created an instance no other thread will enter this loop.
All the remaining threads that were not lucky to enter the synchronized block along with the first thread will be blocked at the first null check. This mechanism is called double checked locking and it provides significant performance benefit and also it is cost effective solution.

Implementation Example: Volatile Keyword

We can also use the volatile keyword to the instance variable declaration.
private volatile static SingletonExample singletonInstance;
The volatile keyword helps as concurrency control tool in a multithreaded environment and provides the latest update in a most accurate manner.However please note that double check locking might not work before Java 5. In such situation we can use early loading mechanism. If we remember about the original sample code we had used lazy loading. In case of early loading we will instantiate the SingletonExample class at the start and this will be referred to the private static instance field.

public class SingletonExample {
    private static final SingletonExample singletonInstance = new SingletonExample;
 
    // SingletonExample prevents any other class from instantiating
    private SingletonExample() {
    }
 
    // Providing Global point of access
    public static SingletonExample getSingletonInstance() {
 
        return singletonInstance;
    }
 
    public void printSingleton(){
        System.out.println("Inside print Singleton");
    }
}
In this approach the singleton object is created before it is needed. The JVM takes care of the static variable initialization and ensures that the process is thread safe and that the instance is created before the threads tries to access it. In case of Lazy loading the singletonInstance is created when the client class calls the getSingleInstance() whereas in case of the early loading the singletonInstance is create when the class is loaded in the memory.

Implementation Example: Using enum

Implementing Singleton in Java 5 or above version using Enum:
Enum is thread safe and implementation of Singleton through Enum ensures that your singleton will have only one instance even in a multithreaded environment. Let us see a simple implementation:

public enum SingletonEnum {
 INSTANCE;
 public void doStuff(){
     System.out.println("Singleton using Enum");
 }
}
And this can be called from clients :
public static void main(String[] args) {
        SingletonEnum.INSTANCE.doStuff();
 
    }

FAQs:

Question: Why can’t we use a static class instead of singleton?
Answer:
  • One of the key advantages of singleton over static class is that it can implement interfaces and extend classes while the static class cannot (it can extend classes, but it does not inherit their instance members). If we consider a static class it can only be a nested static class as top level class cannot be a static class. Static means that it belongs to a class it is in and not to any instance. So it cannot be a top level class.
  • Another difference is that static class will have all its member as static only unlike Singleton.
  • Another advantage of Singleton is that it can be lazily loaded whereas static will be initialized whenever it is first loaded.
  • Singleton object stores in Heap but, static object stores in stack.
  • We can clone the object of Singleton but, we can not clone the static class object.
  • Singleton can use the Object Oriented feature of polymorphism but static class cannot.
Question: Can the singleton class be subclassed?
Answer: Frankly speaking singleton is just a design pattern and it can be subclassed. However it is worth to understand the logic or requirement behind subclassing a singleton class as the child class might not inherit the singleton pattern objective by extending the Singleton class. However the subclassing can be prevented by using the final keyword in the class declaration.
Question: Can there be multiple instance of singleton using cloning?
Answer: That was a good catch! What do we do now? To prevent the another instance to be created of the singleton instance we can throw exception from inside the clone() method.
Question: What is the impact if we are creating another instance of singleton using serialization and deserialization?
Answer: When we serialize a class and deserialize it then it creates another instance of the singleton class. Basically as many times as you deserialize the singleton instance it will create multiple instance. Well in this case the best way is to make the singleton as enum. In that way the underlying Java implementation takes care of all the details. If this is not possible then we will need to override the readobject() method to return the same singleton instance.
Question: Which other pattern works with Singleton?
Answer:There are several other pattern like Factory method, builder and prototype pattern which uses Singleton pattern during the implementation.
Question: Which classes in JDK uses singleton pattern?
Answer: java.lang.Runtime : In every Java application there is only one Runtime instance that allows the application to interface with the environment it is running. The getRuntime is equivalent to the getInstance() method of the singleton class.

Uses of Singleton design pattern:

Various usages of Singleton Patterns:
  • Hardware interface access: The use of singleton depends on the requirements. However practically singleton can be used in case external hardware resource usage limitation required e.g. Hardware printers where the print spooler can be made a singleton to avoid multiple concurrent accesses and creating deadlock.
  • Logger : Similarly singleton is a good potential candidate for using in the log files generation. Imagine an application where the logging utility has to produce one log file based on the messages received from the users. If there is multiple client application using this logging utility class they might create multiple instances of this class and it can potentially cause issues during concurrent access to the same logger file. We can use the logger utility class as a singleton and provide a global point of reference.
  • Configuration File: This is another potential candidate for Singleton pattern because this has a performance benefit as it prevents multiple users to repeatedly access and read the configuration file or properties file. It creates a single instance of the configuration file which can be accessed by multiple calls concurrently as it will provide static config data loaded into in-memory objects. The application only reads from the configuration file at the first time and there after from second call onwards the client applications read the data from in-memory objects.
  • Cache: We can use the cache as a singleton object as it can have a global point of reference and for all future calls to the cache object the client application will use the in-memory object.

Hands-on:

Let us take a small practical example to understand the Singleton design pattern in more details.

Problem Statement:
Design a small ATM printing application which can generate multiple types of statements of the transaction including Mini Statement, Detailed statement etc. However the customer should be aware of the creation of these statements. Ensure that the memory consumption is minimized.

Design Solution:
The above requirement can be addressed using two core Gang of four design pattern – Factory design pattern and Singleton design pattern. In order to generate multiple types of statements for the ATM transactions in the ATM machine we can create a Statement Factory object which will have a factory method of createStatements(). The createStatements will create DetailedStatement or MiniStatement objects.

The client object will be completely unware of the object creation since it will interact with the Factory interface only. We will also create an interface called StatementType. This will allow further statement type objects e.g. Credit card statement etc to be added. So the solution is scalable and extensible following the object oriented Open/Closed design principle.

The second requirement of reducing the memory consumption can be achieved by using Singleton design pattern. The Statement Factory class need not be initiated multiple times and a single factory can create multiple statement objects. Singleton pattern will create a single instance of the StatementFactory class thus saving memory.
ATM example ATM example


  • Factory: Factory is an abstract class which is a single point of contact for the client. All the concrete factory classes needs to implement the abstract factory method.

  • StatementFactory: This is the Factory implementation class which consist of the creator method. This class extends from the Factory abstract class.This is the main creator class for all the products e.g. Statements.

  • StatementType: This is a product interface which provides abstraction to the various types of products which needs to be created by the Factory class.

  • DetailedStatement: This is a concrete implementation of the StatementType interface which will print the detailed statements.

  • MiniStatement: This is a concrete implementation of the StatementType interface which will print the mini statements.

  • Client: This is the client class which will call the StatementFactory and StatementType and request for object creation.

Assumption:
The design solution caters to only single ATM machine.

Sample Code:

Factory.java
public abstract class Factory {
 protected abstract StatementType createStatements(String selection);
 }
StatementFactory.java
public class StatementFactory extends Factory {
    private static StatementFactory uniqueInstance;
 
    private StatementFactory() {
    }
 
    public static StatementFactory getUniqueInstance() {
        if (uniqueInstance == null) {
            uniqueInstance = new StatementFactory();
            System.out.println("Creating a new StatementFactory instance");
        }
        return uniqueInstance;
 
    }
 
    public StatementType createStatements(String selection) {
        if (selection.equalsIgnoreCase("detailedStmt")) {
            return new DetailedStatement();
        } else if (selection.equalsIgnoreCase("miniStmt")) {
            return new MiniStatement();
        }
        throw new IllegalArgumentException("Selection doesnot exist");
    }
}
StatementType.java

public interface StatementType {
    String print();
}
DetailedStatement.java
public class DetailedStatement implements StatementType {
    @Override
    public String print() {
        System.out.println("Detailed Statement Created");
        return "detailedStmt";
    }
}
MiniStatement.java
public class MiniStatement implements StatementType {
    @Override
    public String print() {
        System.out.println("Mini Statement Created");
        return "miniStmt";
    }
}
Client.java
public class Client {
 
    public static void main(String[] args) {
        System.out.println("Enter your selection:");
        BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
        String selection = null;
        try {
            selection = br.readLine();
        } catch (IOException e) {
            e.printStackTrace();
        }
        Factory factory = StatementFactory.getUniqueInstance();
        StatementType objStmtType = factory.createStatements(selection);
        System.out.println(objStmtType.print());
    }
 
}
Conclusion:
In the above articles we have gone into the details of the Singleton pattern, how to implement singleton pattern along with a practical application. Though singleton pattern looks a simple implementation we should resist ourselves from using it until and unless there is a strong requirement for it. You can blame the unpredictable nature of the results in the multi-threading environment. Though we can use enum in Java 5 and above it sometimes get difficult to implement your logic always in enum or there might be legacy code before Java 5. Hope our readers have liked this article.