One of the most common tasks in machine learning is making prediction about data points you haven’t observed by finding patterns in your data. A popular example of this was the Netflix Prize , where the goal was to predict users ratings based on a large corpus of known ratings for a number of individuals. One possible way to approach this problem is to frame it as a low-rank matrix completion problem – that is, you build a matrix where each row is a user and each column is a movie and then try to fill in the missing entries in such a way that minimizes the rank of the matrix. Here, I’ll first outline the motivation for such an approach and then describe a non-standard algorithm for solving these sorts of problems that has some interesting properties.
Low-Rank Matrix Completion
One of the first ideas one might have when looking at a matrix like this is to assume that if people share tastes in many movies they’d likely share tastes in other movies. Mathematically, that is like saying that the vectors that comprise the rows of those users point in similar directions. If they point in the exact same direction, we can say those rows are linearly dependent, which implies that the matrix is not full rank . This observation suggests that we should fill in entries so that the rows and columns are as linearly dependent as possible, or, more precisely, we should minimize the rank of the matrix:
Here I’m saying that we want to minimize the rank of A subject to the constraint that the observed entries of A (given by C) have the values y. If we had an algorithm for solving this problem we could run it and it would give us something like this:
So far so good! Except there are two problems. First, there are entries in our matrix that we still can’t fill in – we don’t have any "coupling" ratings between the two sets of movies to model what ratings will be on the other pair. This doesn’t occur so much on the Netflix dataset, but some smaller datasets can have issues like this. It would be useful to have direct clarity on what entries you can trust and which you cannot. In practice those unknown entries might be filled in by zeros or noise depending on what algorithm you use, and it’s not plainly obvious which is which. Secondly, the minimization problem I described above is NP-HARD. This tends to be a big issue when you’re talking about large datasets.
Back in 2009 one of the big results in this area was due to Emmanuel Candés and Terence Tao . They were able to show that, given a sufficient number of samples, one could exactly solve the problem using a convex relaxation of the above rank minimization problems. Convex optimization problems are generally fast and have a lot of very nice and powerful theory around convergence rates and algorithm development. This, and many similar iterative techniques that have since been developed, are currently the standard approach to solving these sorts of problems. For the rest of this post, I’m going to detail a totally different approach that has some interesting properties and advantages in certain use cases.
A Different Approach
First, let’s consider the rank-1 problem. That is, we know that our matrix can be written as the outer product of two vectors. The simplest example we can construct is a 2×2 matrix:
If we consider the entries given by the latter expression, there’s a constraint on the values they can take – consider taking the determinant of the matrix:
So there’s 1 constraint on the values that the entries of the matrix can take if it’s rank-1. If we know 3 values in the matrix this enables us to directly solve for the 4th value as a function of the other three. For example:
So we have a non-iterative means for completing a rank-1 matrix of size 2×2. We can extend this to the case of an arbitrary size matrix where you have any "triangle" of values known. For example:
By identifying entries that are defined by this constraint we can propagate values through the matrix as they become fully determined. So the next step in constructing an algorithm that uses this is to find a fast algorithm for identifying these 2×2 submatrices with 3 known components. One way to structure this is to consider a bipartite graph where each node is an entry in either
and each edge is an entry in
between the row and column nodes it corresponds to. For example:
Then we can fill in the matrix by identifying 1-near cycles of length 4 – that is, by adding one edge we could complete the cycle of length 4. Fortunately for the rank-1 case this is simply equivalent to finding paths of length 3. One way to do this is to construct the adjacency matrix for the graph and raise it to the third power. I’ll use M to denote that adjacency matrix:
Here we observe the block anti-diagonal structure of the adjacency matrix induced by the bipartite nature of the graph and denote the mask of observed values
. Any non-zero value in corresponds to a entry that can be solved for. In order to additionally identify entries that have not already been solved for, we additionally require that
at that index. So we can construct a logical array of entries
that can be completed with the entrywise logic:
Then, after completing the matrix entries indicated by
, we can iterate and check if any new entries are determined using the above expression, terminating when Q is all false. Then all we have to do is identify the path of length 3 between the two nodes using a search algorithm and solve for the new entry. Below are some example visualizations of the algorithm running on different matrices demonstrating this propagation of information. The top plot is an image of the matrix values and the bottom plot is the mask of known values, and each is plotted at each iteration of identifying Q.
The code implementing this algorithm and generating these plots is available on my Github page .
A simple worst case upper bound to the run time of this approach on an n-by-m matrix with h holes would be
, which assumes that only one "hole" is filled on each iteration of the algorithm and uses dense matrix multiplication for finding Q. This is certainly an over-estimate, however – one would expect to fill in many more than one hole on each iteration, and only highly pathological cases would cause a large number of iterations to be required. Furthermore, by using sparse boolean matrix multiplication in Q and smarter evaluation one should be able to reduce the complexity further.
In summation I’ve outlined and implemented an algorithm for solving noiseless rank-1 matrix completion in polynomial time using a non-optimization based approach, though there’s still a lot of work to be done considering its performance and properties. The next posts in this series will generalize this approach to higher ranks as well as tighten the theoretical performance bounds.
Please email me at email@example.com if you have any suggestions or comments! Also stay tuned for the higher rank generalizations – that post will be a doozy! :grinning:
转载本站任何文章请注明：转载至神刀安全网，谢谢神刀安全网 » A Different Approach to Low-Rank Matrix Completion