August 27, 2009
First Patent for Palamida

Couple of observations about our first patent award. First, it took almost five years; and second, its worth the wait. We have always known that CodeRank was important – and its terrific to get the confirmation that we were right. For those of you not familiar with that feature, it’s a lot like Page Rank from Google. The whole idea is that any type of search engine will probably generate more results than a human cares to plow through, and that it would be a very good thing if the results were ranked according to their usefulness before you looked at them. That’s what CodeRank does, but for source code rather than websites. It’s the way the ranking is determined that is important – and justified the patent.

The authors are Ray Walden and Jing Zhang. Ray is one of the Palamida founders, and Jing is a software engineer who worked with Ray. We’re very proud of their work, obviously, and as far as we can tell this is the first patent that is a direct result of development in our market.

So back to CodeRank. Its all about source code, and determining which code that we know about (our library of known open source code) is the best match to the code you scanned. Once we know that, we can tell you what your code is, as well as things like license and vulnerabilities. How do we determine the best match? Count, Coverage, Clustering and Uniqueness. By computing values for each of these, and summarizing into a single number, we can accurately rank the results, and show you the most useful results first. Quick summary – count is simply the number of matches we find between source document (your code) and a pattern file (our library). Coverage is how much of a particular pattern file shows up in your source document. The more we find, the higher the likelihood that it is the same code. Clustering is the overall proximity of the matches. If they are close together, then they are more likely to represent a cut and paste operation. If they are scattered, they are more likely random code patterns that happen to match. Finally uniqueness. Uniqueness is a measure of how common a pattern is. If a pattern is one that is common to most pattern files, and we find it in your source code, it doesn’t mean very much. On the other hand if we find something that is not common, in other words unique, it is likely to be much more valuable in determining the origin of your code. Taken all together – we now have a way to tell you which scan results are most likely to be the ones that will lead you to the right conclusion about the origin of the code in your codebase.

Our application category, Composition Analysis, is still new. But with inventions like CodeRank, and others that are in the approval pipeline – we’re creating a solid base of technology that will allow a range of new and very interesting features that will help software teams build better and more secure software at lower cost.