Discussions

News: Are you ready to rank your Open Source project?

  1. Antelink launches a new metric for the Open Source Community after a first metric of how many times a library is reused in the Open Source Community. 

    Thanks to the users feedbacks, Antelink proposes the “project reused ranking” based on how many Open Source projects reuse at least one of the artifact of the project being ranked.

    This global metric aims to ease comparison between projects. It is much more robust than metrics based on each release since it will be independent of release policy and the Community release adoption period. Since the metric is still in Bêta, your feedbacks are welcome as well as if you want to have access to your own project ranking or the ranking of all projects in Antelink Database. 

    Then, are you ready to rank your Open Source project?

  2. By this metric commons logging could be the best OSS project ever. This metric is bound to rate low-level libraries higher than high-level aggregate projects or servers. Tomcat is less re-used than commons logging, but that doesn't mean commons logging is a better project - it just means that its a library not a server.

     

  3. Many classifications rank open source projects according to the number of developers, forum activity, or the number of downloads. One metric alone is always questionable and subject to interpretation. Combining several, however, provides a ‘big picture’. New metrics are useful in that they make it possible to characterize new uses. We do measure reuse, and naturally focus on libraries and forked projects. Open source projects dedicated to end-user type applications will not be highly rated.

    So you're right, this metric help identifying de facto standard for low level libraries, but also give some other insights if you compare two projects like Junit  (ranked 7) and one of its promising alternative TestNG (ranked 5), or HSQLDB and Apache project Derby which are both ranked 6. Few others examples are given in blog entries we published in a serie dedicated to "Most Reused Open Source Project of the Week"

    By the way, tomcat reuse ranking is 6, which is not so low !

  4. I'm a bit confused: 1. 10 = Highest ranking? what is that, the linux kernel? Java's runtime library (it's open-source now)? Highest theoretical ranking? 2. The scaling aspect seems weird. I assume that you are analysing a graph of dependencies between OS systems. The degrees of these graphs follow a power-law distribution. Thus, I would expect that a project with a higher score would have an order of magnitude more dependencies. I can assume that junit is/was being used by the majority of healthy java open-source programs. I do not see how it is only one order of magnitude more used than ehcache, a niche open-source library. 3. That brings up the issue of sampling. If you consider that ehcache as very popular, then you are obviously biased towards JEE open-source. I doubt many end user apps (eg. JavaME) programs use this. 4. You are only analysing Java open-source ("upload your jar...") 5. Instead of focusing on reuse only, I would also try using graph analysis algorithms like HITS to find the "meta-packages" Good luck on your project, Stephane
  5. 1. 10 = Highest ranking? what is that, the linux kernel? Java's runtime library (it's open-source now)? Highest theoretical ranking?

    Highest ranking is 10 (on a logarithmic scale from 1 to 10). 10 means that 100% of the open source projects in our data base (which include right now more than 162.000 projects from SF an GoogleCode), reuse at least one artifacts from the project being ranked.


    2. The scaling aspect seems weird. I assume that you are analysing a graph of dependencies between OS systems.

    right :something like Reuse of open source components by the open source community

    The degrees of these graphs follow a power-law distribution. Thus, I would expect that a project with a higher score would have an order of magnitude more dependencies. I can assume that junit is/was being used by the majority of healthy java open-source programs. I do not see how it is only one order of magnitude more used than ehcache, a niche open-source library.

    I agree, we've ben suprised too. By the way, we compare ehcache ranking to JBoss-cache and memcache. I would just say that "The last steps are always the most difficult to climb"


    3. That brings up the issue of sampling. If you consider that ehcache as very popular, then you are obviously biased towards JEE open-source. I doubt many end user apps (eg. JavaME) programs use this.

    Our sampling is a random selection of 162.000 projects from GoogleCode and sourceforge. About one third are java based.

    4. You are only analysing Java open-source ("upload your jar...")

    NO, try with dll, source file, gif, it works too. Java open source projects are just very poular, and then given lot of relevant example.


    5. Instead of focusing on reuse only, I would also try using graph analysis algorithms like HITS to find the "meta-packages"

    Very Interesting idea. I just checked and find some references, I will read. I talked with roberto di cosmo who is working within EU funded mancoosi project They have done some great work about graph of dependencies within package distribution.

     

    Many thanks for this feed back.

     



    Good luck on your project,
    Stephane