In a Slack organization that I am a part of, someone asked:
Has anyone ever heard of anything that takes Github repos and assigns them some sort of “reliability score”? Something that takes stars, commit frequency, PRs, open/closed issue ratio, issue closing time, etc. into account and gives them a score? It would be really nice for choosing between two or three repos that do similar things.
I had some thoughts on this since it is something that I have done in the past, and took them and extended for this post.
The Ruby Toolbox tries to give a rough score for ruby gems based on the features above, and it gets most of the way there. It provides a good overview for classes of gems to get a sense of how well maintained they are. Usually when I am trying to find a gem that does something, I start here. Over time you get a sense of a language’s ecosystem and this is less valuable, but it is still useful for finding new gems that might do that thing better.
I usually look at some combination of the following to determine whether a package or library is worth using:
- Does it purport to do what I need done?
- Does it have documentation?
- How many people have starred / forked the repo?
- How many issues are outstanding?
- Are there many open / ignored pull requests
- Does it have tests?
- Who is the maintainer of the library?
It doesn’t need to score highly on every metric, but more positive signals generally make me more confident in the decision.
More general considerations
To me, choosing an open source library comes down to cost of change. If it would take only a few hours to switch to the other library if one of them proves to be unmaintained or unsuited for the task at hand, then I would just pick one and proceed with implementing it. If it is something that would be very expensive (in terms of time or money), then it is a decision that should be considered more closely.
For example, the other day we tried a library on Github that didn’t have any stars. It seemed like it might get the job done and it was for a minor piece of functionality. Something that was more critical might demand a closer look.
If I am worried about committing to one over the other I sometimes try to build an adapter layer to be able to change which library we use seamlessly. Basically taking a bit of time to increase optionality in the future, with a bit of last responsible moment thrown in there as well. If you defer the decision on what database to use and build an adapter layer, you can switch databases much more easily in the future. There are some similar thoughts at:
Another approach is to try a spike of both and then using the knowledge that you gain to make a decision. So if you learn that one is easier to work with or fits your domain better or is faster, then you can use that knowledge. Also, you buy yourself hours or days of additional information to watch the activity on the repo. Then at worst you have one working implementation if you did it in serial and were happy after the first one, or two if you were unhappy or did them in parallel.
Similarly, Martin Fowler writes:
…one of the main source of complexity is the irreversibility of decisions. If you can easily change your decisions, this means it’s less important to get them right - which makes your life much simpler. The consequence for evolutionary design is that designers need to think about how they can avoid irreversibility in their decisions. Rather than trying to get the right decision now, look for a way to either put off the decision until later (when you’ll have more information) or make the decision in such a way that you’ll be able to reverse it later on without too much difficulty.
What this implies for my projects
Of course, the implication for any open source projects that I maintain is that to provide the most value to others, they should clearly state what the project’s function is. They should be documented, and the issues and pull requests should be reasonably gardened. And so forth.