Plot Python lines of code and PEP-8 warnings

Posted: September 26, 2019
Category: python

I wrote a little script to iterate over a git repository, and count for each commit: the Python source lines of code (with sloccount), and the PEP-8 warnings (with flake8). It is quite interesting to look at their evolution over time.

Plot

Thesis wordcloud

Caveats

  • It is reasonable for some files to have PEP-8 warnings. For example data files that span long lines of content (stored in a dictionary or list). These files can be excluded from the flake8 report.

  • Right, right, my code has tons of warnings. I added flake8 in my editor sometime in 2018 (better late than never), and I'm only fixing code that I change (one of the reasons for this is to avoid git blame telling me about this big commit that fixed all the PEP-8 warnings, instead of the previous commit that is likely to contain more useful debugging information). This is a personal repository where I don't have to synchronize code style with anyone, of course.

More ideas

Such a plot is interesting, but is not barely enough to get clear indicators of a code base sanity. It only represents adherence to code style.

However, it is easy to add other metrics, as long as you can run them from the bash script. Here are a few ideas:

  • code coverage,
  • running time of the test suite,
  • compiling time,
  • number of different contributors,
  • error/warning count of any static analysis tool.

Code

It alls fit within some sort of polyglot bash script, with an embedded plot script. You will need gnuplot, sloccount, and flake8, but you can easily replace these last two with your tools of choice.