Generating activity graphs with Bokeh and SQLAlchemy
Posted: February 09, 2019
Categories: python, sqlalchemy, bokeh
If you keep some data in an SQL database, mapped to SQLAlchemy classes, and representing things your users do, you might be interested in generating graphs of users activity.
To keep everything written in Python, we'll use Bokeh, a plotting library that enables, among other features, the generation of dashboards in HTML.
Let's say we have two tables in our database:
people with their characteristics, and
actions, what the users
do. Actions could be anything representing user interactions in your
software: client orders in the context of a store, player movements
for a video game, etc.
In SQLAlchemy, our database looks like this:
Base = declarative_base() class User(Base): __tablename__ = 'users' id = Column(Integer, primary_key=True) email = Column(String, unique=True) username = Column(String, unique=True) class Action(Base): __tablename__ = 'actions' id = Column(Integer, primary_key=True) name = Column(String) date_created = Column(DateTimeUTC, default=datetime.datetime.utcnow) user_id = Column(Integer, ForeignKey('users.id'))
Now, after our software has been in production for a while, we'd like to know what is going on: for example track the evolution of the amount of daily/weekly/monthly actions, and the evolution of active users.
Bar plots to the rescue!
Amount of actions
We will count things over time ranges, so we'll generate vertical bar
x are dates, and
y are whatever we'll count.
We'll count things in our database, so to avoid duplicating SQLAlchemy queries, let's create this function:
from bokeh.plotting import figure from sqlalchemy import func def figure_counter( session, field_identifier, field_date, groups, to_x, title): """Makes a bar figure with time as x, and something counted as y. `field_identifier` is what is counted (e.g. Action.id) `field_date` is how it is counted (eg. Action.date_created) `groups` is how to group the dates (e.g. ['year', 'month']) `to_x` is a function that will create a datetime from a row (e.g. lambda x: datetime.date(int(x.year), (x.month), 1)) `title` is the title of the plot """ groups = [func.extract(x, field_date).label(x) for x in groups] q = (session.query(func.count(field_identifier).label('count'), *groups) .group_by(*groups) .order_by(*groups) .all()) x = [to_x(el) for el in q] y = [el.count for el in q] width = (x[-1] - x).total_seconds() / len(x) * 900 p = figure(plot_height=300, plot_width=800, title=title, x_axis_type='datetime') p.vbar(x=x, width=width, bottom=0, top=y) return p
The main magic of this function comes from the
groups list: since we
want to count things over a time range, we have to extract
information from the date field in our table, and be sure to generate
unique categories. For example, if we want to count the amount of
actions per day, our unique categories will be identified by
['year', 'month', 'day']. If we extract these three numbers from the
date field, and group by them, we will get correct and unique counts.
func.extract() is for PostgreSQL, you might want to change
this if you use another type of database.
The other non-trivial part is the
to_x parameter: we need functions
to reconstruct complete dates from the parts of the date we extract,
in order to feed correct
x values to Bokeh.
Let's see how to use this function to generate our plots.
Amount of actions per day
Let's begin by generating a bar plot showing the number of actions in DB, grouped by day:
figure_counter( session, Action.id, Action.date_created, ['year', 'month', 'day'], lambda x: datetime.date(int(x.year), int(x.month), int(x.day)), 'Actions created per day')
Amount of actions per week
If you want to get the same bar plot with weekly actions, group your
figure_counter( session, Action.id, Action.date_created, ['isoyear', 'week'], lambda x: (datetime.date(int(x.isoyear), 1, 1) + datetime.timedelta(days=7*x.week)), 'Actions created per week'),
Amount of actions per month
And finally, to get the monthly actions:
figure_counter( session, Action.id, Action.date_created, ['year', 'month'], lambda x: datetime.date(int(x.year), int(x.month), 1), 'Actions created per month'),
We have our magic function, why not count different things? Another
interesting metric is the number of daily/weekly/monthly active users,
i.e. the number of users who did at least one action during the chosen
time range. To count this, we won't count the
Action.id, but rather
distinct(Action.user_id). For the daily active users:
from sqlalchemy import distinct figure_counter( session, distinct(Action.user_id), Action.date_created, ['year', 'month', 'day'], lambda x: datetime.date(int(x.year), int(x.month), int(x.day)), 'Daily amount of active users')
And, as you can guess, you can do the same with weekly and monthly active users.
Now that's enough code for one post, but you can use this to count
things differently, for example the number of actions grouped by the
day of the week (to see if your users are more active on week-ends
for example). You'll have to adapt your
x line, but you can get the
right numbers by using
['isodow'] as your
One last thing: once you've generated all your Bokeh figures, you can
arrange them nicely on an HTML dashboard using
bokeh.embed.components. It is well documented on Bokeh's website.
There used to be a comments section here. The current best way to react to things I write is to ping me on Twitter.