Generating activity graphs with Bokeh and SQLAlchemy

Posted: February 09, 2019
Categories: python, sqlalchemy, bokeh

If you keep some data in an SQL database, mapped to SQLAlchemy classes, and representing things your users do, you might be interested in generating graphs of users activity.

To keep everything written in Python, we'll use Bokeh, a plotting library that enables, among other features, the generation of dashboards in HTML.

Data

Let's say we have two tables in our database: users, representing people with their characteristics, and actions, what the users do. Actions could be anything representing user interactions in your software: client orders in the context of a store, player movements for a video game, etc.

In SQLAlchemy, our database looks like this:

Base = declarative_base()

class User(Base):
    __tablename__ = 'users'

    id = Column(Integer, primary_key=True)
    email = Column(String, unique=True)
    username = Column(String, unique=True)

class Action(Base):
    __tablename__ = 'actions'

    id = Column(Integer, primary_key=True)
    name = Column(String)
    date_created = Column(DateTimeUTC, default=datetime.datetime.utcnow)
    user_id = Column(Integer, ForeignKey('users.id'))

Now, after our software has been in production for a while, we'd like to know what is going on: for example track the evolution of the amount of daily/weekly/monthly actions, and the evolution of active users.

Bar plots to the rescue!

Amount of actions

Now, Bokeh is a really cool library: feed it your numbers (coming from anywhere), choose your plot type, adjust some options, and choose how to save the results. Boom, you got your HTML dashboard with some fancy Javascript to play with the embedded plots.

We will count things over time ranges, so we'll generate vertical bar plots, where x are dates, and y are whatever we'll count.

We'll count things in our database, so to avoid duplicating SQLAlchemy queries, let's create this function:

from bokeh.plotting import figure
from sqlalchemy import func

def figure_counter(
        session,
        field_identifier,
        field_date,
        groups,
        to_x,
        title):
    """Makes a bar figure with time as x, and something counted as y.

    `field_identifier` is what is counted (e.g. Action.id)
    `field_date` is how it is counted (eg. Action.date_created)
    `groups` is how to group the dates (e.g. ['year', 'month'])
    `to_x` is a function that will create a datetime from a row
      (e.g. lambda x: datetime.date(int(x.year), (x.month), 1))
    `title` is the title of the plot
    """
    groups = [func.extract(x, field_date).label(x) for x in groups]
    q = (session.query(func.count(field_identifier).label('count'),
                       *groups)
                .group_by(*groups)
                .order_by(*groups)
                .all())
    x = [to_x(el) for el in q]
    y = [el.count for el in q]

    width = (x[-1] - x[0]).total_seconds() / len(x) * 900

    p = figure(plot_height=300,
               plot_width=800,
               title=title,
               x_axis_type='datetime')
    p.vbar(x=x, width=width, bottom=0, top=y)

    return p

The main magic of this function comes from the groups list: since we want to count things over a time range, we have to extract information from the date field in our table, and be sure to generate unique categories. For example, if we want to count the amount of actions per day, our unique categories will be identified by ['year', 'month', 'day']. If we extract these three numbers from the date field, and group by them, we will get correct and unique counts.

Note that func.extract() is for PostgreSQL, you might want to change this if you use another type of database.

The other non-trivial part is the to_x parameter: we need functions to reconstruct complete dates from the parts of the date we extract, in order to feed correct x values to Bokeh.

Let's see how to use this function to generate our plots.

Amount of actions per day

Let's begin by generating a bar plot showing the number of actions in DB, grouped by day:

figure_counter(
    session,
    Action.id,
    Action.date_created,
    ['year', 'month', 'day'],
    lambda x: datetime.date(int(x.year), int(x.month), int(x.day)),
    'Actions created per day')

Amount of actions per week

If you want to get the same bar plot with weekly actions, group your dates by ['isoyear', 'week']:

figure_counter(
    session,
    Action.id,
    Action.date_created,
    ['isoyear', 'week'],
    lambda x: (datetime.date(int(x.isoyear), 1, 1)
               + datetime.timedelta(days=7*x.week)),
    'Actions created per week'),

Amount of actions per month

And finally, to get the monthly actions:

figure_counter(
    session,
    Action.id,
    Action.date_created,
    ['year', 'month'],
    lambda x: datetime.date(int(x.year), int(x.month), 1),
    'Actions created per month'),

Active users

We have our magic function, why not count different things? Another interesting metric is the number of daily/weekly/monthly active users, i.e. the number of users who did at least one action during the chosen time range. To count this, we won't count the Action.id, but rather distinct(Action.user_id). For the daily active users:

from sqlalchemy import distinct

figure_counter(
    session,
    distinct(Action.user_id),
    Action.date_created,
    ['year', 'month', 'day'],
    lambda x: datetime.date(int(x.year), int(x.month), int(x.day)),
    'Daily amount of active users')

And, as you can guess, you can do the same with weekly and monthly active users.

Now that's enough code for one post, but you can use this to count things differently, for example the number of actions grouped by the day of the week (to see if your users are more active on week-ends for example). You'll have to adapt your x line, but you can get the right numbers by using ['isodow'] as your groups.

One last thing: once you've generated all your Bokeh figures, you can arrange them nicely on an HTML dashboard using bokeh.embed.components. It is well documented on Bokeh's website.

Enjoy!