Generating activity graphs with Bokeh and SQLAlchemy
Posted: February 09, 2019
Categories:
python, sqlalchemy, bokeh
If you keep some data in an SQL database, mapped to SQLAlchemy classes, and representing things your users do, you might be interested in generating graphs of users activity.
To keep everything written in Python, we'll use Bokeh, a plotting library that enables, among other features, the generation of dashboards in HTML.
Data
Let's say we have two tables in our database: users
, representing
people with their characteristics, and actions
, what the users
do. Actions could be anything representing user interactions in your
software: client orders in the context of a store, player movements
for a video game, etc.
In SQLAlchemy, our database looks like this:
Base = declarative_base()
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
email = Column(String, unique=True)
username = Column(String, unique=True)
class Action(Base):
__tablename__ = 'actions'
id = Column(Integer, primary_key=True)
name = Column(String)
date_created = Column(DateTimeUTC, default=datetime.datetime.utcnow)
user_id = Column(Integer, ForeignKey('users.id'))
Now, after our software has been in production for a while, we'd like to know what is going on: for example track the evolution of the amount of daily/weekly/monthly actions, and the evolution of active users.
Bar plots to the rescue!
Amount of actions
Now, Bokeh is a really cool library: feed it your numbers (coming from anywhere), choose your plot type, adjust some options, and choose how to save the results. Boom, you got your HTML dashboard with some fancy Javascript to play with the embedded plots.
We will count things over time ranges, so we'll generate vertical bar
plots, where x
are dates, and y
are whatever we'll count.
We'll count things in our database, so to avoid duplicating SQLAlchemy queries, let's create this function:
from bokeh.plotting import figure
from sqlalchemy import func
def figure_counter(
session,
field_identifier,
field_date,
groups,
to_x,
title):
"""Makes a bar figure with time as x, and something counted as y.
`field_identifier` is what is counted (e.g. Action.id)
`field_date` is how it is counted (eg. Action.date_created)
`groups` is how to group the dates (e.g. ['year', 'month'])
`to_x` is a function that will create a datetime from a row
(e.g. lambda x: datetime.date(int(x.year), (x.month), 1))
`title` is the title of the plot
"""
groups = [func.extract(x, field_date).label(x) for x in groups]
q = (session.query(func.count(field_identifier).label('count'),
*groups)
.group_by(*groups)
.order_by(*groups)
.all())
x = [to_x(el) for el in q]
y = [el.count for el in q]
width = (x[-1] - x[0]).total_seconds() / len(x) * 900
p = figure(plot_height=300,
plot_width=800,
title=title,
x_axis_type='datetime')
p.vbar(x=x, width=width, bottom=0, top=y)
return p
The main magic of this function comes from the groups
list: since we
want to count things over a time range, we have to extract
information from the date field in our table, and be sure to generate
unique categories. For example, if we want to count the amount of
actions per day, our unique categories will be identified by
['year', 'month', 'day']
. If we extract these three numbers from the
date field, and group by them, we will get correct and unique counts.
Note that func.extract()
is for PostgreSQL, you might want to change
this if you use another type of database.
The other non-trivial part is the to_x
parameter: we need functions
to reconstruct complete dates from the parts of the date we extract,
in order to feed correct x
values to Bokeh.
Let's see how to use this function to generate our plots.
Amount of actions per day
Let's begin by generating a bar plot showing the number of actions in DB, grouped by day:
figure_counter(
session,
Action.id,
Action.date_created,
['year', 'month', 'day'],
lambda x: datetime.date(int(x.year), int(x.month), int(x.day)),
'Actions created per day')
Amount of actions per week
If you want to get the same bar plot with weekly actions, group your
dates by ['isoyear', 'week']
:
figure_counter(
session,
Action.id,
Action.date_created,
['isoyear', 'week'],
lambda x: (datetime.date(int(x.isoyear), 1, 1)
+ datetime.timedelta(days=7*x.week)),
'Actions created per week'),
Amount of actions per month
And finally, to get the monthly actions:
figure_counter(
session,
Action.id,
Action.date_created,
['year', 'month'],
lambda x: datetime.date(int(x.year), int(x.month), 1),
'Actions created per month'),
Active users
We have our magic function, why not count different things? Another
interesting metric is the number of daily/weekly/monthly active users,
i.e. the number of users who did at least one action during the chosen
time range. To count this, we won't count the Action.id
, but rather
distinct(Action.user_id)
. For the daily active users:
from sqlalchemy import distinct
figure_counter(
session,
distinct(Action.user_id),
Action.date_created,
['year', 'month', 'day'],
lambda x: datetime.date(int(x.year), int(x.month), int(x.day)),
'Daily amount of active users')
And, as you can guess, you can do the same with weekly and monthly active users.
Now that's enough code for one post, but you can use this to count
things differently, for example the number of actions grouped by the
day of the week (to see if your users are more active on week-ends
for example). You'll have to adapt your x
line, but you can get the
right numbers by using ['isodow']
as your groups
.
One last thing: once you've generated all your Bokeh figures, you can
arrange them nicely on an HTML dashboard using
bokeh.embed.components
. It is well documented on Bokeh's website.
Enjoy!