Before starting this guide, ensure the following are installed/configured properly:

  • Git
  • MySQL 5.7
  • Python 3.x

Setting things up

First thing’s first, clone the Iris repo:

git clone https://github.com/linkedin/iris.git
Let’s take a quick look through the source code.
  • ops - examples of deployment options with Docker, Kubernetes, etc
  • db - schema and dummy data for testing
  • test - end-to-end tests and unit tests
  • src/iris - the main source code directory
    • bin - various daemons. The biggest one to note here is sender.py, which handles Iris’s message sending
    • api.py - Main REST API code
    • ui - Frontend JS, CSS, and static assets. Also contains authentication modules for the FE, and bundling setup.
    • vendors - Pluggable modules handling message sending for various contact modes. Leveraged by the Iris sender.

Set up a virtual env, install dependencies, and set up development environment.

virtualenv venv
source venv/bin/activate
sudo apt-get install libssl-dev libxml2-dev libxslt1-dev libsasl2-dev python-dev libldap2-dev
pip install -e '.[dev,kazoo]'

Configuring the database

Out of the box, Iris expects the database to be set up with username root and no password. This isn’t very secure, so let’s create an iris user. Log in as root, and add the new user. We’ll also need to turn off the ONLY_FULL_GROUP_BY MySQL mode.

CREATE USER 'iris'@'localhost' IDENTIFIED BY 'iris';
GRANT ALL ON iris.* TO iris@localhost;

Now let’s load the schema and some dummy data. Navigate to project root, then run:

mysql -u iris -p < ./db/schema_0.sql
mysql -u iris -p -o iris < ./db/dummy_data.sql
dummy_data.sql contains the following entities:
  • user demo with password demo
  • team demo_team
  • application Autoalerts with key: a7a9d7657ac8837cd7dfed0b93f4b8b864007724d7fa21422c24f4ff0adb2e49

Confirm that this is successful:

| Tables_in_iris              |
| application                 |
| application_mode            |
| application_owner           |
| application_quota           |
| application_stats           |
| default_application_mode    |
| device                      |
| dynamic_plan_map            |
| generic_message_sent_status |
| global_stats                |
| incident                    |
| incident_emails             |
| mailing_list                |
| mailing_list_membership     |
| message                     |
| message_changelog           |
| mode                        |
| plan                        |
| plan_active                 |
| plan_notification           |
| priority                    |
| response                    |
| target                      |
| target_application_mode     |
| target_contact              |
| target_mode                 |
| target_reprioritization     |
| target_role                 |
| target_type                 |
| team                        |
| template                    |
| template_active             |
| template_content            |
| template_variable           |
| twilio_delivery_status      |
| twilio_retry                |
| user                        |
| user_setting                |
| user_team                   |
39 rows in set (0.00 sec)

mysql> SELECT * FROM target;
| id | name      | type_id | active |
|  1 | demo      |       1 |      1 |
|  2 | abc       |       1 |      1 |
|  3 | foo       |       1 |      1 |
|  4 | demo_team |       2 |      1 |
|  5 | foo_team  |       2 |      1 |
|  6 | abc       |       3 |      1 |
|  7 | demo      |       3 |      1 |
7 rows in set (0.00 sec)

We have an Iris database now! Update the configs so Iris can read the DB. These live in configs/config.dev.yaml. Change this

db: &db
      scheme: mysql+pymysql
      user: root
      password: ""
      database: iris
      charset: utf8
    str: "%(scheme)s://%(user)s:%(password)s@%(host)s/%(database)s?charset=%(charset)s"

To this, updating user/password accordingly:

db: &db
      scheme: mysql+pymysql
      user: iris
      password: iris
      database: iris
      charset: utf8
    str: "%(scheme)s://%(user)s:%(password)s@%(host)s/%(database)s?charset=%(charset)s"

Almost ready! Use make in project root directory to run Iris API, then verify with curl:

$ curl localhost:16649/v0/targets

Creating an escalation plan

Navigate to the “Plans” page using the top bar. Click “Create New Plan” on the top right to make a new escalation plan. create-plan

Let’s focus on the main content here. There’s additional settings for Aggregation, Incident tracking etc, but we can take a look at those later. Our plan has a name, description, and list of steps. Each step has a number of notifications. Iris will send messages according to this escalation plan until someone acknowledges (claims) the incident. After claiming, escalation will stop, and no more messages will be sent.

Defining terms:

“Role” and “Target” determine the recipient of the message. Here, our role is “User”, and our target is “Demo”, meaning that we’re sending a message to the demo user. Other roles allow for dynamic definitions in escalation plans; for example, the “Oncall-Primary” role allows Iris to look up the on-call engineer for a target team when escalating. For now, we’ll just focus on user, which is the simplest role.

“Priority” determines the importance of the message. Users can map different priorities to different contact modes. For example, user A might decide to map “medium” messages to SMS, while user B might map medium to emails instead. By default, this mapping is: Low: email Medium: email High: sms Urgent: call

“Wait” determines how many minutes to wait after sending a message before proceeding to the next one. “Count” determines how many times the message will be repeated before moving on.

“Template” defines the template used to generate message content. Iris allows you to customize message content using this template definition. To generate the message, Iris will render the template with the incident context and pass along the resulting content.

Finally, “Optional” allows messages to fail without raising an alert. Otherwise, if Optional is false, a message that cannot be delivered will result in an alerting message sent to the plan creator. Setting this to true can be useful for (role, target) combinations that may not always be present (e.g. oncall-secondary).

Back to the plan

In step 1, we send a medium priority message to the demo user two times, waiting five minutes after sending each message before sending the next. After that, we send two urgent messages, waiting 10 minutes between. Let’s publish the plan with the button on the top right. Back in the Iris UI, give your new plan a test by clicking the button on the top right. Use the Autoalerts application. test-incident

We’ve made an incident! But, if we navigate to the “Incidents” page, we’ll see that there’s no currently active incidents. This is because our incident hasn’t yet sent a message. For messages to be sent, we’ll need to get the Iris sender daemon running.

Iris sender

In the root directory of the project, activate the virtualenv and run make sender (Keep Iris API running as well). Iris sender is responsible for message sending. It runs on a 60 second loop, checking the DB for new incidents and creating messages for those incidents if necessary. A lot of Iris’s complexity lives here, so we’ll be coming back to the sender shortly. For now, let’s set it running and test out our new escalation plan. The logs should have a snippet similar to below:

2018-11-06 01:07:17,963 INFO root [*] 1 new incidents
2018-11-06 01:07:17,975 INFO root [*] 1 new messages
2018-11-06 01:07:17,975 INFO root [*] escalate task finished
2018-11-06 01:07:17,975 INFO root [-] start deactivate task...
2018-11-06 01:07:17,979 INFO root [*] deactivate task finished
2018-11-06 01:07:17,979 INFO root [-] start send task...
2018-11-06 01:07:17,981 INFO root 1 new messages waiting in database - queued: 0
2018-11-06 01:07:17,981 INFO root [*] send task finished
2018-11-06 01:07:17,982 INFO root [-] start aggregate task - queued: 0
2018-11-06 01:07:17,985 INFO root Sending message (ID 4) locally
2018-11-06 01:07:17,986 INFO iris.vendors.iris_dummy SEND: call Autoalerts +1 223-456-7890

Here, we see that Iris sender has sent a dummy call to +1 233-456-7890 (the phone number configured for the demo user). Now, if we navigate to the incidents list, we’ll see that our incident has shown up. We can now claim the incident from the UI to stop further escalation.

Sending a real message If we want to send a real message, we’ll first need to configure Iris sender to use a different message vendor. Currently, the config is set up to use the iris_dummy vendor, which logs messages without actually sending anything. Instead of that, let’s set up something using Twilio, which provides call and SMS services. You can set up a free Twilio account pretty easily at this link. After that, set up a new project with programmable SMS and voice. We’ll need a few things from Twilio to get started: a Twilio phone number, an Account SID, and an Auth token. In addition, we’ll need to set up a TwiML Bin to generate the content of our messages without needing to spin up a new webserver. Some docs for that can be found here. We’ll want the content of our bin to be:

<?xml version="1.0" encoding="UTF-8"?>
  <Say language="en-US" voice="alice">

Head back to configs/config.dev.yaml. On line 139, we see a skeleton configuration for the Twilio message vendor:

#- type: iris_twilio
#  name: twilio_1
#  account_sid: ''
#  auth_token: ''
#  twilio_number: ''
#  relay_base_url: ''

Change this to match below, filling in the account_sid, auth_token, and twilio_number fields appropriately. We’ll also need to fill in the relay_base_url with the url of our TwiML bin, and configure the say/gather endpoints to the empty string. This way, Twilio will hit the TwiML bin URL without any trailing endpoint, which is needed for this to work properly. When you’re done, the config should look like this:

- type: iris_twilio
  name: twilio_1
  account_sid: 'AC123…'
  auth_token: 'abc…'
  twilio_number: '+11234567890'
  relay_base_url: 'https://handler.twilio.com/abc'
  say_endpoint: ''
  gather_endpoint: ''

We’ll need to remove the [] from the vendors: [] line on 118, and set debug: False in the sender settings on line 59. Re-run make/make sender to pick up the new configurations. Finally, change the contact info for the demo user to match the number you verified with Twilio:

UPDATE target_contact
SET destination = '+1 123-456-7890'
WHERE target_id = (SELECT id FROM target WHERE name = 'demo' AND type_id = 1)
AND mode_id IN (SELECT id FROM mode WHERE name='sms' OR name='call');

Now, we can use the iris_twilio vendor for sending messages. Let’s trigger another incident, using the “Test Plan” button for “Example plan” in the same way we did before. Make sure Iris sender is running, and wait for Iris to call. After a minute or so, you should receive a phone call from your Twilio number saying “test_call”.

Customizing call content

Now that we’ve set up phone calls via Twilio, let’s customize the message content to say something a little more useful than “test_call”. To do that, we’ll need to modify the template that our escalation plan is using. If we examine the notifications in “Example plan”, we see that hey all use the “test_template” message template. Let’s make some changes to that. Navigate to the Templates page, and click on test_template in the list on that page. template Here, we see that test_template defines message content for a number of different contact modes. For now, let’s focus on call. Note that this template defines content for a specific application, Autoalerts. Autoalerts defines a number of variables, which are shown on the page as well. Our template can inject the value of any of these variables into the message content using Jinja template syntax. As an example of this, let’s change the content of the call message to:

Hello world! {{ notes }}

Clone the template, make the change, then publish the new template. This will deactivate the previous version of the template. The “Example plan” escalation plan will use the new version of “test_template” when it creates a message from now on, ignoring the previous version. Let’s check it out by testing “Example plan” again. We should get a call now saying “Hello world! This is a note.”

A deeper look at applications

In the previous section, we discussed that test_template defined message content on a per-application basis. Let’s dive a little deeper into the concept of an Iris application. Each Iris app represents an integration with Iris. To explain this a more clearly, let’s look at Autoalerts as an example.

Head to the “Applications” page, and click on Autoalerts in the list that shows up. We should get something that looks like this: application

At the top, we see that the application defines a number of variables. When this application creates an Iris incident, it provides a context JSON object that defines these variables. As we saw earlier, this incident context is rendered via the escalation plan’s template to create the message content. When we trigger an incident with the “Test Plan” button, we use the sample context defined on this page as the incident’s context. For example, in the previous section, we sent a phone call using the {{notes}} variable, which is defined as “This is a note” in the Autoalerts sample context.

Applications also define a context template, which defines the content of the Incident details page in the Iris UI. We can edit this template here and see the result when we navigate back to the incident detail page. For example, try adding a <h2> Hello World! </h2> below the {{#context}} line. When we take a look at any of our previous incidents, we get: hello-context

Finally, the summary template defines the content on the incident list page. Since each incident has a different context, it’s up to the application to define how an incident should be summarized in the incident list. The resulting content of the Autoalerts summary template is shown below. summary-list

Triggering incidents programmatically

Iris applications provide an API key that can be used to trigger incidents programmatically via the Iris REST API. Let’s take a look at how this works by writing a quick script to trigger incidents. Using the Iris python client, this is pretty easy:

import irisclient

iris = irisclient.IrisClient('Autoalerts', 'a7a9d7657ac8837cd7dfed0b93f4b8b864007724d7fa21422c24f4ff0adb2e49', 'https://localhost:16649')
iris.incident('Example plan', {'notes': 'This is a programmatic incident!'})

Remember to make sure the sender is running, and wait for the new phone call. This one should now say “This is a programmatic incident”. If we define our message templates properly, our outbound messages can now include useful information about each incident.

Adding real users

Previously, we routed all of the messages going to the demo user to our own phone numbers through a big SQL query run on the database. This is obviously non-ideal. There are a number of different ways that we can more easily import or create users. The easiest way to do this involves another open-source project: iris-admin.

git clone https://github.com/jrgp/iris-admin.git
virtualenv env
. env/bin/activate
python setup.py develop
. env/bin/activate

Then, modify the DB config in configs/config.dev.yaml to use the iris user/password we created earlier. We can then access the Iris admin UI from http://localhost:16651. This gives us a nifty UI that we can use to add, create, delete, and modify users.

Adding authentication

At LinkedIn, we use LDAP for authentication. We’ve provided an authentication module that leverages this, with example configuration values in configs/config.dev.yaml. In addition, in src/iris/ui/auth, we define several sample Authenticator classes, which define a constructor and the authenticate(user, password) method. Using this interface, custom authentication can be implemented with a variety of different providers.

Wrapping up

After all these steps, we’ve now got a setup of Iris that is development-ready. The examples in the ops directory should be a good starting point for building more production-ready setups, using nginx and uwsgi rather than gunicorn for better performance and allowing for easier deployment. More questions? Reach out to iris-oncall@linkedin.com or open an issue on Github.