Monday, October 7, 2013

Tell, don't Ask.

tl;dr

If you rely on asking, you're asking for trouble. Instead:

  1. Tell functions what to do, don't make them ask.
  2. Tell processes what to do, don't make them ask.
  3. Keep all your environment variable queries in one place, apart from the rest of the code.

You probably think this is obvious, but it isn't

The principle of asking versus telling has many faces in programming. Some of the faces are obvious—others are more subtle. This articles moves from the more obvious to the more subtle.

So if you find yourself saying, "Well, duh!" Keep reading.

Telling v. Asking

This Python code illustrates telling:

# teller.py
import sqlite3

def connectToDatabase(filename):
return sqlite3.connect(filename)

The connectToDatabase function accepts an argument for the database connection details. Other code that calls connectionToDatabase tells the function what it wants to do.

This Python code illustrates asking:

# asker.py
import sqlite3

DATABASE_FILENAME='/tmp/database.sqlite'

def connectToDatabase():
return sqlite3.connect(DATABASE_FILENAME)

The connectToDatabase function in the above snippet is not told database connection details. Instead, the function asks for the connection details—in this case, it asks from the global scope (which is a bad place to be in).

Telling > Asking

Telling, as described above, is better than Asking for the following reasons:

  1. The code is more flexible for reuse.

    I can more easily connect to different databases using the teller.py.

  2. The code is easier to test.

    Because teller.py is more flexible for reuse, I can use the code in tests very easily.

  3. It's easier to know how to use the code and harder to use incorrectly.

    It's obvious in teller.py that I must provide a filename to connect to (because of the argument spec of the function). I can't accidentally connect to /tmp/database.sqlite.

    To use asker.py I must know that the function looks at DATABASE_FILENAME either from reading the source code or the docstring of the function (which is absent in this case). This would be much more difficult to do if connectToDatabase called other functions in other files which accessed a global variable.

Abusing Environment Variables

In asker.py the function asks for information from the global scope. The global scope is just one place to get information from. The environment is another. Take a look at this:

# asker-env.py
import sqlite3
import os

def connectToDatabase():
return sqlite3.connect(os.environ['DATABASE_FILENAME'])

Instead of asking for the database filename from the global scope, asker-env.py asks the environment of the process for the database filename. This is more reusable that asker.py but is still not as good as teller.py because:

  • it still suffers from problem #3: that you must rely on the docstring or reading the source code to understand how to use it, and
  • you can only have one connection per process.

A better approach would be to tell the process which database filename to use as in this example:

# teller-cli.py
import sqlite3
import argparse

def connectToDatabase(filename):
return sqlite3.connect(filename)

if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('filename', help="Filename of the SQLite database")
args = parser.parse_args()
connectToDatabase(args.filename)

teller-cli.py is better than asker-env.py because you can ask it (using teller-cli.py --help) what you need to tell it instead of having to read docstrings or source code. Having a --help option which fully describes configuration options and is enforced when running the process is similar to teller.py having an argument spec that is enforced by Python.

But I thought environment variables were good...

(this is a picture of an environment -- much better than the one on your computer)

If you subscribe to The Twelve-Factor App's ideas, you will store all your configuration in environment variables. Or if you use Travis-CI or Heroku you will also have used environment variables to great effect. Environment variables seem like a great way to do configuration.

Environment variables are cross-language and easy to change. They have huge benefits. Environment variables are a great way to do configuration! It would be nice to leverage the great qualities of environment variables along with the great qualities of code that is told.

You can!

Convert Asking to Telling

To write Telling code that also uses environment variables, restrict the environment variable querying to a single, documented place. Consider this:

# env-runner.py
import argparse
import os

# Define ALL the environment variables this process might use.
env_vars = [
('DATABASE_FILENAME', 'Filename of the SQLite database.'),
]

# Read the environment. This function must only be called from within this
# module if you want to prevent writing asking code.
def getArgs(environ, config):
ret = {}
for (env_name, description) in config:
try:
ret[env_name] = environ[env_name]
except KeyError:
print 'Missing env var: %s %s' % (env_name, description)
raise
return ret


def main():
from teller import connectToDatabase
args = getArgs(os.environ, env_vars)
db = connectToDatabase(args['DATABASE_FILENAME'])
# ...

The above snippet has all the benefits of telling code and the flexibility of asking code:

  1. The code is flexible for reuse.
  2. The code is easy to test.
  3. It's easy to know how to use this code and hard to use incorrectly.

Real-world examples

As proof that the concept of Telling instead of Asking is not obvious, here are some real-world examples (both good and bad):

Klein's improvement on Flask

Flask, a micro web framework for Python, loves the global scope. This is the Hello, World! from the front page:

from flask import Flask
app = Flask(__name__)

@app.route("/")
def hello():
return "Hello World!"

if __name__ == "__main__":
app.run()

And this is the recommended way for accessing a database:

from flask import g

app = Flask(__name__)

@app.before_request
def before_request():
g.db = connect_db()

@app.route("/")
def hello():
db = g.db
# ...

You use a package-global named g, which is "magical" and comes with appropriate warnings:

We store our current database connection on the special g object that Flask provides for us. This object stores information for one request only and is available from within each function. Never store such things on other objects because this would not work with threaded environments. That special g object does some magic behind the scenes to ensure it does the right thing. http://flask.pocoo.org/docs/tutorial/dbcon/

Consider the improvement offered by Klein, a similar micro web framework. With Klein you can easily make apps with Non-global state:

from klein import Klein

class MyApp(object):

app = Klein()

def __init__(self, db_connection):
self.db_connection = db_connection

@app.route('/')
def hello(self, request):
db = self.db_connection
# ...

There is no magical, global g here. You can instantiate MyApp with a database connection, or even have three different instances of MyApp with three different database connections all running in the same app.

Klein lets you tell instead of ask.

Ansible

Ansible is a (really good) configuration management tool. It's mostly straightforward, but it's easy to use it in an asking way—which becomes unmaintainable.

For instance, if in one of our tasks we want to download a resource from http://dev.example.com if we're in the development environment or from https://production.example.com if we're in the production environment, Ansible easily lets us do this:

# main.yml
- name: Get the files
command: wget {{ source_server }}/thefile.tgz /tmp/thefile.tgz
creates=/tmp/thefile.tgz

The command asks for source_server. This task likely lives in a role's task file, which could be in roles/mymachine/tasks/main.yml, deep within the directory structure of my configuration. The problem is that I have no way of knowing (short of manually parsing the task file) when writing my inventory file or anything else that uses/includes main.yml, that source_server is a needed variable.

Ansible lets you ask yourself into an unmaintainable hole. To be more maintainable, Ansible should provide a mechanism for specifying the parameters needed by tasks. Perhaps something like:

# main-with-vars.yml
- variables:
- name: source_server
description: URL of the server to download source files from. For
example: http://foo.com
default: http://example.com

- name: Get the files
command: wget {{ source_server }}/thefile.tgz /tmp/thefile.tgz
creates=/tmp/thefile.tgz

Such a file would allow you to produce a list of all the configurable variables for a role/task and then be able to tell instead of ask.

AngularJS

AngularJS does a lot to help you avoid asking through dependency injection.

Twisted's Reactor

Twisted is currently working toward making the reactor not global in an effort to make testing easier and perhaps allow for new features.

Conclusion

In conclusion, read the tl;dr at the top :) Also, do you have some example of telling v. asking? Or counter-arguments? Post a comment.

1 comment:

  1. This seems like a good, specific example of how to keep your code and configuration loosely coupled. By definition, your "ask" examples require you to understand the code much more intricately in order to write your config (making them tightly coupled). Nice writeup.

    ReplyDelete