Monday, December 16, 2013

YNO-phasic Sleep

Recently, we've seen an increased interest in polyphasic sleep. Proponents claim to be able to extract extra hours out of the day, as well as other benefits. However, not as much interest has been paid to a more ancient and natural sleep pattern: Young Needy Offspring (YNO) phasic sleep.

Tools


Intelclinic)
Unlike polyphasic sleep, which may require additional equipment to be effective, YNO-phasic sleep simply requires one or more offspring. Typically more, younger offspring produce more dramatic results.

24-hour cycle

The charts below show sample 24-hour cycles for monophasic, polyphasic and YNO-phasic sleep:

Monophasic1
8 hours of sleep followed by 16 hours of wakefulness
Polyphasic (Biphasic TED)
4 hours of sleep, 2 hours of wakefulness, 4 hours of sleep, 14 hours of wakefulness
YNO-phasic
~3 hours of sleep followed by random intervals of wakefulness and sleep-like wakefulness. At 6:15am, the offspring will awake for the day, fully rested and ready to play.

Sleep transitions

Monophasic

A preferred sleep cycle involves gentle transitions from sleep to wakefulness. The following chart shows a desirable monophasic sleep cycle.

Note the gentle transitions from wakefulness down to sleeping and back up to wakefulness.

YNO-phasic

YNO-phasic sleep cycles are characterized by abrupt changes from sleeping to wakefulness. The return transition from wakefulness to sleeping does not happen again until the next evening.

Abrupt changes to wakefulness are typically caused by offspring producing highly audible noises, foul odors and surprisingly strong slaps to the face 2. Not infrequently, changes to wakefulness also happen due to the discomfort caused by a spouse's hands lightly strangling the sleeper while moaning, "I can't take it anymore!" through clenched teeth.

Direct benefits

Critics of YNO-phasic sleep3 claim that such sleep has no benefits, but it is obvious from research that this is unfounded. YNO-phasic sleep provides several benefits:

  • Longer days

    As shown above, a person following YNO-phasic sleep will have more wakeful hours during the day, sometimes eliminating sleep altogether for several days (especially when the initial transition is made from monophasic to YNO-phasic sleep). The implications for productivity are obvious.

  • Reading time

    Many people wish they had more time to read. YNO-phasic sleep offers ample opportunity for reading. Just last night, at 3am, I was able to spend 30 minutes repeatedly reading Goodnight Moon. The kittens end up on the chair, and the mouse ends up looking out the window.

  • Improved Resistance

    Those following YNO-phasic sleep build up resistence to some forms of torture, and may earn credits toward bypassing portions of Navy SEAL training or Army Ranger Training.

Indirect benefits

In addition to direct benefits of YNO-phasic sleep, people report many indirect benefits, such as:

  • Increased chance of progeny

    It has been proven that those who have offspring have a greater chance of having descendents than those who don't.

  • Laughter

    The first time the offspring emits laughter4 is very nice. Subsequent times are also nice.

  • Amazement

    After some time of enduring following YNO-phasic sleep, adherents frequently report astonishment and surprise at what their offspring can do (e.g. walking, talking, teasing, telling jokes, performing, etc...)

Notes

This article has been peer reviewed (I had a peer review it).

1 Studies have shown that adherents to YNO-phasic sleep who are shown monophasic sleep diagrams exhibit increased levels of sarcasm and violence.

2 Because of the abrupt and frequent nature of changes to wakefulness, adherents to YNO-phasic sleep have been known to vocalize the name during the night as "Why?! Nooooo!"

3 Interestingly, the most ardent critics of YNO-phasic sleep are often the strictest adherents. The reason for this overlap has yet to be researched.

4 also, giggles, belly laughs, smiles, funny faces.

Monday, October 7, 2013

Tell, don't Ask.

tl;dr

If you rely on asking, you're asking for trouble. Instead:

  1. Tell functions what to do, don't make them ask.
  2. Tell processes what to do, don't make them ask.
  3. Keep all your environment variable queries in one place, apart from the rest of the code.

You probably think this is obvious, but it isn't

The principle of asking versus telling has many faces in programming. Some of the faces are obvious—others are more subtle. This articles moves from the more obvious to the more subtle.

So if you find yourself saying, "Well, duh!" Keep reading.

Telling v. Asking

This Python code illustrates telling:

# teller.py
import sqlite3

def connectToDatabase(filename):
return sqlite3.connect(filename)

The connectToDatabase function accepts an argument for the database connection details. Other code that calls connectionToDatabase tells the function what it wants to do.

This Python code illustrates asking:

# asker.py
import sqlite3

DATABASE_FILENAME='/tmp/database.sqlite'

def connectToDatabase():
return sqlite3.connect(DATABASE_FILENAME)

The connectToDatabase function in the above snippet is not told database connection details. Instead, the function asks for the connection details—in this case, it asks from the global scope (which is a bad place to be in).

Telling > Asking

Telling, as described above, is better than Asking for the following reasons:

  1. The code is more flexible for reuse.

    I can more easily connect to different databases using the teller.py.

  2. The code is easier to test.

    Because teller.py is more flexible for reuse, I can use the code in tests very easily.

  3. It's easier to know how to use the code and harder to use incorrectly.

    It's obvious in teller.py that I must provide a filename to connect to (because of the argument spec of the function). I can't accidentally connect to /tmp/database.sqlite.

    To use asker.py I must know that the function looks at DATABASE_FILENAME either from reading the source code or the docstring of the function (which is absent in this case). This would be much more difficult to do if connectToDatabase called other functions in other files which accessed a global variable.

Abusing Environment Variables

In asker.py the function asks for information from the global scope. The global scope is just one place to get information from. The environment is another. Take a look at this:

# asker-env.py
import sqlite3
import os

def connectToDatabase():
return sqlite3.connect(os.environ['DATABASE_FILENAME'])

Instead of asking for the database filename from the global scope, asker-env.py asks the environment of the process for the database filename. This is more reusable that asker.py but is still not as good as teller.py because:

  • it still suffers from problem #3: that you must rely on the docstring or reading the source code to understand how to use it, and
  • you can only have one connection per process.

A better approach would be to tell the process which database filename to use as in this example:

# teller-cli.py
import sqlite3
import argparse

def connectToDatabase(filename):
return sqlite3.connect(filename)

if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('filename', help="Filename of the SQLite database")
args = parser.parse_args()
connectToDatabase(args.filename)

teller-cli.py is better than asker-env.py because you can ask it (using teller-cli.py --help) what you need to tell it instead of having to read docstrings or source code. Having a --help option which fully describes configuration options and is enforced when running the process is similar to teller.py having an argument spec that is enforced by Python.

But I thought environment variables were good...

(this is a picture of an environment -- much better than the one on your computer)

If you subscribe to The Twelve-Factor App's ideas, you will store all your configuration in environment variables. Or if you use Travis-CI or Heroku you will also have used environment variables to great effect. Environment variables seem like a great way to do configuration.

Environment variables are cross-language and easy to change. They have huge benefits. Environment variables are a great way to do configuration! It would be nice to leverage the great qualities of environment variables along with the great qualities of code that is told.

You can!

Convert Asking to Telling

To write Telling code that also uses environment variables, restrict the environment variable querying to a single, documented place. Consider this:

# env-runner.py
import argparse
import os

# Define ALL the environment variables this process might use.
env_vars = [
('DATABASE_FILENAME', 'Filename of the SQLite database.'),
]

# Read the environment. This function must only be called from within this
# module if you want to prevent writing asking code.
def getArgs(environ, config):
ret = {}
for (env_name, description) in config:
try:
ret[env_name] = environ[env_name]
except KeyError:
print 'Missing env var: %s %s' % (env_name, description)
raise
return ret


def main():
from teller import connectToDatabase
args = getArgs(os.environ, env_vars)
db = connectToDatabase(args['DATABASE_FILENAME'])
# ...

The above snippet has all the benefits of telling code and the flexibility of asking code:

  1. The code is flexible for reuse.
  2. The code is easy to test.
  3. It's easy to know how to use this code and hard to use incorrectly.

Real-world examples

As proof that the concept of Telling instead of Asking is not obvious, here are some real-world examples (both good and bad):

Klein's improvement on Flask

Flask, a micro web framework for Python, loves the global scope. This is the Hello, World! from the front page:

from flask import Flask
app = Flask(__name__)

@app.route("/")
def hello():
return "Hello World!"

if __name__ == "__main__":
app.run()

And this is the recommended way for accessing a database:

from flask import g

app = Flask(__name__)

@app.before_request
def before_request():
g.db = connect_db()

@app.route("/")
def hello():
db = g.db
# ...

You use a package-global named g, which is "magical" and comes with appropriate warnings:

We store our current database connection on the special g object that Flask provides for us. This object stores information for one request only and is available from within each function. Never store such things on other objects because this would not work with threaded environments. That special g object does some magic behind the scenes to ensure it does the right thing. http://flask.pocoo.org/docs/tutorial/dbcon/

Consider the improvement offered by Klein, a similar micro web framework. With Klein you can easily make apps with Non-global state:

from klein import Klein

class MyApp(object):

app = Klein()

def __init__(self, db_connection):
self.db_connection = db_connection

@app.route('/')
def hello(self, request):
db = self.db_connection
# ...

There is no magical, global g here. You can instantiate MyApp with a database connection, or even have three different instances of MyApp with three different database connections all running in the same app.

Klein lets you tell instead of ask.

Ansible

Ansible is a (really good) configuration management tool. It's mostly straightforward, but it's easy to use it in an asking way—which becomes unmaintainable.

For instance, if in one of our tasks we want to download a resource from http://dev.example.com if we're in the development environment or from https://production.example.com if we're in the production environment, Ansible easily lets us do this:

# main.yml
- name: Get the files
command: wget {{ source_server }}/thefile.tgz /tmp/thefile.tgz
creates=/tmp/thefile.tgz

The command asks for source_server. This task likely lives in a role's task file, which could be in roles/mymachine/tasks/main.yml, deep within the directory structure of my configuration. The problem is that I have no way of knowing (short of manually parsing the task file) when writing my inventory file or anything else that uses/includes main.yml, that source_server is a needed variable.

Ansible lets you ask yourself into an unmaintainable hole. To be more maintainable, Ansible should provide a mechanism for specifying the parameters needed by tasks. Perhaps something like:

# main-with-vars.yml
- variables:
- name: source_server
description: URL of the server to download source files from. For
example: http://foo.com
default: http://example.com

- name: Get the files
command: wget {{ source_server }}/thefile.tgz /tmp/thefile.tgz
creates=/tmp/thefile.tgz

Such a file would allow you to produce a list of all the configurable variables for a role/task and then be able to tell instead of ask.

AngularJS

AngularJS does a lot to help you avoid asking through dependency injection.

Twisted's Reactor

Twisted is currently working toward making the reactor not global in an effort to make testing easier and perhaps allow for new features.

Conclusion

In conclusion, read the tl;dr at the top :) Also, do you have some example of telling v. asking? Or counter-arguments? Post a comment.

Tuesday, September 17, 2013

When GitHub is down, BitTorrent Sync saves the day!

GitHub was down momentarily which wouldn't normally be a problem (I'd wait a few minutes until it came back up).  But it happened to be down right when I was leaving for work (and wanted what I was working on at work).


BitTorrent Sync to the rescue!  I have a directory full of bare repositories that I share with BitTorrent Sync.  I just push there, then, by the time I'm back at work, the code has been copied to my computer here.

Monday, September 16, 2013

To the FairTax Proponents

The idea of FairTax excites me.  I value simplicity and the FairTax seems simple.

Let me say again (because the rest of this post will probably make you think I don't support this legislation): I support the principles that FairTax seems to be built on.  The principles seem to be comprehensibility, simplicity, fairness and a lack of loopholes.

But you, legislators trying to pass this bill, do not currently have my support.  Mostly, because of problems with FairTax.org -- problems that have little to do with the content of the bill and everything to do with bad delivery and marketing.

1. Where are the Cons?

FairTax.org leads me to conclude that the FairTax is practically perfect in every way.  But, from experience, I know that there's no way FairTax can be perfect.  There are downsides to this (and every other) legislation, and it is dishonest to pretend that there are not.

I would like to see a set of pros and cons, if for no other reason than to show that the legislators are actually thinking this all the way through.  Perhaps there should be more than one set of pros and cons.  It would be informative to see pros and cons for the wealthy, the poor, corporations, religious institutions, government agencies, brick and mortar stores, online stores, insurance companies, hospitals, etc...

If you can't acknowledge faults, then I can't accept that you've thought this all the way through and I will not support you.

2. The FAQ is not helpful

I tend to agree with this article's estimation of the FAQ as a format.  And after using the FairTax.org FAQ, I agree even more for two reasons:

  1. That I have to click every single question before the answer shows is ridiculous.  And I can't have more than one answer open at a time.  Just give me the info!  I can scroll past the information that isn't relevant.
  2. I read about how other governments (e.g. State of Florida) have implemented similar tax structures.  About an hour later, I wanted to show my wife and I couldn't find the same question again.  I found myself trying iterating through possible ways the question could be phrased.  If all the text was on the page, I would have just searched the page for "Florida."
Please take some time to reorganize the FAQ into a navigable document and stop hiding so much.  It feels like you're trying to hide stuff.


3. Link to the text

All of the text and videos on FairTax.org are nothing compared to the actual text of the legislation.  You can say all you want, however you want, but it's not FairTax.org that's going to be put into law, it's H.R. 25.  Please provide a link to the text of the bill.  Then, rather than telling my what you think the bill says, quote the bill directly.  A side benefit to quoting the actual text is that it might motivate the authors of the text to make it more intelligible to those who aren't politicians.

4. Lead with 30%

The most prominent number on FairTax.org is 23%.  And the FairTax is a sales tax, not an income tax.  So, in the FAQ when people see the question "What will be the rate of the sales tax be at the retail counter?" I will bet that 90% of them will react the same way I did.  "It's 23%, right?"

*click*

"30 percent?!  They lied!"

The text goes on to accurately explain how 23% refers to sales tax based on a tax-inclusive income so that we're comparing apples with blah blah blah.  The math is accurate.  I don't dispute that.  But the feeling of distrust I experienced was very real.  It is misleading to tout the FairTax as a 23% tax that's a sales tax, when it's actually a 30% sales tax.

The feeling you get when you've been led along with 23% only to find see, intentionally hidden in the FAQ is that you've been had and that the people authoring this bill are deceptive.

Instead of leading with 23% and trying to explain away 30%, lead with the more obvious 30% and explain how it's actually 23%.  If you can't lead with 30%, this bill won't pass.  Be more forthright.

5. Corporations are fictions?

Under the question "Will corporations get a windfall with the abolition of the corporate tax?" (which you would think is a yes or no question [until you read the answer]) it begins "Corporations are legal fictions..."

In this post I'm offering a critique of the delivery of the message (not the content of the bill).  I have my reservations about the exceptional status offered to corporations by this bill, but what I take issue with here is the silly notion of calling corporations fictions.

I can see what they're trying to say: that people bear the burdens of taxes on business.  But corporations are not fictions.  I work for a corporation.  It's no fiction.  The corporation buys real goods, feeds us real food sometimes, spends real money in the community, donates real money to campaigns.  "Fiction" is too strong a word.  Please rephrase the answer to this question (including perhaps a Yes or No at the beginning).

Thursday, August 22, 2013

Angular AJAX Upload

Though the Internet would have you believe otherwise, uploading a file asynchronously from AngularJS isn't that hard. I don't want fancy colors or previews or progress bars or any of that. I want to upload a file from my AngularJS-backed webapp without reloading the page. Also, I don't care about old browsers. If you do, then this might not work for you.

After struggling with blueimp's library for way too long, I decided to just implement the part I needed.

Uploading a file using AJAX + AngularJS requires three things:

  1. AJAX
  2. AngularJS
  3. AJAX + AngularJS

1. AJAX

function upload(url, file) {
var formdata = new FormData(),
xhr = new XMLHttpRequest();

formdata.append('myfile', file);

xhr.onreadystatechange = function(r) {
if (4 === this.readyState) {
if (xhr.status == 200) {
// success
} else {
// failure
}
}
}
xhr.open("POST", url, true);
xhr.send(formdata);
}

The file will be posted to the server as the parameter named myfile.

2. AngularJS

app.directive('fileChange', function() {
return {
restrict: 'A',
link: function(scope, element, attrs) {
element.bind('change', function() {
scope.$apply(function() {
scope[attrs['fileChange']](element[0].files);
})
})
},
}
})

If you use the above directive like this:

<input type="file" file-change="runSomething">

when the user chooses a file to upload, runSomething will be called with a FileList. You can pass the first element in that list as the second arg to the upload function above.

3. AJAX + AngularJS

I can't provide a complete demo (because this blog isn't backed by a server I control). But this will probably get you really close:

<!DOCTYPE html>
<html lang="en">
<body ng-app="myapp" ng-controller="UploadCtrl">
<input type="file" file-change="upload">

<script src="https://ajax.googleapis.com/ajax/libs/angularjs/1.0.7/angular.min.js"></script>
<script>
// the javascript
var app = angular.module('myapp', []);

//
// Reusable Uploader service.
//
app.factory('Uploader', function($q, $rootScope) {
this.upload = function(url, file) {
var deferred = $q.defer(),
formdata = new FormData(),
xhr = new XMLHttpRequest();

formdata.append('file', file);

xhr.onreadystatechange = function(r) {
if (4 === this.readyState) {
if (xhr.status == 200) {
$rootScope.$apply(function() {
deferred.resolve(xhr);
});
} else {
$rootScope.$apply(function() {
deferred.reject(xhr);
});
}
}
}
xhr.open("POST", url, true);
xhr.send(formdata);
return deferred.promise;
};
return this;
})


//
// fileChange directive because ng-change doesn't work for file inputs.
//
app.directive('fileChange', function() {
return {
restrict: 'A',
link: function(scope, element, attrs) {
element.bind('change', function() {
scope.$apply(function() {
scope[attrs['fileChange']](element[0].files);
})
})
},
}
})

//
// Example controller
//
app.controller('UploadCtrl', function($scope, $http, Uploader) {
$scope.upload = function(files) {
var r = Uploader.upload('/uploads', files[0]);
r.then(
function(r) {
// success
},
function(r) {
// failure
});
}
});
</script>
</body>
</html>

More

You can do more things like handle multiple files, monitor progress, preview images, etc... But if you don't need all that, and you are using modern browsers, this should do just fine.

Thursday, August 15, 2013

Practical event-driven programming with Python and Twisted

Introduction

A article from 2008 entitled Practical threaded programming with Python was posted to HN today. And I thought, "how would those examples look with Twisted?"

For a great explanation about how Twisted does concurrency, see krondo's Twisted Introduction. On to the code:

Hello World

The first example in the article demonstrates that threads have IDs. Since we're not using threads, the most equiavelent way to do the same thing with Twisted is to not use Twisted at all:

import datetime


def run(what):
now = datetime.datetime.now()
print '%s says Hello World at time: %s' % (what, now)


for i in range(2):
run(i)

Output:

0 says Hello World at time: 2013-08-15 13:45:17.164933
1 says Hello World at time: 2013-08-15 13:45:17.165442

Using queues

The next example shows first a serial approach and then a threaded approach to "grab a URL of a website, and print out the first 1024 bytes of the page." Here are the synchronous/serial and threaded versions.

I should note that I've modified them to get all the page (instead of the first 1024 bytes) and to print a hash of the content (so as not to clutter up this post). It's interesting that only apple.com and ibm.com return the same hash every time.

Synchronous version

import urllib2
import time
import hashlib

hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
"http://ibm.com", "http://apple.com"]

start = time.time()
#grabs urls of hosts and prints first 1024 bytes of page
for host in hosts:
url = urllib2.urlopen(host)
print hashlib.sha1(url.read()).hexdigest(), host

print "Elapsed Time: %s" % (time.time() - start)

Output:

2430771cc3723e965b64eda2d69dd22b697dd4a0 http://yahoo.com
790ace256c1b683a585226d286859f9f2910d9b0 http://google.com
63fbbe761817ebef066f9562e96209ca25a6f0b3 http://amazon.com
dd2f34c7c4f47b49272d7922e4f17f7c1cafd3aa http://ibm.com
562ffc06504dc0557386524b382372448d6e953a http://apple.com
Elapsed Time: 3.34798121452

Threaded version

#!/usr/bin/env python
import Queue
import threading
import urllib2
import time
import hashlib

hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
"http://ibm.com", "http://apple.com"]

queue = Queue.Queue()

class ThreadUrl(threading.Thread):
"""Threaded Url Grab"""
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue

def run(self):
while True:
#grabs host from queue
host = self.queue.get()

#grabs urls of hosts and prints first 1024 bytes of page
url = urllib2.urlopen(host)
print hashlib.sha1(url.read()).hexdigest(), host

#signals to queue job is done
self.queue.task_done()


start = time.time()
def main():
#spawn a pool of threads, and pass them queue instance
for i in range(5):
t = ThreadUrl(queue)
t.setDaemon(True)
t.start()

#populate queue with data
for host in hosts:
queue.put(host)

#wait on the queue until everything has been processed
queue.join()

main()
print "Elapsed Time: %s" % (time.time() - start)

Output:

562ffc06504dc0557386524b382372448d6e953a http://apple.com
fb6fe32cb270f7929157bec5f29ee44f729949fd http://google.com
dd2f34c7c4f47b49272d7922e4f17f7c1cafd3aa http://ibm.com
3643a39f4dd641a3c08f8e5c409d0f5bc6407aed http://amazon.com
3072477b1680fc2650d9cb0674e5ef7972873bf6 http://yahoo.com
Elapsed Time: 1.23798894882

Twisted version

Here's one way to do the same thing with Twisted:

from twisted.internet import defer, task
from twisted.web.client import getPage
import time
import hashlib

hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
"http://ibm.com", "http://apple.com"]

start = time.time()

def printHash(content, host):
print hashlib.sha1(content).hexdigest(), host


def main(reactor, hosts):
dlist = []
for host in hosts:
d = getPage(host)
# when we have the content, call printHash with it
d.addCallback(printHash, host)
dlist.append(d)

# finish the process when the "queue" is done
return defer.gatherResults(dlist).addCallback(printElapsedTime)


def printElapsedTime(ignore):
print "Elapsed Time: %s" % (time.time() - start)


task.react(main, [hosts])

Output:

188eecd4da73515a9d1b3fde88d81ccc3a1e6028 http://google.com
562ffc06504dc0557386524b382372448d6e953a http://apple.com
dd2f34c7c4f47b49272d7922e4f17f7c1cafd3aa http://ibm.com
968fc83c1c7717575af03d43b236baf508134d0f http://yahoo.com
90c51ab729261bb72db922fb5ad22c0ae33c09da http://amazon.com
Elapsed Time: 1.36157393456

The run times of the threaded version and the Twisted version are comparable. Running them each multiple times, sometimes the threaded version is faster and sometimes the Twisted version is faster. They are both consistently faster than the synchronous version. Either way, this isn't a great benchmark and doesn't say much about how ansynchronous v. threaded will work in your particular case.

Working with multiple queues

The article's third bit of code shows how to use multiple queues to get the URL's body in one thread, then process it in another thread.

Threaded version

import Queue
import threading
import urllib2
import time
from BeautifulSoup import BeautifulSoup

hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
"http://ibm.com", "http://apple.com"]

queue = Queue.Queue()
out_queue = Queue.Queue()

class ThreadUrl(threading.Thread):
"""Threaded Url Grab"""
def __init__(self, queue, out_queue):
threading.Thread.__init__(self)
self.queue = queue
self.out_queue = out_queue

def run(self):
while True:
#grabs host from queue
host = self.queue.get()

#grabs urls of hosts and then grabs chunk of webpage
url = urllib2.urlopen(host)
chunk = url.read()

#place chunk into out queue
self.out_queue.put(chunk)

#signals to queue job is done
self.queue.task_done()

class DatamineThread(threading.Thread):
"""Threaded Url Grab"""
def __init__(self, out_queue):
threading.Thread.__init__(self)
self.out_queue = out_queue

def run(self):
while True:
#grabs host from queue
chunk = self.out_queue.get()

#parse the chunk
soup = BeautifulSoup(chunk)
print soup.findAll(['title'])

#signals to queue job is done
self.out_queue.task_done()

start = time.time()
def main():

#spawn a pool of threads, and pass them queue instance
for i in range(5):
t = ThreadUrl(queue, out_queue)
t.setDaemon(True)
t.start()

#populate queue with data
for host in hosts:
queue.put(host)

for i in range(5):
dt = DatamineThread(out_queue)
dt.setDaemon(True)
dt.start()


#wait on the queue until everything has been processed
queue.join()
out_queue.join()

main()
print "Elapsed Time: %s" % (time.time() - start)

Output:

[<title>Apple</title>]
[<title>Google</title>]
[<title>IBM - United States</title>]
[<title>Amazon.com: Online Shopping for Electronics, Apparel, Computers, Books, DVDs &amp; more</title>]
[<title>Yahoo!</title>]
Elapsed Time: 1.65801095963

Twisted version

For this simple example, it makes sense to just do the processing right after receiving the body. That would look like this:

from twisted.internet import defer, task
from twisted.web.client import getPage
import time
from BeautifulSoup import BeautifulSoup

hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
"http://ibm.com", "http://apple.com"]

start = time.time()

def printTitle(content, host):
soup = BeautifulSoup(content)
print soup.findAll(['title'])


def main(reactor, hosts):
dlist = []
for host in hosts:
d = getPage(host)
# when we have the content, call printTitle with it
d.addCallback(printTitle, host)
dlist.append(d)

# finish the process when the "queue" is done
return defer.gatherResults(dlist).addCallback(printElapsedTime)


def printElapsedTime(ignore):
print "Elapsed Time: %s" % (time.time() - start)


task.react(main, [hosts])

Output:

[<title>Google</title>]
[<title>Apple</title>]
[<title>IBM - United States</title>]
[<title>Amazon.com: Online Shopping for Electronics, Apparel, Computers, Books, DVDs &amp; more</title>]
[<title>Yahoo!</title>]
Elapsed Time: 1.80365180969

(As with the previous examples, neither the threaded nor the Twisted version are much different in speed.)

Hey!

"Hey! Those aren't the same!" I hear you say. You are right. They are not. The threaded version could extract the title in ThreadUrl.run instead of putting the content in queue for a DatamineThread.

I think the author was trying to show how you can make two threads work together on something... big? I haven't come up with a problem where it makes sense to write something in the Twisted version other than d.addCallback(printTitle, ...). If you have an idea post a comment, and I'll happily update this post (or make another post).

Conclusion

You can do things with threading. You can do things with Twisted. You should investigate Twisted (mostly for reasons not mentioned in this post). As noted above, krondo's Twisted Introduction is good, or there's some stuff I've written.

Also, if anyone can think of a better scenario for the two-kinds-of-thread-workers model, I'll update (or post again) with what a Twisted version might look like.

Wednesday, July 10, 2013

Angular injection

tl;dr is marked throughout by ∴

I don't like magical code. AngularJS is magical. I must fix this.

Dependency injection was one of AngularJS's first evil magicks I encountered. The idea that calling this function

function myFunction($scope, $http) {
...
}
will magically reach out to the universe and grab the correct values for $scope and $http runs contrary to all the JavaScript I've ever used. You can't do that!

So I dug in to discover the magicks. And now it's not magic! It's great! It's rougly equivalent to import in Python or require in Ruby. Here's how it works:

Modules

AngularJS groups injectable things together into modules. The following code will:

  1. make a module named woods
  2. add a provider to the woods module named Eeyore, which has a constant value
var woods = angular.module('woods', []);
woods.value('Eeyore', 'sad')

Here's some of the source for the module function plus context (see the full source here — the comments are helpful):

// from setupModuleLoader()
function ensure(obj, name, factory) {
return obj[name] || (obj[name] = factory());
}

// ...

var modules = {};
return function module(name, requires, configFn) {
// ...
return ensure(modules, name, function() {
// ...
var moduleInstance = {
// ...
requires: requires,
name: name,
provider: invokeLater('$provide', 'provider'),
factory: invokeLater('$provide', 'factory'),
service: invokeLater('$provide', 'service'),
value: invokeLater('$provide', 'value'),
constant: invokeLater('$provide', 'constant', 'unshift'),
filter: invokeLater('$filterProvider', 'register'),
controller: invokeLater('$controllerProvider', 'register'),
directive: invokeLater('$compileProvider', 'directive'),
// ...
};
// ...
return moduleInstance;
// ...
});
};

  1. The ensure(obj, name, factory) function makes sure that obj has an attribute named name, creating it by calling factory if it doesn't.
  2. The module(name, requires, configFn) function adds a moduleInstance named name to the global-ish modules object (by using ensure).

angular.module(...) adds a module to some global-ish module registry.

Injectors

Injectors find providers from among the modules it knows about. By default, AngularJS creates an injector through the bootstrapping process. We can also make an injector with angular.injector() and use it to access providers within modules:

// Run this in a JavaScript console (on a page that has AngularJS)

// Make a woods module with an Eeyore provider
var woods = angular.module('woods', []);
woods.value('Eeyore', 'sad')

// Make an injector that knows about the 'woods' module.
var injector = angular.injector(['woods'])

// Get poor Eeyore out of the module
injector.get('Eeyore');
// -> "sad"

The creation of injectors and how they know where things are is somewhat recursive (and the code is a little hard to read). I will unravel that magic in another post as it was making this post too long. For now, just know that

Injectors can find the providers you add to modules (e.g. through .value(...) or .factory(...)) and can find modules that were previously added to the global-ish module registry.

Invoke

Using an injector, we can invoke functions with dependency injection:

// Run this in a JavaScript console (on a page that has AngularJS)

// Make a woods module with an Eeyore provider
var woods = angular.module('woods', []);
woods.value('Eeyore', 'sad')

// Make an injector that knows about the 'woods' module.
var injector = angular.injector(['woods'])

// Imbue a function with sadness
function eatEmotion(Eeyore) {
return 'I am ' + Eeyore;
}
injector.invoke(eatEmotion);
// -> "I am sad"

But how does it KNOOooooowwwWWW??

How does AngularJS know the names of the arguments a function is expecting? How does it know that my weather function's arguments is named sunny?

function weather(sunny) {
...
}

That's an internal detail of weather, inaccessible from the outside, no? I've done introspection with Python, but this is JavaScript.

How AngularJS gets the argument names made me laugh out loud when I found it. It's a dirty (effective) trick found in the annontate function (full source):

var FN_ARGS = /^function\s*[^\(]*\(\s*([^\)]*)\)/m;
var FN_ARG_SPLIT = /,/;
var FN_ARG = /^\s*(_?)(\S+?)\1\s*$/;
var STRIP_COMMENTS = /((\/\/.*$)|(\/\*[\s\S]*?\*\/))/mg;
function annotate(fn) {
var $inject,
fnText,
argDecl,
last;

if (typeof fn == 'function') {
if (!($inject = fn.$inject)) {
$inject = [];
fnText = fn.toString().replace(STRIP_COMMENTS, '');
argDecl = fnText.match(FN_ARGS);
forEach(argDecl[1].split(FN_ARG_SPLIT), function(arg){
arg.replace(FN_ARG, function(all, underscore, name){
$inject.push(name);
});
});
fn.$inject = $inject;
}
} else if (isArray(fn)) {
last = fn.length - 1;
assertArgFn(fn[last], 'fn');
$inject = fn.slice(0, last);
} else {
assertArgFn(fn, 'fn', true);
}
return $inject;
}

If you pass a function to annotate it will convert that function to a string and use regular expressions to get the names of the arguments.

I should note, however, that the practice of depending on argument names for injection is discouraged (because of how the names get munged during minification). It makes the code look cleaner, though. Maybe we should work on changing minification to handle this introspective kind of injection.

Which functions have it? Which don't?

When you're just starting with AngularJS, it's a little frustrating that some functions are magic (i.e. are called with injection) and some are seemingly inert. For instance, when writing a directive, link is not called with dependency injection, but controller is.

The provider methods are called with injection (factory, value, etc...). And directive controllers are called with injection. From the official docs:

DI is pervasive throughout Angular. It is typically used in controllers and factory methods.

Sadly, the only way to know if a function is called with dependency injection is to... know. Read the docs or the source, and build up an ample supply of doing it wrong :)

Namespacing

Modules provided to an injector will stomp on each other's providers:

// Run this in a JavaScript console (on a page that has AngularJS)

function mineFor(Thing) {
return "I found " + Thing + "!";
}


// Make two modules that each define a Thing provider
var good_module = angular.module('good', []);
good_module.value('Thing', 'gold');

var bad_module = angular.module('bad', []);
bad_module.value('Thing', 'sour milk');

// Make an injector
var injector = angular.injector(['good', 'bad']);

injector.invoke(mineFor);
// -> "I found sour milk!"

I don't know if this is by design or if there are plans to address it. Be aware of it.

In summary

Dependency injection in AngularJS is roughly equivalent to other languages' including and importing, but scoped to functions. Some of the magic is accomplished by exploiting function.toString() and regular expressions.

Read the official doc about Dependency Injection for some of the motivation behind its use.