If you want, you can have Dask set up a Jupyter notebook server for you, co-located with the Dask scheduler. There are many ways to do this, but this blog post lists two.

First, why would you do this?

Sometimes people inside of large institutions have complex deployment pains. It takes them a while to stand up a process running on a machine in their cluster, with all of the appropriate networking ports open and such. In that situation, it can sometimes be nice to do this just once, say for Dask, rather than twice, say for Dask and for Jupyter.

Probably in these cases people should invest in a long term solution like JupyterHub, or one of its enterprise variants, but this blogpost gives a couple of hacks in the meantime.

Hack 1: Create a Jupyter server from a Python function call

If your Dask scheduler is already running, connect to it with a Client and run a Python function that starts up a Jupyter server.

from dask.distributed import Client

client = Client("scheduler-address:8786")

def start_juptyer_server():
    from notebook.notebookapp import NotebookApp
    app = NotebookApp()
    app.initialize([])  # add command line args here if you want

client.run_on_scheduler(start_jupyter_server)

If you have a complex networking setup (maybe you’re on the cloud or HPC and had to open up a port explicitly) then you might want to install jupyter-server-proxy (which Dask also uses by default if installed), and then go to http://scheduler-address:8787/proxy/8888 . The Dask dashboard can route your connection to Jupyter (Jupyter is also kind enough to do the same for Dask if it is the main service).

Hack 2: Preload script

This is also a great opportunity to learn about the various ways of adding custom startup and teardown. One such way, is a preload script like the following:

# jupyter-preload.py
from notebook.notebookapp import NotebookApp

def dask_setup(scheduler):
    app = NotebookApp()
    app.initialize([])
dask-scheduler --preload jupyter-preload.py

That script will run at an appropriate time during scheduler startup. You can also put this into configuration

distributed:
  scheduler:
    preload: ["/path/to/jupyter-preload.py"]

Really though, you should use something else

This is mostly a hack. If you’re at an institution then you should ask for something like JuptyerHub.

Or, you might also want to run this in a separate subprocess, so that Jupyter and the Dask scheduler don’t collide with each other. This shouldn’t be so much of a problem (they’re both pretty light weight), but isolating them probably makes sense.

Thanks Nick!

Thanks to Nick Bollweg, who answered a questions on this topic here


blog comments powered by Disqus