Please take the Dask User Survey for 2019. Your reponse helps to prioritize future work.


We are pleased to announce the release of Dask version 2.0. This is a major release with bug fixes and new features.

Most major version changes of software signal many new and exciting features. That is not the case with this release. Instead, we’re bumping the major version number because we’ve broken a few APIs to improve maintainability, and because we decided to drop support for Python 2.

This blogpost outlines these changes.

Install

As always, you can conda install Dask:

conda install dask

or pip install from PyPI:

pip install "dask[complete]" --upgrade

Full changelogs are available here:

Drop support for Python 2

Python 2 reaches end of life in 2020, just six months away. Most major PyData projects are dropping Python 2 support around now. See the Python 3 Statement for more details about some of your favorite projects.

Python 2 users can continue to use older versions of Dask, which are in widespread use today. Institutions looking for long term support of Dask in Python 2 may wish to reach out to for-profit consulting companies, like Quansight.

Dropping Python 2 will allow maintainers to spend more of their time fixing bugs and developing new features. It will also allow the project to adopt more modern development practices going forward.

Small breaking changes

We now include a list with a brief description of most of the breaking changes:

  • The distributed.bokeh module has moved to distributed.dashboard
  • Various ncores keywords have been moved to nthreads
  • Client.map/gather/scatter no longer accept iterators and Python queue objects. Users can handle this themselves with submit/as_completed or can use the Streamz library.
  • The worker /main route has moved to /status
  • Cluster.workers is now a dictionary mapping worker name to worker, rather than a list as it was before

Some larger fun changes

We didn’t only break things. We also added some new things :)

Array metadata

Previously Dask Arrays were defined by their shape, chunkshape, and datatype, like float, int, and so on.

Now, Dask Arrays also know the type of their chunks. Historically this was almost always a NumPy array, so it didn’t make sense to store, but now that Dask Arrays are being used more frequently with sparse array chunks and GPU array chunks we now maintain this information as well in a ._meta attribute. This is already how Dask dataframes work, so it should be familiar to advanced users of that module.

>>> import dask.array as da
>>> x = da.eye(1000000)
>>> x._meta
array([], shape=(0, 0), dtype=float64)

>>> import sparse
>>> s = x.map_blocks(sparse.COO.from_numpy)
>>> s._meta
<COO: shape=(0, 0), dtype=float64, nnz=0, fill_value=0.0>

This work was largely done by Peter Entschev

Array HTML output

Dask arrays now print themselves nicely in Jupyter notebooks, showing a table of information about their size and chunk size, and also a visual diagram of their chunk structure.

import dask.array as da
x = da.ones((10000, 1000, 1000))
Array Chunk
Bytes 80.00 GB 125.00 MB
Shape (10000, 1000, 1000) (250, 250, 250)
Count 640 Tasks 640 Chunks
Type float64 numpy.ndarray
1000 1000 10000

Proxy Worker dashboards from the Scheduler dashboard

If you’ve used Dask.distributed they you’re probably familiar with Dask’s scheduler dashboard, which shows the state of computations on the cluster with a real-time interactive Bokeh dashboard. However you may not be aware that Dask workers also have their own dashboard, which shows a completely separate set of plots for the state of that individual worker.

Historically these worker dashboards haven’t been as commonly used because it’s hard to connect to them. Users don’t know their address, or network rules don’t enable direct web connections. Fortunately, the scheduler dashboard is now able to proxy a connection from the user to the worker dashbaord.

You can access this by clicking on the “Info” tab and then selecting the “dashboard” link next to any of the workers. You will need to also install jupyter-server-proxy

pip install jupyter-server-proxy

Thanks to Ben Zaitlen for this fun addtition. We hope that now that these plots are made more visible, people will invest more into developing plots for them.

Black everywhere

We now use the Black code formatter throughout most Dask repositories. These repositories include pre-commit hooks, which we recommend when developing on the project.

cd /path/to/dask
git checkout master
git pull upstream master

pip install pre-commit
pre-commit install

Git will then call black and flake8 whenever you attempt to commit code.

Dask Gateway

We would also like to inform readers about the somewhat new Dask Gateway project that enables institutions and IT to control many Dask clusters for a variety of users.

Dask Gateway

Acknowledgements

There have been several releases since the last time we had a release blogpost. The following people contributed to the following repositories since the 1.1.0 release on January 23rd:

  • dask/dask

    • (Rick) Richard J Zamora
    • Abhinav Ralhan
    • Adam Beberg
    • Alistair Miles
    • Álvaro Abella Bascarán
    • Anderson Banihirwe
    • Aploium
    • Bart Broere
    • Benjamin Zaitlen
    • Bouwe Andela
    • Brett Naul
    • Brian Chu
    • Bruce Merry
    • Christian Hudon
    • Cody Johnson
    • Dan O’Donovan
    • Daniel Saxton
    • Daniel Severo
    • Danilo Horta
    • Dimplexion
    • Elliott Sales de Andrade
    • Endre Mark Borza
    • Genevieve Buckley
    • George Sakkis
    • Guillaume Lemaitre
    • HSR05
    • Hameer Abbasi
    • Henrique Ribeiro
    • Henry Pinkard
    • Hugo
    • Ian Bolliger
    • Ian Rose
    • Isaiah Norton
    • James Bourbeau
    • Janne Vuorela
    • John Kirkham
    • Jim Crist
    • Joe Corbett
    • Jorge Pessoa
    • Julia Signell
    • JulianWgs
    • Justin Poehnelt
    • Justin Waugh
    • Ksenia Bobrova
    • Lijo Jose
    • Marco Neumann
    • Mark Bell
    • Martin Durant
    • Matthew Rocklin
    • Michael Eaton
    • Michał Jastrzębski
    • Nathan Matare
    • Nick Becker
    • Paweł Kordek
    • Peter Andreas Entschev
    • Philipp Rudiger
    • Philipp S. Sommer
    • Roma Sokolov
    • Ross Petchler
    • Scott Sievert
    • Shyam Saladi
    • Søren Fuglede Jørgensen
    • Thomas Zilio
    • Tom Augspurger
    • Yu Feng
    • aaronfowles
    • amerkel2
    • asmith26
    • btw08
    • gregrf
    • mbarkhau
    • mcsoini
    • severo
    • tpanza
  • dask/distributed

    • Adam Beberg
    • Benjamin Zaitlen
    • Brett Jurman
    • Brett Randall
    • Brian Chu
    • Caleb
    • Chris White
    • Daniel Farrell
    • Elliott Sales de Andrade
    • George Sakkis
    • James Bourbeau
    • Jim Crist
    • John Kirkham
    • K.-Michael Aye
    • Loïc Estève
    • Magnus Nord
    • Manuel Garrido
    • Marco Neumann
    • Martin Durant
    • Mathieu Dugré
    • Matt Nicolls
    • Matthew Rocklin
    • Michael Delgado
    • Michael Spiegel
    • Muammar El Khatib
    • Nikos Tsaousis
    • Olivier Grisel
    • Peter Andreas Entschev
    • Sam Grayson
    • Scott Sievert
    • Tom Augspurger
    • Torsten Wörtwein
    • amerkel2
    • condoratberlin
    • deepthirajagopalan7
    • jukent
    • plbertrand
  • dask/dask-ml

    • Alejandro
    • Florian Rohrer
    • James Bourbeau
    • Julien Jerphanion
    • Matthew Rocklin
    • Nathan Henrie
    • Paul Vecchio
    • Ryan McCormick
    • Saadullah Amin
    • Scott Sievert
    • Sriharsha Atyam
    • Tom Augspurger
  • dask/dask-jobqueue

    • Andrea Zonca
    • Guillaume Eynard-Bontemps
    • Kyle Husmann
    • Levi Naden
    • Loïc Estève
    • Matthew Rocklin
    • Matyas Selmeci
    • ocaisa
  • dask/dask-kubernetes

    • Brian Phillips
    • Jacob Tomlinson
    • Jim Crist
    • Joe Hamman
    • Joseph Hamman
    • Matthew Rocklin
    • Tom Augspurger
    • Yuvi Panda
    • adam
  • dask/dask-examples

    • Christoph Deil
    • Genevieve Buckley
    • Ian Rose
    • Martin Durant
    • Matthew Rocklin
    • Matthias Bussonnier
    • Robert Sare
    • Tom Augspurger
    • Willi Rath
  • dask/dask-labextension

    • Daniel Bast
    • Ian Rose
    • Matthew Rocklin
    • Yuvi Panda

blog comments powered by Disqus