This work is supported by Anaconda Inc. and the Data Driven Discovery Initiative from the Moore Foundation.

I’m pleased to announce the release of Dask version 0.15.2. This release contains stability enhancements and bug fixes. This blogpost outlines notable changes since the 0.15.0 release on June 11th.

You can conda install Dask:

conda install dask

or pip install from PyPI

pip install dask[complete] --upgrade

Conda packages are available both on the defaults and conda-forge channels.

Full changelogs are available here:

Some notable changes follow.

New dask-core and dask conda packages

On conda there are now three relevant Dask packages:

  1. dask-core: Package that includes only the core Dask package. This has no dependencies other than the standard library. This is primarily intended for down-stream libraries that depend on certain parts of Dask.
  2. distributed: Dask’s distributed scheduler, depends on Tornado, cloudpickle, and other libraries.
  3. dask: Metapackage that includes dask-core, distributed, and all relevant libraries like NumPy, Pandas, Bokeh, etc.. This is intended for users to install

This organization is designed to both allow downstream libraries to only depend on the parts of Dask that they need while also making the default behavior for users all-inclusive.

Downstream libraries may want to change conda dependencies from dask to dask-core. They will then need to be careful to include the necessary libraries (like numpy or cloudpickle) based on their user community.

Improved Deployment

Due to increased deployment on Docker or other systems with complex networking rules dask-worker processes now include separate --contact-address and --listen-address keywords that can be used to specify addresses that they advertise and addresses on which they listen. This is especially helpful when the perspective of ones network can shift dramatically.

dask-worker scheduler-address:8786 \
            --contact-address 192.168.0.100:9000  # contact me at 192.168.0.100:9000
            --listen-address 172.142.0.100:9000  # I listen on this host

Additionally other services like the HTTP and Bokeh servers now respect the hosts provided by --listen-address or --host keywords and will not be visible outside of the specified network.

Avoid memory, file descriptor, and process leaks

There were a few occasions where Dask would leak resources in complex situations. Many of these have now been cleaned up. We’re grateful to all those who were able to provide very detailed case studies that demonstrated these issues and even more grateful to those who participated in resolving them.

There is undoubtedly more work to do here and we look forward to future collaboration.

Array and DataFrame APIs

As usual, Dask array and dataframe have a new set of functions that fill out their API relative to NumPy and Pandas.

See the full APIs for further reference:

Deprecations

Officially deprecated dask.distributed.Executor, users should use dask.distributed.Client instead. Previously this was set to an alias.

Removed Bag.concat, users should use Bag.flatten instead.

Removed magic tuple unpacking in Bag.map like bag.map(lambda x, y: x + y). Users should unpack manually instead.

Julia

Developers from the Invenia have been building Julia workers and clients that operate with the Dask.distributed scheduler. They have been helpful in raising issues necessary to ensure cross-language support.

Acknowledgements

The following people contributed to the dask/dask repository since the 0.15.0 release on June 11th

  • Bogdan
  • Elliott Sales de Andrade
  • Bruce Merry
  • Erik Welch
  • Fabian Keller
  • James Bourbeau
  • Jeff Reback
  • Jim Crist
  • John A Kirkham
  • Luke Canavan
  • Mark Dunne
  • Martin Durant
  • Matthew Rocklin
  • Olivier Grisel
  • Søren Fuglede Jørgensen
  • Stephan Hoyer
  • Tom Augspurger
  • Yu Feng

The following people contributed to the dask/distributed repository since the 1.17.1 release on June 14th:

  • Antoine Pitrou
  • Dan Brown
  • Elliott Sales de Andrade
  • Eric Davies
  • Erik Welch
  • Evan Welch
  • John A Kirkham
  • Jim Crist
  • James Bourbeau
  • Jeremiah Lowin
  • Julius Neuffer
  • Martin Durant
  • Matthew Rocklin
  • Paul Anton Letnes
  • Peter Waller
  • Sohaib Iftikhar
  • Tom Augspurger

Additionally we’re happy to announce that John Kirkham (@jakirkham) has accepted commit rights to the Dask organization and become a core contributor. John has been active through the Dask project, and particularly active in Dask.array.


blog comments powered by Disqus