Book,

Data Science at Scale with Python and Dask

.
Manning Early Access Program Manning, (2018)

Abstract

If you're doing data analysis using Pandas, NumPy, or Scikit, you know about THE WALL. At some point, you need to introduce parallelism to your system to handle larger-scale data or analytics tasks. The problem with THE WALL is that it can require you to rewrite your code, redesign your system, or start all over using an unfamiliar technology like Spark or Flink. Dask is a native parallel analytics tool designed to integrate seamlessly with the libraries you're already using, including Pandas, NumPy, and Scikit-Learn. It's built to help you parallelize your data tasks on a standalone system, a cluster, or even a massive supercomputer without radically changing the way you work.

Tags

Users

  • @gdmcbain

Comments and Reviews