Presentations/20190224_BHJ_DaskIntro.md at master · CSusingPython/Presentations

Dynamic task scheduling
"Big Data" collections : parallel arrays, dataframes, and lists -> run on top of dynamic task schedulers

import dask.dataframe as dd
df = dd.read_csv('who.csv')
df.groupby(df.user_id).value.mean().compute()

installs with conda or pip
extends the size of convenient datasets from "fits in memory" to "fits on disk"
scale to a cluster of 100s of machines -> resilient, elastic, data local, and low latency

참고) visualize task graph

import dask.array as ds
x da.ones((15, 15), chunks=(5,5))
y = x + x.T
# y.compute()
y.visualize(filename='transpose.svg')

정말 간략하게 Dask에 대한 소개 정도만...

Provide feedback