bookmark

Data Science at the Command Line


Description

We data scientists love to create exciting data visualizations and insightful statistical models. However, before we get to that point, usually much effort goes into obtaining, scrubbing, and exploring the required data. The command line, although invented decades ago, is an amazing environment for performing such data science tasks. By combining small, yet powerful, command-line tools you can quickly explore your data and hack together prototypes. New tools such as parallel, jq, and csvkit allow you to use the command line for today's data challenges. Even if you're already comfortable processing data with, say, R or Python, being able to also leverage the power of the command line can make you a more productive and efficient data scientist.

Preview

Tags

Users

  • @schmidt2

Comments and Reviews