Using Luigi for data science with d6tflow

Luigi is a great library for building data workflows. But it is mostly designed for data engineering, not data science. d6flow is an open source python library built on top of Luigi, optimized for data science workflows. You can quickly:
  • Load task input and output into Pandas and Dask dataframes
  • Save Pandas and Dask dataframes to parquet, CSV or SQL
  • Load/save trained sklearn and keras models
  • Invalidate tasks including upstream/downstream tasks during trial-and-error research
  • Integrate with d6tpipe to quickly hand off data from data engineer to data scientist
Learn more at: Getting started:
  1. Visit Github page - star the library
  2. Example project - go through a machine learning workflow
  3. Readthedocs - fully documented


To learn more about the DataBolt tools and products that help you accelerate data science, check out

To see other blog posts check out our archive at

For questions and feedback email us at

Copyright © 2019, All rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.

Email Marketing Powered by Mailchimp