Using Luigi for data science with d6tflow

Luigi is a great library for building data workflows. But it is mostly designed for data engineering, not data science. d6flow is an open source python library built on top of Luigi, optimized for data science workflows. You can quickly:
  • Load task input and output into Pandas and Dask dataframes
  • Save Pandas and Dask dataframes to parquet, CSV or SQL
  • Load/save trained sklearn and keras models
  • Invalidate tasks including upstream/downstream tasks during trial-and-error research
  • Integrate with d6tpipe to quickly hand off data from data engineer to data scientist
Learn more at: Getting started:
  1. Visit Github page - star the library
  2. Example project - go through a machine learning workflow
  3. Readthedocs - fully documented
Github

Questions?

To learn more about the DataBolt tools and products that help you accelerate data science, check out www.databolt.tech

To see other blog posts check out our archive at blog.databolt.tech.

For questions and feedback email us at support@databolt.tech

Share
Tweet
Forward
Copyright © 2019 www.databolt.tech, All rights reserved.


Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.

Email Marketing Powered by Mailchimp