Accelerate Data Engineering

d6tjoin - Identify and analyze join problems

Joining datasets is a common data engineering operation. However, often there are problems merging datasets from different sources because of mismatched identifiers, date conventions etc.

d6tjoin.utils module allows you to test for join accuracy and quickly identify and analyze join problems.

Here are some examples which show you how to:

  • do join quality analysis prior to attempting a join
  • detect and analyze a string-based identifiers mismatch
  • detect and analyze a date mismatch
See jupyter notebook


import d6tjoin.utils
# Use Case: assert 100% join accuracy for data integrity checks

j = d6tjoin.utils.PreJoin([df1,df2],['id','date']) 
    assert j.is_all_matched() # fails
    print('assert fails!')

# Use Case: detect and analyze id mismatch
  key left key right all matched inner left right outer unmatched total unmatched left unmatched right
0 id id False 0 10 10 20 20 10 10
1 date date True 366 366 366 366 0 0 0
2 __all__ __all__ False 0 3660 3660 7320 7320 3660 3660


See jupyter notebook


To learn more about the DataBolt tools and products that help you accelerate data engineering, check out

To see other blog posts check out our archive at

For questions and feedback email us at

Copyright © 2018, All rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.

Email Marketing Powered by Mailchimp