DATA SCIENCE IS EASY. GETTING THE RIGHT DATA READY FOR IT IS HARDER.

Updated: Feb 11



So how do you make it easier?

Starting with the high priority use cases is a sound approach. More often than not, you already have the data anyway. For inspiration, check out McKinsey's Potential Value of AI Across Industries.

There is always a temptation to take in as much data as possible (all legacy data), throw it in a data lake, and start mining. Such boil-the-ocean ideas have not proven to work well and take a lot of time because data is complex to link and connect. Instead, do it gradually, one use case at a time.

To create new insight, sandboxes, as part of a comprehensive data lake strategy, are a great way to connect various data sets. But remember that the data should somehow measure something along the process or journey for which business question you are trying to develop insights.

Place importance on metadata (like a data catalog) and having a data quality strategy. Try to automate reconciliation processes. Of course, a data quality strategy precludes a data governance strategy as well. Data governance is all about improving data quality, which is nothing more than enhancing characteristics of the data so it can serve NEW uses (like analytics). The more automated and embedded your analytical needs are, the more critical data governance and quality will be.

Remember as well that how the data is acquired, how it changes over time and where it comes from will have just as much impact on AI as the data itself.

Finally, to avoid drowning, executives should connect their business strategy to the data and analytics strategy. Since data quality, labelling and "connecting" all the data take so much effort, a focus on valuable data sets will help.

©2020 by Modern Data Analytics