Updated: Oct 11, 2020
13 years ago when data science was not as popular as it today, the college that I got my engineering degree in was smart enough to teach us a few very serious data science courses. I immediately fell in love with data science, I understood that this is the future. Companies these days are producing enormous amounts of data and many of them don't know how to get meaningful insights from it, it's a shame because there is tons of money that can be produced from this data...
In 2007 I worked in a tech company that gave software services to forex and stocks broker named iFOREX. My boss back then was a very techy open-minded guy. Although he didn't know much about data mining he gave me the green light to start the very first project of data mining in iFOREX.
See, it was 2008, the beginning of the "subprime" crisis and for those of you that don't know, forex companies are flourishing in these kinds of days. The volatility of the market is very high and it's attracting lots of speculators. iFOREX grew from few dozens of employees to hundreds of employees in less than a year. They started getting thousands of new leads every day and practically they couldn’t get in touch with each one of these leads which practically means that leads are getting spoiled. The cost of each lead can be between a few dollars to 30$-40$ depends on the market, we are talking about millions of dollars every month that might go to waste. You are most probably familiar with the Pareto law and how it is reflected in businesses profit, in most businesses 20% of the customers are producing 80% of the profit. In the Forex industry, the numbers are a bit different, 95% of the profit is being generated by 5% of the customers. In practice let’s say that a company is getting 1000 leads a day, only 5% of them will generate profit, the other will generate a loss. What is the problem that you see? Let me help you with that, the problem is that in order to discover these 5% "diamonds" you need to deal with 95% “garbage”, which means a waste of A LOT of money. With this data in front of us, we understood that we have to identify the 5% of the "Diamonds" WITHOUT wasting time on the 95% "Garbage".
Think about it , you have 10 sales agents that are dealing with 1000 leads in a normal day (just for the sake of the example) and only 50 of these leads will produce a profit, this means that you are wasting the salary of 9.5 sales agents + all the cost of the leads that they are dealing with. This is how my first commercial data science project started. In our project, we were aiming to predict what is the chances of a lead to become a diamond in the first days of the lead in iFOREX trading platform. We (me and a dear friend of mine that was my partner in this project) have used a software that back then was called “Clementine”, today after IBM bought it they called it "SPSS Modeler". The process of modeling can be divided into 4 parts:
1. Data collection - You have to start gathering all the data that you think that MIGHT be relevant to your model. Even if you think that the chances that a specific piece of data is relevant are very very low, still you have to put it inside your DWH. We discovered along the way that there are data points that seem completely irrelevant were actually very relevant.
2. Data cleaning - This part takes most of the time. The data that you are gathering is usually coming from different parts of the system and it usually dirty, filled with inaccurate numbers, and ordered in the wrong way. You have to clean and fix these issues otherwise your model won't produce what you want.
3. Building the model - This is the fun part...here you are experimenting with different models (and approaches) on your data and you seeing which one is giving the best results. you are doing it only on 50% of the data, the other 50% will be used in the next step.
4. Validating the model - Once your model is ready you have to check if it's producing the expected output. You are doing it on the other 50% of the data. This 50% are not "known" by the machine and the machine has to run the model and to predict what are the chances that a lead (out of this 50%) will become a "Diamond". Because we are talking about historical data here you actually have already the result and you can see if the "machine" predicted it well.
To make a long story short, At the end of the process, we were able to predict in 70% confidence if a lead will become a "Diamond" or not after 2 weeks of trading in iFOREX trading platform. It was crazy, I understood the power of data mining and since then I’m a big fan of implementing data mining models in the critical areas in every company that I'm working with.
So how it relates to product management?! THIS IS PRODUCT MANAGEMENT! Product management is about providing software solutions to achieve business goals, Product management is about being open to all kind of possible solutions to needs that even your customer didn't know that he had. Most of the customers got used to working in a certain way and they don't know that it can be done 100% more efficient and sometimes 100% more profitable. In your product team, you have to have skills such as data scientist, it will give you the superpower that your company needs in order to skyrocket. You can argue that the place of the data scientist is not in the product team, I think that it's the ultimate place for him, but it's getting late and I need to save some topics for my next posts :-)