Taming The Mathematical Monster

Most of us have two biggest hurdles while deciding to learn Data Science. One is that we need to learn how to code, and the other is mathematics. And since maths has bullied us in our school days, we might have more of an aversion towards it. Here, we will try to beat that bully back by understanding why we need to learn maths and what actually is to be learnt. And along the way, why it is not that difficult to learn.

There is a story. Once there was a wild horse. He would throw off anyone who tried to mount it. Everyone was afraid of it. So much so that people didn’t even try to tame it. Then came Alexander. He tamed the horse with his wits and soon became one of the best riders in the whole of Macedonia.

He later died of malaria -Alexander, not the horse.

But what’s this story got to do with Mathematics?

Like the wild horse, most of us don’t even try because we have heard it’s really tough.
If you use a little wit, it is really easy to master.
Once you master mathematics, it will take you places.

An analogy like the above is good for being motivated (for the first 30 mins of starting), but before any effort is applied towards learning Mathematics, the question arises- Why?

The simple answer is– Because it is a requirement to be a Data Scientist.

If you are not convinced to start learning mathematics with the above realization, the long answer goes like this. First, for measurement and modeling, you need maths. Measurement is easy to understand. Dairy milk starts at 10 Rs, and Mt Everest is 8848 meters in height (a little bit more than Maria Sharapova). What can scare you is the modeling? Say Tony is three times taller than Moni (Tony = 3*Moni). That was not scary. But if I were to say the equation for height is as under-

equation

This is what a mathematical model might look like. What is needed from you is to be able to read it in plain English. It says y depends on both x and z. It equals the sum of 3 times the value of x squared, the value of the log of x with base 2, three times the value of constant e raised to the power z(negative means inverse this and use the fraction), and the rate of change of y itself with x on that point.

This equation becomes infinitely simpler if we know that x is 16, z is 1, and the rate of change of y is 13. Because then all you need is a calculator and you have your value of y.

So, what is our end goal? Do we need to solve this equation or write this or similar ones? Answer- Neither. That is the job of the machine. Your job is two-pronged:

To make your clients understand that y is very dependent on the value of x, a little on z, and none so ever on a variable t.
This equation will have many solutions (And the machine will learn to solve it, not you). You look at those solutions and try to find which one you want to use to achieve your objective.

The second requirement of learning mathematics is to be able to compare two things. Let’s say you want to know how much the sales of two products is going on and you are told that Toffee is very much in demand and Chocolates is much much much in demand. Can you make any sort of decision on this information? Maths will help you make sense with rephrasing it as Toffees have a demand of 3 million units, and Chocolates have a demand of 19 million units. Additionally, since Toffees and Chocolates are negatively correlated, supplying 10 million units of Chocolates will result in a decrease of demand of product A to 2.6 million units.

Also, if we raise the price of Toffees by 10 percent, our total sales will decrease by 3 percent, the price to the company will be lowered by 17 percent, increasing our total profit by 9 percent (Isn’t that common sense- though sales may decrease if the manufacturing cost has also decreased then total profit may actually increase. The trick is to know what is that sweet spot where the universe will conspire to make you and your company rich, and maths with Data Science can help you find it).

Maths also lets you analyze things that are difficult to imagine. We live in a four-dimensional world. Three dimensions of space and one of time. Can you imagine a world with only two dimensions? Yes, because it has lower dimensions. If we remove time, everything will be ageless, and if we remove only height, a spherical ball will look like a circle. That is easy. But what if I tell you to imagine a ten-dimensional universe where six dimensions are of space. In our world, we know that the ball bounces from a wall at exactly the same angle with which it made contact with the surface. Can you do the same in the ten-dimensional universe? Yes, even if you cannot imagine it, you can calculate these arbitrary things in Maths and Data Science. A four-dimensional world is simply represented as f(x,y,z,t). A 6 dimensional world would be f(x,y,z,t,a,b). It might hurt your feelings, but everything can be reduced to an equation and solved if enough data is available. Everything!!!!

Lastly, nobody is asking you to solve for the Riemann hypothesis. We are being asked for the basic stuff only. So be a man and learn it.

By now, I hope you have understood the importance of learning mathematics. The actual trick lies in learning only the required stuff (It is not being lazy, it is being effective). So what do we need to master? Let’s break it down to the simplest of terms. Here are the minimum requirements.

Statistics– I like to look at statistics as the way to summarize data. You can have 100 rows of data from a bank, but to know that their average bank balance is 3500 golden coins helps. And that is what Statistics is most of the time. And hence the important things to know are Mean, Median, Mode, Standard Deviation, Standard Error, Moments, and Correlation.

Probability-There is something called the law of averages. It means that events transpire so that the average value (or balance) is maintained over time. The most successful stock investor ever was a mathematician whose principal investment strategy was around this. What made his model (It was a computer-based mathematical model) more successful was that he had better data, and he calculated his probabilities of happening of those events better than anyone else. And that is the probability you need to learn. Learning about expectations, Bayes’ theorem, conditional probability, probability distribution functions, and Central limit theorem will help you make sense of all the data and how it is affected. Because sometimes all we are doing in Data Science is trying to find the hidden pattern in all that seeming random data.

Calculus-Some of us believe that Integration and Differentiation calculus are two demons who snuck into the world to terrorize mankind. If you think the same, you are wrong. Differentiation is just a way to calculate the rate of change, and Integration is just how one calculates the area.

In its simplest form,

Differentiation at a point = (Final Value – Initial Value)/ Initial Value

Integration of equation = Length * Breadth

If it was so easy, why the hell did it seem so hard when you tried to learn it in school. It is because you were trying to measure the rate of change of heat transfer in an alloy for which the composition of material changed with the length. And the area being measured was under a curve, which looks like a flag hoisted in the wind. Both would be equations of high order and degree. It does sound tough, but I am sure it does make you curious. The reason why they exist is very logical. Let us say you want to predict the rate at which your lemonade stand sells lemonade(y). What is the best bet? My answer is- Take your sale from yesterday(x) and add it to the growth rate you had yesterday. That will give you the most accessible model and to write it will be-

y=x+dy/dx

Calculus allows us to take into account the rates of change and fluctuations hidden in the data. And only if we can articulate the model well, it becomes a hundred times easier to understand.

Again, our goal is not to write these mathematical models or solve them. To be less abstract, it would be to measure error in your model and define criteria for measurement for your hypothesis. By definition, it is measurement and comparison only.

Most of us would prefer it if we learned all the maths back in school. But no matter how much we want to go back in time and beat some sense into that kid, we can’t. And whatever gaps are there in our knowledge is our problem. So, all we can do is roll up our sleeves and get to work.

There is also this small question of where to start learning from. And the plethora of recommendations is for free resources on the web. Those places are really good and quite free. However, we are always short on time and effort. There is a lot to do and a little time for that. So I would recommend you do that not separately, but within that Data Science course, you have decided to go for. We also offer the same with our courses here at InfoSecTrain. These courses are curated to involve the application of mathematics targeted to the field of Data Science only. But if you are carving a path for yourself, try to learn only the things that were included in the syllabus of the famous Data Science courses out there and here. It will greatly save you time.

There are a few things I would like to summarize this with.

It’s really not that tough. Dare to start.
If you believe you are an absolute beginner, you will still win this race with Maths for Data Science.
It’s really not that tough. You can do it.
Be smart when you start.

If you are still not convinced to start, go watch ‘Forrest Gump’ and come back. Maybe that will make you move that lazy, procrastinating butt into action.

AUTHOR
Nishant Budakoti
Infosec Train

“ Nishant Budakoti is an M.Tech in field of VLSI and has experience of over 6 years working with electrical utlilities. He is a voracious reader. He loves working with different technologies and prefers online medium to learn and share. “