We have all read the punchlines – data scientist is the sexiest job, there’s not enough of them and the salaries are very high. The role has been sold so well that the number of data science courses and college programs are growing like crazy. After my previous blog post I have received questions from people asking how to become a data scientist – which courses are the best, what steps to take, what is the fastest way to land a data science job?
I tried to really think it through and I reflected on my personal experience – how did I get here? How did I become a data scientist? Am I a data scientist? My experience has been very mixed – I have started out as a securities analyst in an investment house using mainly Excel then slowly shifted towards business intelligence in the banking industry and then in consulting, eventually doing the actual so-called “data science” – building predictive models, working with Big Data, crunching tons of numbers and writing code to do data analysis and machine learning – though in the earlier days it was called “data mining”.
When the data science hype has started I tried to understand how is it different from what I have been doing so far, maybe I should learn new skills and become the data scientist instead of someone working “in analytics”?
Like everybody obsessed with it I have started taking multiple courses, reading data books, doing data science specializations (and not finishing all of them..), coded a lot – I wanted to become THE one in the middle cross-section of the (in)famous data science Venn diagram. What I did learn is that these unicorns (yes, the people in the middle “Data Science” bucket are called unicorns) rarely exist and even if they do – they are typically generalists who have knowledge in all of these areas but are “master of none”.
Although I now consider myself a data scientist – I lead a fantastically talented data science team in Amazon, build machine learning models, work with “Big data” – I still think there’s too much chaos around the craft and much less clarity, especially for people new to the industry or ones trying to get in. Don’t get me wrong – there are a lot of very complex branches of data science – like AI, robotics, computer vision, voice recognition etc. – which require very deep technical and mathematical expertise, and potentially a PhD… or two. But if you are interested in getting into a data science role that was called a business / data analyst just a few years ago – here are the four rules that have helped me survive in the data science world.
Rule 1 – Get your priorities and motivations straight. Be very realistic about what skills you have right now and where you want to arrive – there are so many different roles in data science, it’s important to understand and assess you current knowledge base. Let’s say you’re working in HR and want to change careers – learn about HR analytics! If you’re a lawyer – understand the data applications in the legal industry. The fact is that the hunger for insights is so big that all industries and business functions have started using it. If you already have a job then try to understand what can be optimized or solved by using data and learn how to do it yourself. It’s going to be gradual and long shift but you will still have a job and learn by doing it in the real world. If you are a recent graduate or a student – you have a perfect chance to figure out what are you passionate about – maybe movies, maybe music, or maybe cars? You wouldn’t imagine the amount of data scientists these industries employ – and they are all crazy about the fields they’re working in.
Rule 2 – Learn the basics very well. Although the specifics of the each data science field are very different, the basics are the same. There are three areas where you should develop very strong foundations – basic data analysis, introductory statistics and coding.
Data analysis. You should understand and practice (a lot!) the basic data analysis techniques – what is a data table, how to join tables, what are the main techniques to analyze data organized in such a way, how to build summary views on your dataset and draw initial conclusions from it, what is the exploratory data analysis, which visualizations can help you understand and learn from data. This is very basic but believe me – master this you’ll have the fundamental skill that is absolutely mandatory for the job.
Statistics. Also, get a very good grasp of introductory statistics – what is mean, median, when to use one over the other, what is standard deviation and when is doesn’t make any sense to use it, why averages “lie” but are still the most used aggregated value everywhere. And when I say “introductory” I really mean “introductory”. Unless you are a mathematician and plan to become an econometrician who applies advanced statistical and econometric models to explain complex phenomenons – then yes, learn advanced statistics. If you don’t have PhD in mathematics, just take your time and be patient and get a really good grasp of the basic statistics and probability.
Coding. And off course – learn how to code. This is the most over-used cliché advice but it’s actually a sound one. You should start from learning how to query a database with SQL first – believe it or not, most of the time data science teams spend are on data pulling and preparation, and a lot of that is done with SQL. So get your basics in place – build your own small database, write some “select * from my_table” lines and get a good grasp of the SQL fundamentals. You should also learn one (start with just one) data analysis language – be it R or Python, both are great – that does make a difference and many positions require it, although not all. First learn the basics of the language you chose with focus on how to do data analysis with it. You don’t have to become a programmer to succeed in the field, it’s all about knowing how to use the language to analyze and visualize data.
Rule 3 – Data science is about solving problems – find and solve one. The thing I have learned over the years is that one of the fundamental requirements for a data scientist is to be always asking questions and looking for problems. Now I don’t advice to do it 24/7 as you will definitely go insane, but be prepared to be the problem solver and be looking for them non-stop. Start small, find areas in your own life that can benefit from some analysis – you will be amazed how much data is available out there. Maybe you will analyze your spending patterns, identify sentiment patterns of your emails, or just build nice charts to track your city’s finances. The data scientist is responsible for questioning everything – is this marketing campaign effective, are there any concerning trends in the business, do some products under-perform and should be taken off the market, does the discount the company gives makes sense or is it too big – these questions become hypotheses that are then validated or rejected by the data scientist. The hypotheses are the raw material of a data scientist as the more of them you will solve and explain – the better you’ll be in your job.
Rule 4 – Start doing instead of planning what you will do “when”. This is applicable to any learning behavior but it’s especially true in data science. Be sure you start “doing” from the very first day you start learning. It’s very easy to put off the actual learning by just reading “about” data science, how it “should” be done, copy-pasting data analysis code from the book and running it on very simple datasets which you will never ever get in the real world.
With everything you learn – be sure you start applying it to the field you’re passionate about. That’s where the magic happens – writing your first line of code and seeing it fail, being stuck and not knowing what to do next, looking for the answer, finding a lot of different solutions none of which work, struggling to build your own one and finally passing a milestone – the “aha!” moment. This is how you will actually learn. Learning by doing is the only way to learn data science – you don’t learn how to ride bike by reading about it, right? Same rule applies here – whatever you learn, be sure you apply it immediately and solve actual problems with real data.
“If you spend too much time thinking about a thing, you’ll never get it done.” – a quote from one of the most famous martial artists Bruce Lee captures the essence of this post. You have to apply what you learn and make sure you make your own mistakes – this is the only way you will learn and improve. And if Bruce Lee doesn’t convince you, maybe Shia LeBeouf will: