A Brief Introduction to: Data Science (Part 1/3)
Updated: Aug 26, 2020
Welcome to the first of the three part series of blogs in which we will explore the field of Data Science to get a better understanding of the field which was popularized by Harvard University's study, in which they notably called it the "Sexiest job of the 21st century".
The goal of this article is to get you understand the power of data science and get you excited about this incredibly innovative field of computing.
A simple google search about Data Science presents the definition as mentioned below,but what does it really mean?
Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining, deep learning and big data.
This definition of Data Science is technically correct, but more often than not, fails to generate an excitement in the reader. By the end of this article, we hope to make you see the potential that this field has to offer and become excited about it, rather than be intimidated by the definition.
Over the years, with the advancement and popularization of technology in the form of Cell phones, ioT devices, availability of the Internet and Online Platforms (Youtube,Facebook,Netflix,Twitter,Linkedin,Twitch,Spotify,Instagram to name a few) and more people than ever now finally having access to the internet, the data generated by users has only grown exponentially.
To give you an idea about just how much data is generated, here's an infographic:
With such high volumes of data being generated, many Businesses realized the value of collecting and analyzing this data in order to get a better insight regarding their customers. However, this task of gathering data from people and it's analysis isn't a new phenomenon, but has always been around. In some form or another, many people have already worked on data science without even knowing it.
Some of the applications of Data Science, without explicitly having called it that would be:
Getting customer feedback through surveys & Focus groups has been a go to strategy for businesses to get better insight regarding their customers, in order to make better and more informed business strategies.
Another interesting application in which most people would've participated in is the Census, which is a record of all the people living in a country.
Medical Research involving case studies and clinical trials to understand the effect of certain medicines on patients.
So, what is Data Science,exactly?
Data Science is a relatively new paradigm of computation and hence it's definition is still evolving but it deals with collection,processing,analysis of datasets to gather meaningful insights that can be used as a decision support tool- for instance classifying data,grouping similar data,finding patterns within data which can be used to develop strategies and making predictions.
What makes Data Science different?
Data Science is thought of a new paradigm in comparison to the traditional methods of software development (in which developers program each and every aspect of the software to behave a certain way) because we usually use the data to get some kind of a rule,which can explain the data.
To visualize what you've just read take a look at this graphic which makes it simpler to understand:
Machine Learning is a very important technique used by Data Scientists to analyse data, and it is something we shall explore in a future post.
Where is Data Science used?
One of the things that makes Data Science so popular and gives it so much potential is the wide range of applicability - it's applications are truly only limited by someone's imagination.
To be more specific, here are some of the most interesting applications we came across:
Data Science has already made a considerable impact in Sports for teams to make highly informed decisions & strategies. Here we look at two case studies in Liverpool FC and the Houston Rockets.
i) Liverpool FC:
If you're a football fan, you probably know where this is going.
A sporting director, or director of sport, is an executive management position in a body concerned with sport. The role is best known as a manager role at continental European football clubs, which are usually "sports clubs" offering many types of sports. While the coach takes care of the team in daily work, the manager or sporting director takes care of hiring the team. The sporting director is, in many cases, a member of the executive board and therefore an executive director.
Micheal Edwards has been at Liverpool FC since 2016 and has played a key role in engineering transfers of players using analytics to the club such as Mohamed Salah, Sadio Mane, Roberto Firmino, Andy Robertson, Georginio Wijnaldum, Fabinho, Alisson and Virgil van Dijk. All the mentioned players at the time of joining the club weren't necessarily considered to be the best in the game or highly valued.
But using data and statistical methods, their playing styles were analyzed and they were considered to be a good fit for the club.
With the help of Data Analytics and Great management, the club won their first Premier League title this year,a huge achievement considering that they hadn't won it in the last 30 years.
ii) Houston Rockets:
Daryl Morey,who is the General Manager of the Houston Rockets, was a former consultant and MIT Sloan Graduate who never played professional or college basketball. As a statistical consultant, he saw the oppurtunity to apply his knowledge in the domain of basketball. This worked brilliantly as he gained a deeper understanding of the players' style and how teams can adapt their game plan to win games.
One of Morey's first insights was to influence the team into taking more Three-point shots. Three-point shots in basketball are more difficult because they are further away from the basket, but Morey recognized that the 50% increase in points received for the three-point shot (compared to a two-point shot) made it more mathematically efficient than almost all two-point shots other than dunks and lay-ups. In particular, Morey (and others) realized that three-point attempts from the corner of the court (“corner threes”) had a higher percentage chance of going in (because the shape of the three-point line made corner threes closer to the basket) and were therefore more valuable.
Many set plays are now designed specifically to get strong three-point shooters open for corner threes. The on-court numbers back up the commitment to this strategy as the Rockets have already broken the record for most three-point shots made in a season, with games to spare.
Businesses were one of the first adopters of the data science practices to better understand their customers. One of the most popular roles in data science is that of a Business Analyst.
Here's how businesses use data science :
i)Customer Segmentation: what products they like,why they like them and observing trends to predict future demand or patterns in customer behavior to design strategies and modify business plans.
Identifying different customers, the life time value proposition they have, segmentation.
ii)Descriptive,Prescriptive & Predictive Analytics : To analyse how a product has been performing in quarter or predicting how likely it is to succeed.Involves financial reporting and forecasting methods such as Return on Investment, Cost benefit analysis, Time Series Analysis.Prescriptive methods deal with diagnosing a problem and trying to find the most appropriate solution depending on the context.
These methodologies are used by businesses to leverage consumer data to develop marketing and branding strategies.
iii)Research and Development
Businesses also use data analytics in R&D departments to improve their quality of products.
Companies such as Tesla use Deep Learning which is a highly specialized version of data science to produce automated, self driving cars. Tesla's autopilot feature is one of the most popular implementations of fully autonomous vehicles, and has forced industry leaders in the automotive industry to take Artificial Intelligence more seriously.
The idea of recommendation systems is to make suggestions that people will like.
Companies such as Amazon, Google, Instagram, LinkedIn and many others have built their business around the idea of recommending relevant products and services to their users (using algorithms and techniques such as Nearest Neighbours,Collaborative Filtering and many more novel algorithms, more on this in a future article).
The rise of online digital media has exponentially grown over recent times and now it's almost impossible to think of a world without Spotify or Netflix. One of the issues that started arising as more people started using these services was that there was way too much content and there were a rising number of users. To combat this, Netflix had announced a Movie recommender system competition open to all, and their challenge was for people to beat the existing recommender system that they were using, for a prize of a Million Dollars.
This was an turning point in the analytics industry as more people started working on the problem and more people started learning about Data Science.
Youtube is another popular example of which people get video recommendations based around the kind of content they watch.
Data Science has traditionally been a very academic discipline as it involves designing mathematical models and used to be considered for research regarding available data.
It has been used widely in conducting
i) Medical Diagnosis:
Figuring out the cause behind a health problem is one of the main tasks of doctors, and by using data science techniques such as decision trees and evaluating medical records of patients, it becomes an opportunity for higher quality of healthcare, as the computer could potentially detect an underlying condition which the given doctor may not notice at first glance.Detecting potential health conditions such as cancer cell growth, tumors, diagnosing disease based on conditions are some of the main applications in this area.
ii) Medical Prognosis:
Prognosis is the branch of medicine specializing in predicting the future of a patient's health.
It is quite useful in informing patients about their risks of developing an illness and can help for more guided treatment for patients who need more attention.
The way this is done is that a profile of the patient is created. A profile is a record containing information such as clinical history,physical examination results and Lab and Imaging records. A mathematical model is used to generate a risk score, which indicates the risk associated to the patient's health.
iii) Medical Treatment: Medical treatment is the management and care of a patient to combat disease or disorder. The effect of the medication can be analysed using the patient's data, and a mode of behavior categorized for the drug, this is often the strategy used to conduct clinical trials of drugs before they are released for the masses.
Data Science techniques are being increasingly used in the medical community and has lead to the rise of many businesses as well, making it an interesting and useful application of Data Science.The possibilities are seemingly endless,which makes data analysis such a valuable skill to have.
Spacecrafts and satellites operating in space generate huge amounts of data due to the complexity of their research missions. Transmission of the data from outer space to the earth is an area where scientists face issues.
When the data is sent back to the earth in the form of images/video on the other planet, it is often analysed by experts to better understand the atmosphere,land composition and many more complex ecological features( this analysis often involves Computer Vision techniques such as CNNs).
Nasa and SpaceX are leading organizations working to improve our understanding of the universe and some of the technology they use in Space vehicle navigation and landing involves Data Science;For instance:
i)Nasa's Jet Propulsion Laboratory (JPL) : Allows the spacecraft to self-adjust for example orbit and velocity, and can support ground navigation systems to control a spacecraft’s flight path, engine power and orbital position. A spacecraft’s onboard machine learning algorithm also has the potential to perform autonomous navigation in deep space.
ii)SpaceX Falcon 9’s successful landing at Cape Canaveral Air Force Station in 2015 demonstrated machine learning and computer vision’s power to transform space exploration. SpaceX used a convex optimization algorithm to determine the best way to land the rocket, with real-time computer vision data aiding route prediction.
6) AUTONOMOUS VEHICLES:
The idea of self-driving cars has been around for a while, however it has become wildly successful in the last 10 years or so, with Tesla leading the way with their autopilot feature.
Many other organizations around the world have followed in pursuit of this dream of a world without car drivers and some companies have succeeded in innovating a dream into reality.
The intuition behind self driving cars is quite complex and requires a solid understanding of mathematics(calculus, algebra,differential equations),physics(motion,aerodynamics,friction and many more concepts) but a highly simplified way to understand how it works could be to imagine what it is like to drive a car. If you already know how to drive a car, some of these things should be quite intuitive to you:
Having an understanding of the environment around us and ourselves.Some of the tasks which are quite easy for us as humans to understand, are often quite challenging for machines. These tasks could be to detect objects on the road such as people,other vehicles or road signs. Data Science techniques such as Computer Vision play a crucial role in solving these category of problems.
ii)Planning: Planning can be thought of as a process of thinking about the activities required to achieve a desired goal. It is one of the first steps towards achieving a goal,and in driving it is a very important skill as one needs to make quick,rational and logical decisions to drive safely.
Some of the tasks in planning that data scientists work on are to: Making decisions about how to react to situations - anticipating other driver's movements, Making lane changes,Speeding up or slowing down.
Robotics is an interdisciplinary research field of study in which computer science converges with traditional engineering. The idea of robots has long been popularized by movies such as I,Robot which was a film featuring Will Smith, arguably far ahead of its time and more recently has been seen in movies such as the Transformers,Elysium,Ex-machina and blade runner 2049.
The goal of robotics is to design intelligent machines that can help and assist humans in their day-to-day lives and keep everyone safe.Robotics is one of the most potentially dangerous implementations of artificial intelligence as it threatens the automation of millions of jobs. Leaders in the industry have raised concerns regarding the intelligence of the robots as they become smarter and smarter, things might get to a point where they achieve human level performance in daily tasks, and become a threat to us physically as well, as potrayed in the movie I,Robot.
As the field is quite new and slowly starting to gain momentum, people are still learning about these techniques and many business have started to see the value of analytics to their organizations.Data Science isn't just another trend in technology, it's the next chapter in the technological revolution and the power of Artificial Intelligence will only become more visible over the upcoming years.
In the next part of this series,we will look at some of the most fundamental techniques used by Data Scientists, so stay tuned and stay safe.