A deep dive understanding of Data Science and Career Opportunities

The Data Science Triangle

Always there is this question, What exactly is Data Science?  Let we understand the principle component of this question, that is Data.  Data is a piece of fact about any entity-be it a person, company, product or services, for an example, a employee's name, a product price, location of a company - all these piece of fact are called data.

Data is generated in huge quantities by many businesses and individuals, take the example of a bank, they generate huge amont of data such as customers personal information,  their transactions, nominee details and so on.  Imagine big e-commerce sites which has data about millions of products and services, customer data, user reviews and replies.  Based on how these data are organized, stored and managed, they can be classified into Structured data,  semi-structured data and unstructured data.

Structured Data

If you can arrange a data in the form of a table, then it can be defined as a Structured Data.  In the bank example, every customer account data is arranged in rows and columns as if in a passbok or monthly statement, that is in tabular form, hence these can be defined as Structured Data.

Un-Structured Data

If a data cannot be arranged in a tabular format, or in any specific format, then those kind of data are said to be Un-structured Data.  Example of un-structured data can be news feeds, social media postings, a mix of different types of data, all are un-structured data

Semi-Structured Data

If data could not be arranged in a tabular form, but still if they follow any specific format, then those type of data are called Semi-Structured Data.  A program souce code, a XML or JSON type of data are all semi-structured.

Database Management System

When huge quantity of data are generated digitally and endlessly every moment, then we need some technology and tools to store and manage those data.  A software which helps in storing, retrieving and managing huge quantity of data is termed as a Database Management System or simply DBMS.  Even though there exists various techniques to store and manage data digitally, the most popular is the RDBMS model or the Relational Database Management System.

Oracle, MySQL, PostgreSQL are the most popular database software in the RDBMS category, they are also called SQL Databases.  The SQL or Structured Query Language, an easy to use programming language  that is used to Manage the data in the RDBMS products.   The role of designing, creating and managing databases is called Database Developer or SQL Developer. Join the MySQL Database Developer Course at skillphenix and learn everything to become an expert Database Developer.

The limitations of RDBMS software is that, they are specifically designed to effeciently handle Structured Data.  Hence to manage semi-structured and un-structured data, there are products like MongoDB, Cassendra which are very popular, and these products are also called as NoSQL databases. There is this NoSQL programming language to manage data in unstructured databases.

Any bigger company needs both SQL and NoSQL based databases to store and manage huge quantity of their business data.  Hence there is a huge demand for both SQL and NoSQL database Developers.

Data Engineering

There is a necessity for all businesses to survive and to optimize their operations costs and to expands business onto new territories so as to increase their profits and grow bigger.  These businesses needs insights about their customer behaviour, market dynamics and change in trends to better price their products and services or to launch new products or services.

As bigger enterprises have various functions and departments, often they use different types of applications to manage data.  For example, the customer service department may maintain their data in excel sheets, whereas the sales department may maintain their data using CRM software such as Salesforce.  The HR and Payroll department may maintain their employees data in a separate application.  All these data may have stored in different storage and file formats. 

In order to Analyze the data, all the data from different format has to be collatted and normalized to apply various analysis.  This is acheived by a process called Extraction, Transformation and Loading - ETL.  There are several popular tools to perform ETL operations such as Informatica, Talend and so on.  A Data Warehouse which is a special database having advanced processing and transactional abilities to store both Structured and Un Structured data is used to Load the Extracted and Trasnformed data.

Data Engineers are responsible for ETL and to build and manage Data Warehouses. Their job role involves understanding the data requirements, identifying the right data sources, plan for Extraction and Transformation of data, and to Load onto a Data Warehouse software.

Data Analyst - Data Analysis or Data Analytics

The huge quantity of data stored in those data warehouse - often referred as Big Data, has to be to processed to get usable and relevant information that can be used by businesses for informed decision making.  This process is called Data Analysis.  As the data is presented in the form of easily understandable Graphs and images, Statistically created Tables and easily comparable Graphs - all together referred as Dashboards, these dashboard helps in taking business decision faster and reduces errors and risks that are long associated in traditional method of analysis.

If analysis focuses on the processing of past data, Analytics is the process of predicting what may happen in the future by analysing the past data.  This is achieved by applying statistics, opeartions research and computer programming by identifying and analyzing patterns in the past data. Please note that analysis and analytics are often used interchangeably in practice, yet they have their own subtle differences and purpose.  When data Analysis or Analytics is performed on business data for business related solutions, then it is named as Business Intellegence Analyst or BI Analyst.

The role of Data Analyst is to apply analysis and anlytics by way of selecting, cleaning and processing data from data warehouse or various data sources. The tools used by Data Anlalyst are Python for programmatical processing of data, MySQL or any RDBMS, PowerBI or Tableau for data Visualization. 

Data Scientist

Now,  what if a multi disciplinary advanced anlytics such as advanced mathematical, statistical, scientific computing, scientific methods and algorithms, programming and processing are applied on this huge data? the result is, unimaginable insights are derived to achieve very advanced solutions, such as Artificial Intelligence, Machine Learning, Natural Language Processing, Predictive Models, Generative solutions, Pattern Recognition, and the list is endless.  This entire process is typically, defined as  Data Science.

The role of a Datascientist is multi-disciplinary, it needs mathematical, statistical, programming and domain experitise to create advanced models and algorithms for complex real-life problem solving.





Shiva Saami
A 15+ years experience in various disciplines of Information Technology