Technology

What is Data Analytics?

July 30, 2020

Data analytics describes a broad field with the unifying characteristic being that the goal is to derive meaningful conclusions from the analysis of raw, usually unstructured, data. Data analytics can thus reveal trends that would otherwise be lost within the data. The main use of this information is to optimize processes and increase the overall efficiency of a business or system.

Types of Data Analytics

Generally, data analytics is divided into four main types: descriptive, diagnostic, predictive, and prescriptive. These can also be seen as steps to take within a full analytics process since every one of these types builds on the results of the previous one.

The goal in descriptive analytics is to answer questions about what has happened in the past. The techniques in this field are used to summarize large datasets and describe the outcomes to stakeholders via key performance indicators or similar metrics. Note that this is a purely descriptive process: the results are given as is, but not yet commented on. Diagnostic analytics takes this one step further.

After the “what happened” the natural next question answered here is “why did it happen”. This is generally achieved by taking a deep dive into the results of the descriptive analytics process with special attention to anomalies in the data. The goal is to find such anomalies (e.g. an unexpected change in a performance metric), collect data related particularly to these anomalies, and then leverage statistical techniques to find an explanation for the anomalies.

Up to this point, the goals have been backward oriented: we described and explained the current state. The next step, predictive analytics, goes beyond the current state and aims to give insight into the question “what will likely happen next”. Predictive analytics uses historical data over time to identify trends and argue whether they are likely to recur. This can provide valuable insights into potential future paths, and often also a degree of confidence measuring how likely a specific outcome is.

The techniques used here contain a variety of classical statistical models and more modern machine learning methods, such as neural networks, decision trees, or regression. The final type, prescriptive analytics, is the logical conclusion to this process. The goal here is to take the data insights from the predictive analytics process and derive recommendations for future behavior from it.

This can again be done using machine learning techniques: by evaluating the likely effects of different decisions on the key performance indicators, prescriptive analytics can help businesses to make informed decisions about their future direction in the market.

Sometimes the former two types (descriptive and diagnostic) are considered to be “data analysis” rather than “data analytics”. Nowadays data analytics often implies work with big data, that is, analysis of large volume and/or high-velocity data. However, while this type of data does present unique data handling and management challenges, the general principles above apply to any kind of raw data.

accountant, counting, calculation

The Data Analytics Pipeline

Each of the above types generally involves the data analysis pipeline of data mining, data management, statistical analysis, and data presentation. How involved each of these steps is, depends on the specific goal in question. For example, predictive analytics building on already existing descriptive results will be lighter on the data mining and data management parts, but very involved with the statistical analysis.

Data mining is an essential first process for many data analytics tasks. It involves extracting data from unstructured sources, which may include written text, large databases, raw sensor data, or other types of data. A key step in this process is the so-called “ETL-process”: Extract the data from the various sources, transform it into a useful and consistent format, and load it into a database or data warehouse to then continue working with it. This is often the most time-consuming step in the data analysis pipeline.

Data management or data warehousing is the design and implementation of databases that allow easy and structured access to the results of data mining. Nowadays this is often achieved using a combination of relational (SQL) databases and non-relational or NoSQL databases, depending on the specific data being considered.

Statistical analysis is where the insights are created from the data. It is easy to say that this is the most important step in the pipeline, but this is not necessarily true: Statistical analysis is only useful if the input data was well prepared and collected, otherwise, the results are likely to be useless at best and misleading at worst.

Finally, data presentation is an easily overlooked, but very important step. Data visualization allows the Data analyst to take a collection of numbers and spin them into a compelling narrative that helps executives and managers to understand the insights generated by the analysis and their importance for the business decisions to be made from them.

Tools for Data Analytics

Typically, data analytics combines a variety of tools. A non-exhaustive list of examples includes Data management and Data Mining tools such as AWS Glue, SQL Servers such as PostgreSQL, MySQL/MariaDB, MS SQL Server, Oracle SQL, or fully-fledged data warehousing solutions like Oracle Warehouse Builder, SAP-Business Objects, IBM Information Server, and Apache Hadoop.

Examples of Non-relational databases are MongoDB or Redis. The statistical analysis is typically done using a statistical programming language like R or Python (with the Pandas library). Further libraries and tools include scikit-learn, Apache Spark, or SQL. Data Presentation is the realm of visualization tools such as Tableau, Microsoft PowerBI, or the python libraries matplotlib and plotly.

If you want to dig deeper into data science, you can check out Data Science Course in Hyderabad.