Data analysis can be defined as a process which involves the inspection, cleaning, transformation, and modeling of data with the aim of extracting information and actionable knowledge. This information or knowledge can then be used to draw inferences in a scientific study or help support the decision-making process in a business. Different data-analysis tools are used, which help you examine data sets and draw inferences from the information garnered. These tools help process data sets, analyze relationships and correlations between data sets, and identify patterns and trends within these data sets. MATLAB, also known as Matrix Laboratory, is a high-level programming language and interactive environment commonly used for your medical and scientific computation, visualization, image processing, and data analysis. SAS, which stands for Statistical Analysis Software, is a statistical software suite built for data extraction, data transformation, and predictive and business analytics. Java, when not commonly used for analysis purposes, does provide APIs which can be suitable for the purpose.
PHASES OF DATA ANALYSIS
- Data-requirement gathering,
- Data collection
- Data cleaning
- Data analysis
- Data interpretation
- Data visualization
In data-requirement gathering, we describe the data-analysis problem statement, the data we require for the analysis, and the analysis procedures and techniques to be used in the data-analysis process. It is somewhat analogous to a software-requirement specification. We determine the purpose of the analysis, what data will be analyzed, what analysis processes and methods will be employed, and what we anticipate the expected results to be.
In data collection, we determine how we will obtain the data we need. The data-collection step involves collecting data from various sources ranging from organizational databases, survey responses, unstructured text data on websites and other platforms, and so on.
In data cleaning, we addressed the existence of duplicate records, white spaces, errors, or outliers in the data. The data should be cleaned and made error-free before it can be subjected to any analysis. There are multiple ways of cleaning data depending on whether the data is missing or noisy.
For example, if we have missing data like missing attributes in the record, we can either ignore the entire record which contains the missing files, or we can fill in the missing values. There are multiple methods we can employ to fill in missing attributes of the data set. One example is to employ measures of central tenancy, like mean or median where the blank attribute is filled with the mean or median of all values in the Attribute column.
Data Interpretation is the process of understanding, organising, and interpreting the given data, for making sense of and getting a meaningful conclusion. The basic concept of data interpretation is to review the collected data by means of analytical methods and arrive at relevant conclusions.
Data visualization is the representation of information and data using charts, graphs, maps, and other visual tools. These visualizations allow us to easily understand any patterns, trends, or outliers in a data set. Data visualization also presents data to the general public or specific audiences without technical knowledge in an accessible manner.
“Data is Everywhere”, in sheets, in social media platforms or feedback, surveys. To take out the information all you need to do is analysis. The proccess of studying the data, finding patterns from the data. The result of the data analysis is the final dataset that you can further used for Data Analytics.