After data collection, the data should be managed and processed in a systematic order to obtain valid results through appropriate analysis techniques. There are certain data processing operations before the data go through analysis procedure.
1. Editing:
Editing of data is a process of examining the collected raw data (especially in surveys) to detect errors and omissions and to correct these when possible. As a matter of fact, editing involves a careful scrutiny of the completed questionnaires and/or schedules. Editing is done to assure that the data are accurate, consistent with other facts gathered, uniformly entered, as completed as possible and have been well arranged to facilitate coding and tabulation.
- Field Editing: It consists in the review of the reporting forms by the investigator for completing (translating or rewriting) what the participants have written in abbreviated and/or in illegible format at the time of recording the respondents’ responses.
- Central Editing: It should take place when all forms or schedules have been completed and returned to the office.
2. Coding:
Coding refers to the process of assigning numerals or other symbols to answers so that responses can be put into a limited number of categories or classes. Such classes should be appropriate to the research problem under consideration. Coding is necessary for efficient analysis and through it the several replies may be reduced to a small number of classes which contain the critical information required for analysis.
3. Classification:
Most research studies result in a large volume of raw data which must be reduced into homogeneous groups if we are to get meaningful relationships. Data having a common characteristic are placed in one class and in this way the entire data get divided into a number of groups or classes.
5. Data Cleaning:
6. Tabulation:
7. Data Analysis:
a. Descriptive and Causal Analyses:
b. Inferential analysis/Statistical analysis:
- Classification according to attributes: For e.g. data are classified on the basis of common characteristics such as gender, education status, religion, etc.
- Classification according to class-intervals: For e.g. age (21-30, 31-40, 41-50)
4. Data Entry:
Data entry is the act of entering information into electronic formats by using word processing or data processing software hosted on a computer and its data entry operators who perform these tasks. Some of the software for data entry are EpiData, MS EXCEL, etc. While entering data, it is always advisable to enter only numbers as far as possible, which at first, speed up entering the process and secondly reduces the chances of entering wrong, meaningless, erroneous information.
5. Data Cleaning:
Data cleansing or data cleaning is the process of identifying and removing (or correcting) inaccurate records from a dataset, table, or database and refers to recognizing unfinished, unreliable, inaccurate or non-relevant parts of the data and then restoring, remodeling, or removing the dirty or crude data.
6. Tabulation:
The process of placing classified data into tabular form is known as tabulation. Tabulation is the process of summarizing raw data and displaying the same in compact form (i.e., in the form of statistical tables) for further analysis. In a broader sense, the tabulation is an orderly arrangement of data in columns and rows.
7. Data Analysis:
Data Analysis is the process of systematically applying statistical and/or logical techniques to describe and illustrate, condense and recap, and evaluate data. The popular software for data analysis is SPSS (Statistical Package for the Social Sciences). Other softwares are R software, STATA, etc. Analysis may, therefore, be categorized as descriptive and causal analyses, and inferential analysis (Inferential analysis is often known as statistical analysis)
Descriptive analysis is largely the study of distributions of one variable. Causal analysis is concerned with the study of how one or more variables affect changes in another variable. It is thus a study of functional relationships existing between two or more variables.
- Univariate Analysis (Analysis of one variable): Frequency analysis [Measures of Central Tendency (Mean, median, mode), Measures of dispersion (range, standard deviation, mean deviation), Percentile values], Measures of skewness, One-way ANOVA, etc.
- Bivariate Analysis (Analysis of two variables): Simple regression (causal analysis and simple correlation, Association of attributes, Two-way ANOVA
- Multivariate Analysis (Analysis of more than two variables): Multiple Regression, Multiple correlation, Multi-ANOVA, Factor analysis
Inferential analysis is concerned with the various tests of significance for testing hypotheses in order to determine with what validity data can be said to indicate some conclusion or conclusions. It is also concerned with the estimation of population values. It is mainly on the basis of inferential analysis that the task of interpretation (i.e., the task of drawing inferences and conclusions) is performed. We especially perform estimation of parameter values (Point estimate, Interval estimate) and test hypotheses (Parametric tests, Non-parametric tests) in inferential or statistical analysis.