How to Analyse Categorical Data with Frequency Tables
Categorical data is known as a classification of data under numerous categories, for example, an object might be deemed as good or bad, gender can be classified as male or female, results can be divided into two groups as pass or fail, or a questionnaire can include categories such as agree, disagree, and no opinion.
Useful techniques to analyse categorical data include frequency tables, contingency tables, chi-square test and charts. This blog throws light on using frequency tables to analyse data that can be a single-way table or two-way table. Consider the following example.
One-way table
Suppose a study of educational achievement of employed women has been carried on. The target population involves the 28-years-old female group. You categorised the formal education as follows:
1 - college graduate
2 - high school
3- sr. secondary
4 - postgraduate
5 - high school dropout
6 - secondary level dropout
A sample of 200 subjects was drawn from the population of 28-years-old working females, and the following frequency was obtained.
1 - 30
2 - 55
3 - 45
4 - 20
5 - 20
6 - 30
The frequency table will be as follows:
Frequency | Percent(%) | Valid percent(%) | Cumulative Percent(%) | |
---|---|---|---|---|
1 | 30 | 15 | 15 | 15 |
2 | 55 | 27.5 | 27.5 | 42.5 |
3 | 45 | 22.5 | 22.5 | 65 |
4 | 20 | 10 | 10 | 75 |
5 | 20 | 10 | 10 | 85 |
6 | 30 | 15 | 15 | 100 |
Total | 200 | 100 | 100 |
Here, there was no missing value that’s why percentage values were the same as valid percentage values. Otherwise, the missing value will be deducted from the total frequency value to calculate the valid percentage.
Two-way table
A two-way frequency table measures two variables, one divided into rows and another split into columns. For instance, suppose a group of 20 individuals was asked to identify their complexion and hair colour. You found the following results.
Hair Colour | Sallow | Fair | Black | Tan | Total |
---|---|---|---|---|---|
Red | 1 | 2 | 1 | 1 | 5 |
Blonde | 2 | 0 | 3 | 1 | 6 |
Brown | 1 | 2 | 2 | 0 | 5 |
Black | 1 | 0 | 3 | 0 | 4 |
Total | 5 | 4 | 9 | 2 | 20 |
Skin Complexion
The total of each category is known as marginal distributions indicating the number of individuals in each row and column without having the effect of another variable. For instance, the total number of people with sallow complexion irrespective of their hair colour is 5.
Hair Colour | % value |
---|---|
Red | 25 |
Blonde | 30 |
Brown | 25 |
Black | 20 |
Complexion | % value |
---|---|
Sallow | 25 |
Fair | 20 |
Black | 45 |
Tan | 10 |
You have analysed the percentage of each hair colour and complexion separately. However, you might want to investigate the percentage value of one variable in relation to another variable. Out of 6 people who have blonde hair, 3 possess black eyes that means 50% of people with blonde hair have black eyes.
Chi-square for the above two-way table is as follows:
Chi Square = 7.489
Degrees of Freedom = 9
p-value = 0.5864