1. Give a brief introduction about the assignment, and include a short summary of a related article with a proper citation.
2. Give a short description about this dataset. Is this primary or secondary data? What types of variable(s) is involved? Display the first 5 cases of your dataset.
3. Explain how you collect the data and discuss whether your sample is biased. Is this primary or secondary data? What type of variable(s) is/are involved? You don’t need to display your data in this section.
Answer:
Introduction
In Australia, at the end of a financial year, a tax return file is lodged by a lot of the Australian citizens. There are two different ways in which a tax return lodgement can be made. The first one is by paying a registered tax agent and telling them to lodge the return file for them. The other method is by lodging the tax returns by themselves (Alghamdi and Rahim 2016). The study here aims at evaluating the proportion of people who prefer the tax agents to lodge the income tax and the proportion of people who prefer to do it n their own. The preference has to be assessed based on some factors such as age, income amount and deduction amount.
For the purpose of this study, data is necessary. Thus, data has been collected from the website of the Australian Taxation Office (ATO). The dataset that will be considered for this study is actually a subset of the original dataset that has been obtained from this website. A sample of 1000 people of Australia has been selected from the whole dataset for this study. The dataset contains information about the variables such as gender, age range, lodgement methods, total income amount and total deduction amount. Here. In the gender column, 0 indicates male and 1 indicates female. In the column age range, 0 indicates the person is aged more than 70 years, 1 indicates that the person is aged between 65 and 69 years, 2 indicates that the person is aged between 60 and 64 years, 3 indicates that the person is aged between 55 and 59 years, 4 indicates that the person is aged between 50 and 54 years, 5 indicates that the person is aged between 45 and 49 years, 6 indicates that the person is aged between 40 and 44 years, 7 indicates that the person is aged between 35 and 39 years, 8 indicates that the person is aged between 30 and 34 years, 9 indicates that the person is aged between 25 and 29 years, 10 indicates that the person is aged between 20 and 24 years and 11 indicates that the person is aged below 20 years. In the column lodgement method, A indicates tax agent and S indicates self preparer. Thus, these three variables are categorical variables. Again, the variables total income amount and total deduction amount are continuous numeric data. The data collected in this case is secondary data as it is obtained from the website of ATO. The first five cases of the data is attached in table 1.
There is another part to this study. In this part data has to be collected from international students about their preferences on lodgement methods of tax returns. 40 Students from different nations studying in the Australian universities were asked this question and their responses were recorded in the second dataset. In this case the data has been collected as a result of a survey and hence it is known as a primary data. The variable involved in this case is a categorical variable.
Preference of Lodgement Methods of Australian People
At first, summary of the data on the Australian people has been conducted and it can be seen clearly from table 2 as well as figure 1 that the tax agents are given preference by 738 people out of 1000 people. Thus, a very high proportion of people (73.8 percent) of Australia prefer lodging their tax returns by hiring expert and registered tax agents.
Table 2: Types of Lodgement Methods by Australian People
Lodgement_method
|
Frequency
|
Proportion
|
A
|
738
|
0.738
|
S
|
262
|
0.262
|
Grand Total
|
1000
|
1
|
Figure 1: Proportion of different types of Lodgement Methods
The confidence interval for the given data has been obtained at 95 percent confidence level and is given in table 3. It can be seen from the table that the 95 percent confidence interval for the proportion of people preferring tax agents is (71.07%, 76.53%).
Table 3: Confidence Interval for the proportion of people preferring Tax Agents
|
|
|
Data
|
Sample Size
|
1000
|
Count of Successes
|
738
|
Confidence Level
|
95%
|
|
|
|
|
Intermediate Calculations
|
Sample Proportion
|
0.738
|
z Value
|
1.9600
|
Standard Error of the Proportion
|
0.0139
|
Margin of Error
|
0.0273
|
Assumptions: n.p=738, n.q=262
|
MET
|
|
|
Confidence Interval
|
Interval Lower Limit
|
71.07%
|
Interval Upper Limit
|
76.53%
|
Thus, from the above analysis, it is clear the sample proportion of the people preferring tax agents over themselves on the account of lodging tax returns is 0.738 and also with 95 percent confidence, it can be said that the proportion of people of Australia preferring tax agents over themselves for lodging tax returns lies between 71.07 percent and 76.53 percent.
Preference of Lodgement Methods of International Students
Here, the preference of the lodgement method for tax returns is to be evaluated for the international students. It can be clearly seen from table 4 that 80 percent of the students selected prefer tax agents and 20 percent do not. Thus, quite a high proportion of students also prefer the tax agents over themselves.
Table 5 shows the necessary calculations for the 95 percent confidence interval for the proportion of students preferring tax agents. The table shows that with 95 percent confidence, within a percentage of 67.60 and 92.40 students, there is a preference for tax agents for lodging tax returns.
Table 6 shows the comparisons of the two proportions for the preference of tax agents to lodge tax returns. In order to run this test, the following hypothesis has been defined:
Null Hypothesis (H0): There is no difference in the proportions.
Alternate Hypothesis (H1): There is significant difference in the proportions.
From the results of the test, it can be seen that the null hypothesis is accepted. Hence there is no difference in the proportions for the preference of tax agents over self preparers for lodgement of tax returns by both the international students and the people of Australia (Krishnamoorthy 2016).
Table 4: Types of Lodgement Methods by International Students
Lodgement Method
|
Frequency
|
Proportion
|
A
|
32
|
0.8
|
S
|
8
|
0.2
|
Grand Total
|
40
|
1
|
Figure 2: Types of Lodgement Methods by International Students
Table 5: Confidence Interval for the proportion of International Students preferring tax agents
|
|
|
Data
|
Sample Size
|
40
|
Count of Successes
|
32
|
Confidence Level
|
95%
|
|
|
|
|
Intermediate Calculations
|
Sample Proportion
|
0.8
|
z Value
|
1.9600
|
Standard Error of the Proportion
|
0.0632
|
Margin of Error
|
0.1240
|
Assumptions: n.p=32, n.q=8
|
MET
|
|
|
Confidence Interval
|
Interval Lower Limit
|
67.60%
|
Interval Upper Limit
|
92.40%
|
Table 6: Z Test for Two Proportions
|
|
Hypotheses
|
Null Hypothesis H0:
|
p
|
=
|
0%
|
Alternative Hypothesis HA:
|
p
|
<>
|
0%
|
Test Type
|
|
|
Two
|
Level of Significance
|
a
|
0.05
|
Sample Data
|
Sample Size Group 1
|
1000
|
Successes in Group 1
|
738
|
Sample Size Group 2
|
40
|
Successes in Group 2
|
32
|
Hypothesized Difference
|
0
|
Intermediate Calculations
|
Proportion Group 1
|
0.738
|
Proportion Group 2
|
0.8
|
Average Proportion
|
0.740385
|
Difference in Two Proportions
|
-0.062
|
Z
|
-0.87702
|
p-value
|
0.380474
|
Hypothesis Test Decision
|
Do not reject Ho
|
Lodgement Method and Age Group
The relationship between age group and lodgement methods is given in table 7. It can be seen from table 7 and figure 3 clearly that people of all age ranges prefer agents to lodge for their tax returns rather than themselves. The difference in the proportion of preference is quite high in all the age groups except for the people aged below 24 years. For the people aged below 24 years, the difference in the proportion of preference for tax agents and themselves are very less.
Table 7: Age range and Lodgement Method (Observed Frequency Table)
Age Range
|
Lodgement Method
|
|
A
|
S
|
Grand Total
|
0
|
43
|
10
|
53
|
1
|
29
|
10
|
39
|
2
|
59
|
13
|
72
|
3
|
69
|
14
|
83
|
4
|
89
|
18
|
107
|
5
|
88
|
20
|
108
|
6
|
70
|
21
|
91
|
7
|
89
|
25
|
114
|
8
|
60
|
33
|
93
|
9
|
79
|
40
|
119
|
10
|
45
|
43
|
88
|
11
|
18
|
15
|
33
|
Grand Total
|
738
|
262
|
1000
|
To test for the association between the age groups and lodgement methods, a chi square test of association has to be conducted (Sharpe 2015). To conduct this test, the expected frequencies has to be calculated which us given in table 8. The null and the alternate hypothesis for this test is given below:
Null Hypothesis (H0): There is no significant association between age range and Lodgement Methods.
Alternate Hypothesis (H1): There is significant association between age range and Lodgement Methods.
The significance value for this test has been found to be less than 0.05 (the level of significance). Thus, the null hypothesis has been rejected. Thus, there is an association between age range and lodgement methods.
Table 8: Expected Values of Age Range and Lodgement Methods
|
Age Range
|
A
|
S
|
Grand Total
|
0
|
39.114
|
13.886
|
53
|
1
|
28.782
|
10.218
|
39
|
2
|
53.136
|
18.864
|
72
|
3
|
61.254
|
21.746
|
83
|
4
|
78.966
|
28.034
|
107
|
5
|
79.704
|
28.296
|
108
|
6
|
67.158
|
23.842
|
91
|
7
|
84.132
|
29.868
|
114
|
8
|
68.634
|
24.366
|
93
|
9
|
87.822
|
31.178
|
119
|
10
|
64.944
|
23.056
|
88
|
11
|
24.354
|
8.646
|
33
|
Grand Total
|
738
|
262
|
1000
|
Lodgement Method and Total Income Amount
Table 9 gives the summary of the average of the total income amount earned by the people preferring tax agents and themselves respectively for lodging tax returns. It is very clear from the table that people with higher income prefer tax agents and lower income count on themselves for lodging tax returns.
Table 9: Summary of average Total Income
Lodgement Methods
|
Average of Tot_inc_amt
|
A
|
67612.57859
|
S
|
45202.08397
|
Grand Total
|
61741.029
|
It can be seen from table 10 that the average income of the people hiring tax agents are $67,612.58 and self preparer are $45,202.08. On the other hand, it can also be seen that there are 50 percent of the people who earn more than $46,363 and $35,726 and hire tax agents or prepare by themselves respectively. As there s huge difference between these values and the average is quite high, this indicates that there are some people who earn a lot higher than the others.
It has been observed that there are 34 such people hiring tax agents who earn a lot higher than the other people of the group and 10 such people lodge for themselves who earn a lot higher than the other people of the group. Thus, the incomes are not close to the mean income mostly. There are more people who earn higher than the average.
Table 10: Summary of Lodgement Methods
|
|
A
|
S
|
Mean
|
67612.58
|
45202.08
|
Standard Error
|
6398.625
|
2732.934
|
Median
|
46353
|
35726
|
Mode
|
0
|
0
|
Standard Deviation
|
173826
|
44236.4
|
Sample Variance
|
3.02E+10
|
1.96E+09
|
Kurtosis
|
484.0539
|
18.67934
|
Skewness
|
20.36088
|
3.408487
|
Range
|
4308305
|
380484
|
Minimum
|
-1942
|
-7
|
Maximum
|
4306363
|
380477
|
Sum
|
49898083
|
11842946
|
Count
|
738
|
262
|
First Quartile
|
25848.75
|
17830.75
|
Third Quartile
|
77811.25
|
58913
|
Interquartile Range
|
51962.5
|
41082.25
|
Table 11: Outlier Ranges
Lower Outiler Range
|
-52095
|
-43792.6
|
Upper Outlier Range
|
155755
|
120536.4
|
Total Income Amount and Total Deduction Amount
Figure 5 shows the relationship between the Total Income and Total Deduction of the people of Australia. It can be seen from the figure only that with the increase in the total income, there is Increase in the total deduction. Thus, there is a positive relation between the two variables.
The degree of the relationship between the income and the deduction has been obtained as a result of the regression analysis (Costa 2017). The relationship between Income and Deduction can be established with the following equation:
Total deduction amount = 470.37 + (0.0325 * Total income amount)
54.98 precent of the variability in the deduction amount can be explained by the income. The model is quite a good fit as the significance value obtained from table 13 is less than the level of significance (0.05).
Table 12: Regression Statistics
|
Multiple R
|
0.741468594
|
R Square
|
0.549775676
|
Adjusted R Square
|
0.54932455
|
Standard Error
|
4455.001408
|
Observations
|
1000
|
Table 13: ANOVA
|
|
df
|
SS
|
MS
|
F
|
Significance F
|
Regression
|
1
|
24187044267
|
2.42E+10
|
1218.673
|
3.9162E-175
|
Residual
|
998
|
19807343468
|
19847038
|
|
|
Total
|
999
|
43994387735
|
|
|
|
Table 14: Coefficients of Regression Equation
|
Coefficients
|
Standard Error
|
t Stat
|
P-value
|
Lower 95%
|
Upper 95%
|
Intercept
|
470.366301
|
152.164921
|
3.091161
|
0.002049
|
171.7664046
|
768.9661973
|
Tot_inc_amt
|
0.032515845
|
0.000931433
|
34.90949
|
3.9E-175
|
0.030688053
|
0.034343636
|
Conclusion
From the analysis conducted in the former sections, it has been observed that the preference for the tax agents is much higher than self preparers on the account lodging tax returns. Both the people of Australia and international students prefer to appoint tax agents to lodge their returns. People below 24 years old have shown equal preference towards both the types of lodgement methods. Total income and Total deduction due to tax has been observed to have a direct relationship. With the increase in income in the deduction amount increases as well.
In the study so far, the variable gender has not been considered at all. This study can be further proceeded by considering the factor gender and preference of lodgement methods according to gender could also be evaluated.
References
Alghamdi, A. and Rahim, M., 2016. Development of a Measurement Scale for User Satisfaction with E-tax Systems in Australia. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XXVII (pp. 64-83). Springer Berlin Heidelberg.
Krishnamoorthy, K., 2016. Handbook of statistical distributions with applications. CRC Press.
Sharpe, D., 2015. Your chi-square test is statistically significant: Now what?. Practical Assessment, Research & Evaluation, 20.
Costa, V., 2017. Correlation and Regression. In Fundamentals of Statistical Hydrology (pp. 391-440). Springer International Publishing.