Part 1: Summarizing Salaries
1. Shown below are salaries for administrators and teachers in a hypothetical secondary school district. Discussions over a possible salary increase are at an impasse If you were the chair of the teacher's negotiating team for salary increases for the coming year, which measure of central tendency (mean, median, or mode) would you want to present to the arbitrator? Explain.
John Thomas, Principal $86,500.00
Alice Waters, Asst. Principal $64,000.00
Doris Adams, Head Counselor $52,500.00
Alice Johns, Asst. Principal $72,000.00
Joan Jeter, Teacher $32,500.00
Barbara Ho, Teacher $47,000.00
Susan Sing, Teacher $49,325.00
Bobbie Davis, Teacher $37,000/00
Jean Ellis, Teacher $36,500.00
Bill Hong, Teacher $41,000.00
Susan Sadler, Teacher $36,325.00
Allison Davis, Teacher $34,000.00
Bruce Owyoung, Teacher $34,000.00
Sallie Sucre, Teacher $34,000.00
Theodore Adams, Teacher $34,000.00
Danny Wong, Teacher $34,000.00
Jesus Contreras, Teacher $34,000.00
David Ellis, Teacher $34,000.00
Dan Simpson, Teacher $34,000.00
Philip Contrepper, Teacher $34,000.00
The modal value is the best measure of central tendency that should be presented to the arbitrator since the modal value of $ 34,000 which also happens to be the lowest salary in the school district. A strong argument is that nine of the 20 individuals chosen have a low salary of $34000.
2. Would you expect the following correlations to be positive or negative? Why?
(a) Bowling scores and golf scores
I would expect them to be negative because as much as both require good hand eye coordination. However, the amount of force applied should not be too much in a Golf swing while in bowling strength of the throw is key to high scores.
(b) Reading scores and arithmetic scores for sixth-graders
Positive correlation is expected because the cognitive activities for both reading and arithmetic require the same region of the cerebral cortex hence they go hand in hand.
(c) Age and weight for a group of 5-year-olds; for a group of people over 70
The age and weight of five year olds demonstrate a positive correlation due to rapid growth process in the formative years. For a 70 year old people there is a negative correlation between age and weight since muscle atrophy and bone demineralization occurs in old age.
(d) Life expectancy at age 40 and frequency of smoking
Frequency of smoking causes a negative correlation on life expectancy since smoking leads to high risk of lung diseases and cancers of breathing system tissue.
(e) Size and strength for junior high students
There is a positive correlation because the bigger the student the more the muscle mass hence the higher the strength.
3. Why do you think so many people mistrust statistics? How might such mistrust be alleviated?
Causes of distrust
Many people around the world do not trust statistics mostly due to a variety of reasons as provided below.
1. Inaccuracy of data collected sometimes the data collected is not correct as a result of use of wrong equipment and equipment errors.
2. Human bias in the collection of data results in the neglect of critical information or the deliberate leaving out of information.
3. Manipulation of data to suit political or economic motives- it is a common practice for institutions around the world change raw data just to suit their intentions.
4. Inappropriate choice of statistical methods that does not suit the type of data intending to be collected makes people doubt the integrity of statistics.
5. Most people are just ignorant hence they do not believe in statistics for not particular reason.
Alleviation of mistrust
The following efforts should be put in place to relieve the levels of mistrust among people:
1. Transparency in revealing all the details about statistics.
2. Education to people about the importance of statistics so that they dont view it with skepticism.
3. Exercising integrity on the part of institutions to publish statistics that are true and correct.
4. Founding of bureaus that verify and ascertain the correctness of statistical data.
5. Choice of appropriate equipment and statistical methods reduces the errors of data obtained hence people can believe in the data collected.
4. Would it be possible for two different distributions to have the same standard deviation but different means? What about the reverse? Explain.
It is possible to have different means yet have the same standard deviation. This is because the standard deviation is a measure of variance from an average and is not dependent on the mean.
A set of data can have the same mean as another and still have different standard deviations. This is because the variance from the central value
5. The larger the standard deviation of a distribution, the more heterogeneous the scores in that distribution. Is this statement true? Explain.
This statement is true because the standard deviation is a measure of variation from the average and the larger the deviation indicates that the values are very far from the mean. Such scores can mean that the data is in between two extremes and can be said to be heterogeneous in nature in that it is not congested in only one region.
6. The most complete information about a distribution of scores is provided by a frequency polygon. Is this statement true? Explain.
This statement is false because even though the frequency polygon is a good way of presenting data and getting compact information about its salient properties, there is going to be a lot of congestion of values in a particular class if the number of classes is small. One is also likely to miss some individual characteristics of a particular observation.
7. Grouping scores in a frequency distribution has its advantages but also its disadvantages. What might be some examples of each?
Frequency distribution is a good method of statistical representation. It is a great method when the analysis and representation of large data is necessary. This makes it easier to divide the data into groups which can be easily manageable. It is also helpful when we want to know the level of concentration of values around a particular region. One is also able to know the highest and lowest observations at a quick glance without combing through the large data.
Disadvantages of grouping scores are also present and normally include the following.
8. Any single raw score, in and of itself, tells us nothing. Would you agree? Explain.
I agree, this is because a single score though an original observation entry made cannot tell us about the rest of the observations. We must therefore make other observations in order to see a relationship that can be used to validate the single data entry. Such a relationship will prove to us that the method and the equipment used are correct and accurate. We cannot therefore deduce anything from a single raw entry since we do not have a basis to determine whether it is valuable or just noise.
9. The relationship between age and strength is said to be curvilinear. What does this mean?
A curvilinear relationship is one whereby both factors show positive correlation up to a specific point where a deflection occurs and their relationship deflects to a negative correlation such that the increase in one results in a proportional decrease in the other. A good example being the age of an individual and their strength. As an individual advances in age during their early years so does their strength up to a peak in their middle years. Past this point the individuals strength declines but during this time their age is still increasing.
Part 2: Descriptive Statistics
1. If you are designing a quantitative study, place an X after each of the descriptive statistics listed below that you will use to summarize your data:
Frequency polygons __X____Five-number summary______ Box plot_______
Percentages ________
Mean__X___ Median__X___ Standard deviation__X___ Frequency table____X___
Bar graph_______
Pie chart________ Correlation coefficient___X_____ Scatterplot____X_____
Place an X after the technique(s) you would use to describe any relationships found in your study.
Comparison of frequency polygons___X____
Comparison of averages________
Crossbreak table(s)________
Correlation coefficient_____X___
Scatterplot___X_____
Reporting of percentages________
Outliers
2. How will you deal with discrepant cases or outliers in your data analysis?
The meaning of an outlier in statistics is that value which varies greatly from the rest of the values obtained during a research. Common causes of outliers in statistics include a discrepancy in the collection of the data or just a unique value different from the rest. The former cause of an outlier could be as a result of improper method of data collection, wrong reading by the researcher or erroneous equipment usage.
Identification of Outliers
A number of statistical methods can be used to spot such huge variations in data collection. The most common methods for identifying them can be grouped into graphical and model based systems. Hybrid systems such as box plots also do exist.
Model based systems work on the presumption that the data they are analyzing is obtained from a normal distribution curve hence they try to find discrepancies in the measures of central tendency such as the average and the deviation. Common techniques of model based systems are Chauvenets criterion, Grubbs test, MMS , Pierce criterion and Dixons Q test. In linear regression models, Mahalanobis leverage and distance tests are used.
The standard and acceptable method of outlier identification is the Modified Thompson Tau test which creates a rejection zone based on the mean of the data set and the standard deviation. After the absolute variations are calculated,the rejection zone is chosen using the formula
Rejection Region= ta/2(n-1)
n(n-2+t2a/2)
The value of v =1(x-mean) if this value is greater than rejection zone then the data is an outlier
Dealing with outliers
How to manage the occurrence of an outlier is dependent on the cause of the outlier and most common methods of working with them include Retention, exclusion, studentized residual, data transformation and robust regression.
Exclusion is commonly practiced when we have a known measurement model and the distribution errors in the normal distribution are know. Even though it is not mathematically and scientifically correct to exclude values, the outliers in a large set to of data can be removed by both truncation or winsoring. Truncation is whereby the outlier is removed from the data set while winsoring involves the replacement of such data with the nearest data that lies within the accepted region...
If you are the original author of this essay and no longer wish to have it published on the SuperbGrade website, please click below to request its removal: