Next: , Up: A complete example

### 8.1 The database of exercises

The exercises to be used in this example are for a course in Descriptive Statistics. The full directory tree of the exercise database of this example is the following:

example_base -- 1num_m -- exer1
-- exer2
-- exer3
-- stem1.eps
-- stem1.jpg
-- stem2.eps
-- stem2.jpg
-- 1num_t -- bimodal.eps
-- bimodal.jpg
-- exer1
-- exer2
-- exer3
-- exer4
-- exer5
-- Normal_m -- exer1
-- exer2
-- exer3
-- Normal_s -- exer1
-- exer2
-- Normal_t -- exer1
exer2
exer3
exer4
exer5


COMMENT: in some of the subdirectories there are some graphic files. Encapsulated Postscript file (.eps) are fore normal LaTeX output and JPEG file (.jpg) are for LaTeX web forms. Manyex will move these files (if they are specified in the exercise definition) to the directory where it creates the exams.

We list next all exercises in the database:

1num_m/exer1:

#
#  Exercise 1 of example database of questions
#  Exercise on 1 numerical variable - multiplechoice
#  6 questions
#

title "Countries that have won the World Cup (1950-2002)"
block (type=multiplechoice rearrange=yes)
statement
The following table shows the list of countries that have won
the World Cup in the period 1950-2002. We are interested in
studying the percentage of times that a given country has won
the World Cup.

\begin{center}
\begin{tabular}{c c}
\hline\hline
Year & Country \\
\hline
1950 &	1 \\
1954 &	2 \\
1958 &	3 \\
1962 &	3 \\
1966 &	4 \\
1970 &	3 \\
1974 &	2 \\
1978 &	5 \\
1982 &	6 \\
1986 &	5 \\
1990 &	2 \\
1994 &	3 \\
1998 &	7 \\
2002 &	3 \\
\hline
\end{tabular}
\end{center}

Key: Uruguay=1, Germany=2, Brazil=3, England=4, Argentina=5,
Italy=6, France=7.

endstatement

question
The "Country" variable is:
. a numerical continuous variable.
. a numerical discrete variable.
. an absolute frequency.
. a relative frequency.
; None of the above options is correct.
It is a categorical variable, despite the fact that is is codified
with numbers. Numbers in this case are just labels for the country name.
endquestion

question
Organize the data in a frequency table. What values does the variable have?
. 1, 2, 3, 4, 5, 6, 7.
. 1950,1954,1958,1962, 1966, 1970, 1974, 1978, 1982, 1986, 1990,
1994, 1998, 2002.
. 1, 2, 3, 5.
. It does not take values.
: None of the abofe options is correct.
The variable is "Country" and it can have values that go from 1 to 7,
corresponding to the 7 countries mentioned in the sample.
endquestion

question
In this data set, an individual is...
. A country.
. A year.
. A number between 1 and 7.
. A number between 1 and 5.
: None of the above options is correct.
A country, and its frequency is how manyt times it has won the World Cup.
endquestion

question
The statement "a 86\% of the countries has won the World Cup no more
than 2 times since 1950" is referring to:
. An absolute frequency.
. A relative frequency.
. A cumulative absolute frequency.
. A cumulative relative frequency.
; None of the above options is correct.
It is clearly not an absolute frequency, since a percentage is given. But
furthermore it is not a cumulative frequency either, since cumulative
frequencies make only sense for numerical variables, where the values of
the variable can be ordered and its frequencies accumulated. Therefore
none of the options is correct.
endquestion

question
To represent this distribution graphically, one can use:
. A bar diagram.
. A histogram.
. A stem-and-leaf plot.
. An approximate drawing.
: None of the above options is correct.
The most appropiate graphical representation for a categorical variables is
a bar diagram
endquestion

question
To describe this distribution one has to comment:
. A comparison between the percentage of times that countries have one
the World Cup.
. The center and the spread.
. The form and extreme values, if any.
. The center, the spread, the form and extreme values, if any.
: None of the above options is correct.
Being a categorical variable, there is only one appropiate numerical summary:
the proporton or percentatge of cases in each category.
endquestion
endblock


COMMENT: This exercise is composed by just one block with six questions of the multiplechoice type. Everything will be permuted here, the questions and the options within the questions, except those prefixed with “:” or “;”. Usually the first option is the correct one, except when a “;” is present, in which case that one will the correct one, for instance the first question in this exercise, “None of the above options is correct”, which cannot be permuted to maintain the meaning of the question.

1num_m/exer2:

#
#  Exercise 2 of example database of questions
#  Exercise on 1 numerical variable - multiplechoice
#  6 questions
#

title "Student pocket cash"
auxiliar "stem1.eps"
webfile "stem1.jpg"
block (type=multiplechoice rearrange=yes)
statement
The following is a stem-and-leaf plot of the variable "pocket cash",
measure in euros with no cents, based on answers of a survey of 15
students of group 3 of Data Analysis 101:

\bigskip

\tthdump{
\begin{center}
\leavevmode
\epsfxsize=55pt
\epsfbox{stem1.eps}
\end{center}
}
%%tth:\begin{html}<p><img SRC="stem1.jpg" height=300 width=250></center>\end{html}
endstatement

question
The "pocket cash" variable is:
.  Numerical.
.  Continuous categorical.
.  Categorical.
.  Discrete categorical.
: None of the above options is correct.
It is clearly a numerical variable, the amount of money that students
carry in their pockets.
endquestion

question
The fourth observation, in the ordered list of cases from the smallest to
the largest value, is:
. 35
. 4
. 12
. 17
; None of the above options is correct.
The fourth observation is 11, therefore no option is correct.
endquestion

question
The center of the distributio is:
. 22 euros
. Between 17 and 22 euros
. 12 euros
. 49 euros
: None of the above options is correct.
Since there are 15 cases (an uneven number of cases), there is a
case which lies exactly in the center, the 8th case. We check the
ordered list and we see that this cas has a value equal to 22.
endquestion

question
The form of the distribution is:
. Skewed to the right.
. Perfectly symmetric.
. Skewed to the left.
. It does not have a form.
: None of the above options is correct.
If there weren't some unusually high values, the distribution would be
more symmetric, therefore we have skewness to the right (to high values).
endquestion

question
The distribution
. has an outlier equal to 86.
. does not have outliers.
. has two outliers equal to 4 and 5.
. has one outlier equal to 49.
: None of the above options is correct.
The case with a value equal to 86 is clearly isolated from the rest and
therefore we can consider it an outlier.
endquestion

question
The leaf unit is equal to
. 1.
. 10.
. 100.
. euro cents.
: None of the above options is correct.
The values that we observe in the plot are equal to the actual values
in euros, therefore the leaf unit is equal to 1.
endquestion
endblock


1num/exer3:

#
#  Exercise 3 of example database of questions
#  Exercise on 1 numerical variable - multiplechoice
#  6 questions
#

title "Poverty in the world - 2005"
auxiliar "stem2.eps"
webfile "stem2.jpg"
block (type=multiplechoice rearrange=yes)
statement
The following data set shows the percentage of people under the
poverty line in different countries for 2005:

\begin{center}
\begin{tabular}{l c}
\hline\hline
Country & Poverty Percentage \\
\hline
Australia & 11.2 \\
Austria	 & 9.3 \\
Denmark & 4.3 \\
Finland & 6.4 \\
France	& 7.0 \\
Germany & 9.8 \\
Greece	& 13.5 \\
Italy & 12.9 \\
Portugal & 13.7 \\
United Kingdom & 11.4 \\
\multicolumn{2}{c}{Source: OCDE 2005} \\
\hline
\end{tabular}
\end{center}

endstatement

question
In this data set, an individual is
. A country.
. A number between 4 and 14.
. A poverty percentage.
. A year.
: None of the above options is correct.
The cases that we have in our sample correpond to countries for which we
observe a characteristic, the percentage of poor people (according
to the poverty line criterium). Therefore the individuals are countries.
endquestion

question
The variable Poverty percentage'' is:
. Numerical.
. Categorical continuous.
. Categorical.
. Categorical discrete.
: None of the above options is correct.
It is a numerical variable, we quantify the percentage of poor people for
each country.
endquestion
endblock

# We start a new block, since we do not want to permute completely
# the questions because it would affect the logical flow of the exercise.

# \tthdump is a macro that has to be defined and included in the
# master LaTeX file. It is used to ignore the incluson of the .eps
# file when you are building a html form exam, in which case
# the .jpg file will be included. See the tth manual for the %%tth
# construct.

block (type=multiplechoice rearrange=no)
question

Draw a stem-and-leaf plot for this distribution (do not round the
leafs or split the stems). The number of stems in the diagram that
you get is:
. 10
. 8
. 11
. Les than 8
: None of the above options is correct.
The stem-and-leaf plot is the following

\bigskip

\tthdump{
\begin{center}
\leavevmode
\epsfxsize=40pt
\epsfbox{stem2.eps}
\end{center}
}
%%tth:\begin{html}<p><img SRC="stem2.jpg" height=300 width=250></center>\end{html}

There are 10 stems in the diagram.
endquestion

question
According to the stem-and-leaf plot, the center of the distribution is:
. 9.8
. Between 8 and 9
. 10.3
. 11.2
: None of the above options is correct.
The center is defined as the case which is larger than 50\% of the cases
and smaller than 50\%. Since there is an uneven number of cases (11), at
the plot it will correspond with the case in the 6th place, that is 9.8.
endquestion
endblock

# Now we allow for the last two questions to permute, but they will be
# always located at the end.

block (type=multiplechoice rearrange=yes)
question

We want now to reduce the number of stems to only 2 stems. To achieve this,
we will have to:
. round and split the stems in 2.
. round.
. split the stems in 2.
. split the stems in 5.
: none of the above options is correct.
Rounding to the tens, we get two stems (0 and 1), therefor to get 4 stems we
have to split then afterwards in 2. Therefore the correct answer is
round and split in 2.
endquestion

question
The form of the distribution is:
. Skewed to the left.
. Quite symmetric.
. Skewed to the right.
. Neither symmetric nor skewed.
: None of the above options is correct.
There are some small values that break the symmetry of the distribution,
therefore the distribution is skewed to the left.
endquestion
endblock


COMMENT: Notice the use of different blocks in this last exercise. This is helpful if the meaning of the exercise would be lost when permuting the questions.

Normal_m/exer1

#
#  Exercise 1 of example database of questions
#  Exercise on normal distribution - multiplechoice
#  6 questions
#

title "Rainfall in Catalunya"
block (type=multiplechoice rearrange=yes)
statement
During the last 50 years, yearly average rainfall follows an
approximately normal distribution with mean equal to 20 l/m$^2$
and a standard deviation of 3 l/m$^2$.
endstatement
question
What's the percentage of years that has rained more than 23 l.?
. Un 16\%
. Un 32\%
. Un 22\%
. Un 24\%
: None of the above options is correct.
According to the rule 100 - 68=32\% of the frequencies are outside the
limit of the mean plus/minus one standard deviation. Looking at only
one side of the distribution we have 32/2=16\%.

We can also compute the percentage standardizing $X$:

${{23 - 20} \over {3}} = 1$

Looking at the table of standard normal we get that the proportion of
frequencies lying ot the right of $z=1$ is approximately 16\%.
endquestion

question
What is the approximate maximum rainfall of the 2.5\% of years with less
rain?
. 14 l/m$^2$
. 12 l/m$^2$
. 16 l/m$^2$
. 10 l/m$^2$
: None of the above options is correct.
This can computed with the rule, since the mean plus/minus 2 standard
deviations leaves 5\% of the frequencies outside the limits, looking at
only one side we have 2.5\%, therefore $20 - 2\cdot 3=14$.

We can also look at
the table of the standard distribution, we find that 2.5\% of the
frequencies are on the left of $z=-1.96$, so we recover the corresponding
$X$:

$X = 20 - 1.96 \cdot 3 = 14.12$
endquestion

question
What approximate percentage of years has rained less than 16 l.?
. 9.18 \%
. 12.34 \%
. 14.93 \%
. 6.12 \%
: None of the above options is correct.
We standardize:

${{ 16 - 20}\over{3}} = -1.33$

The percentage of frequencies on the left (smaller values)
of $z=-1.33$ in the standard normal table is: 9.18\%.
endquestion

question
What is the approximate maximum rainfall of the 25\% years with less rain?
. 18 l/m$^2$
. 22 l/m$^2$
. 25 l/m$^2$
. 15 l/m$^2$
: None of the above options is correct.
We look for the $z$ in the standard normal table that leaves 25\% of the
frequencies to the left, and we get $z=-0.67$. We recover $X$:

$20 - 0.67 \cdot 3 = 17.99 l/\mbox{m}^2$
endquestion

question
What is the approximate percentage of years that has rained between
14 and 23 l/m$^2$?
. 82.5 \%
. 24 \%
. 18.5 \%
. 15 \%
: None of the above options is correct.
According to other questions, 2.5\% of the days it rains less than 14 l/m$^2$
and 16\% of days it rains more than 23 l/m$^2$, therefore between these
to limits it rains:

$100 - 2.5 - 16 = 82.5 \mbox{\%}$
of the days.
endquestion
endblock

block (type=multiplechoice rearrange=yes)
question
What should be the approximate shape of the rainfall distribution so that
the computations done in this exercise are valid?
. symmetric
. skewed to the left
. skewed to the right
. bimodal
: None of the above options is correct.
The computations here are valid if the underlying distributin is normal, and
the normal distribution is symmetric.
endquestion
endblock


1num_m/exer2:

#
#  Exercise 2 of example database of questions
#  Exercise on normal distribution - multiplechoice
#  6 questions
#

title "Duration of the Data Analysis class"
block (type=multiplechoice rearrange=yes)
statement

The duration of the Data Analysis class follows an approximately normal
distribution with mean equal to 120 minutes and standard deviation equal
to 2 minutes.
endstatement
question
What is the approximate percentage of classes that last more than 126
minutes?
. Un 0.15 \%
. Un 1.5 \%
. Un 15 \%
. Un 99.7 \%
: None of the above options is correct.
This corresponds to the mean plus three times the standard deviation,
and therefore according to the 68-95-99.7\% rule 0.3\% of the frequencies
are left out of these limits on both sides of the distribution, looking
only at one side (the largest values) we get 0.3/2 = 0.15\%.
endquestion

question
What is the approximate minimum duration of the 2.5\% of shortest classes?
. 116 minutes
. 106 minutes
. 126 minutes
. 96 minutes
: None of the above options is correct.
Using the rule, 120 - 2*2 = 116 minutes.
endquestion

question
What approximate percentage of classes last less than 117 minutes?
. 6.68 \%
. 12.34 \%
. 9.12 \%
. 14.93 \%
: None of the above options is correct.
We standardize:

$X = {{117 - 120}\over{2}} = -1.5$

Looking at the table of the standard normal, we find that for $z=-1.5$ a
6.68\% of the frequencies are smaller.
endquestion

question
What is the approximate maximum duration of the 10\% of longest clases?
. 122.56 minutes
. 132.56 minutes
. 142.56 minutes
. 92.56 minutes
: None of the above options is correct.
The corresponding standard $z$ is 1.28, so we recover $X$:

$120 + 1.28 \cdot 2 = 122.56$
endquestion

question
What is the approximate percentage of classes that last between
116 and 120 minutes?
. 47.5 \%
. 24 \%
. 36 \%
. 16 \%
: None of the above options is correct.
2.5\% are shorter than 116 minutes, and 50\% are larger than 120 minutes
(since it is the mean and by symmetry also the median), therefore:

$100 - 50 - 2.5 = 47.5$
endquestion

question
What is the approximate duration of a class with standardized duration
equal to 1?
. 122 minutes
. 132 minutes
. 112 minutes
. 142 minutes
: None of the above options is correct.
Since it is one standard deviation above the mean (mean = 0 , standard
deviation = 1), we get:

$120 + 2 = 122$
endquestion
endblock


Normal_m/exer3:

#
#  Exercise 3 of example database of questions
#  Exercise on normal distribution - multiplechoice
#  6 questions
#

title "Noise in the street"
block (type=multiplechoice rearrange=yes)
statement
The noise at the crossing of the Balmes and Aragó streets in Barcelona
has been recorded during 50 days and it has been determined that
the level of noise follows an approximately normal distribution with mean
equal to 20 deciBels and a standard deviation equal to 1 deciBel.
endstatement

question
What is the approximate number of days during the period of 50 days recorded
where the noise level is equal or larger than 21.28 deciBel?
. 5 days
. 10 days
. 15 days
. 2 days
: None of the above optins is correct.
endquestion

question
What is the approximate percentage of days were the noise is less than 19
deciBels?
. 16 \%
. 20 \%
. 10 \%
. 90 \%
: None of the above optins is correct.
endquestion

question
What is the approximate percentage of days when the noisce is larger than
22 deciBels?
. 2.5\%
. 0.5 \%
. 5 \%
. 95 \%
: None of the above optins is correct.
endquestion

question
What is the approximate percentage of days where the noise level lies
between 19.5 and 20.5 deciBels?
. 38,30 \%
. 48,30 \%
. 28,30 \%
. 18,30 \%
: None of the above optins is correct.
endquestion

question
On a day with standardized normal level of -1, is there more or less noise
than the median of the distribution?
. Less.
. We do not have enough information to answer the question.
. There is the same noise
. More.
: None of the above optins is correct.
endquestion
endblock

block (type=multiplechoice rearrange=yes)
question
If during some days there are works and the noise level increases 2
deciBels, the mean of the standardized distribution
. does not change.
. increases.
. decreases.
. it becomes negative.
: None of the above optins is correct.
endquestion
endblock


1num_t/exer1:

block (type=truefalse)
question
. If the distribution is skewed to the right, the mean will be always larger
than the median.
True, the mean is sensitive to outliers and will be moved to the right
by abnormally high values in the distribution (right-skewness).
endquestion
endblock
@endverbatim

COMENT: True/false exercises are composed by questions with just one option
that can be true or false. True options are prefixed with .'' and false
options with :''. It is convenient to have just one question in each
true/false exercise, so that they can be permuted for different exams more
appropiately. They could also have more than one question.

1num_t/exer2:
@verbatim
block (type=truefalse)
question
: The coefficient of variation is a summary of the form of the ditribution.
False, the coefficient of variation measures spread, it is simply the
standard deviation (a measure of spread) divided by the mean.
endquestion
endblock


1num_t/exer3:

block (type=truefalse)
auxiliar "bimodal.eps"
webfile "bimodal.jpg"
question
: If the mean and the median have the same value then the distribution is
unimodal.
False, the mean and the median having the same value implies that the
distribution is symmetric, but it can have two modes or any other form,
for instance:

\bigskip

\tthdump{
\begin{center}
\leavevmode
\epsfxsize=110pt
\epsfbox{bimodal.eps}
\end{center}
}
%%tth:\begin{html}<p><img SRC="bimodal.jpg" height=300 width=250></center>\end{html}

endquestion
endblock


COMMENT: We have included a graph in the answer. For the LaTeX normal exam the “epsf” package is used, but of course any other package could be used. For html output with “tth”, the macro “tthdump” is used to ignore the Postscript file and the “jpg” version is included in the html code. See the “tth” manual for the inclusion of graphics using “tth”.

1num_t/exer4:

block (type=truefalse)
question
: The mode is a summary of the form of the distribution.
False, it is a summary of the center of the distribution.
endquestion
endblock


1num_t/exer5:

block (type=truefalse)
question
: The first quartile is a measure of spread.
False, it is a measure of position. With the first and third quartile
we can get a measure of spread, since its diference, the interquartilic
range, shows as the 50\% of the frequencies around the median.
endquestion
endblock


Normal_s/exer1:

#
#  Exercise 1 of example database of questions
#  Exercise on normal distribution - shortanswer
#  3 questions
#

title "Normal Distribution"
statement
The factory "Mantecados S.R.L." manufactures Christmas sweets in
the traditional way, employing 1,000 workers. The management knows
that each worker produces a mean of 50 kgs of sweets, with a
standard deviation  of 3 kgs, and that this distribution is
approximately normal.
endstatement

question
What percentage of workers produces less than a worker with
a standardized production equal to 0.45?
endquestion

question
The management wants to give a prize to all the workers who
achieve a production of more than 57 kgs a day, and assigns a
budget for the prize of 10,000 euros. How much will each wining
worker get?
endquestion

question
The less productive workers will be moved to the packing department.
If all workers producing less than 43 kgs per day are moved, how many
workers will be moved to the packing department?
endquestion

endblock


COMMENT: Short answer questions have only a stament, no option. Not a lot of permutations can be performed on these questions, and in any case cheating is harder. It is also easier to hand different exams if they are only composed by short answer questions. In any case this feature is provided to combine them with multiplechoice and true/false questions, where manyex can be really useful.

Normal_s/exer2:

#
#  Exercise 2 of example database of questions
#  Exercise on normal distribution - shortanswer
#  3 questions
#

title "Normal Distribution"
statement
The family business "Drink Team" produces grapes for wine, employing
1,000 workers. The management knows that each worker collects a mean
of 50 kgs of grapes, with a standard deviatin of 3 kgs and the distribution
is approximately normal.
endstatement

question
What percentage of workers produce less than a worker with
standardized procution of 0.45?
We look at the table of the standard normal distribution and we
find that for $z=0.45$ (this is the percentage of frequencies to the
left of this value of $z$) the corresponding value is 0.6736. Therefore
a 67.36\% of workers will produce les than a worker with standardized
value of 0.45.
endquestion

question
Workers producing more than 57 kgs will be rewarded, and there is a budget
of 1,000 euros for the rewards. How much will a rewarded worker get?
We standardize:

$z= (57-50)/3=2.33.$

Checking the table, we observe that a 1\% of the frequencies are on the
right of 2.33, therefore 1\% of the workers will be rewarded. Over 1,000
workers a 1\% corresponds to 10 workers. Since the budget is of 1,000 euros,
each rewarded worker will get 100 euros.
endquestion

question
Less productive workers will be moved to the packing section. If all
workers producing less than 43 kgs will be moved to packing, how many
workers will be moved?

We standardize:

$z= (43-50)/3=-2.33.$

At the table we observe that to the left of this value we have 1\% of the
frequencies, therefore over 1,000 this represents 10 workers.
endquestion
endblock


Normal_t/exer1:

#
#  Exercise 1 of example database of questions
#  Exercise on normal distribution - true/false
#  1 question
#

block (type=truefalse)
question
. The normal distribution is always symmetric
This is true.
The definition of a normal distribution implies a symmetric distribution.
endquestion
endblock


Normal_t/exer2:

#
#  Exercise 2 of example database of questions
#  Exercise on normal distribution - true/false
#  1 question
#

block (type=truefalse)
question
: At every normal distribution a 2\% of the frequencies is
less than 0.95.
This statement is false because it depends on the mean and the
standard deviation.
endquestion
endblock


Normal_t/exer3:

#
#  Exercise 3 of example database of questions
#  Exercise on normal distribution - true/false
#  1 question
#

block (type=truefalse)
question
. We can know exactly how a normal distribution is if we are
given its mean and its standard deviation.
True, a normal distribution is completely characterized by its mean
and its standard deviation.
endquestion
endblock


Normal_t/exer4:

#
#  Exercise 4 of example database of questions
#  Exercise on normal distribution - true/false
#  1 question
#

block (type=truefalse)
question
. At a normal distribution a 95\% of the frequencies fall
in an interval $[\mu-\sigma,\mu+\sigma]$.
True, this is a part of the 68-95-99,7\% rule.
endquestion
endblock


Normal_t/exer5:

#
#  Exercise 5 of example database of questions
#  Exercise on normal distribution - true/false
#  1 question
#

block (type=truefalse)
question
: If to a variable that follows the normal distribution we substract its
mean, the resulting variable will continue being normal and its
standard deviation will be equal to 0.
`