The exercises to be used in this example are for a course in Descriptive Statistics. The full directory tree of the exercise database of this example is the following:
example_base -- 1num_m -- exer1 -- exer2 -- exer3 -- stem1.eps -- stem1.jpg -- stem2.eps -- stem2.jpg -- 1num_t -- bimodal.eps -- bimodal.jpg -- exer1 -- exer2 -- exer3 -- exer4 -- exer5 -- Normal_m -- exer1 -- exer2 -- exer3 -- Normal_s -- exer1 -- exer2 -- Normal_t -- exer1 exer2 exer3 exer4 exer5
COMMENT: in some of the subdirectories there are some graphic files. Encapsulated Postscript file (.eps) are fore normal LaTeX output and JPEG file (.jpg) are for LaTeX web forms. Manyex will move these files (if they are specified in the exercise definition) to the directory where it creates the exams.
We list next all exercises in the database:
1num_m/exer1:
# # Exercise 1 of example database of questions # Exercise on 1 numerical variable - multiplechoice # 6 questions # title "Countries that have won the World Cup (1950-2002)" block (type=multiplechoice rearrange=yes) statement The following table shows the list of countries that have won the World Cup in the period 1950-2002. We are interested in studying the percentage of times that a given country has won the World Cup. \begin{center} \begin{tabular}{c c} \hline\hline Year & Country \\ \hline 1950 & 1 \\ 1954 & 2 \\ 1958 & 3 \\ 1962 & 3 \\ 1966 & 4 \\ 1970 & 3 \\ 1974 & 2 \\ 1978 & 5 \\ 1982 & 6 \\ 1986 & 5 \\ 1990 & 2 \\ 1994 & 3 \\ 1998 & 7 \\ 2002 & 3 \\ \hline \end{tabular} \end{center} Key: Uruguay=1, Germany=2, Brazil=3, England=4, Argentina=5, Italy=6, France=7. Answer the following questions: endstatement question The "Country" variable is: . a numerical continuous variable. . a numerical discrete variable. . an absolute frequency. . a relative frequency. ; None of the above options is correct. answer It is a categorical variable, despite the fact that is is codified with numbers. Numbers in this case are just labels for the country name. endanswer endquestion question Organize the data in a frequency table. What values does the variable have? . 1, 2, 3, 4, 5, 6, 7. . 1950,1954,1958,1962, 1966, 1970, 1974, 1978, 1982, 1986, 1990, 1994, 1998, 2002. . 1, 2, 3, 5. . It does not take values. : None of the abofe options is correct. answer The variable is "Country" and it can have values that go from 1 to 7, corresponding to the 7 countries mentioned in the sample. endanswer endquestion question In this data set, an individual is... . A country. . A year. . A number between 1 and 7. . A number between 1 and 5. : None of the above options is correct. answer A country, and its frequency is how manyt times it has won the World Cup. endanswer endquestion question The statement "a 86\% of the countries has won the World Cup no more than 2 times since 1950" is referring to: . An absolute frequency. . A relative frequency. . A cumulative absolute frequency. . A cumulative relative frequency. ; None of the above options is correct. answer It is clearly not an absolute frequency, since a percentage is given. But furthermore it is not a cumulative frequency either, since cumulative frequencies make only sense for numerical variables, where the values of the variable can be ordered and its frequencies accumulated. Therefore none of the options is correct. endanswer endquestion question To represent this distribution graphically, one can use: . A bar diagram. . A histogram. . A stem-and-leaf plot. . An approximate drawing. : None of the above options is correct. answer The most appropiate graphical representation for a categorical variables is a bar diagram endanswer endquestion question To describe this distribution one has to comment: . A comparison between the percentage of times that countries have one the World Cup. . The center and the spread. . The form and extreme values, if any. . The center, the spread, the form and extreme values, if any. : None of the above options is correct. answer Being a categorical variable, there is only one appropiate numerical summary: the proporton or percentatge of cases in each category. endanswer endquestion endblock
COMMENT: This exercise is composed by just one block with six questions of the multiplechoice type. Everything will be permuted here, the questions and the options within the questions, except those prefixed with “:” or “;”. Usually the first option is the correct one, except when a “;” is present, in which case that one will the correct one, for instance the first question in this exercise, “None of the above options is correct”, which cannot be permuted to maintain the meaning of the question.
1num_m/exer2:
# # Exercise 2 of example database of questions # Exercise on 1 numerical variable - multiplechoice # 6 questions # title "Student pocket cash" auxiliar "stem1.eps" webfile "stem1.jpg" block (type=multiplechoice rearrange=yes) statement The following is a stem-and-leaf plot of the variable "pocket cash", measure in euros with no cents, based on answers of a survey of 15 students of group 3 of Data Analysis 101: \bigskip \tthdump{ \begin{center} \leavevmode \epsfxsize=55pt \epsfbox{stem1.eps} \end{center} } %%tth:\begin{html}<p><img SRC="stem1.jpg" height=300 width=250></center>\end{html} endstatement question The "pocket cash" variable is: . Numerical. . Continuous categorical. . Categorical. . Discrete categorical. : None of the above options is correct. answer It is clearly a numerical variable, the amount of money that students carry in their pockets. endanswer endquestion question The fourth observation, in the ordered list of cases from the smallest to the largest value, is: . 35 . 4 . 12 . 17 ; None of the above options is correct. answer The fourth observation is 11, therefore no option is correct. endanswer endquestion question The center of the distributio is: . 22 euros . Between 17 and 22 euros . 12 euros . 49 euros : None of the above options is correct. answer Since there are 15 cases (an uneven number of cases), there is a case which lies exactly in the center, the 8th case. We check the ordered list and we see that this cas has a value equal to 22. endanswer endquestion question The form of the distribution is: . Skewed to the right. . Perfectly symmetric. . Skewed to the left. . It does not have a form. : None of the above options is correct. answer If there weren't some unusually high values, the distribution would be more symmetric, therefore we have skewness to the right (to high values). endanswer endquestion question The distribution . has an outlier equal to 86. . does not have outliers. . has two outliers equal to 4 and 5. . has one outlier equal to 49. : None of the above options is correct. answer The case with a value equal to 86 is clearly isolated from the rest and therefore we can consider it an outlier. endanswer endquestion question The leaf unit is equal to . 1. . 10. . 100. . euro cents. : None of the above options is correct. answer The values that we observe in the plot are equal to the actual values in euros, therefore the leaf unit is equal to 1. endanswer endquestion endblock
1num/exer3:
# # Exercise 3 of example database of questions # Exercise on 1 numerical variable - multiplechoice # 6 questions # title "Poverty in the world - 2005" auxiliar "stem2.eps" webfile "stem2.jpg" block (type=multiplechoice rearrange=yes) statement The following data set shows the percentage of people under the poverty line in different countries for 2005: \begin{center} \begin{tabular}{l c} \hline\hline Country & Poverty Percentage \\ \hline Australia & 11.2 \\ Austria & 9.3 \\ Canada & 10.3 \\ Denmark & 4.3 \\ Finland & 6.4 \\ France & 7.0 \\ Germany & 9.8 \\ Greece & 13.5 \\ Italy & 12.9 \\ Portugal & 13.7 \\ United Kingdom & 11.4 \\ \multicolumn{2}{c}{Source: OCDE 2005} \\ \hline \end{tabular} \end{center} endstatement question In this data set, an individual is . A country. . A number between 4 and 14. . A poverty percentage. . A year. : None of the above options is correct. answer The cases that we have in our sample correpond to countries for which we observe a characteristic, the percentage of poor people (according to the poverty line criterium). Therefore the individuals are countries. endanswer endquestion question The variable `Poverty percentage'' is: . Numerical. . Categorical continuous. . Categorical. . Categorical discrete. : None of the above options is correct. answer It is a numerical variable, we quantify the percentage of poor people for each country. endanswer endquestion endblock # We start a new block, since we do not want to permute completely # the questions because it would affect the logical flow of the exercise. # \tthdump is a macro that has to be defined and included in the # master LaTeX file. It is used to ignore the incluson of the .eps # file when you are building a html form exam, in which case # the .jpg file will be included. See the tth manual for the %%tth # construct. block (type=multiplechoice rearrange=no) question Draw a stem-and-leaf plot for this distribution (do not round the leafs or split the stems). The number of stems in the diagram that you get is: . 10 . 8 . 11 . Les than 8 : None of the above options is correct. answer The stem-and-leaf plot is the following \bigskip \tthdump{ \begin{center} \leavevmode \epsfxsize=40pt \epsfbox{stem2.eps} \end{center} } %%tth:\begin{html}<p><img SRC="stem2.jpg" height=300 width=250></center>\end{html} There are 10 stems in the diagram. endanswer endquestion question According to the stem-and-leaf plot, the center of the distribution is: . 9.8 . Between 8 and 9 . 10.3 . 11.2 : None of the above options is correct. answer The center is defined as the case which is larger than 50\% of the cases and smaller than 50\%. Since there is an uneven number of cases (11), at the plot it will correspond with the case in the 6th place, that is 9.8. endanswer endquestion endblock # Now we allow for the last two questions to permute, but they will be # always located at the end. block (type=multiplechoice rearrange=yes) question We want now to reduce the number of stems to only 2 stems. To achieve this, we will have to: . round and split the stems in 2. . round. . split the stems in 2. . split the stems in 5. : none of the above options is correct. answer Rounding to the tens, we get two stems (0 and 1), therefor to get 4 stems we have to split then afterwards in 2. Therefore the correct answer is round and split in 2. endanswer endquestion question The form of the distribution is: . Skewed to the left. . Quite symmetric. . Skewed to the right. . Neither symmetric nor skewed. : None of the above options is correct. answer There are some small values that break the symmetry of the distribution, therefore the distribution is skewed to the left. endanswer endquestion endblock
COMMENT: Notice the use of different blocks in this last exercise. This is helpful if the meaning of the exercise would be lost when permuting the questions.
Normal_m/exer1
# # Exercise 1 of example database of questions # Exercise on normal distribution - multiplechoice # 6 questions # title "Rainfall in Catalunya" block (type=multiplechoice rearrange=yes) statement During the last 50 years, yearly average rainfall follows an approximately normal distribution with mean equal to 20 l/m$^2$ and a standard deviation of 3 l/m$^2$. endstatement question What's the percentage of years that has rained more than 23 l.? . Un 16\% . Un 32\% . Un 22\% . Un 24\% : None of the above options is correct. answer According to the rule 100 - 68=32\% of the frequencies are outside the limit of the mean plus/minus one standard deviation. Looking at only one side of the distribution we have 32/2=16\%. We can also compute the percentage standardizing $X$: \[ {{23 - 20} \over {3}} = 1 \] Looking at the table of standard normal we get that the proportion of frequencies lying ot the right of $z=1$ is approximately 16\%. endanswer endquestion question What is the approximate maximum rainfall of the 2.5\% of years with less rain? . 14 l/m$^2$ . 12 l/m$^2$ . 16 l/m$^2$ . 10 l/m$^2$ : None of the above options is correct. answer This can computed with the rule, since the mean plus/minus 2 standard deviations leaves 5\% of the frequencies outside the limits, looking at only one side we have 2.5\%, therefore $20 - 2\cdot 3=14$. We can also look at the table of the standard distribution, we find that 2.5\% of the frequencies are on the left of $z=-1.96$, so we recover the corresponding $X$: \[ X = 20 - 1.96 \cdot 3 = 14.12 \] endanswer endquestion question What approximate percentage of years has rained less than 16 l.? . 9.18 \% . 12.34 \% . 14.93 \% . 6.12 \% : None of the above options is correct. answer We standardize: \[ {{ 16 - 20}\over{3}} = -1.33 \] The percentage of frequencies on the left (smaller values) of $z=-1.33$ in the standard normal table is: 9.18\%. endanswer endquestion question What is the approximate maximum rainfall of the 25\% years with less rain? . 18 l/m$^2$ . 22 l/m$^2$ . 25 l/m$^2$ . 15 l/m$^2$ : None of the above options is correct. answer We look for the $z$ in the standard normal table that leaves 25\% of the frequencies to the left, and we get $z=-0.67$. We recover $X$: \[ 20 - 0.67 \cdot 3 = 17.99 l/\mbox{m}^2 \] endanswer endquestion question What is the approximate percentage of years that has rained between 14 and 23 l/m$^2$? . 82.5 \% . 24 \% . 18.5 \% . 15 \% : None of the above options is correct. answer According to other questions, 2.5\% of the days it rains less than 14 l/m$^2$ and 16\% of days it rains more than 23 l/m$^2$, therefore between these to limits it rains: \[ 100 - 2.5 - 16 = 82.5 \mbox{\%} \] of the days. endanswer endquestion endblock block (type=multiplechoice rearrange=yes) question What should be the approximate shape of the rainfall distribution so that the computations done in this exercise are valid? . symmetric . skewed to the left . skewed to the right . bimodal : None of the above options is correct. answer The computations here are valid if the underlying distributin is normal, and the normal distribution is symmetric. endanswer endquestion endblock
1num_m/exer2:
# # Exercise 2 of example database of questions # Exercise on normal distribution - multiplechoice # 6 questions # title "Duration of the Data Analysis class" block (type=multiplechoice rearrange=yes) statement The duration of the Data Analysis class follows an approximately normal distribution with mean equal to 120 minutes and standard deviation equal to 2 minutes. endstatement question What is the approximate percentage of classes that last more than 126 minutes? . Un 0.15 \% . Un 1.5 \% . Un 15 \% . Un 99.7 \% : None of the above options is correct. answer This corresponds to the mean plus three times the standard deviation, and therefore according to the 68-95-99.7\% rule 0.3\% of the frequencies are left out of these limits on both sides of the distribution, looking only at one side (the largest values) we get 0.3/2 = 0.15\%. endanswer endquestion question What is the approximate minimum duration of the 2.5\% of shortest classes? . 116 minutes . 106 minutes . 126 minutes . 96 minutes : None of the above options is correct. answer Using the rule, 120 - 2*2 = 116 minutes. endanswer endquestion question What approximate percentage of classes last less than 117 minutes? . 6.68 \% . 12.34 \% . 9.12 \% . 14.93 \% : None of the above options is correct. answer We standardize: \[ X = {{117 - 120}\over{2}} = -1.5 \] Looking at the table of the standard normal, we find that for $z=-1.5$ a 6.68\% of the frequencies are smaller. endanswer endquestion question What is the approximate maximum duration of the 10\% of longest clases? . 122.56 minutes . 132.56 minutes . 142.56 minutes . 92.56 minutes : None of the above options is correct. answer The corresponding standard $z$ is 1.28, so we recover $X$: \[ 120 + 1.28 \cdot 2 = 122.56\] endanswer endquestion question What is the approximate percentage of classes that last between 116 and 120 minutes? . 47.5 \% . 24 \% . 36 \% . 16 \% : None of the above options is correct. answer 2.5\% are shorter than 116 minutes, and 50\% are larger than 120 minutes (since it is the mean and by symmetry also the median), therefore: \[ 100 - 50 - 2.5 = 47.5\] endanswer endquestion question What is the approximate duration of a class with standardized duration equal to 1? . 122 minutes . 132 minutes . 112 minutes . 142 minutes : None of the above options is correct. answer Since it is one standard deviation above the mean (mean = 0 , standard deviation = 1), we get: \[ 120 + 2 = 122 \] endanswer endquestion endblock
Normal_m/exer3:
# # Exercise 3 of example database of questions # Exercise on normal distribution - multiplechoice # 6 questions # title "Noise in the street" block (type=multiplechoice rearrange=yes) statement The noise at the crossing of the Balmes and Aragó streets in Barcelona has been recorded during 50 days and it has been determined that the level of noise follows an approximately normal distribution with mean equal to 20 deciBels and a standard deviation equal to 1 deciBel. endstatement question What is the approximate number of days during the period of 50 days recorded where the noise level is equal or larger than 21.28 deciBel? . 5 days . 10 days . 15 days . 2 days : None of the above optins is correct. endquestion question What is the approximate percentage of days were the noise is less than 19 deciBels? . 16 \% . 20 \% . 10 \% . 90 \% : None of the above optins is correct. endquestion question What is the approximate percentage of days when the noisce is larger than 22 deciBels? . 2.5\% . 0.5 \% . 5 \% . 95 \% : None of the above optins is correct. endquestion question What is the approximate percentage of days where the noise level lies between 19.5 and 20.5 deciBels? . 38,30 \% . 48,30 \% . 28,30 \% . 18,30 \% : None of the above optins is correct. endquestion question On a day with standardized normal level of -1, is there more or less noise than the median of the distribution? . Less. . We do not have enough information to answer the question. . There is the same noise . More. : None of the above optins is correct. endquestion endblock block (type=multiplechoice rearrange=yes) question If during some days there are works and the noise level increases 2 deciBels, the mean of the standardized distribution . does not change. . increases. . decreases. . it becomes negative. : None of the above optins is correct. endquestion endblock
1num_t/exer1:
block (type=truefalse) question . If the distribution is skewed to the right, the mean will be always larger than the median. answer True, the mean is sensitive to outliers and will be moved to the right by abnormally high values in the distribution (right-skewness). endanswer endquestion endblock @endverbatim COMENT: True/false exercises are composed by questions with just one option that can be true or false. True options are prefixed with ``.'' and false options with ``:''. It is convenient to have just one question in each true/false exercise, so that they can be permuted for different exams more appropiately. They could also have more than one question. 1num_t/exer2: @verbatim block (type=truefalse) question : The coefficient of variation is a summary of the form of the ditribution. answer False, the coefficient of variation measures spread, it is simply the standard deviation (a measure of spread) divided by the mean. endanswer endquestion endblock
1num_t/exer3:
block (type=truefalse) auxiliar "bimodal.eps" webfile "bimodal.jpg" question : If the mean and the median have the same value then the distribution is unimodal. answer False, the mean and the median having the same value implies that the distribution is symmetric, but it can have two modes or any other form, for instance: \bigskip \tthdump{ \begin{center} \leavevmode \epsfxsize=110pt \epsfbox{bimodal.eps} \end{center} } %%tth:\begin{html}<p><img SRC="bimodal.jpg" height=300 width=250></center>\end{html} endanswer endquestion endblock
COMMENT: We have included a graph in the answer. For the LaTeX normal exam the “epsf” package is used, but of course any other package could be used. For html output with “tth”, the macro “tthdump” is used to ignore the Postscript file and the “jpg” version is included in the html code. See the “tth” manual for the inclusion of graphics using “tth”.
1num_t/exer4:
block (type=truefalse) question : The mode is a summary of the form of the distribution. answer False, it is a summary of the center of the distribution. endanswer endquestion endblock
1num_t/exer5:
block (type=truefalse) question : The first quartile is a measure of spread. answer False, it is a measure of position. With the first and third quartile we can get a measure of spread, since its diference, the interquartilic range, shows as the 50\% of the frequencies around the median. endanswer endquestion endblock
Normal_s/exer1:
# # Exercise 1 of example database of questions # Exercise on normal distribution - shortanswer # 3 questions # title "Normal Distribution" block (type=shortanswer rearrange=yes) statement The factory "Mantecados S.R.L." manufactures Christmas sweets in the traditional way, employing 1,000 workers. The management knows that each worker produces a mean of 50 kgs of sweets, with a standard deviation of 3 kgs, and that this distribution is approximately normal. endstatement question What percentage of workers produces less than a worker with a standardized production equal to 0.45? endquestion question The management wants to give a prize to all the workers who achieve a production of more than 57 kgs a day, and assigns a budget for the prize of 10,000 euros. How much will each wining worker get? endquestion question The less productive workers will be moved to the packing department. If all workers producing less than 43 kgs per day are moved, how many workers will be moved to the packing department? endquestion endblock
COMMENT: Short answer questions have only a stament, no option. Not a lot of permutations can be performed on these questions, and in any case cheating is harder. It is also easier to hand different exams if they are only composed by short answer questions. In any case this feature is provided to combine them with multiplechoice and true/false questions, where manyex can be really useful.
Normal_s/exer2:
# # Exercise 2 of example database of questions # Exercise on normal distribution - shortanswer # 3 questions # title "Normal Distribution" block (type=shortanswer rearrange=no) statement The family business "Drink Team" produces grapes for wine, employing 1,000 workers. The management knows that each worker collects a mean of 50 kgs of grapes, with a standard deviatin of 3 kgs and the distribution is approximately normal. endstatement question What percentage of workers produce less than a worker with standardized procution of 0.45? answer We look at the table of the standard normal distribution and we find that for $z=0.45$ (this is the percentage of frequencies to the left of this value of $z$) the corresponding value is 0.6736. Therefore a 67.36\% of workers will produce les than a worker with standardized value of 0.45. endanswer endquestion question Workers producing more than 57 kgs will be rewarded, and there is a budget of 1,000 euros for the rewards. How much will a rewarded worker get? answer We standardize: \[ z= (57-50)/3=2.33. \] Checking the table, we observe that a 1\% of the frequencies are on the right of 2.33, therefore 1\% of the workers will be rewarded. Over 1,000 workers a 1\% corresponds to 10 workers. Since the budget is of 1,000 euros, each rewarded worker will get 100 euros. endanswer endquestion question Less productive workers will be moved to the packing section. If all workers producing less than 43 kgs will be moved to packing, how many workers will be moved? answer We standardize: \[ z= (43-50)/3=-2.33. \] At the table we observe that to the left of this value we have 1\% of the frequencies, therefore over 1,000 this represents 10 workers. endanswer endquestion endblock
Normal_t/exer1:
# # Exercise 1 of example database of questions # Exercise on normal distribution - true/false # 1 question # block (type=truefalse) question . The normal distribution is always symmetric answer This is true. The definition of a normal distribution implies a symmetric distribution. endanswer endquestion endblock
Normal_t/exer2:
# # Exercise 2 of example database of questions # Exercise on normal distribution - true/false # 1 question # block (type=truefalse) question : At every normal distribution a 2\% of the frequencies is less than 0.95. answer This statement is false because it depends on the mean and the standard deviation. endanswer endquestion endblock
Normal_t/exer3:
# # Exercise 3 of example database of questions # Exercise on normal distribution - true/false # 1 question # block (type=truefalse) question . We can know exactly how a normal distribution is if we are given its mean and its standard deviation. answer True, a normal distribution is completely characterized by its mean and its standard deviation. endanswer endquestion endblock
Normal_t/exer4:
# # Exercise 4 of example database of questions # Exercise on normal distribution - true/false # 1 question # block (type=truefalse) question . At a normal distribution a 95\% of the frequencies fall in an interval $[\mu-\sigma,\mu+\sigma]$. answer True, this is a part of the 68-95-99,7\% rule. endanswer endquestion endblock
Normal_t/exer5:
# # Exercise 5 of example database of questions # Exercise on normal distribution - true/false # 1 question # block (type=truefalse) question : If to a variable that follows the normal distribution we substract its mean, the resulting variable will continue being normal and its standard deviation will be equal to 0. answer False, a transformation implying a simple origin change (adding or substracting the mean) will not affect the spread (which is measured by the standard deviation). endanswer endquestion endblock
Once you have your exercises in your database you can start preparing your exams. Changing the seed, and of course adding more exercises, you can create different exams each time. It is also possible to use different subdirectories or different exercise filename prefixes (“exerfin”, “exermid”, ...) to define which exercises to use in different exams.