DIGMath: Statistics – Florence S. Gordon

Return to Florence Gordon’s Home Page

Graphical Simulations and Investigations for Statistics and Probability using Excel

Sheldon P. Gordon and Florence S. Gordon, authors

Most of the following DIGMath (Dynamic Investigatory Graphics for Mathematics) programs require the use of macros to operate. In order to use these spreadsheets, Excel must be set to accept macros. To change the security setting on macros: When you open any of the spreadsheets, a new bar appears near the top of the window that says something like: “Security Warning: Active content has been disabled”, depending on the version of Excel you are using. Just click on Options, then click on “Enable the Content”, and finally click OK.

The following are the DIGMath simulations and demonstrations that are currently (August, 2023) ready for use. (Several others are under development.) Please feel free to download and use any or all of these files. To access any individual files or all of the files to download, click on this link: DIGMath for Statistics. The first item shown there is the associated zip file, statistics.7z, containing all 99 of the current individual files. If you have any problems downloading or running any of these Excel files, please contact us at flogo@optonline.net or gordonsp@farmingdale.edu for assistance. If you have any suggestions for improvements or for new topics, please pass them on also.

1. Fair Coin A coin flipping simulation in which the user has the choice of the number of repetitions of flipping a fair coin.
2. Dice Roll A dice rolling simulation in which the user can choose the number of rolls of a pair of fair dice.
3. Coin Flips A coin flipping simulation in which the user has the choice of the number of fair coins being flipped and the number of repetitions.
4. Binomial Distribution This h module lets you investigate the binomial distribution based on n trials with a probability p of success. You can select values of n and p using sliders and the program draws the histogram for the corresponding binomial distribution. It also shows the mean and standard deviation of the distribution.
5. Binomial Simulation The user has the choice of the number of coins, the probability of success, and the number of repetitions.
6. Binomial Probabilities This module lets you investigate the probabilities associated with a binomial distribution. The spreadsheet covers six different cases, covering virtually al the standard kinds of problems with binomial probabilities. It will calculate and display the binomial probabilities of getting (1) exactly x successes in a binomial process with n trials with a probability p of success; (2) between two given numbers of successes; (3) at least a given number of successes; (4) more than a given number of successes; (5) at most a given number of successes; and (6) less than a given number of successes. In each case, you select values of n, p, and x using sliders and the program draws the histogram to show the corresponding binomial probability.
7. Law of Large Numbers The user has the choice of the desired probability of success and the number of repetitions to see the pattern of successes over the long run.
8. Chaos of Small Numbers The user has the choice of the desired probability of success and the number of repetitions (up to 25) to see the actual (simulated) outcomes and the cumulative frequency of success to demonstrate the unpredictable nature of the outcomes in the short run.
9. Outlier Effects on the Mean and Standard Deviation This module lets you investigate the effect that changing an additional point has on the values for the mean and the standard deviation (both graphically and numerically). The spreadsheet starts with a set of either 5, 10, or 20 points and you can change the value for an additional point to see how it affects the calculations and the extent of the changes depending on the number of points.
10. Effect of an Extra Point on Statistics This module lets you investigate the effects of an extra point on the mean, the median, the standard deviation, and the InterQuartile Range (IQR) from two perspectives: depending on how close to or far from the center an extra point lies and depending on the size of the dataset.
11. The Normal Distribution This module lets you investigate the normal distribution with mean μ and standard deviation σ. You enter both parameters using sliders and the program draws the corresponding normal distribution curve. You can watch the effects of changing the parameter values on the resulting curve.
12. Normal Probabilities This spreadsheet lets you investigate the probabilities associated with a normal distribution . You can enter, via sliders, values for the mean μ and standard deviation σ and then an interval of x-values from xL to xR to visualize the probability that x lies between these two values . The program draws the graph of the normal distribution, and highlights graphically the region under the normal curve between xL and xR; it also shows numerically the probability that x lies between these two values.
13. Simulating the Normal Distribution This spreadsheet lets you investigate the normal distribution in terms of a random simulation. You can enter, via sliders, values for the mean μ and standard deviation σ and then an interval of x-values from x-Min to x-Max. You can also select the number of random points you want in this normal distribution and the program will generate those points, plot them along with the graph of the normal distribution, and display graphically, with different colors, those that fall under the designated portion of the normal distribution curve and those that do not. The results are also shown numerically and compared to the theoretical values for the area under the normal curve.
14. Normal Approximation to the Binomial Distribution This program lets you investigate how well a normal distribution approximates the binomial distribution based on the parameters n and p. You enter the values for n and p via sliders and the program draws the histogram for the binomial distribution and the corresponding normal distribution curve using μ = np and σ = √np(1-p) to compare the two distributions.
15. The Poisson Distribution This spreadsheet lets you investigate two different aspects of the Poisson distribution that expresses the probability of a number of events occurring in a fixed period of time if these events occur with a known average rate and independently of the time since the last event. (1) The first looks at the shape of the Poisson distribution depending on its two parameters, the expected number of outcomes of an event in a given time period and the number of occurrences. You use a slider to vary the first parameters and see the effects on the shape of the distribution. (2) The second aspect is based on the idea that the Poisson distribution can be used to approximate the binomial distribution with probability of success p and number of trials n. Using sliders to change n and p, you can observe which combinations make for a good approximation and which do not.
16. Central Limit Theorem Simulation The user can choose any of four underlying populations (normal, uniformly distributed, skewed, and bimodal), the sample size n, and the number of random samples. The simulation randomly generates the samples and plots the means of each sample. From the graphical display and the associated numerical displays, it becomes apparent that (1) the distribution of sample means is centered very close to the mean of the underlying population, that (2) the spread in the sample means is a fraction of the standard deviation of the underlying population (about one-half as large when n = 4, about one-third as large when n = 9, about one-quarter as large when n = 16, etc.), so that students quickly conjecture that the formula for the standard deviation of the distribution of sample means is σ/√n, and that (3) as the sample size increases, the sampling distribution looks more and more like a normal distribution.
17. Visualizing the Sample Mean and the Sample Standard Deviation This program helps you visualize the sample means and the sample standard deviations drawn from each of the four underlying populations used in the Central Limit Theorem Simulation. You can select your choice of the population and select, using sliders, the sample size and the number of desired samples. Th
18. t-Distributions This module lets you investigate the properties of the t-distribution based on various numbers of degrees of freedom from 2 up to 31. You enter the desired number of degrees of freedom and the program draws the corresponding t-distribution curve as well as the curves for d.f. = 1, d.f. = 11, and d.f. = 21 and the limiting normal distribution curve when d.f. = 31.
19. Distribution of Sample Proportions The user chooses the probability of success p, the sample size n, and the number of random samples. The simulation randomly generates the samples, displays the corresponding proportion of successes, and displays the summary statistics. From these displays, the students quickly conjecture that (1) the mean of the distribution of sample proportions is equal to the proportion of successes in the underlying population, that (2) the simulated results with different values of the sample size n agree with the formula for the standard deviation of this sampling distribution, and that (3) the sampling distribution becomes more and more normal in appearance as the sample size increases.
20. Distribution of Sample Medians This simulation is similar to the Central Limit Theorem Simulation with the same choice of underlying populations and the ability to select the sample size n and the number of random samples to be generated, but with sample medians instead of sample means.
21. Distribution of Sample Midranges This simulation is similar to the Central Limit Theorem Simulation, but with sample midranges instead of sample means.
22. Distribution of Sample Modes This simulation is similar to the Central Limit Theorem Simulation, but with sample modes instead of sample means.
23. Distribution of Sample Variances This simulation is similar is similar to the Central Limit Theorem Simulation, but instead of simulating sample means from a population, the program now generates and displays the sample variances.
24. Distribution of Sample Standard Deviations This program is similar to the Central Limit Theorem Simulation, but instead of simulating sample means from a population, the program now generates and displays the sample standard deviations.
25. Sample IQR’s This simulation is similar to the Central Limit Theorem Simulation, but instead of simulating sample means from a population, the program now generates and displays the sample InterQuartile Ranges. The IQR is the difference between the first and third quartiles of a set of data and so represents a measure of the spread in the data.
26. Distribution of Sample Skewness Values This is similar to the Central Limit Theorem Simulation, but instead of simulating sample means from a population, the program now generates and displays the sample skewness values — a measure of how far a set of values deviates from a symmetric distribution.
27. Standard Deviations of Sample Proportions This program simulates the distribution of the standard deviation of sample proportions p. You enter the probability of success π, the sample size and the number of random samples and the program randomly generates, calculates, and displays the standard deviation of those sample proportions.
28. Simulating Confidence Intervals The user has the choice of the same four underlying populations as in the Central Limit Theorem simulation (to see that the population does not affect the results) and the confidence level (90%, 95%, 98%, or 99%). The simulation generates a fixed number of samples from the selected population, calculates and plots the corresponding confidence interval, and summarizes the number and percentage of confidence intervals that actually contain the mean of the underlying population. Students see that the actual (simulated) percentage is typically close to the selected value for the confidence level. They also see that typically the higher the confidence level, the longer the lines are that represent the actual confidence interval. They also see that typically those confidence intervals that do not contain the population mean are near-misses.
29. Constructing Confidence Intervals for Means This spreadsheet assists you in constructing a confidence interval for the mean of a population. You enter the sample data — the sample size n, the sample mean, and the sample standard deviation, and select the level of confidence (90%, 95%, 98%, or 99%) you want. The spreadsheet constructs the corresponding confidence interval and displays it, as well as compares it in size, to the confidence intervals with other levels of confidence.
30. Simulating Confidence Intervals for Proportions The user controls the choice of the population proportion π for the underlying population, the confidence level (90%, 95%, 98%, 99%), and the sample size n. The simulation generates a fixed number of random samples from that population, calculates and plots the corresponding confidence interval, and summarizes the number and percentage of confidence intervals that actually contain the proportion π of the underlying population. Students see that the actual (simulated) percentage is typically close to the selected value for the confidence level. They also see that typically the higher the confidence level, the longer the lines are that represent the actual confidence interval. They also see that, as the sample size increases, the lengths of the sample confidence intervals decrease and as the sample size decreases, the lengths of the confidence intervals increase. They also see that typically those confidence intervals that do not contain the population proportion are near-misses.
31. Constructing Confidence Intervals for Proportions This h spreadsheet assists you in constructing a confidence interval for the proportion of a population. You enter the sample data — the sample size n and the number of “successes” in that sample — and select the level of confidence (90%, 95%, 98%, or 99%) you want. The spreadsheet constructs the corresponding confidence interval and displays it, as well as compares it in size, to the confidence intervals with other levels of confidence.
32. Visualizing Confidence Intervals for the Mean This module lets you investigate the ideas associated with confidence intervals for the mean of a population in a dynamic fashion using sliders. You can control the values for the sample data — the sample size n, the sample mean, and the sample standard deviation. The spreadsheet constructs the corresponding 90%, 95%, 98%, and 99% confidence intervals and displays all of them, both graphically and numerically, so you can compare the lengths of each as you change the input values.
33. Visualizing Confidence Intervals for Proportions This module lets you investigate the ideas associated with confidence intervals for the proportion π of a population in a dynamic fashion using sliders. You can control the values for the sample data — the sample size n and the number of successes x. The spreadsheet constructs the corresponding 90%, 95%, 98%, and 99% confidence intervals and displays all of them, both graphically and numerically, so you can compare the lengths of each as you change the input values.
34. Confidence Intervals for the Difference of Means This module helps in constructing a confidence interval for the difference of mean based on summary sample data from two samples: the size of the samples, the sample means, and the sample standard deviations. You choose the confidence level desired — 90%, 95%, 98%, or 99% and the resulting intervals are shown graphically and numerically.
35. Confidence Intervals for the Difference of Proportions This module helps in constructing a confidence interval for the difference in population proportion based on summary sample data from two samples: the size of the samples and the number of “successes” in each sample. You choose the confidence level desired — 90%, 95%, 98%, or 99% — and the resulting intervals are shown graphically and numerically.
36. Simulating Hypothesis Testing You have the choice of the same four underlying populations (again, to see that the population does not affect the results) and the level of significance (10%, 5%, 2%, 1%). The simulation generates a fixed number of samples from the selected population, plots the mean of each sample with a vertical line at the appropriate location, and summarizes the number and percentage of sample means that fall into this region. The height of each line is equal to the standard deviation of that sample. Students see that the simulated percentage of sample means that fall in the rejection region is typically close to the selected level of significance. They also see that most of the sample means that fall into the rejection region tend to be quite close to the critical values. They also see that the lines representing the samples whose means are close to the population mean tend to be very tightly clustered compared to those that are near the extreme ends, which are sparsely distributed.
37. Visualizing Hypothesis Testing for the Mean This spreadsheet helps you visualize the fundamental ideas related to testing a hypothesis for the mean of a population. You enter the sample data — the sample size n, the sample mean, and the sample standard deviation, the choice of a two-tailed test or a one-tailed test with the tail on the right or the left, and select the significance level α you want highlighted. The spreadsheet displays all four corresponding critical values for a one-tailed test or all eight critical values for a two-tailed test and shows the position of the sample mean for the data. It also displays the associated z– or t-value and the conclusion of whether you can Reject or Fail to Reject the null hypothesis as the selected significance level.
38. Hypothesis Tests for the Mean This spreadsheet assists you in testing a hypothesis for the mean of a population. You enter the null hypothesis for the supposed value of μ and select the test you want — either two tailed or one tail with either tail. You then enter the sample data — the sample size n, the sample mean and the sample standard deviation — and the level of significance α. The spreadsheet displays the corresponding normal or t-distribution, the location of the critical value(s), and the location of the sample mean. It also shows the associated z– or t-value, as well as the corresponding P-value, and the conclusion as to whether you Reject or Fail to Reject the null hypothesis.
39. Hypothesis Tests for the Proportion This spreadsheet assists you in testing a hypothesis for the proportion of a population. You enter the null hypothesis for the supposed value of p and select the test you want — either two tailed or one tail with either tail. You then enter the sample data — the sample size n and the number of successes x in that sample — and the level of significance α. The spreadsheet displays the corresponding normal distribution (when appropriate), the location of the critical value(s), and the location of the sample proportion p. It also shows the associated z-value, as well as the corresponding P-value, and the conclusion as to whether you Reject or Fail to Reject the null hypothesis.
40. Simulating the P-Values for Hypothesis Tests on the Mean This module lets you visualize the P-values associated with sample means when conducting a hypothesis test on the population mean. The program randomly generates 100 samples of size n = 50 from your choice of the usual four underlying populations, calculates the P-value associated with each sample mean, and draws the P-values as a series of vertical lines. You can choose the significance level a for a two-tailed test and the program also shows the vertical line associated with the P-value for that critical value. You can then see the number, and percentage, of the sample means whose P-values are sufficiently unlikely that would indicate that you should Reject the null hypothesis. The spreadsheet also displays the scatterplot of the 100 sample P-values plotted against the values of the 100 sample means to demonstrate that the pattern in the points typically looks like a normal distribution pattern. A horizontal line is also included at the height corresponding to the P-value associated with the critical value for the hypothesis test on the population mean at the selected level a of significance.
41. Simulating the P-Values for Hypothesis Tests on the Proportion This module lets you visualize the P-values associated with sample proportions when conducting a hypothesis test on the population proportion π. The program randomly generates 100 samples of size 100, calculates the P-value associated with each sample proportion, and draws the P-values as a series of vertical lines. You can choose the significance level α for a two-tailed test and the program also shows the vertical line associated with the P-value for that critical value. You can then see the number, and percentage, of the sample proportions whose P-values are sufficiently unlikely that would indicate that you should Reject the null hypothesis. The spreadsheet also displays the scatterplot of the 100 sample P-values plotted against the values of the 100 sample proportions to demonstrate that the pattern in the points typically looks like a normal distribution pattern. A horizontal line is also included at the height corresponding to the P-value associated with the critical value for the hypothesis test on the population proportion at the selected level of significance α.
42. Hypothesis Test on the Difference of Means This spreadsheet assists you in testing a hypothesis for the difference in means of two populations. The null hypothesis is that the two means are equal, and you have to select the alternate hypothesis test you want — either two tailed or one tail with either tail. You then enter the sample data for the two samples — the sample size n, the sample mean and the sample standard deviation — and the level of significance α The spreadsheet displays the corresponding normal or t-distribution for the distribution of differences of sample means, the location of the critical value(s), and the location of the difference in the two sample means. It also shows the associated z– or t-value, as well as the corresponding P-value, and the conclusion as to whether you Reject or Fail to Reject the null hypothesis.
43. Hypothesis Test on the Difference of Proportions This spreadsheet assists you in testing a hypothesis for the difference in proportions of two populations. The null hypothesis is that the two proportions are equal, and you have to select the alternate hypothesis test you want — either two-tailed or one-tailed with either tail. You then enter the sample data for the two samples — the sample size n and the number of successes in each sample — and the level of significance α. The spreadsheet displays the corresponding normal distribution (if appropriate) for the distribution of differences of sample proportions, the location of the critical value(s), and the location of the difference in the two sample proportions. It also shows the associated z-value, as well as the corresponding P-value, and the conclusion as to whether you Reject or Fail to Reject the null hypothesis.
44. The Distribution of the Difference of Means This module lets you investigate the distribution of the difference of means based on summary sample data from two samples drawn from the choice of four underlying populations (to see that the population does not affect the results). You can choose the sample size (from n = 2 to n = 50) from each sample and the number (from 50 to 250) of samples. The simulation generates that number of samples from the selected populations, plots the difference in the sample means of each sample, and displays the mean and standard deviation of the differences in the sample means compared to the theoretical predictions based on the population of differences of means of all possible samples.
45. The Distribution of the Difference of Sample Proportions This module lets you investigate the distribution of the difference of proportions based on summary sample data from two samples drawn from two binomial populations. For each population, you can choose the probability π of success and the sample size (from n = 2 to n = 100) from each sample, as well as the number of random samples (between 50 and 300). The simulation generates that number of samples from the two populations, plots the difference in the sample proportions of each set of samples, and displays the mean and standard deviation of the differences in the sample proportions compared to the theoretical predictions based on the population of differences of proportions of all possible samples.
46. Linear Regression: Fitting a Line to Data This module performs a linear regression analysis on any set of up to 50 (x, y) data points. It shows graphically the points and the associated regression line and also displays the equation of the regression line, the value for the correlation coefficient r, and the value for the Sum of the Squares that measures how close the line comes to all the data points.
47. Sum of the Squares This spreadsheet allows you to investigate dynamically how the sum of the squares measures how well a line fits a set of data. You can enter a set of data and select the number of data points you want to use. You also enter the values you want for the slope and the vertical intercept of a line. The display shows the data points with the line based on those parameters and also shows the value for the sum of the squares associated with that linear fit.
48. Regression Simulation You have the choice of the sample size (n > 2) and the number of samples. The simulation generates the random samples, calculates the equation of and plots the corresponding sample regression line, and also draws the population regression line. The students quickly see that, with small sample sizes, the likelihood of the sample regression line being close to the population regression line may be very small with widely varying slopes for many of the sample lines. As the sample size increases, the sample regression lines become ever more closely matched to the population line.
49. Simulating the Correlation Coefficient This spreadsheet lets you investigate the sample distribution for the correlation coefficient r based on repeated random samples drawn from a bivariate population. You can choose between n = 3 and n = 50 random points for each sample and between 50 and 250 such samples from the underlying population. For each sample, it calculates the correlation coefficient and displays a histogram showing the values of r from the samples. It also calculates and displays the mean of the sample correlation coefficients and compares it to the correlation coefficient for the underlying bivariate population.
50. Simulating the Regression Coefficients This module lets you investigate the sample distributions for the two regression coefficients a and b in the regression equation y = ax + b based on repeated random samples drawn from a bivariate population. You can choose between n = 3 and n = 40 random points for each sample and between 50 and 250 such samples from the underlying population. For each sample, the program calculates the regression equation and displays the various regression lines along with the regression line for the underlying bivariate population. It then draws two histograms — one showing the distribution of the values of the slope a from the random samples and the other showing the distribution of the values of the vertical intercepts b from those samples. It also calculates and displays the mean of each of the sample regression coefficients and compares it to the regression coefficients for the underlying bivariate population.
51. The Effects of an Extra Point on the Regression Line and the Correlation Coefficient This module lets you investigate the effect that changing an additional point has on the regression line (both graphically and numerically) and on the correlation coefficient. You have the choice of 5, 10, or 20 fixed points and can move an additional point using sliders to see how it affects the calculations and the extent of the changes depending on the number of points.
52. Fitting a Median-Median Line to Data This module fits a median-median line to any set of up to 50 (x, y) data points. It shows graphically the points and the associated median-median line and also displays the equation of the median-median line and the value for the Sum of the Squares that measures how close the line comes to all the data points.
53. Simulating the Median-Median Line This spreadsheet lets you investigate the median-median line that fits a set of data via a simulation. You have the choice of the sample size and the number samples that will be drawn from an underlying population. The spreadsheet generates random samples and draws the corresponding median-median lines to help you see the effect of sample size on the consistency of the lines produced.
54. Simulating the Quartile-Quartile Line This module lets you investigate the quartile-quartile line that fits a set of data via a simulation. The quartile-quartile line is based on finding the 1st and 3rd quartiles for both the x and the y values in a set of data and then creating the line that passes through those two points. As such, it is a viable alternative to the usual least-squares regression line that is conceptually and computationally simpler. You have a choice of the sample size and the number of samples that will be drawn from an underlying population. The spreadsheet generates the random samples and draws all the corresponding quartile-quartile lines to help you see the effect of sample size on the consistency of the lines.
55. Comparing Lines that Fit Data This program lets you compare how well the least-squares line, the median-median line (that is built into many calculators), and the quartile-quartile line fit sets of data. You can choose the number of random data points from an underlying population and the spreadsheet generates a random sample and displays the three lines, along with the data points, so that you can compare how well the three lines fit the data and how they compare to one another, particularly as the sample size increases.
56. DataFit: Fitting Linear, Exponential, and Power Functions to Data This spreadsheet is a visual and computational tool for investigating the issue of fitting linear, exponential, and power functions to data and the underlying transformations used to create the nonlinear functions. You can enter a set of data and the spreadsheet displays six graphs: (1) For a linear fit: the regression line superimposed over the original (x, y) data; (2) For an exponential fit: the regression line superimposed over the transformed (x, log y) data values; (3) The exponential function superimposed over the original (x, y) data; (4) For a power fit: the regression line superimposed over the transformed (log x, log y) data values; (5) The power function superimposed over the original (x, y) data; (6) All three functions superimposed over the original (x, y) data. The spreadsheet also shows the values for the correlation coefficients associated with all three linear fits and the values for the sums of the squares associated with each of the three fits to the original data. On a separate page, the spreadsheet also shows the residual plots associated with each of the three function fits.
57. Multivariate Linear Regression This spreadsheet lets you perform multivariate linear regression when the dependent variable Y is a function of two independent variables X1 and X2 or a function of three independent variables X1, X2, and X3. You enter the number of data points (up to a maximum of 50) and then the values for the dependent and independent variables in the appropriate columns. The spreadsheet responds with the equation of the associated linear regression equation, the value for the sum of the squares, and the value for the coefficient of determination, R²; note that this value tells you the percentage of the variation that is explained by the linear function.
58. Comparing Moving Averages This program lets you compare different moving averages to one another as well as to the underlying set of data. Moving averages are used widely in many different fields to identify patterns and trends in data where there are wild fluctuations on a day-to-day basis. This spreadsheet uses some actual data on the spread of the COVID pandemic to illustrate the concept by plotting the underlying data, the 3-day moving average, the 20-day moving average, and a third moving average that you can choose based on either 4-days, 5-days, up through 19-days.
59. Fitting Functions to Moving Averages This spreadsheet lets you investigate how moving averages can be used to identify the pattern in a set of data that has extreme daily fluctuations. The program creates charts for the 5-day moving average, the 10-day moving average, the 15-day moving average, and the 20-day moving average. It also allows you to select the type of function you would like to use — linear, exponential, power, quadratic, or cubic — using a slider and displays the resulting functions that fit both the underlying data and the corresponding moving average. It also displays the resulting equations of the functions and the values of the corresponding correlation coefficients, r, (for the linear, exponential, and power fits) and the coefficient of multiple correlation, R, (for the polynomial fits) to assess how well each function fits the data.
60. The Kolmogorov-Smirnov Test for Normality This program is a test to determine whether or not a set of data might be normally distributed via an hypothesis test when you can safely assume that the underlying population is normally distributed. You enter a relatively small (up to 30 numbers) in ascending numerical order and the spreadsheet performs the test at the 5% significance level while providing a graphical interpretation of the procedure. It displays the associated test-statistic and reports whether you can reject the claim that the data is normally distributed or whether you fail to reject the claim.
61. The Lillifors Test for Normality This program is a test to determine whether or not a set of data might be normally distributed via an hypothesis test. It is a special case of the Kolmogorov-Smirnov Test when you don’t know if you can assume that the underlying population is itself normally distributed. You enter a relatively small (up to 30 numbers) in ascending numerical order and the spreadsheet performs the test at the 5% significance level while providing a graphical interpretation of the procedure. It displays the associated test-statistic and reports whether you can reject the claim that the data is normally distributed or whether you fail to reject the claim.
62. The Birthday Problem This module lets you investigate the Birthday Problem, which asks for the probability that two people in a group of n people will have the same birthday. The program lets you decide on the number of people in a group (between 1 and 100) and displays the graph of the probability of a match versus the number of people in the group.
63. Simulating the Birthday Problem This module lets you investigate the Birthday Problem from the point of view of a random simulation. You have the choice of the number of people in a group (from 2 to 50). The program then generates a random sample of birthdates for each of the people and displays the list, including highlighting those that match. It presents the results, including the theoretical probability of a match and the number of matches.
64. Number of Boys vs. Girls Born in a Family This spreadsheet lets you investigate the number of Boys and Girls born into a family based on the fact that 51.2% of all live births are Boys. You can choose the number of children (1-10) in a family and the number of such families. The program simulates this and displays the outcomes graphically in a histogram and numerically with a table of outcomes and the mean and standard deviation of the results.
65. The Drunkard’s (or Random) Walk Simulation This module lets you investigate the notion of a random walk in the plane. You have the choice of the number of random steps (between 1 and 1000) and the length of each step. The program then generates a random collection of steps and displays the results graphically, as well as some numerical analysis on the actual distance covered from the starting point compared to the theoretical predictions.
66. Buffon Needle Problem This module lets you experiment with a graphical simulation of Buffon’s Needle Problem — the probability that a needle of length L lands on the seam between parallel strips of flooring of width W when it falls to the floor. You can select the number of random “needles” that fall, the width of the strip of flooring, and the length of the needles and see the results graphically and numerically.
67. Buffon Needle Problem on Square Tiles This module lets you experiment with a graphical simulation of the Laplace/Buffon’s Needle Problem — the probability that a needle of length L lands on the seam between square tiles of width W when it falls to the floor. You can select the number of random “needles” that fall, the width of the tile, and the length of the needles and see the results graphically and numerically.
68. Buffon’s Disk Problem This spreadsheet lets you experiment with a graphical simulation of a variation of Buffon’s Needle Problem — the probability that a circular disk of radius r lands on the seam between parallel strips of flooring of width W when it falls to the floor. You can select the number of random “disks” that fall, the width of the strip of flooring, and the length of the needles and see the results graphically and numerically.
69. Buffon’s Disk in a Circle Problem This spreadsheet lets you experiment with a graphical simulation of a variation of Buffon’s Needle Problem — the probability that a circular disk of radius r lands entirely within a larger circle of radius R when it falls to the floor or if it crosses the boundary circle. You can select the number of random “disks” that fall, the radius r of each disk, and the radius R of the circle, and see the results graphically and numerically.
70. Buffon’s Disks on a Square Tile Problem This spreadsheet lets you experiment with a graphical simulation of a variation of Buffon’s Needle Problem — the probability that a circular disk of radius r lands on the seam between square tiles of width W when it falls to the floor. You can select the number of random “disks” that fall, the width of the tile, and the length of the needles and see the results graphically and numerically.
71. Buffon’s Needle Problem in Concentric Circles This spreadsheet lets you experiment with a graphical simulation of a variation of Buffon’s Needle Problem — the probability that a circular disk of radius r lands on the seam between a group of concentric circles when it falls to the floor. You can select the number of random “disks” that fall, the fixed difference in the radii of the concentric circles on the floor, and the length of the needles and see the results graphically and numerically.
72. Buffon Problem for Square Coins on a Square Tile This spreadsheet lets you experiment with a graphical simulation of a variation on Buffon’s Needle Problem — the probability that a square coin of length L lands on the edge of a square tile of width W . You can select the number of random “coins” that fall, the width of the flooring tile, and the number of coins and see the results graphically and numerically.
73. Simulation of Gambler’s Ruin This program lets you investigate the notion of Gambler’s Ruin in which a person starts a game of chance with a fixed amount to bet (the stake) and repeatedly bets a fixed amount on one particular outcome until he or she runs out of money or reaches a certain amount of winnings. You can enter the stake, the amount of the bet, the amount won on each successful bet, the fixed probability of winning on each bet, and the number of bets (up to 1000) that will be displayed using sliders. In most realistic situations, the “house” sets the payoff amount low enough to assure that the gambler will eventually run out of money — that is, lose his or her shirt. This is why it is known as Gambler’s Ruin.
74. Simulation of Dart Throwing This module lets you investigate the process of throwing random darts at a dartboard. You can select between 100 and 1000 random darts and the spreadsheet shows the position that each dart lands and displays the breakdown of how many, and what percentage, of the darts all into each of the rings in the dartboard.
75. Product of the Faces of Two Dice This program lets you investigate the product (instead of the sum) of the faces on a pair of dice. You can choose up to 720 rolls of the two dice and the spreadsheet shows the distribution of the outcomes and a list of the number and percentage of each possible outcome compared to the theoretical predictions.
76. Difference of the Faces of Two Dice This program lets you investigate the differences (instead of the sums) of the faces on a pair of fair dice. You can choose up to 720 rolls of the two dice and the spreadsheet shows the distribution of the outcomes and a list of the number and percentage of each possible outcome compared to the theoretical predictions.
77. Simulation of Rolling 4-Sided Dice This spreadsheet lets you investigate the probability experiment of rolling a pair of fair 4-sided dice, instead of the usual 6-sided dice, so that the possible sums are now 2, 3, …, 8. The simulation shows the results of repeated trials both graphically in a histogram and numerically in terms of the number of times each of the possible outcomes arises.
78. Simulation of Rolling 8-Sided Dice This spreadsheet lets you investigate the probability experiment of rolling a pair of fair 8-sided dice, instead of the usual 6-sided dice, so that the possible sums are now 2, 3, …, 16. The simulation shows the results of repeated trials both graphically in a histogram and numerically in terms of the number of times each of the possible outcomes arises.
79. Yahtzee: Rolling Five Dice Simulation The game of Yahtzee^TM involves rolling a set of five fair dice. This DIGMath module lets you investigate this experiment by simulating repeated random rolls (up to 720 times) of five dice. It displays the results in a histogram as well as a table showing the simulated outcomes and the expected theoretical outcomes.
80. Visualizing Conditional Probability This program helps you visualize the idea of conditional probability where the usual sample space for a probability experiment is reduced by knowing some other detail of the event. The spreadsheet looks at the sum of the faces of two dice and allows you to investigate what happens if either (1) you know the result of the second die or (2) you know the product of the two faces. The spreadsheet displays the corresponding histogram and the associated numerical outcomes.
81. Waiting Time Simulation This module lets you investigate the length of time a car will wait at a red light. You can select the total length of the cycle and the length of time that the light is red. The results — the number of times that the wait is 0, 1, 2, seconds — are shown graphically in a histogram and in a table listing the outcomes. The average wait over all repetitions is also shown.
82. Hypergeometric Probabilities This module helps you visualize the probabilities associated with a hypergeometric distribution, which is based on selecting a sample of size n from a population having N elements of which K are considered successes. The standard probability problem asks what is the probability of having exactly x successes in that sample? You can enter the three values N, K, and n and the desired number of successes x in that sample. The spreadsheet draws the corresponding hypergeometric distribution and highlights the desired outcome, as well as the numerical results.
83. Chi-Square Analysis This program is designed to perform a complete chi-square analysis on many different sized contingency tables, including 2 by 2, 2 by 3, 2 by 4, 3 by 2, 3 by 3, and 3 by 4. On the Set-Up screen, you first enter the number of rows and the number of columns of the desired contingency table and you then have to click on an appropriate tab to go to the corresponding input screen. On that screen, you then enters the observed values into the various positions in the contingency table. The spreadsheet displays the resulting table of expected frequencies, the number of degrees of freedom, and the value of the chi-square statistic based on the values in the table. It also draws the graph of the corresponding chi-square distribution and indicates the location of the critical value, based on the desired level of significance, separating the rejection region from the region where one cannot reject the null hypothesis. In addition, the program indicates the location of the chi-square statistic corresponding to the data in the contingency table. Finally, the program indicates whether or not one can reject the null hypothesis at that significance level.
84. The Chi-Square Distributions This module lets you explore the behavior of various chi-square distributions, which depend on the number n of degrees of freedom. The user can enter any desired number of degrees of freedom from 2 to 31 and the program draws the graphs of that chi-square distribution as well as those with 3, 7, 11, 15, …, 27 degrees of freedom. Because the chi-square distributions become more normal in shape as n increases, the program also draws the standard normal distribution with mean μ = 0 and standard deviation σ = 1 for comparison.
85. Chi-Square Simulation This spreadsheet lets you investigate the variation in the values that can arise for the chi-square statistic via a random simulation based on a two by three contingency table. You define the table by entering the column and row totals and the number of random samples drawn from that population and the spreadsheet generates and graphs the corresponding values of the chi-square statistic.
86. One-Way Analysis of Variance (ANOVA) This spreadsheet is designed to perform a complete one-way analysis of variance (ANOVA) to test whether the means of two or more (up to 5 sample means) may come from populations with the same mean (the null hypothesis) or from populations with different means (the alternate hypothesis). Each sample can contain up to 10 entries. The spreadsheet displays the resulting ANOVA table, including the value of the F-statistic. It also draws the graph of the corresponding F-distribution and indicates the location of the critical value, based on the 5% level of significance, that separates the rejection region from the region where one cannot reject the null hypothesis. In addition, the program indicates the location of the F-statistic corresponding to the data in the table. Finally, the program indicates whether or not one can reject the null hypothesis at the 5% significance level.
87. Simulating the Runs Test This spreadsheet lets you investigate the Runs Test both graphically and numerically. When there is a collection of outcomes consisting of A’s and B’s, the object is to see the number of runs that occur. The module lets you select the total number of A’s and B’s, the number of A’s, and the number of random samples. The distribution of the number of runs is drawn as well as numerical measures for the mean and standard deviation.

All of these files were developed under the support from a variety of grants from the National Science Foundation, to whom the authors are very appreciative.

To access any individual files or all of the files to download, click on this link: DIGMath for Statistics.

Return to Florence Gordon’s Home Page