First Digit Phenomenon Activity 6A: First Digit Phenomenon A Descriptive Statistical Discovery Instruction Sheet (Rev 2.5) Introduction: Statisticians divide their subject into two major branches. Descriptive Statistics: Based on the concepts and methods necessary to organize and summarize data, which deals with describing data in the form of tables, graphs, or sample statistics Inferential Statistics: Is how to reach decisions about a large body of data by examining only a small part of the data, by dealing with inferring (or estimating) population characteristics from sample data At times we tend to downplay the category of descriptive statistics perhaps thinking it plays a minor role in making decisions and judgment calls or that it lacks the clout of inferential statistics. Specifically this spreadsheet activity provides students an opportunity to use simple side-by-side bar graphs to aid in their discovery of the amazing First Digit Phenomenon, while generally it helps them gain an appreciation for shapes of sampling distributions. This Activity in based on material covered in Units 5C and 6A Estimated Time for Completion: This activity could potentially be broken in two parts: 1) a gathering of population data accompanied by the analysis. 2) an internet search accomplished by a modification of the lift-hand theoretical distribution with the ensuring discussion questions. The total amount of time spent on the activity is fairly variable, approximately 1 to 3 hours for the spreadsheet and analysis, depending on the student in depth taken and background with Spreadsheet. Student can save a considerable amount of time by having the right-hand table and graph previously set up in the Template. Objectives: The Mathematical objectives of the activity are: 1. To discover through U.S. Census data, the distributions of both the right-hand and left-hand digits and how they dramatically differ. 2. To learn about the First Digit Phenomenon, its history, and some of the possible applications of Benford s Law. The spreadsheet objectives of the activity are: 1. To learn using Excel left, right, and countif functions and how to sort data. 2. To construct bar graphs and draw appropriate conclusions from the data.
Materials Required:. Access to the internet to gather population data from the U.S. Census Bureau s website and do a brief keyword search on Benford s Law and /or first digit phenomenon. Access to spreadsheet software The project template First_Digit_Phenomenon.xls. A selected states to analyze the distribution of the left and right-hand digits of the City and Town Populations - states chart is provided. This activity handout for recording, analyzing, and discussing the results. Activity Overview: Most of the project can be self-driven. The following list outlines what you can do to feel satisfied with the project and confident with your spreadsheet work. Become familiar with a sample of a completed spreadsheet that centers on the population of Idaho's 200 towns and cities (see last pages of this activity sheet). Make sure you are comfortable with the 'left', 'right', and 'countif' functions as well as Benford's Law formula: log(l + 1!Digit). You need to be clear on the right and left-hand distributions in terms of why shape and center matter, and to understand on what both the heights of the curve and area below the curve actually represent. Remember the absolute cell reference of the left digit column (column D) when building the 'countif' or the theoretical count formulas. Without the absolute cell referencing, as you fill down your formula, you will miss coverage of the first several towns or cities. The template may be useful to show you with a modest amount of effort depending on your spreadsheet skill level. The notion of copying a bar graph and then making modifications with a right-click, choosing Select data... ", can be extremely valuable to help you get started in the right direction. It may also prove useful to briefly review some of the key types of distributions such as left and rightskewed, uniform, normal, bi-modal, and where they are often found in life's various contexts. Before you begin: Review the section titled Shapes of Distributions discussed in unit 6A of your text. 1) What are the two Branches of Statistics? (1) (2) 2) Theorize what the distributions would approximately look like for both the right-hand and left-hand digits of the populations of towns and cities from your selected state. Show a sketch in the rectangles below. 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 Right-hand Digit Distribution Left-hand Digit Distribution
Procedure: Now, with your spreadsheet template ready to go, carefully follow the steps listed below. You have to choose one of three options to complete this activity. Your total score depends on the option chosen. Option 1: Option 2: Option 3: Regular score (7.0 points) 80% of the Regular score (5.6 points) 50% of the regular score (3.5 points) Note: States with a strikethrough line have too few cities and towns as reported in the census database which do not make for a valid analysis. Alabama Hawaii Massachusetts New Mexico South Dakota Alaska Idaho Michigan New York Tennessee Arizona Illinois Minnesota Arkansas Indiana Mississippi North Carolina North Dakota Texas California Iowa Missouri Ohio Vermont Colorado Kansas Montana Oklahoma Virginia Connecticut Kentucky Nebraska Oregon Washington Delaware Louisiana Nevada Pennsylvania Florida Maine New Hampshire Georgia Maryland New Jersey Rhode Island South Carolina Utah West Virginia Wisconsin Wyoming Option 1: Regular score (7.0 points) 1. Choose a state from the states chart shown above 2. Use a web browser to locate the U.S Census Bureau: Try the following Link http://www.census.gov/popest/ or contact the State's Census Customer Service and ask them for the break down info of the individual population of each of the state cities and towns. You are not restricted to which year, may be you will be able to obtain the latest which could be of the year 2010 or later.
3. Open the Excel file of your selected state into your spreadsheet software. 4. Copy all the cells containing the names of the towns and cities of the state along with their projected populations (that is, data from columns A and B, excluding the headers). 5. Paste this information into columns A and B of your spreadsheet template. 6. Carefully use your Sort command to arrange the towns and cities from lowest to highest in population. (See your software's help documentation for the proper way to do this.) 7. After pasting the data into your template, you will have noticed that the right-hand digits of each town or city's population are automatically placed into column C along with a frequency table to the right and a side-by-side bar graph directly linked to the table. 8. Double click any of the right digits in column C to discover the built-in function that was used to correctly copy the right-hand digits of the population. Double click the Actual Count cells from column G to see how the 'countif' function is used. You might investigate more information about this 'countif' function in your software's help documentation. As you investigate the "Theoretical Count" cells from column H, notice that the total number of towns and cities is being multiplied by 1/10 since there are 10 potential digits on the right side. On average, each digit should appear equally often. 9. Using the knowledge gained from the right-hand distribution data, table, and side-by-side bar graph, construct the same information and graph for the left digits in a parallel way. Be careful to note that the left digits are 1 through 9 and therefore the theoretical count requires that the total number of towns and cities be multiplied by 1/9. 10. Depending on your graphing experience, it may be easier to copy and paste the bar graph and then customize it, rather than creating one from scratch for the left-hand digits. Refer to software help for some plotting tips to make an effective side-by-side bar graph of your data based on the left-hand digits of the populations (optional). Option 2: 80% of the Regular score (5.6 points) 1. Use the data provided for the state of Idaho 2: Do steps 4 10 listed above
Option 3: 50% of the Regular score (3.5 points) If time or facilities are such that individual Internet research is cumbersome for you, much of Benford's Law and/or first digit phenomenon can be addressed collectively. The story about its discovery by American astronomer Simon Newcomb and rediscovery by American electrical engineer is very interesting. Type a theoretical detailed report on the subject of this activity. Photo credits: WWH'.gutsbv.lId.ac.uk, http://en.wikipedia.org/wiki/simon_newcomb
Analysis: Now with both side-by-side bar graphs completed, answer the questions below. 1. What type of distribution shape would you say fits (models) the following counts? Right Actual Count Left Actual Count Right Theoretical Count Left Theoretical Count 2. Which of the two types of actual digits fits its theoretical count distribution the best? LEFT-HAND RIGHT-HAND (circle one) 3. Referring to the one you did not circle above, describe in a brief sentence what characteristic(s) of the shape made it not be the best fit. Discussion: Before addressing the items below, modify the formula for the Left-hand Theoretical Count by using the logarithmic formula from Benford's Law. A brief Internet search on "Benford's Law" and/or "first digit phenomenon" would be very helpful. 1. Why do you suppose the actual count distributions are not as smooth as the theoretical? 2. By changing the right-hand theoretical count to reflect the logarithmic formula from Benford's Law, discuss how the shape improved the fit of the actual count. 3. Why would the right-hand digit's distribution be approximately uniform (flat)? 4. Why would the left-hand digit's distribution be roughly right-skewed? (see #5 below) 5. To better understand why there is a built-in bias for the lower digits in the left-hand distribution, scan the sorted populations of your state from low to high. Discuss why a city, as it grows in population, would remain with a left-hand digit of a 1 longer than a 2, or why longer with a 2 than a 3, etc. You may come to a better appreciation for the first digit phenomenon that occurs in certain kinds of data by noting how there is a 100% increase from 1 to 2 but then a dramatically tapering percentage thereafter. Fill in the rest of the table and discuss how this might apply to population changes in a town or city. Digit 1 2 3 4 5 6 7 8 9 %Increase n/a 100% 6. Based on your Internet research, discuss a practical application of Benford's Law that interested you and why. Also, include what the Benford ratios are for digits 1 through 9.
1. The population of Idaho's 200 towns and cities Spreadsheet: