Welcome to Introduction to Statistics! This book is written for introductory Statistics courses both on line and in classroom. You should read this entire preface carefully. Either taking this course on line or in classroom, you will greatly benefit from what is stated in this preface. That is my guarantee. Confucius said, Education without understanding is a futile exercise. Understanding is the key to success in this course, in college, and in life. There is no substitute for understanding. If you memorize something without understanding, you often cannot use it correctly or effectively. In mathematics and Statistics, to learn and understand new materials, you need to use materials that you have learned and understood. So, do not try to get through this course by memorization. Please study the course materials to understand them. You should understand everything in the textbook. If you do, there is no problem but fun in this course. Generally, learning math and Statistics is easy and fun with understanding; everything becomes obvious. You should understand things given in bold letters, examples (especially those reference examples), definitions, formulas and, in fact, everything in this textbook. The tests and exams are based on the textbook. That is, you can find the answers and how to find the answers for test/exam questions in this book. Of course, you need certain things (like sets, inequalities and such) from the prerequisite. Other than that, you need nothing else to answer test and exam questions. A reference book is suggested in the course syllabus. Almost any introductory Statistics book could serve as a reference (check its topics, though). If there is a difference between this book and other books, we go by this textbook. You should read the textbook, for the sections of the next test, at least several times to understand as much (if not all) as possible before taking the test. If you read the textbook just to find answers for test questions (without trying to understand its contents), you would get in trouble fast. You cannot get through this course that
way. You should read the textbook several times and understand the materials given in the textbook before taking a test for the first time. The exams are for evaluation purposes only. However, the tests are for evaluation and learning purposes. That is, you use the tests to learn the materials, which is a reason why you are allowed to take each test many times before its deadline. You can ask me about test questions too, by e-mail or during the office hours. If you think that your answer for a question is correct but that it is graded to be incorrect, then please ask me about it. In fact, if you have four or five questions that you cannot answer correctly in a test after taking the test several times, ask me about them. I will help you. This means that you have no excuse for not getting excellent, if not perfect, scores on the tests. Exercise questions are given at the end of each section. They are used to check whether or not you have understood the materials given in the section. You can find all the answers (or how to find answers) in the section. If you truly understand the material that an exercise question is asking about, you can answer the question without any trouble. In fact, you know your answer is correct if you have understood the material that the question is asking about. This is a reason why the answers for these exercise questions are not given anywhere. If you are not sure about your answer or have no idea what an exercise question is asking about, then you have failed to understand the material that it is asking about. You need to go back to the textbook and study the material till you understand it. You can find answers or how to find answers for all the questions in the textbook. However, if you need to find the answers for numerical/computational questions or any exercise questions, please contact me; I am happy to help. You may discuss exercise questions with other students while you are prohibited to discuss, with any other students or anyone, on specific questions in any test before its deadline. Like any applied mathematics, Statistics consists of the math component (mathematical definitions, notations, formulae, computations, and such) and the application component (practical purposes, practical functions, practical
application, and such). This course and the textbook emphasize both components, unlikely may other Statistics courses and textbooks. This book does not have a glossary because you construct a glossary in your head as you study and understand the materials in the course and this textbook. If you need a hard copy, make it yourself as you study. You need only materials that you have studied and understood in order to learn and understand new materials because of the sequential structure of mathematics (Statistics). By the way, you can take a glossary in your head to the exams and use it but cannot do so with a hard copy. This book does not have an index either. By studying and understanding the materials, you should know where you can find certain materials in the book. Of course, you can construct a hard copy of an index, if you need one, which could be a good learning process. Again, to study and understand new materials, you do not need anything that you have not studied (that is, you need only what you have studied and understood) because of the sequential structure of this book. For those who are taking a course in classroom, lectures are based on this book. You see the same definitions, examples and such along with more explanations and examples in the lectures. You should go over your lecture notes and make sure you understand everything in the notes before your next lecture; I am available to answer any questions by e-mail or during my office hours for every student. Finally, do not memorize formulae; study and understand them. If you understand formulae, you can recall and use them anytime you need or want to. Often, you can compute or obtain what you are looking for without its formula once you understand it. I have a couple of examples with the formulae of the sample mean and the sample standard deviation at the end of this preface. After studying the formulae, please come back and read the examples. It should make a lot more sense to you then. Do not memorize it but do understand it. This is generally true even outside mathematics and Statistics. There is a lot of trouble with memorization; easy to forget it, unable to regain it if forgotten, unable to memorize many things, easy to
get confused, and so on. However, the biggest problem is being unable to recognize which one to use or realize when to use it (which one to use) and when not to use it. Also, often, you cannot use it correctly even when used. These problems can be avoided by understanding. So, study to understand and avoid memorization in this course. You cannot get through this course by memorizing this book. Confucius said, He who memorizes is buying a car without engine. It does not take him too far. EXAMPLES WITH THE FORMULAS OF SAMPLE MEAN AND STANDARD DEVIATION: One of the purposes of the sample mean (sample average) is to indicate the center of data. If data consist of two numbers, the center of the data is their midpoint. This midpoint is obtained by adding these two numbers and dividing the sum by two. With three numbers, their center is obtained by adding them up and dividing the sum by three. You can check that it indeed gives you the center of the numbers. Add three numbers and divide it by three. Subtract the sample mean from each number and add the three differences. It comes out to be zero (this is a definition of the center). Yes, the sample mean is the center of the data. So, if you have n numbers in data, then add them up and divide the sum by n. This is the formula for the sample mean. Students who understand the sample mean (as the center of data) and its formula do not need to memorize the formula. They can come up with the formula from their understanding it. The formula of the sample mean is simple. Let us try the formula of the sample variance, one of complicated formulas in introductory Statistics. A purpose of sample variance is to measure the amount of variation in data. So, let us measure the variation (difference) of each datum or number from the center of the data (the sample mean) by subtracting the sample mean from each datum. To get the total variation, add all the differences. However, the sum turns out always zero (a
problem caused by these negative differences). So, square each of those differences (with that, they are all positive) and add them up. Here is another problem. If there are many numbers in data, the sum becomes large even if they are tightly clustered (less variation). This is not fair. So, let us use average variation (per datum) to solve this unfairness. Divide the sum of squared differences by the sample size n. This is the formula of the sample variance. If you want to measure the amount of variation in the same unit as that of the original data, take the square-root of the sample variance, which is the formula of the sample standard deviation. You do not have to be given these formulae. In fact, you do not need the formulae to compute sample variation and standard deviation. The great thing about this is that you know exactly how the sample standard deviation can get the value zero (or a large value) and what its value zero (or large value) means. You do not have to memorize what a zero (or a large value) means. As a result, you do not have to memorize formulae and anything connected with formulae. You do not have to be given formulae either. You should learn formulae in math and Statistics this way (understand them); that is, no need for memorization and no need to be given formulae in mathematics and Statistics courses. Note: The common formula for the sample variance (or sample standard deviation) uses the divisor (n-1), instead of n. There are two reasons for this. One is the unbiasedness of the estimator. Another reason is to prevent someone from using the formula to measure variation from one datum. One number does not contain any information about the variation so no formula for measuring the variation in data should not be used. If someone tries to use the formula with one datum, he has n = 1 and n 1 = 0. When the denominator is 0, it is undefined as it should be. By the way, when data are large enough, dividing the sum by n or n -1 would not result in any significant difference numerically. The standard deviation formula is the most complicated formula in this course. However, if you understand it (as described above) when it is introduced to you, you know the exact reason for everything to be in the formula. As a result, you
understand the formula and can use it. You do not have to memorize it. You can come up with it yourself from your understand it anytime when you need it. This is true for any formula in mathematics and Statistics. What is given here might not be clear to you now. However, after learning about the sample variance and standard deviation, please come back and read the last couple of pages again. They should make a lot of sense to you, then. Copyrighted by Michael Greenwich, 01/2017