Lecture 1: Introduction to Data and Distributions Chapter 1
Important Things www.stat.purdue.edu/~xuanyaoh/stat350 Syllabus Textbook Classroom Locations Policy: Hw/Lab/Class Participation/Exams Exam Schedule SAS
The Required Textbook
What is Statistics? Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data Suppose we want to have an idea about how well do Purdue students do in MA162 in the past 5 years, what can one do to find it out? Find MA162 records and check them too many to look at, not realistic in most cases Draw a number of records and try to make a reasonable guess statistics comes into play!
Population vs. Samples Population: All objects of interest Sample: a subset of the population
Examples of Data Results from making observations on one or more variables x = score on a STAT 350 midterm exam Univariate data or one variable (x, y) = height and weight of a STAT 350 student Bivariate data or two variables Etc.
Types of Variables
Two Terminologies Descriptive Statisticsti ti Summarize and describe important features of data Numerical summary measures mean, median, standard deviation Graphic, visual display histogram, scatter plot Inferential Statistics Formal guesses we make about the population by looking at the sample Common types of inferential statistics are confidence intervals and significance tests
12 1.2 Descriptive Statistics: Graphical The scores of 30 Undergraduate Students
Graphical display such as the histogram in the previous slide gives us a rough idea on the whole, very informative and clear Numerical measures such as mean and Numerical measures such as mean and standard deviation in the previous slide give us a quantitative measure of the center and spread of the data
Visual Displays of Data Histogram see in previous example, will discuss in detail Dot plot self reading (sec 1.2) Stem and Leaf, see in later example (sec 12) 1.2) Bar graph or chart self reading (sec 1.2) Scatterplot discussed t d later We won t discuss them all but you should cover them in your reading & be comfortable with them all.
Histogram for Discrete Data Based on previous example, To get the histogram, just count the occurrence of each value of the variable and plot the counts (frequency) on the vertical axis Can display as frequencies (counts) or percents
Continuous Data Subdivide ide the x-axis into a number of class intervals (or classes), plot the frequency or relative frequency for each class Define the boundaries of the classes carefully to prevent observations from falling on boundaries (read Pg.14) The class size may greatly influence how the histogram looks Big class interval: a few big rectangles Small class interval: many small rectangles
Example: Ex. 8 from Text (Pg. 14)
How to choose the class width? Although the class width doesn t change the distribution, it can change your visual understanding of the distribution A rule of thumb in determining a A rule of thumb in determining a reasonable number of classes if provided by your text
Relative Frequency vs. Density
Why Densities?
Interpreting Histograms
DotPlot (Self-reading in Sec 1.2)
StemPlot (SelfReading)
Hank Aeron Example
1.3 Distributions
Continuous Distributions
Continuous Distribution: Density Function
Examples
About SAS Read SAS section in syllabus, and also the instruction from course website
When you go home Read over the syllabus carefully, before you make decision! Get the Textbook Read the SAS part in syllabus and the instruction from course website Read/Review sections 1.1, 1 1.2 and 1.3 Start doing Hw#1 and Lab#1 posted on the website No lab this Wednesday, so go to the regular Wed classroom. d d d b To preview, Read sections 1.3 (discrete distribution, mass function), 1.4 and 1.5