Statistics
اسلاید 1: StatisticsFrom BSCS: Interaction of experiments and ideas, 2nd Edition. Prentice Hall, 1970 and Statistics for the Utterly Confused by Lloyd Jaisingh, McGraw-Hill, 2000
اسلاید 2: r2…is the fraction of the variation in the values of y that is explained by the least-squares regression line of y on x. Class AttendanceGradesExample: If r2 = 0.61 in the graph to the left, this meansthat about 61% of one’s grade is accounted for by the linear relationship with attendance. The other 39% could be dueto a multitude of factors.
اسلاید 3: What is statistics?a branch of mathematics that provides techniques to analyze whether or not your data is significant (meaningful)Statistical applications are based on probability statementsNothing is “proved” with statisticsStatistics are reportedStatistics report the probability that similar results would occur if you repeated the experiment
اسلاید 4: Statistics deals with numbersNeed to know nature of numbers collectedContinuous variables: type of numbers associated with measuring or weighing; any value in a continuous interval of measurement.Examples:Weight of students, height of plants, time to floweringDiscrete variables: type of numbers that are counted or categoricalExamples:Numbers of boys, girls, insects, plants
اسلاید 5: Can you figure out…Which type of numbers (discrete or continuous?)Numbers of persons preferring Brand X in 5 different townsThe weights of high school seniorsThe lengths of oak leavesThe number of seeds germinating35 tall and 12 dwarf pea plantsAnswers: all are discrete except the 2nd and 3rd examples are continuous.
اسلاید 6: Populations and SamplesPopulation includes all members of a groupExample: all 9th grade students in AmericaNumber of 9th grade students at SRNo absolute numberSampleUsed to make inferences about large populationsSamples are a selection of the populationExample: 6th period Accelerated BiologyWhy the need for statistics?Statistics are used to describe sample populations as estimators of the corresponding populationMany times, finding complete information about a population is costly and time consuming. We can use samples to represent a population.
اسلاید 7: Sample Populations avoiding BiasIndividuals in a sample populationMust be a fair representation of the entire pop.Therefore sample members must be randomly selected (to avoid bias)Example: if you were looking at strength in students: picking students from the football team would NOT be random
اسلاید 8: Is there bias?A cage has 1000 rats, you pick the first 20 you can catch for your experimentA public opinion poll is conducted using the telephone directoryYou are conducting a study of a new diabetes drug; you advertise for participants in the newspaper and TVAll are biased: Rats-you grab the slower rats. Telephone-you call only people with a phone (wealth?) and people who are listed (responsible?). Newspaper/TV-you reach only people with newspaper (wealth/educated?) and TV( wealth?).
اسلاید 9: Statistical Computations (the Math)If you are using a sample populationArithmetic Mean (average)The mean shows that ½ the members of the pop fall on either side of an estimated value: meanThe sum of all the scores divided by the total number of scores.http://en.wikipedia.org/wiki/Table_of_mathematical_symbols
اسلاید 10: Looking at profile of data: DistributionWhat is the frequency of distribution, where are the data points? Class (height of plants-cm)Number of plants in each class0.0-0.931.0-1.9102.0-2.9213.0-3.9304.0-4.9205.0-5.9146.0-6.92Distribution Chart of Heights of 100 Control PlantsDistribution Chart of Heights of 100 Control Plants
اسلاید 11: Histogram-Frequency Distribution ChartsThis is called a “normal” curve or a bell curveThis is an “idealized” curve and is theoretical based on an infinite number derived from a sample
اسلاید 12: Mode and MedianMode: most frequently seen value (if no numbers repeat then the mode = 0)Median: the middle number If you have an odd number of data then the median is the value in the middle of the setIf you have an even number of data then the median is the average between the two middle values in the set.
اسلاید 13: Variance (s2)Mathematically expressing the degree of variation of scores (data) from the meanA large variance means that the individual scores (data) of the sample deviate a lot from the mean.A small variance indicates the scores (data) deviate little from the mean
اسلاید 14: http://www.mnstate.edu/wasson/ed602calcvardevs.htmCalculating the variance for a whole populationΣ = sum of; X = score, value, µ = mean, N= total of scores or valuesOR use the VAR function in Excel
اسلاید 15: http://www.mnstate.edu/wasson/ed602calcvardevs.htmCalculating the variance for a Biased SAMPLE populationΣ = sum of; X = score, value, n -1 = total of scores or values-1 (often read as “x bar”) is the mean (average value of xi). Note the sample variance is larger…why?
اسلاید 16: Heights in Centimeters of Five Randomly Selected Pea Plants Grown at 8-10 °CPlantHeight (cm)Deviations from meanSquares of deviation from mean(xi)(xi- x) (xi- x)2A1024B7-11C6-24D800E911Σ xi = 40Σ (xi- x) = 0Σ (xi- x)2 = 10Xi = score or value; X (bar) = mean; Σ = sum of
اسلاید 17: Variance helps to characterize the data concerning a sample by indicating the degree to which individual members within the sample vary from the meanFinish Calculating the VarianceΣ xi = 40Σ (xi- x) = 0Σ (xi- x)2 = 10There were five plants; n=5; therefore n-1=4So 10/4= 2.5
اسلاید 18: Standard DeviationAn important statistic that is also used to measure variation in biased samples.S is the symbol for standard deviationCalculated by taking the square root of the varianceSo from the previous example of pea plants: The square root of 2.5 ; s=1.6Which means the measurements vary plus or minus +/- 1.6 cm from the mean
اسلاید 19: What does “S” mean?We can predict the probability of finding a pea plant at a predicted height… the probability of finding a pea plant above 12.8 cm or below 3.2 cm is less than 1% S is a valuable tool because it reveals predicted limits of finding a particular value
اسلاید 20: Pea Plant Normal Distribution Curve with Std Dev
اسلاید 21: The Normal Curve and Standard Deviationhttp://classes.kumc.edu/sah/resources/sensory_processing/images/bell_curve.gifA normal curve:Each vertical line is a unit of standard deviation68% of values fall within +1 or -1 of the mean95% of values fall within +2 & -2 unitsNearly all members (>99%) fall within 3 std dev units
اسلاید 22: Standard Error of the Sample Means AKA Standard ErrorThe mean, the variance, and the std dev help estimate characteristics of the population from a single sampleSo if many samples were taken then the means of the samples would also form a normal distribution curve that would be close to the whole population.The larger the samples the closer the means would be to the actual valueBut that would most likely be impossible to obtain so use a simple method to compute the means of all the samples
اسلاید 23: A Simple Method for estimating standard errorStandard error is the calculated standard deviation divided by the square root of the size, or number of the populationStandard error of the means is used to test the reliability of the dataExample… If there are 10 corn plants with a standard deviation of 0.2Sex = 0.2/ sq root of 10 = 0.2/3.03 = 0.0060.006 represents one std dev in a sample of 10 plantsIf there were 100 plants the standard error would drop to 0.002 Why?Because when we take larger samples, our sample means get closer to the true mean value of the population. Thus, the distribution of the sample means would be less spread out and would have a lower standard deviation.
اسلاید 24: Probability TestsWhat to do when you are comparing two samples to each other and you want to know if there is a significant difference between both sample populations(example the control and the experimental setup)How do you know there is a differenceHow large is a “difference”?How do you know the “difference” was caused by a treatment and not due to “normal” sampling variation or sampling bias?
اسلاید 25: Laws of ProbabilityThe results of one trial of a chance event do not affect the results of later trials of the same event. p = 0.5 ( a coin always has a 50:50 chance of coming up heads)The chance that two or more independent events will occur together is the product of their changes of occurring separately. (one outcome has nothing to do with the other) Example: What’s the likelihood of a 3 coming up on a dice: six sides to a dice: p = 1/6Roll two dice with 3’s p = 1/6 *1/6= 1/36 which means there’s a 35/36 chance of rolling something else…Note probabilities must equal 1.0
اسلاید 26: Laws of Probability (continued)The probability that either of two or more mutually exclusive events will occur is the sum of their probabilities (only one can happen at a time).Example: What is the probability of rolling a total of either 2 or 12?Probability of rolling a 2 means a 1 on each of the dice; therefore p = 1/6*1/6 = 1/36 Probability of rolling a 12 means a 6 and a 6 on each of the dice; therefore p = 1/36So the likelihood of rolling either is 1/36+1/36 = 2/36 or 1/18
اسلاید 27: The Use of the Null HypothesisIs the difference in two sample populations due to chance or a real statistical difference?The null hypothesis assumes that there will be no “difference” or no “change” or no “effect” of the experimental treatment.If treatment A is no better than treatment B then the null hypothesis is supported.If there is a significant difference between A and B then the null hypothesis is rejected...
اسلاید 28: T-test or Chi Square? Testing the validity of the null hypothesisUse the T-test (also called Student’s T-test) if using continuous variables from a normally distributed sample populations (ex. Height)Use the Chi Square (X2) if using discrete variables (if you are evaluating the differences between experimental data and expected or hypothetical data)… Example: genetics experiments, expected distribution of organisms.
اسلاید 29: T-testT-test determines the probability that the null hypothesis concerning the means of two small samples is correctThe probability that two samples are representative of a single population (supporting null hypothesis) OR two different populations (rejecting null hypothesis)
اسلاید 30: STUDENT’S T TESTThe student’s t test is a statistical method that is used to see if to sets of data differ significantly. The method assumes that the results follow the normal distribution (also called students t-distribution) if the null hypothesis is true. This null hypothesis will usually stipulate that there is no significant difference between the means of the two data sets.It is best used to try and determine whether there is a difference between two independent sample groups. For the test to be applicable, the sample groups must be completely independent, and it is best used when the sample size is too small to use more advanced methods.Before using this type of test it is essential to plot the sample data from he two samples and make sure that it has a reasonably normal distribution, or the student’s t test will not be suitable. It is also desirable to randomly assign samples to the groups, wherever possible. Read more: http://www.experiment-resources.com/students-t-test.html#ixzz0Oll72cbi http://www.experiment-resources.com/students-t-test.html
اسلاید 31: EXAMPLEYou might be trying to determine if there is a significant difference in test scores between two groups of children taught by different methods. The null hypothesis might state that there is no significant difference in the mean test scores of the two sample groups and that any difference down to chance. The student’s t test can then be used to try and disprove the null hypothesis. RESTRICTIONSThe two sample groups being tested must have a reasonably normal distribution. If the distribution is skewed, then the student’s t test is likely to throw up misleading results. The distribution should have only one mean peak (mode) near the center of the group.If the data does not adhere to the above parameters, then either a large data sample is needed or, preferably, a more complex form of data analysis should be used. Read more: http://www.experiment-resources.com/students-t-test.html#ixzz0OlllZOPZ http://www.experiment-resources.com/students-t-test.html
اسلاید 32: RESULTSThe student’s t test can let you know if there is a significant difference in the means of the two sample groups and disprove the null hypothesis. Like all statistical tests, it cannot prove anything, as there is always a chance of experimental error occurring. But the test can support a hypothesis. However, it is still useful for measuring small sample populations and determining if there is a significant difference between the groups. by Martyn Shuttleworth (2008). Read more: http://www.experiment-resources.com/students-t-test.html#ixzz0OlmGvVWD http://www.experiment-resources.com/students-t-test.html
اسلاید 33: Use t-test to determine whether or not sample population A and B came from the same or different population t = x1-x2 / sx1-sx2x1 (bar x) = mean of A ; x2 (bar x) = mean of Bsx1 = std error of A; sx2 = std error of BExample: Sample A mean =8Sample B mean =12Std error of difference of populations =112-8/1 = 4 std deviation units
اسلاید 34: Comparison of A and BB’s mean lies outside (less than 1% chance of being the normal distribution curve of population AReject Null Hypothesis
اسلاید 35: Online calculators:http://www.physics.csbsju.edu/stats/t-test_bulk_form.html online calculates for you… and a box plot also http://www.graphpad.com/quickcalcs/ttest1.cfm
اسلاید 36: The t statistic to test whether the means are different can be calculated as follows:
اسلاید 37: Amount of O2 Used by Germinating Seeds of Corn and Pea PlantsmL O2/hourat 25 °CReading NumberCornPea10.200.2520.240.2330.220.3140.210.2750.250.2360.240.3370.230.2580.200.2890.210.25100.200.30Total2.202.70Mean0.220.27Variance.0028.0106Excel file located in AccBio file folderHow to do this all in EXCEL
اسلاید 38: http://www2.cedarcrest.edu/academic/bio/hale/biostat/session19links/nachocurve2tail.jpg Ho = null hypothesis if the t value is larger than the chart value (the yellow regions) then reject the null hypothesis and accept the HA that there is a difference between the means of the two groups… there is a significant difference between the treatment group and the control group.
اسلاید 39: T table of values (5% = 0.05)For example:For 10 degrees of freedom (2N-2)The chart value to compare your t value to is 2.228If your calculated t value is between +2.228 and -2.228Then accept the null hypothesis the mean are similarIf your t value falls outside +2.228 and -2.228 (larger than 2.228 or smaller than -2.228)Fail to reject the null hypothesis (accept the alternative hypothesis) there is a significant difference.
اسلاید 40: So if the mean of the corn = 0.22 and the mean of the peas =0.27The variance (s2)of the corn is 0.000311 and the peas is .001178.Each sample population is equal to ten.Then: 0.22-0.27 / √ (.000311+.001178)/10-0.05/ √ 0.001489/10-0.05/ √ .0001489(ignore negative sign)t= 4.10Df = 2N-2 = 2(10) -2=18Chart value =2.102Value is higher than t-value… reject the null hypothesis there is a difference in the means.
اسلاید 41: The “z” test -used if your population samples are greater than 30Also used for normally distributed populations with continuous variables-formula: note: “σ” (sigma) is used instead of the letter “s” z= mean of pop #1 – mean of pop #2/ √ of variance of pop #1/n1 + variance of pop#2/n2Also note that if you only had the standard deviation you can square that value and substitute for variance
اسلاید 42: Z table (sample table with 3 probabilities αZα (one tail)Zα/2 (two tails)0.11.281.640.051.6451.960.012.332.576Z table use:α = alpha (the probability of) 10%, 5% and 1 %Z α: z alpha refers to the normal distribution curve is on one side only of the curve “one tail” can be left of the mean or right of the mean. Also your null hypothesis is either expected to be greater or less than your experimental or alternative hypothesisZ α/2 = z alpha 2: refers to an experiment where your null hypothesis predicts no difference between the means of the control or the experimental hypothesis (no difference expected). Your alternative hypothesis is looking for a significant differenceUse a one-tail test to show that sample mean A is significantly greater than (or less than) sample mean B. Use a two-tail test to show a significant difference (either greater thanOr less than) between sample mean A and sample mean B.
اسلاید 43: Example z-testYou are looking at two methods of learning geometry proofs, one teacher uses method 1, the other teacher uses method 2, they use a test to compare success.Teacher 1; has 75 students; mean =85; stdev=3Teacher 2: has 60 students; mean =83; stdev= = (85-83)/√3^2/75 + 2^2/60 = 2/0.4321 = 4.629
اسلاید 44: Example continuedZ table (sample table with 3 probabilities) αZα (one tail)Zα/2 (two tails)0.11.281.640.051.6451.960.012.332.576Z= 4.6291Ho = null hypothesis would be Method 1 is not better than method 2HA = alternative hypothesis would be that Method 1 is better than method 2This is a one tailed z test (since the null hypothesis doesn’t predict that there will be no difference)So for the probability of 0.05 (5% significance or 95% confidence) that Method one is not better than method 2 … that chart value = Zα 1.645 So 4.629 is greater than the 1.645 (the null hypothesis states that method 1 would not be better and the value had to be less than 1.645; it is not less therefore reject the null hypothesis and indeed method 1 is better
اسلاید 45: Chi squareUsed with discrete valuesPhenotypes, choice chambers, etc. Not used with continuous variables (like height… use t-test for samples less than 30 and z-test for samples greater than 30)O= observed valuesE= expected valueshttp://www.jspearson.com/Science/chiSquare.html
اسلاید 46: http://course1.winona.edu/sberg/Equation/chi-squ2.gif
اسلاید 47: Interpreting a chi squareCalculate degrees of freedom# of events, trials, phenotypes -1Example 2 phenotypes-1 =1Generally use the column labeled 0.05 (which means there is a 95% chance that any difference between what you expected and what you observed is within accepted random chance.Any value calculated that is larger means you reject your null hypothesis and there is a difference between observed and expect values.
اسلاید 48: How to use a chi square charthttp://faculty.southwest.tn.edu/jiwilliams/probab2.gif
نقد و بررسی ها
هیچ نظری برای این پاورپوینت نوشته نشده است.