Statistics for Data Science Hypothesis Testing

Published on Jun 18, 2023

Scene 1 (0s)

[Virtual Presenter] Good morning everyone. Today we are going to explore the world of statistics and how it applies to data science. Over the course of this introduction session, we are going to cover a variety of concepts. This includes the formulation and testing of hypotheses, type I and type II errors, assumptions, critical points, one- and two-tailed tests, and the p-value approach. We will also look at some key ideas that you need to be aware of, as well as confidence intervals and hypothesis testing. By the end of this session, you will have a strong foundation to the principles of statistics and their application to data science. So, let's get started..

Scene 2 (47s)

[Audio] As quality analysts, we often have to make decisions based on data. In this introduction to statistics for data science, we'll examine what it takes to make decisions using data. To see this in action, take the example of a bulb manufacturing company. As their quality analyst, you are asked to analyze the reliability of the bulbs. Historically, 70% of the bulbs have passed the reliability test. Now, a new manufacturing process (B) is introduced. Is the new process improving the reliability of the bulbs? To answer this, several data aspects must be assessed, such as hypothesis formulation, test statistic, type I and II errors, assumptions, critical points, one- and two-tailed tests, p-value approach and more. We will go through each of these in more detail in this session..

Scene 3 (1m 50s)

[Audio] We will look at how gathering evidence assists in making statistical inferences. A random sample of 100 bulbs will be taken to observe that 73 of them are reliable in order to evaluate if this provides any evidence that the new manufacturing process is more reliable. The probability of getting 73 or more reliable bulbs in a sample of 100 will be seen to determine our conclusions. Proceeding with the examination..

Scene 4 (2m 19s)

[Audio] From the slide, it can be observed that the new manufacturing process (C) produced a sample of 100 bulbs with 81 reliable bulbs. The probability of obtaining 81 or more reliable bulbs in a sample of 100 bulbs was nearly 0.01, indicating that this process did indeed improve reliability..

Scene 5 (2m 41s)

[Audio] Hypothesis are used in statistics for data science for both estimation and hypothesis testing purposes. Estimation entails taking a random sample and computing a sample statistic to gain an accurate point and interval estimate. Hypothesis testing uses sample data to assess the likelihood of a hypothesis about the population parameter. Estimation alone is insufficient to conclude in these types of cases..

Scene 6 (3m 11s)

[Audio] What is hypothesis testing?" Hypothesis testing is a statistical process used to make an inference about a population parameter. Using a sample, hypothesis testing uses a test statistic, assumptions, critical points, one- and two-tailed tests, p-value approach, and more to draw conclusions about the population parameter. Hypothesis testing is used to SET a value for the parameter and then TEST to see whether that value is tenable given the evidence gathered from the sample..

Scene 7 (3m 44s)

[Audio] Hypothesis testing is a useful method for concluding decisions in a variety of contexts. It can be employed to check the truth of a statement or to determine the effect of a new product or service. An example of the former is to determine if a new car system raises the mean mpg, while an example of the latter is to establish if a manufacturer's 1L soft drink bottles contain an average of 0.99L or more. It can equally be used to measure the effectiveness of a business strategy, such as whether a fresh online advertisement has led to heightened online conversion rates for an E-commerce website. These are merely some of the many potential applications of hypothesis testing..

Scene 8 (4m 29s)

[Audio] Hypothesis testing is a crucial part of the data science process. Two separate statements regarding the population parameter are established: the Null hypothesis and the Alternative hypothesis. The Null hypothesis is the existing situation or the expected value, whereas the Alternative hypothesis is the counter idea or the target for improvement. For instance, the Null hypothesis could be that the new bulb manufacturing process does not increase reliability, whilst the Alternative hypothesis may suggest that it does. In this lesson, we will learn how to set up and apply Hypotheses..

Scene 9 (5m 12s)

[Audio] The topic of discussion is null & alternative formulation, which is a critical part of hypothesis testing. An illustration has been chosen in order to ascertain that shipments meet the required standards. The population mean, ?, is set to 8.5. The null hypothesis, H0, states ? is equal to 8.5 while the alternative hypothesis, Ha, posits that ? is not equal to 8.5. This test is done to validate that the shipments comply with the set specifications..

Scene 10 (5m 50s)

[Audio] We are examining the population proportion ? to see if the proportion of men on business travel abroad who bring a significant other with them is more than 20%. To determine this, we have to formulate the Null and Alternative hypotheses, which are H0: ? = 0.2 and Ha: ? > 0.2. This will assist us to identify if the proportion is in fact higher than what is assumed..

Scene 11 (6m 21s)

[Audio] Hypotheses are important components of research. They are statements or questions that research seeks to answer. Hypotheses can be divided into two types: alternate hypothesis and null hypothesis. The alternate hypothesis is the research question being asked, while the null hypothesis is the negation of this research question. Equality signs such as =, >= or <= are typically used when testing a status quo, while inequality signs such as ≠, > or < are used to answer questions beyond our current knowledge..

Scene 12 (7m 2s)

[Audio] Hypothesis testing is a method used to decide if a claim concerning a population can be accepted or declined based on sample data. It includes two conflicting claims, and a test statistic to decide which one is supported by the data. Comprehending these concepts is essential as it will aid us to organize our data and answer questions such as 'is there a relationship between two variables?' or 'is a difference between treatments statistically important?'. We will look over type I and II errors, assumptions, critical points, one and two-tailed tests, p-value approach, and more..

Scene 13 (7m 41s)

[Audio] Having a thorough understanding of the concept of a null hypothesis is essential. A null hypothesis states there is no difference between two or more groups or variables. To determine if the null hypothesis is valid or not, an experiment must be done that uses a random sample and tests the evidence against the null hypothesis. If the evidence is strong enough the null hypothesis is rejected in favour of the alternative hypothesis. If, however, the evidence is not strong enough then the null hypothesis is not rejected in favour of the alternative hypothesis. In this introduction to statistics in data science, topics like hypothesis formation, test statistic, type I and II errors, assumptions, critical points, one- and two-tailed tests, p-value approach, and more will be talked about in order to comprehend the concept of the null hypothesis as well as assess its validity..

Scene 14 (8m 42s)

[Audio] Statistical tests are a significant component of hypothesis testing. The test statistic is ascertained from the sample data and compared to the predetermined Decision Rule. Test statistic is a random variable which follows a regular distribution such as Normal, T, F, or Chi-square; these tests are commonly referred to by the name of the test statistic. Since hypothesis testing is dependent on sampling distributions, the decisions taken are based on probability. Knowing and comprehending the test statistics and their value is essential for understanding hypothesis testing..

Scene 15 (9m 25s)

[Audio] Exploring how Type I and Type II Errors can come up when making decisions related to hypothesis testing is the aim of today. A Type I Error, also known as a false positive, is when you incorrectly reject a valid null hypothesis. Conversely, a Type II Error, or false negative, is when you incorrectly fail to reject a false null hypothesis. Both errors must be taken into account when working in the field of data science..

Scene 16 (9m 56s)

[Audio] We will be looking at Type I and Type II Errors. Type I Error is a false positive, when the investigator wrongly rejects a null hypothesis that is actually true. Type II Error is a false negative, when the investigator fails to reject a null hypothesis that is false. The probabilities for Type I and Type II Error are denoted by α and β, respectively. To make a correct decision, the probability for Type I Error should be 1 - β and for Type II Error should be 1 - α. A level of significance is used to determine whether to reject or not reject the given null hypothesis..

Scene 17 (10m 40s)

[Audio] Discussing the concept of Type I and Type II errors, Type I errors (false positives) occur when the null hypothesis that the patient does not have cancer is rejected, but the patient does not have cancer. On the other hand, Type II errors (false negatives) happen when the alternate hypothesis that the patient does have cancer is rejected, but the patient does have cancer. An instance of this would be a patient not having cancer, yet the doctor's report states they do; a Type II error would be a patient having cancer but the report states they do not..

Scene 18 (11m 18s)

[Audio] Hypothesis testing can be a challenging process, so I suggest following a consistent approach. Utilizing the template for hypothesis testing is a straightforward way to do this. This template takes you through a logical order of steps in order to structure and test your hypothesis. It includes establishing null and alternate hypotheses, creating a test statistic, choosing the critical points, fixing a significance level, and then determining the p-value for deciding if the null hypothesis should be turned down. By using this template, you can arrange your hypothesis testing and obtain valid results..

Scene 19 (12m 0s)

[Audio] To begin, hypothesis formulation, test statistic, type I and II errors, assumptions, critical points, one- and two-tailed tests, p-value approach, and more are concepts to be covered in this introduction to statistics for data science. This will enable us to determine the question we are trying to answer, set up relevant hypotheses, prepare the data, choose the right test, assess assumptions, and ultimately carry out the test in order to come to the desired conclusion..

Scene 20 (12m 33s)

[Audio] Hypothesis testing is an important element of data science and statistics. It helps us assess the possibility of an event happening depending on a given set of data. In this part of our presentation, we will be looking into the fundamentals of hypothesis testing, such as how to create a hypothesis, decide on a test statistic, comprehend the assumptions and key elements, and utilize the p-value technique to interpret the results. Now let us begin!.

Scene 21 (13m 5s)

[Audio] In this slide, we will go over two key concepts pertaining to hypothesis testing: Level of Significance and the p-value. Level of Significance is a pre-fixed probability of rejecting the null hypothesis when it is true; this is set before the hypothesis test is conducted. The p-value, however, is the probability of observing test statistic or more extreme results than the computed test statistic, and this depends on the sample data. Alpha is pre-fixed but the p-value depends on the value of the test statistic..

Scene 22 (13m 44s)

[Audio] Today, we are going to look at some of the key questions one needs to consider when performing hypothesis testing. The questions we will look at are what are the null and alternative hypotheses, what is the preset level of significance, what is an appropriate test statistic, and how to check whether the data is giving significant evidence against the null hypothesis. To help put this into perspective, I will go through an example, where the population standard deviation is known and the sample size is more than 30. In this way, we can better understand the significance of these questions..

Scene 23 (14m 22s)

[Audio] Now is the time to use statistical tests to answer the manager's question. In this slide, we will look closely at a one-tailed test to determine if there is sufficient evidence to conclude that the mean delivery time of products is greater than five days. We will discuss the hypothesis formulation, test statistic, assumptions, type I and II errors, critical points, one- and two-tailed tests, p-value approach and more. So let's dive into the details and find the answer to our question..

Scene 24 (14m 56s)

[Audio] Today we will be exploring a fundamental concept in the world of statistics: one-tailed and two-tailed tests. With these tests, hypotheses can be tested and evaluated mathematically. In order to determine which type of test is appropriate, we must consider what kind of statement we are testing. One-tailed tests are used when we are looking for whether or not a certain effect or phenomenon is either above or below a certain given value, while two-tailed tests are used when we are looking for whether or not a certain effect or phenomenon is either above or below or equal to a certain given value. We'll take a look at how to properly apply each type of test and the implications of each..

Scene 25 (15m 43s)

[Audio] This slide focuses on the decision between one-tailed or two-tailed tests for formulating a hypothesis. In a one-tailed test, the alternative hypothesis specifies that the population mean, ?, is either greater than or less than ?0, which is the hypothesized population mean. For a two-tailed test, the alternative hypothesis suggests the population mean is not equal to ?0. It is important to remember that the choice between a one-tailed or two-tailed test is independent of the sample data; instead, it is based on the nature of the issue..

Scene 26 (16m 22s)

[Audio] We are comparing one-tailed and two-tailed tests. On the left image, the hypothesis test has more power on one side and the difference is not tested on the other side. The right image shows testing the difference on both sides. There are two major distinctions between one-tailed and two-tailed tests. The value of the test statistic remains the same, nevertheless the critical value or p-value correlated with the test statistic varies. Also, in one-tailed tests, the hypothesis test has greater power on one side as compared to the other..

Scene 27 (17m 3s)

[Audio] We have gained insight into hypothesis tests, so let us proceed to examine how to interpret the outcomes. Confidence intervals give us an estimation of the plausible range of values that include the real population value. They can help us determine the level of doubt allied with an evaluation. By creating confidence intervals, we can determine the correctness of our sample results and how indicative they are of the population..

Scene 28 (17m 32s)

[Audio] There is a relationship between hypothesis testing and the confidence interval. We calculated the (100 - 5)% confidence interval for the mean and conducted the Z-test for the mean with a 5% significance level. The hypotheses of the Z-test were H0: ? = ?0 against Ha: ? ≠ ?0. The confidence interval contains all values of ?0 for which the null hypothesis will not be rejected. Therefore, this slide suggests that there is a connection between the estimated confidence interval and the hypothesis test..