📊AP Statistics Unit 3 – Collecting Data

Collecting data is a crucial step in statistical analysis. This unit covers various methods for gathering information, from sampling techniques to survey design and experimental procedures. Understanding these concepts helps ensure that data collected is representative and reliable. The unit also delves into potential biases and errors that can affect data quality. By learning about these pitfalls and ethical considerations, students can design studies that yield accurate results while respecting participants' rights and well-being.

Key Concepts

  • Population refers to the entire group of individuals, objects, or events that a researcher is interested in studying
  • Sample is a subset of the population that is selected for study and is used to make inferences about the population
  • Parameter is a numerical summary that describes a characteristic of a population (mean, standard deviation)
  • Statistic is a numerical summary that describes a characteristic of a sample (sample mean, sample standard deviation)
  • Variables are characteristics or attributes that can be measured or observed and vary among individuals in a population
    • Quantitative variables have numerical values and can be discrete (whole numbers) or continuous (any value within a range)
    • Qualitative variables are categorical and can be nominal (no inherent order) or ordinal (natural order)
  • Sampling bias occurs when some members of the population are more likely to be selected for the sample than others, leading to a sample that is not representative of the population
  • Nonresponse bias happens when individuals who respond to a survey differ systematically from those who do not respond

Types of Data

  • Categorical data consists of observations that can be classified into distinct categories or groups (gender, race, political affiliation)
  • Numerical data involves observations that are measured on a numerical scale and can be either discrete or continuous
    • Discrete data can only take on certain values, often whole numbers (number of siblings, number of cars owned)
    • Continuous data can take on any value within a specified range (height, weight, temperature)
  • Cross-sectional data is collected at a single point in time from different individuals or groups
  • Time series data is collected over a period of time, typically at regular intervals, from the same individual or group
  • Observational data is collected by observing and recording information without manipulating any variables
  • Experimental data is collected by deliberately manipulating one or more variables while controlling other factors and measuring the effect on the response variable

Sampling Methods

  • Simple random sampling ensures that each member of the population has an equal chance of being selected for the sample
    • Requires a complete list of all members of the population (sampling frame)
    • Can be done with or without replacement (member can be selected more than once)
  • Stratified random sampling divides the population into distinct subgroups (strata) based on a specific characteristic and then randomly samples from each stratum
    • Ensures that each subgroup is represented in the sample in proportion to its size in the population
  • Cluster sampling involves dividing the population into clusters (naturally occurring groups) and then randomly selecting entire clusters to include in the sample
    • Useful when a complete list of all members of the population is not available or when the population is geographically dispersed
  • Systematic sampling selects every kth member from a list of the population, starting with a randomly chosen member
    • Requires a complete list of all members of the population in a specific order
  • Convenience sampling selects members of the population who are easily accessible or readily available (mall intercept, online surveys)
    • Not a probability sampling method and may lead to biased results

Data Collection Techniques

  • Surveys involve asking a sample of individuals a set of questions to gather information about their opinions, behaviors, or characteristics
    • Can be conducted through various modes (face-to-face, telephone, mail, online)
    • Require careful design to ensure that questions are clear, unbiased, and elicit accurate responses
  • Interviews are a more in-depth form of data collection that involves asking open-ended questions to gather detailed information from respondents
    • Can be structured (fixed set of questions), semi-structured (mix of fixed and open-ended questions), or unstructured (no fixed questions)
  • Observations involve collecting data by watching and recording the behavior of individuals or groups in a natural setting
    • Can be participant (researcher is part of the group being observed) or non-participant (researcher is not part of the group)
  • Experiments involve deliberately manipulating one or more variables (independent variables) while controlling other factors and measuring the effect on the response variable (dependent variable)
    • Require random assignment of subjects to treatment and control groups to ensure that any differences in the response variable are due to the manipulation of the independent variable(s)
  • Secondary data analysis involves using data that has already been collected by someone else for a different purpose
    • Requires careful evaluation of the quality and appropriateness of the data for the current research question

Survey Design

  • Clearly define the research question and target population before designing the survey
  • Use simple, clear, and unbiased language in the questions to ensure that respondents understand what is being asked
  • Avoid leading questions that suggest a particular answer or double-barreled questions that ask about more than one thing at a time
  • Use closed-ended questions with a fixed set of response options for easier data analysis and open-ended questions to gather more detailed information
  • Consider the order of the questions and group related questions together to improve the flow of the survey
  • Pretest the survey with a small sample of the target population to identify any problems with the questions or response options
  • Include clear instructions and definitions for any technical terms or concepts used in the survey
  • Offer incentives for participation, if appropriate, to increase response rates

Experimental Design

  • Clearly define the research question and hypotheses before designing the experiment
  • Identify the independent variable(s) (factors that will be manipulated) and the dependent variable (outcome that will be measured)
  • Use a control group that does not receive the treatment to serve as a basis for comparison
  • Randomly assign subjects to treatment and control groups to ensure that any differences in the dependent variable are due to the manipulation of the independent variable(s)
  • Control for extraneous variables (factors that could affect the dependent variable but are not of interest) by holding them constant or using blocking
  • Use blinding (single or double) to prevent bias in the measurement of the dependent variable
  • Determine the appropriate sample size and power to detect a meaningful difference between the treatment and control groups
  • Use appropriate statistical methods to analyze the data and draw conclusions about the effect of the independent variable(s) on the dependent variable

Potential Biases and Errors

  • Selection bias occurs when the sample is not representative of the population due to the way in which subjects are selected
    • Can be reduced by using probability sampling methods and ensuring that the sampling frame is complete and up-to-date
  • Response bias occurs when respondents do not answer questions truthfully or accurately due to social desirability, acquiescence, or other factors
    • Can be reduced by using neutral language in questions, offering anonymity or confidentiality, and using multiple methods to measure the same construct
  • Nonresponse bias occurs when those who do not respond to a survey differ systematically from those who do respond
    • Can be reduced by using follow-up procedures to increase response rates and comparing the characteristics of respondents and nonrespondents
  • Measurement error occurs when the instruments or methods used to collect data are not reliable or valid
    • Can be reduced by using established and validated measures, pretesting instruments, and using multiple methods to measure the same construct
  • Sampling error occurs when the sample statistics differ from the population parameters due to chance variation in the sampling process
    • Can be reduced by increasing the sample size and using stratified or cluster sampling to ensure that subgroups are adequately represented

Ethical Considerations

  • Obtain informed consent from participants by providing them with information about the purpose, procedures, risks, and benefits of the study and ensuring that they understand their rights as participants
  • Protect the privacy and confidentiality of participants by using secure data storage and reporting methods and not disclosing identifying information without permission
  • Avoid deception by being truthful about the purpose and procedures of the study and debriefing participants afterwards if deception was necessary
  • Minimize harm to participants by carefully weighing the risks and benefits of the study and taking steps to prevent or mitigate any potential harm
  • Respect the autonomy of participants by allowing them to make their own decisions about whether to participate and to withdraw from the study at any time without penalty
  • Ensure that the study is justified by the potential benefits to society and that the risks to participants are reasonable in relation to the anticipated benefits
  • Report the results of the study accurately and honestly, including any limitations or negative findings, and make the data available for replication by other researchers


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.