1.1 Definitions of Statistics, Probability, and Key Terms

3 min readjune 25, 2024

is a powerful tool for understanding and making informed decisions. This intro to statistics covers key concepts like descriptive vs. inferential methods, populations and samples, and types of variables. These fundamentals are essential for analyzing data effectively.

Understanding these basic statistical concepts lays the groundwork for more advanced analysis. By grasping terms like parameters, statistics, and , you'll be better equipped to interpret data, draw meaningful conclusions, and apply statistical thinking to real-world problems.

Introduction to Statistics

Descriptive vs inferential statistics

Top images from around the web for Descriptive vs inferential statistics
Top images from around the web for Descriptive vs inferential statistics
  • summarize and describe the main features of a data set without drawing conclusions beyond the data at hand
    • Involve measures such as , , , , and graphs to characterize the data
    • Focus on organizing, summarizing, and presenting data in a meaningful way (tables, charts)
  • use data to make inferences, predictions, or generalizations about a larger
    • Involve , , and to draw conclusions that extend beyond the immediate data
    • Allow researchers to make data-driven decisions and predictions based on a (political polls, medical trials)

Key terms in statistical studies

  • refers to the entire group of individuals, objects, or events of interest in a study
    • Often too large to study in its entirety (all college students, all smartphones)
  • is a subset of the population selected for study that is ideally representative of the population
    • Allows for more efficient data collection and analysis (100 randomly selected college students)
  • is a numerical summary measure that describes a characteristic of a population
    • Usually unknown and estimated using sample data
    • Examples: (μ\mu), (σ\sigma)
  • is a numerical summary measure computed from sample data used to estimate the corresponding population
    • Examples: sample (xˉ\bar{x}), (ss)
  • is the process of selecting a subset of individuals from a population to estimate characteristics of the whole population

Numerical and categorical variables

  • are quantitative variables that take on numeric values and can be discrete or continuous
    • have countable values, often integers (number of siblings, number of cars owned)
    • have measurable values that can take on any value within a range (height, weight, temperature)
    • Examples: age, test scores, income
  • are qualitative variables that take on values in distinct categories or groups and can be nominal or ordinal
    • have categories with no inherent order (gender, race, blood type)
    • have categories with a natural order (education level: high school, bachelor's, master's, doctorate; income brackets)
    • Examples: eye color, marital status, political affiliation

Probability and Statistical Inference

  • Probability is the measure of the likelihood that an event will occur
  • refers to the pattern of values in a dataset or population
  • measures the spread of data points around the mean
  • measures the strength and direction of the relationship between two variables
  • Hypothesis testing involves:
    • Null hypothesis: a statement of no effect or no difference
    • : a statement of an effect or a difference

Key Terms to Review (51)

Alternative Hypothesis: The alternative hypothesis is a statement that suggests a potential outcome or relationship exists in a statistical test, opposing the null hypothesis. It indicates that there is a significant effect or difference that can be detected in the data, which researchers aim to support through evidence gathered during hypothesis testing.
Average: The average, or mean, is the sum of a set of values divided by the number of values. It provides a measure of central tendency for the data.
Categorical variables: Categorical variables are variables that represent categories or groups and have a limited number of distinct values. These values are usually qualitative and describe characteristics or attributes.
Categorical Variables: Categorical variables are variables that represent a set of categories or groups, rather than numerical values. They are used to classify or group data based on qualitative characteristics or attributes, and are commonly used in statistical analysis and data visualization.
Central limit theorem for means: The Central Limit Theorem for Sample Means states that the distribution of sample means will approximate a normal distribution, regardless of the population's distribution, provided the sample size is sufficiently large. This approximation improves as the sample size increases.
Cluster sample: A cluster sample is a sampling method where the population is divided into groups, or clusters, and a random sample of these clusters is selected. All members of the chosen clusters are included in the final sample.
Confidence intervals: Confidence intervals provide a range of values that likely contain the true population parameter. They quantify the uncertainty of an estimate.
Confidence Intervals: Confidence intervals are a statistical concept that provide a range of values within which a population parameter is likely to fall, based on a sample statistic. They are used to quantify the uncertainty associated with estimating an unknown parameter and allow researchers to make inferences about the true value of that parameter.
Continuous Variables: Continuous variables are quantitative measurements that can take on any value within a given range, rather than being limited to a set of discrete values. They represent data that can be measured on a continuous scale, such as height, weight, temperature, or time.
Correlation: Correlation is a statistical measure that describes the strength and direction of the linear relationship between two variables. It quantifies the degree to which changes in one variable are associated with changes in another variable.
Data: Data consists of facts, figures, and other evidence gathered through observations. In statistics, data is used to draw conclusions and make decisions.
Descriptive statistics: Descriptive statistics involves summarizing and organizing data to make it easily understandable. It includes measures such as mean, median, mode, range, and standard deviation.
Descriptive Statistics: Descriptive statistics is a branch of statistics that involves the collection, organization, analysis, and presentation of data in a meaningful way. It provides a summary of the key characteristics of a dataset, allowing researchers to gain insights and understand patterns without drawing conclusions about the broader population.
Discrete Variables: Discrete variables are variables that can only take on a finite or countable number of distinct values. They are characterized by their ability to be separated into distinct categories or groups, with no intermediate values between them.
Distribution: In the context of statistics and data analysis, distribution refers to the arrangement or spread of data values within a dataset. It describes the pattern or shape in which the data points are dispersed, providing insights into the characteristics and behavior of the underlying phenomenon being studied.
Error bound for a population mean: The error bound for a population mean is the maximum expected difference between the true population mean and a sample estimate of that mean. It is often referred to as the margin of error in confidence intervals.
Hypothesis Testing: Hypothesis testing is a statistical method used to determine whether a claim or hypothesis about a population parameter is likely to be true or false based on sample data. It involves formulating a null hypothesis and an alternative hypothesis, collecting and analyzing sample data, and making a decision to either reject or fail to reject the null hypothesis.
Inferential statistics: Inferential statistics involves using a random sample of data taken from a population to make inferences about the entire population. It includes estimation, hypothesis testing, and prediction.
Inferential Statistics: Inferential statistics is a branch of statistics that uses sample data to draw conclusions about a larger population. It allows researchers to make inferences and predictions about unknown parameters or characteristics of a population based on the analysis of sample data.
Mean: The mean is the average of a set of numbers, calculated by dividing the sum of all values by the number of values. It is a measure of central tendency in a data set.
Mean: The mean, also known as the average, is a measure of central tendency that represents the arithmetic average of a set of values. It is calculated by summing up all the values in the dataset and dividing by the total number of values. The mean provides a central point that summarizes the overall distribution of the data.
Median: The median is the middle value in a data set when the values are arranged in ascending or descending order. If the data set has an even number of observations, the median is the average of the two middle numbers.
Median: The median is the middle value in a set of data when the values are arranged in numerical order. It is a measure of the central tendency of a dataset and represents the value that separates the higher half from the lower half of the data distribution.
Mode: The mode is the value that appears most frequently in a data set. It is one of the measures of central tendency.
Mode: The mode is a measure of central tendency that represents the value or values that occur most frequently in a dataset. It is a key concept in statistics and probability, as well as various data visualization techniques, measures of data location and center, and descriptive statistics.
Nominal Variables: Nominal variables are categorical variables that represent qualitative characteristics or labels without any inherent numerical value or order. They are used to classify or group data into distinct, non-overlapping categories.
Numerical variables: Numerical variables are quantitative data that represent measurable quantities. They can be either discrete or continuous.
Numerical Variables: Numerical variables are quantitative data that can be measured and expressed as numbers. They represent characteristics or attributes that can be counted or measured on a numerical scale, such as height, weight, age, or income. Numerical variables are a fundamental concept in the field of statistics and probability, as they form the basis for many statistical analyses and calculations.
Ordinal variables: Ordinal variables are a type of categorical variable where the categories have a meaningful order or ranking but do not have a consistent difference between the values. This ordering helps in understanding the relative position of data points, making it essential in statistics for analyses that involve rankings or scales.
Parameter: A parameter is a numerical characteristic of a population, such as a mean or standard deviation. It represents an entire group rather than a sample taken from it.
Parameter: A parameter is a numerical value or characteristic that defines a population or a statistical model. It represents a fixed, unknown quantity that is used to describe the properties of a larger group or system.
Pearson: Pearson refers to Pearson's correlation coefficient, a measure of linear correlation between two variables. It ranges from -1 to 1, indicating the strength and direction of the relationship.
Population: A population is the entire group that you want to draw conclusions about in a study. It includes all members or items of interest.
Population: In statistics, a population refers to the complete set of observations or measurements of interest that a researcher wants to study or make inferences about. It encompasses all the individuals, objects, or events that possess a common characteristic or set of characteristics within a defined scope.
Population Mean: The population mean, denoted by the Greek letter μ, is the average or central value of a characteristic or variable within a entire population. It is a fundamental concept in statistics that represents the typical or expected value for a given population.
Population Standard Deviation: The population standard deviation is a measure of the amount of variation or dispersion in a set of values from the mean of that population. It provides insight into how spread out the values are within a complete population, helping to understand the consistency of data points relative to their mean. This concept connects with various statistical principles, including the use of sampling techniques, measures of data spread, the behavior of distributions, and how these concepts are applied when estimating population parameters.
Probability: Probability is the measure of the likelihood of an event occurring. It is a fundamental concept in statistics that quantifies the uncertainty associated with random events or outcomes. Probability is central to understanding and analyzing data, making informed decisions, and drawing valid conclusions.
Proportion: Proportion is the fraction of the total that possesses a certain attribute. It is calculated by dividing the number of elements with the attribute by the total number of elements.
Regression Analysis: Regression analysis is a statistical technique used to model the relationship between a dependent variable and one or more independent variables. It allows researchers to estimate the average change in the dependent variable associated with a one-unit change in the independent variable, while controlling for other factors.
Representative sample: A representative sample is a subset of a population that accurately reflects the members of the entire population. It is used to make inferences about the population without surveying every individual.
Sample: A sample is a subset of a population selected for measurement, observation, or questioning to provide statistical information about the population. It is used when it is impractical or impossible to collect data from the entire population.
Sample: A sample is a subset of a population that is selected and studied to make inferences about the characteristics of the entire population. It is a fundamental concept in statistics, probability, and data analysis, as it allows researchers to draw conclusions about a larger group based on the information gathered from a smaller, representative group.
Sample Standard Deviation: Sample standard deviation is a measure of the amount of variation or dispersion in a set of sample data points. It quantifies how much the individual data points differ from the sample mean, providing insight into the spread and reliability of the sample data. A smaller sample standard deviation indicates that the data points are closer to the mean, while a larger value suggests greater variability in the data.
Sampling: Sampling involves selecting a subset of individuals from a population to estimate characteristics of the entire population. It is crucial for making inferences when it is impractical to study an entire population.
Sampling: Sampling is the process of selecting a subset of individuals or observations from a larger population to make inferences or draw conclusions about the entire population. It is a fundamental concept in statistics that allows researchers to study a manageable portion of a larger group in order to gain insights about the whole.
Standard Deviation: Standard deviation is a statistic that measures the dispersion or spread of a set of values around the mean. It helps quantify how much individual data points differ from the average, indicating the extent to which values deviate from the central tendency in a dataset.
Statistic: A statistic is a numerical value that summarizes or describes a characteristic of a sample. It is used to make inferences about populations based on sampled data.
Statistic: A statistic is a numerical value calculated from a sample of data that is used to describe or make inferences about a population. Statisticians use statistics to analyze data, test hypotheses, and draw conclusions in the context of various fields, such as 1.1 Definitions of Statistics, Probability, and Key Terms, 1.2 Data, Sampling, and Variation in Data and Sampling, and 8.3 A Population Proportion.
Statistics: Statistics is the science of collecting, analyzing, interpreting, presenting, and organizing data. It involves using mathematical theories and methodologies to draw conclusions or make predictions based on data.
Variable: A variable is a characteristic or attribute that can take on different values. Variables are used in statistics to represent data collected from observations.
Variance: Variance is a statistical measurement that describes the spread or dispersion of a set of data points in relation to their mean. It quantifies how far each data point in the set is from the mean and thus from every other data point. A higher variance indicates that the data points are more spread out from the mean, while a lower variance shows that they are closer to the mean.
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.