Data leakage refers to the unintended exposure of sensitive or confidential data, which can lead to flawed analysis or model performance. It occurs when information from outside the training dataset is used to create the model, compromising the validity of predictions. This phenomenon is particularly concerning during feature selection and engineering because it can skew results, leading to overly optimistic performance metrics that do not hold in real-world scenarios.
congrats on reading the definition of data leakage. now let's actually learn it.