Data leakage refers to the unintended exposure of data that can compromise the integrity of a predictive model, typically occurring when information from the test set is inadvertently used during model training. This can lead to overly optimistic performance metrics because the model has seen data it shouldn’t have, which results in poor generalization to unseen data. Recognizing and preventing data leakage is crucial for ensuring that a model performs accurately in real-world applications.
congrats on reading the definition of data leakage. now let's actually learn it.