Data leakage refers to the unintentional exposure of sensitive information or data from a dataset that can lead to biased model predictions and overestimation of a model's performance. It occurs when information from outside the training dataset is used to create the model, affecting its validity. This phenomenon can significantly mislead the evaluation of a model, as it may seem to perform exceptionally well due to this unauthorized access to future or test data.
congrats on reading the definition of data leakage. now let's actually learn it.