scoresvideos
Business Intelligence
Table of Contents

Association rule mining uncovers hidden patterns in large datasets, revealing relationships between items. It's a powerful tool for businesses, helping them understand customer behavior and make data-driven decisions.

The Apriori algorithm is key in this process, efficiently finding frequent itemsets and generating rules. By iteratively building and pruning candidate sets, it discovers valuable insights while minimizing computational costs.

Association Rule Mining

Concept of association rule mining

  • Discovers interesting relationships and correlations between items in large datasets (retail transactions, web clickstreams)
  • Identifies frequent patterns, associations, or causal structures (product affinities, co-occurring events)
  • Helps uncover hidden patterns and insights for decision-making (product recommendations, market segmentation)

Apriori algorithm for itemsets

  • Iterative approach to discover frequent itemsets (sets of items that frequently appear together)
  • Generates candidate itemsets of length k from frequent itemsets of length k-1 (joins itemsets to create larger candidates)
  • Prunes candidate itemsets using the Apriori principle
    • If an itemset is infrequent, its supersets must also be infrequent (reduces search space)
  • Steps in the Apriori algorithm
    1. Set a minimum support threshold (percentage of transactions containing the itemset)
    2. Generate frequent itemsets of length 1 (individual items above the support threshold)
    3. Iteratively generate candidate itemsets of increasing lengths
      • Join frequent itemsets of length k-1 to generate candidates of length k (combine itemsets)
      • Prune candidate itemsets using the Apriori principle (remove infrequent itemsets)
      • Count the support of each candidate itemset (scan the database)
      • Retain itemsets that meet the minimum support threshold (frequent itemsets)
  • Generating association rules from frequent itemsets
    • Create rules from frequent itemsets (antecedent → consequent)
    • Calculate support and confidence for each rule
      • Support: $\frac{|A \cup B|}{N}$, where A and B are itemsets and N is the total number of transactions (prevalence of the rule)
      • Confidence: $\frac{|A \cup B|}{|A|}$, measures the strength of the rule (likelihood of consequent given antecedent)
    • Filter rules based on minimum confidence threshold (retain strong rules)

Sequential Pattern Mining

Sequential pattern mining basics

  • Discovers frequent subsequences in a sequence database (ordered sets of items or events)
  • Considers the order of events or items (temporal or positional relationships)
  • Useful for analyzing time-series data and user behavior (customer purchase sequences, web navigation paths)
  • Applications of sequential pattern mining
    • Customer purchase behavior analysis (identifying common purchase sequences over time)
    • Web usage mining (analyzing user navigation patterns on websites)
    • Biomedical sequence analysis (discovering patterns in DNA sequences or medical events)

Algorithms for sequential patterns

  • GSP (Generalized Sequential Patterns) algorithm
    • Apriori-based algorithm for sequential pattern mining (generates and prunes candidates)
    • Generates candidate sequences by joining frequent sequences (creates longer candidates)
    • Prunes candidates using the Apriori principle (removes infrequent subsequences)
    • Scans the database to determine frequent sequences (counts support)
  • PrefixSpan (Prefix-Projected Sequential Pattern Mining) algorithm
    • Pattern-growth approach for sequential pattern mining (avoids candidate generation)
    • Uses prefix-projection to reduce the search space
      • Projects the database based on frequent prefixes (creates smaller databases)
      • Recursively grows frequent subsequences in each projected database (expands patterns)
    • Avoids candidate generation and multiple database scans (more efficient than GSP)
    • Steps in PrefixSpan
      1. Find frequent items to form length-1 sequential patterns (individual items above support threshold)
      2. Divide the search space into smaller projected databases based on prefixes (create sub-databases)
      3. Recursively grow frequent subsequences in each projected database (expand patterns)
      4. Concatenate prefixes with frequent subsequences to generate sequential patterns (combine patterns)