Log Pattern Anomaly Search

🎯 Level: Hard Score: 80 ⏱️ Estimated Time: 30 mins
📜 Problem Description

In cybersecurity and system administration, analyzing vast streams of log data is crucial for detecting anomalies, security breaches, or system failures. Often, specific sequences of characters within a log entry indicate a critical event or a known malicious pattern. Your task is to develop an efficient algorithm that can quickly locate the first occurrence of such a pattern within a much larger log stream.

Given two strings, logStream (representing the entire log data) and anomalyPattern (the specific pattern you're searching for), your goal is to find the starting index of the very first time anomalyPattern appears as a contiguous substring within logStream. If the pattern is not found anywhere in the log stream, you should report this by returning -1.

For example, if logStream is "sysalertservicefailurerestart" and anomalyPattern is "servicefailure", the pattern starts at index 9.

This problem requires an algorithm that is highly efficient, especially when dealing with very long log streams and potentially repetitive patterns. Consider using advanced string matching techniques to solve this challenge optimally.

Input Format

The first line of input contains an integer 'T', denoting the number of test cases.

For each test case:

  • The first line contains a string logStream.
  • The second line contains a string anomalyPattern.

Output Format

For each test case, your program should output a single integer on a new line. This integer should be the 0-based index of the first occurrence of anomalyPattern in logStream. If anomalyPattern is not found, output -1.

Constraints

  • 1 <= T <= 50
  • 1 <= logStream.length, anomalyPattern.length <= 105
  • Both logStream and anomalyPattern consist only of lowercase English alphabets ('a'-'z').
📌 Constraints

                    
📝 Sample Input/Output
Example 1: Basic Match
logStream: "systemalertcriticalprocessfailure"
anomalyPattern: "critical"
Explanation: The pattern "critical" starts at index 11 of "systemalertcriticalprocessfailure".
Answer: 11
Example 2: No Match
logStream: "webserverstartupok"
anomalyPattern: "databaseerror"
Explanation: The pattern "databaseerror" is not present in "webserverstartupok".
Answer: -1
Example 3: Match at Beginning
logStream: "errorlogid456eventdetected"
anomalyPattern: "errorlog"
Explanation: The pattern "errorlog" starts right at the beginning of the log stream, at index 0.
Answer: 0
💡 Hints
  • 💡 Consider the inefficiencies of a naive string search (O(NM) complexity) for large inputs. Can you pre-process the pattern to achieve a linear time complexity (O(N+M)) for searching?
  • 💡 The Knuth-Morris-Pratt (KMP) algorithm is designed for this specific problem. It builds a 'longest proper prefix which is also a suffix' (LPS) array for the pattern. How does this array help in avoiding redundant character comparisons during the search?
  • 💡 Pay close attention to how the KMP algorithm handles mismatches. Instead of simply shifting the pattern by one character, it leverages the LPS array to determine the maximum shift possible, allowing you to resume comparison from an already partially matched prefix.
📚 Topics

String & Tries

😕 No submissions found

Start coding and your submissions will appear here.