• Nenhum resultado encontrado

We first note that [EJS10] shows computing the period of a string in one-pass requiresΩ(n) space. Since the problem of periodicity for strings containing wild- cards is a generalization of exact periodicity, the same lower bound applies.

Periodicity in Data Streams with Wildcards 103

Theorem 7 (Implied from Theorem 3 from [EJS10] and Theorem 16 from [EGSZ17]). Given a string S with at most k wildcard characters, any one-pass streaming algorithm that computes the smallest wildcard-period requires Ω(n) space.

To show a lower bound that randomized streaming algorithm that computes all wildcard-periods ofS with probability at least 1n1, even under the promise that the wildcard-periods are at most n/2, consider the following construction.

Define an infinite string 110112021303. . ., as in [GMSU16], and letνbe the prefix of length n4. DefineX to be the set of binary strings of length n4 with Hamming distance k2 from ν. For x∈ X, letYx be the set of binary strings of length n4 with either Δ(x, y) = k2 or Δ(x, y) = k2 + 1. Pick (x, y) uniformly at random from (X, Yx). Then Theorem 17 in [EGSZ17] shows a lower bound on the size of the sketches necessary to determine whether Δ(x, y) = k2 or Δ(x, y) = k2 + 1.

Theorem 8 [EGSZ17]. Any sketching function S that determines whether Δ(x, y) = k2 orΔ(x, y)> k2 fromS(x)andS(y), with probability at least1n1 fork=o(

n), usesΩ(klogn)space.

Suppose Alice hasy, along with the locations of the first k2 positionsiin which y[i]=x[i]. Alice replaces these locations with wildcard characters , runs the wildcard-period algorithm, and forwards the state of the algorithm to Bob, who hasx. Bob then continues running the algorithm onx◦x◦xto determine the wildcard-period of the string S(x, y) =y◦x◦x◦x. Observe that:

Lemma 5. If Δ(x, y) =k2, then the string S(x, y) =y◦x◦x◦xhas period n4. On the other hand, if Δ(x, y) =k2+ 1, then S(x, y) has period greater than n4. Combining Theorem8and Lemma5:

Theorem 9. For k = o(

n) with k > 2, any one-pass randomized streaming algorithm that computes all wildcard-periods of an input stringS with probability at least11n requiresΩ(klogn)space, even under the promise that the wildcard- periods are at most n2.

Acknowledgements. We would like to thank the anonymous reviewers for their help- ful comments. The work was supported by the National Science Foundation under NSF Awards #1649515 and #1619081.

References

[AEL10] Amir, A., Eisenberg, E., Levy, A.: Approximate periodicity. In: Cheong, O., Chwa, K.-Y., Park, K. (eds.) ISAAC 2010. LNCS, vol. 6506, pp. 25–36.

Springer, Heidelberg (2010).https://doi.org/10.1007/978-3-642-17517-6 5 [AG97] Apostolico, A., Galil, Z. (eds.): Pattern Matching Algorithms. Oxford Uni-

versity Press, Oxford (1997)

[AGM+90] Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol.215(3), 403–410 (1990)

104 F. Erg¨un et al.

[AGMP13] Andoni, A., Goldberger, A., McGregor, A., Porat, E.: Homomorphic finger- prints under misalignments: sketching edit and shift distances. In: Proceed- ings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, pp. 931–940 (2013)

[BG11] Breslauer, D., Galil, Z.: Real-time streaming string-matching. In: Gian- carlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 162–172.

Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21458- 5 15

[Bla08] Blanchet-Sadri, F.: Algorithmic Combinatorics on Partial Words. Discrete Mathematics and its Applications. CRC Press, Boca Raton (2008) [BMRW12] Blanchet-Sadri, F., Mercas, R., Rashin, A., Willett, E.: Periodicity algo-

rithms and a conjecture on overlaps in partial words. Theor. Comput. Sci.

443, 35–45 (2012)

[CC07] Clifford, P., Clifford, R.: Simple deterministic wildcard matching. Inf. Pro- cess. Lett.101(2), 53–54 (2007)

[CEPR09] Clifford, R., Efremenko, K., Porat, E., Rothschild, A.: From coding theory to efficient pattern matching. In: Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 778–784 (2009) [CFP+15] Clifford, R., Fontaine, A., Porat, E., Sach, B., Starikovskaya, T.: Dictionary

matching in a stream. In: Bansal, N., Finocchi, I. (eds.) ESA 2015. LNCS, vol. 9294, pp. 361–372. Springer, Heidelberg (2015). https://doi.org/10.

1007/978-3-662-48350-3 31

[CFP+16] Clifford, R., Fontaine, A., Porat, E., Sach, B., Starikovskaya, T.A.: The k-mismatch problem revisited. In: Proceedings of the 27th Annual ACM- SIAM Symposium on Discrete Algorithms, SODA, pp. 2039–2052 (2016) [CH02] Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wild-

card matching. In: Proceedings on 34th Annual ACM Symposium on The- ory of Computing (STOC), pp. 592–601 (2002)

[CJPS13] Clifford, R., Jalsenius, M., Porat, E., Sach, B.: Space lower bounds for online pattern matching. Theor. Comput. Sci.483, 68–74 (2013)

[CKP17] Clifford, R., Kociumaka, T., Porat, E.: The streaming k-mismatch prob- lem. CoRR, abs/1708.05223 (2017)

[CM11] Crouch, M.S., McGregor, A.: Periodicity and cyclic shifts via linear sketches. In: Goldberg, L.A., Jansen, K., Ravi, R., Rolim, J.D.P. (eds.) APPROX/RANDOM -2011. LNCS, vol. 6845, pp. 158–170. Springer, Hei- delberg (2011).https://doi.org/10.1007/978-3-642-22935-0 14

[EAE06] Elfeky, M.G., Aref, W.G., Elmagarmid, A.K.: STAGGER: periodicity min- ing of data streams using expanding sliding windows. In: Proceedings of the 6th IEEE International Conference on Data Mining (ICDM), pp. 188–199 (2006)

[EGSZ17] Erg¨un, F., Grigorescu, E., Azer, E.S., Zhou, S.: Streaming periodicity with mismatches. In: Approximation, Randomization, and Combinatorial Opti- mization. Algorithms and Techniques, APPROX/RANDOM, pp. 42:1–

42:21 (2017)

[EGSZ18] Erg¨un, F., Grigorescu, E., Azer, E.S., Zhou, S.: Periodicity in data streams with wildcards. CoRR, abs/1802.07375 (2018)

[EJS10] Erg¨un, F., Jowhari, H., Sa˘glam, M.: Periodicity in streams. In: Serna, M., Shaltiel, R., Jansen, K., Rolim, J. (eds.) APPROX/RANDOM -2010.

LNCS, vol. 6302, pp. 545–559. Springer, Heidelberg (2010). https://doi.

org/10.1007/978-3-642-15369-3 41

Periodicity in Data Streams with Wildcards 105

[EMS10] Erg¨un, F., Muthukrishnan, S., Sahinalp, S.C.: Periodicity testing with sub- linear samples and space. ACM Trans. Algorithms6(2), 43:1–43:14 (2010) [Gaw13] Gawrychowski, P.: Optimal pattern matching in LZW compressed strings.

ACM Trans. Algorithms (TALG)9(3), 25 (2013)

[GKP16] Golan, S., Kopelowitz, T., Porat, E.: Streaming pattern matching with d wildcards. In: 24th Annual European Symposium on Algorithms, pp.

44:1–44:16 (2016)

[GMSU16] Gawrychowski, P., Merkurev, O., Shur, A.M., Uznanski, P.: Tight tradeoffs for real-time approximation of longest palindromes in streams. In: 27th Annual Symposium on Combinatorial Pattern Matching, CPM, pp. 18:1–

18:13 (2016)

[GS83] Galil, Z., Seiferas, J.: Time-space-optimal string matching. J. Comput.

Syst. Sci.26(3), 280–294 (1983)

[HR14] Hermelin, D., Rozenberg, L.: Parameterized complexity analysis for the closest string with wildcards problem. In: Combinatorial Pattern Matching - 25th Annual Symposium, CPM Proceedings, pp. 140–149 (2014) [IKM00] Indyk, P., Koudas, N., Muthukrishnan, S.: Identifying representative

trends in massive time series data sets using sketches. In: VLDB, Pro- ceedings of 26th International Conference on Very Large Data Bases, pp.

363–372 (2000)

[Ind98] Indyk, P.: Faster algorithms for string matching problems: matching the convolution bound. In: 39th Annual Symposium on Foundations of Com- puter Science, FOCS, pp. 166–173 (1998)

[Kal02] Kalai, A.: Efficient pattern-matching with don’t cares. In: Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 655–656 (2002)

[KMP77] Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput.6(2), 323–350 (1977)

[KR87] Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algo- rithms. IBM J. Res. Dev.31(2), 249–260 (1987)

[LN11] Lachish, O., Newman, I.: Testing periodicity. Algorithmica60(2), 401–420 (2011)

[LNV14] Lewenstein, M., Nekrich, Y., Vitter, J.S.: Space-efficient string indexing for wildcard pattern matching. In: 31st International Symposium on The- oretical Aspects of Computer Science (STACS), pp. 506–517 (2014) [MMT14] Manea, F., Mercas, R., Tiseanu, C.: An algorithmic toolbox for periodic

partial words. Discret. Appl. Math.179, 174–192 (2014)

[MR95] Muthukrishnan, S., Ramesh, H.: String matching under a general matching relation. Inf. Comput.122(1), 140–148 (1995)

[PL07] Porat, E., Lipsky, O.: Improved sketching of hamming distance with error correcting. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp.

173–182. Springer, Heidelberg (2007).https://doi.org/10.1007/978-3-540- 73437-6 19

[PP09] Porat, B., Porat, E.: Exact and approximate pattern matching in the streaming model. In: 50th Annual IEEE Symposium on Foundations of Computer Science, FOCS, pp. 315–323 (2009)

[RS17] Radoszewski, J., Starikovskaya, T.A.: Streaming k-mismatch with error correcting and applications. In: 2017 Data Compression Conference, DCC, pp. 290–299 (2017)

Maximum Colorful Cycles