• Nenhum resultado encontrado

The Approximate Period Problem

N/A
N/A
Protected

Academic year: 2017

Share "The Approximate Period Problem"

Copied!
7
0
0

Texto

(1)

The Approximate Period Problem

Vladimir Yu. Popov

∗†

Abstract—We show that the approximate period problem is NP-complete for metric and alphabet car-dinality 7.

Keywords: approximate periods, computational com-plexity

1

Introduction

Most of the current work in music information retrieval has been done with monophonic sources. In monophonic music, no new note begins until the current note has fin-ished sounding. The most basic approach to monophonic feature extraction reduces notes to a single dimension. Pitch is extracted and duration is ignored, or vice versa. Both pitch and duration information may be used in the final retrieval system, but as features they are treated separately. Most music information retrieval researchers favor relative measures because a change in tempo or transposition across keys does not significantly alter the music information expressed [1], [2], [3], [4], [5], [6], [7].

Relative pitch has three standard expressions: exact in-terval, rough contour, and simple contour. Exact interval is the signed magnitude between two contiguous pitches. Simple contour keeps the sign and discards the magni-tude. Rough contour keeps the sign and groups the mag-nitude into a number of equivalence classes. Collectively, this techniques are known as unigrams. Long sequences, orn-grams, are constructed from an initial sequence of in-terval or ratio unigrams. A most sophisticated approach to n-gram extraction is the detection of repeating

pat-terns [8], [9], [10]. String matching n-gram extraction

techniques use notions such as insertions, deletions, and substitutions.

Regularities in experimentally obtained data often reveal important knowledge about the underlying physical sys-tem. Regularities in a biological sequence can be used to identify the sequence among other sequences, or to in-fer information about the evolution of the sequence. The genomes of eukaryotes, i.e. higher order organisms such humans, contain many regularities. Tandem repeats, or

Ural State University, Department of Mathemat-ics and Mechanics, 620083 Ekaterinburg, RUSSIA Email: Vladimir.Popov@usu.ru

The work partially supported by Grant of President of the Russian Federation MD-1687.2008.9 and Analytical Departmen-tal Program ”Developing the scientific potential of high school” 2.1.1/1775.

tandem arrays, which are consecutive occurrences of the same string, are the most frequent. For example, the

six nucleotidesT T AGGGappear at the end of every

hu-man chromosome in tandem arrays that contain between one and two thousand copies [11]. Finding occurrences of repeated substrings in string is a widely studied prob-lem. In biological sequence analysis searching for tandem repeats is used to reveal structural and functional infor-mation.

DNA and protein sequences can be seen as long texts over specific alphabets. Those sequences represent the genetic code of living beings. Searching specific sequences over those texts appeared as a fundamental operation for prob-lems such as looking for given features in DNA chains, or determining how different two genetic sequences were. This was modeled as searching for given “patterns” in a “text”. However, exact searching was of no use for this application, since the patterns rarely matched the text exactly. The genetic sequences of two members of the same species are not identical, they are just very similar. Moreover, searching for exact tandem repeats can be too restrictive because of experimental errors. This gave a motivation to “search allowing errors”.

2

Preliminaries

Traditionally the alignment notation has been used to illustrate a comparison between two or more sequences. Given a set of strings

X ={x1, x2, . . . , xk}

on an alphabet Σ, a multiple alignment of X is a set of

strings

A={A1, A2, . . . , Ak},

|A1|=|A2|=. . .|Ak|=n, on augmented alphabet Γ =

Σ∪{∆}such that each stringAiis a copy ofxiinto which

n− |xi| copies of special symbol ∆ have been inserted.

Symbol ∆ is called an indel and represents the insertion or deletion of a particular symbol in one string relative to another [12].

A conventional way to measure the approximate simi-larity between two sequences a1. . . amandb1. . . bn is to

(2)

calculate local transformations or costs of local transfor-mations. Usually the considered local transformations are the following:

• substitution: ai→bj;

• insertion: ∆→bj;

• deletion: ai→∆.

To define a distance between sequences, one should first fix the set of local transformations and non-negative

val-ued cost function δ that gives for each transformation

a→b a costδ(a, b). A penalty matrix specifies the

sub-stitution cost for each pair of characters and the inser-tion/deletion cost for each character. The differences ap-pearing in the considered two sequences can be viewed differently, e.g., one substitution can be viewed as one insertion and one deletion. Therefore, it is natural to observe the minimum number of such differences. The

weighted edit distance between xand y is the minimum

cost to convertxtoyusing a penalty matrix. The general

framework of the edit distance and its many variations is given in [13].

3

Problem Definition

We consider the notion of approximate periods which is an approximate version of periods. This notion is first

discussed in [14]. Let δ be a distance function which

specified by a penalty matrix. Given two stringsx, pand

distance functionδ, we define approximate periods as

fol-lows. If there exists a partition ofxinto disjoint blocks of substrings, i.e.,x=p1. . . pr,pi6=ǫ, such thatδ(p, pi)≤t

for 1≤i < r, and δ(p′, pi)twhere pis some prefix of

p, we say thatpis at-approximate period ofx(see [14]).

Let us consider the following problem:

The approximate period problem (AP)

Instance: A finite alphabet Γ, a string x from Γ∗, a

penalty matrixM, and a positive integer t.

Question: Is there a string u ∈ Γ∗ such that u is a

t-approximate period ofx?

4

The Main Result

If |Γ| ≥ 9 then there exists δ such that δ(a, a) = 0,

δ(a, b) =δ(b, a) for alla, b∈Γ, andAP problem isNP

-complete [14]. It is shown in [15] that AP problem is

NP-complete for |Γ| ≥ 5 and penalty matrix M as in

figure 1.

Theorem. If |Γ| ≥ 7 then there exists δ such that

δ(a, a) = 0, δ(a, b) = δ(b, a) for all a, b ∈ Γ, and AP

problem isNP-complete.

A G C T ∆

A 0 m d t+ 1 t+ 1

G m 0 d t+ 1 t+ 1

C d d 2d t+ 1 t+ 1

T t+ 1 t+ 1 t+ 1 0 t+ 1

∆ t+ 1 t+ 1 t+ 1 t+ 1 0

Figure 1: The penalty matrix M

Proof. It is easy to see that theAPproblem is inNP.

Consider the familiar Hamming distance, D, where the

local transformations are of the form a → b with cost

D(a, a) = 0 andD(a, b) = 1, fora6=b. Lety1 andy2 be

finite strings. Let us consider the following problem:

The closest string problem (CS)

Instance: A finite alphabet Σ, a set S={s1, s2, . . . , sn}

of strings each of lengthm,S⊆Σ∗, and a positive integer

d.

Question: Is there a string s of length m such that for every stringsi∈S, D(s, si)≤d?

TheCSproblem isNP-complete even for the restriction

to a binary alphabet [16], [17]. We will assume that Σ =

{0,1}. Now we transform an instance of theCSproblem

to an instance of theAPproblem as follows.

• Γ = Σ∪ {2,3,4, T,∆}

• x=T2mT23mT22mT22mT23mT23mT24mT2s

1T2s2T2

s3T . . . T snT

• t=md. Note that we can assume thatd >m

2.

• Define the penalty matrixM as in figure 2.

0 1 2 3 4 T ∆

0 0 m d d d t+ 1 t+ 1

1 m 0 d d d t+ 1 t+ 1

2 d d 0 2d 2d t+ 1 t+ 1

3 d d 2d 0 2d t+ 1 t+ 1

4 d d 2d 2d 0 t+ 1 t+ 1

T t+ 1 t+ 1 t+ 1 t+ 1 t+ 1 0 t+ 1

∆ t+ 1 t+ 1 t+ 1 t+ 1 t+ 1 t+ 1 0

Figure 2: The penalty matrix M

It is easy to see that this transformation can be done in polynomial time.

Let us show that if there is a string usuch that u is a

t-approximate period ofx, then u=T u′T where uis a

string in alphabet{0,1}.

Let us consider a partition of x into disjoint blocks of

(3)

substrings: x=p1. . . pr.

First suppose that u has no T. Clearly, there exists a

partition block of x which has at least one T, and the

distance between u and the partition block is greater

then t. Therefore, u must have at least one T.

Sup-pose that u has no more than one T. In this case,

u = u′(0,1,2,3,4,∆)T u′′(0,1,2,3,4,∆). Consider the

alignment ofp1, . . . , prinduced byu. It is easy to see that

p1=p′1(∆)T p′′1(2,∆). It is clear that eitherδ(u, p1)> t

or

δ(u′(0,1,2,3,4,∆), p

1(∆)) +

+δ(u′′(0,1,2,3,4,∆), p′′

1(2,∆))≤t. (1)

Since u is a t-approximate period of x, δ(u, p1)≤ t. In

view of (1),

δ(u′(0,1,2,3,4,∆), p

1(∆))≤t (2)

and

δ(u′′(0,1,2,3,4,∆), p′′

1(2,∆))≤t.

Sinceδ(∆, y) =t+ 1,y ∈ {0,1,2,3,4}, in view of (2), it is easy to see that

u′(0,1,2,3,4,∆)∈ {}. (3)

Since p1 =p′1(∆)T p′′1(2,∆), p2 =p′2(2,∆)T p′′2(∆). It is

clear that eitherδ(u, p2)> tor

δ(u′(0,1,2,3,4,∆), p

2(2,∆)) +

+δ(u′′(0,1,2,3,4,∆), p′′

2(∆))≤t. (4)

Since u is a t-approximate period of x, δ(u, p2) ≤ t.

Therefore, in view of (4),

δ(u′(0,1,2,3,4,∆), p

2(2,∆))≤t (5)

and

δ(u′′(0,1,2,3,4,∆), p′′

2(∆))≤t. (6)

Since δ(∆, y) = t+ 1, y ∈ {0,1,2,3,4}, in view of (3) and (5), p′

2(2,∆) ∈ {∆}∗. Similarly, in view of (6),

u′′(0,1,2,3,4,∆) ∈ {}and p′′

1(2,∆) ∈ {∆}∗. Since

p′′

1(2,∆) ∈ {∆}∗, p2′(2,∆) ∈ {∆}∗ and 2m is a

subse-quence of p′′

1(2,∆)p′2(2,∆),umust have at least twoT.

Now suppose that

u=u′(0,1,2,3,4,∆)T u′′(0,1,2,3,4,∆)T . . .

. . . u(k)(0,1,2,3,4,∆)T u(k+1)(0,1,2,3,4,∆).

It is easy too see that in this case

p1=p′1T p′′1T . . . p (k) 1 T p

(k+1) 1 ,

where p′

1, p′′1, . . . , p (k) 1 , p

(k+1)

1 ∈ {0,1,2,3,4,∆}∗. Since u

is at-approximate period ofx,δ(u, p1)≤t. Therefore,

δ(u′(0,1,2,3,4,∆), p

1) +

+δ(u′′(0,1,2,3,4,∆), p′′

1) +. . .

+δ(u(k)(0,1,2,3,4,∆), p(k) 1 ) +

+δ(u(k+1)(0,1,2,3,4,∆), p(k+1)

1 )≤t. (7)

In view of (7),

δ(u′(0,1,2,3,4,∆), p

1)≤t,

δ(u′′(0,1,2,3,4,∆), p′′

1)≤t,

. . .

δ(u(k)(0,1,2,3,4,∆), p(k) 1 )≤t,

δ(u(k+1)(0,1,2,3,4,∆), p(k+1)

1 )≤t. (8)

Suppose thatk= 2p+ 1 wherep≥1. In this case

p2=p′2T p′′2T . . . p (k) 2 T p

(k+1) 2 ,

wherep′′

2 ∈ {∆}∗. It is easy to see thatp′′1 ∈ {2,∆}∗, and

2mis a subsequence ofp′′

1. Sinceδ(∆,2) =t+ 1, In view

of (8), it is easy to check thatu′′(0,1,2,3,4,∆)∈ {/ }.

Sinceδ(u, p2)≤t,

δ(u′(0,1,2,3,4,∆), p

2) +

+δ(u′′(0,1,2,3,4,∆), p′′

2) +. . .

+δ(u(k)(0,1,2,3,4,∆), p(k) 2 ) +

+δ(u(k+1)(0,1,2,3,4,∆), p(k+1)

2 )≤t. (9)

In view of (9),

(4)

δ(u′′(0,1,2,3,4,∆), p′′

2)≤t.

Since k = 2p + 1, it is easy to check that p′′

2 ∈

{∆}∗. In view of δ(∆, y) = t + 1, y ∈ {0,1,2,3},

u′′(0,1,2,3,4,∆)∈ {}. Therefore,k6= 2p+ 1.

Suppose that k= 2pwherep≥7. It is easy to see that

in this case

p1=W1(∆)T(2m)∆T W2(∆)T(3m)∆T W3(∆)T(2m)∆

T W4(∆)T(2m)∆T W5(∆)T(3m)∆T W6(∆)T(3m)∆

T W7(∆)T(4m)∆T W8(∆)T(s1)∆T W9(∆)T(s2)∆T . . .

. . . T(sp7)∆T Wp+1(∆),

whereWj(∆)∈ {∆}∗, 1≤j≤p+ 1,

(Y)∆= ∆q1Y1∆q2Y2. . .∆qrYr∆qr+1,

Y =Y1Y2. . . Yr.

Since u is a t-approximate period of x, δ(u, p1) ≤ t.

Therefore,

δ(u′(0,1,2,3,4,∆), W

1(∆)) +

+δ(u′′(0,1,2,3,4,∆),(2m)

∆) +

+δ(u′′′(0,1,2,3,4,∆), W2(∆)) +

+δ(u(4)(0,1,2,3,4,∆),(3m)

∆) +

+δ(u(5)(0,1,2,3,4,∆), W

3(∆)) +

+δ(u(6)(0,1,2,3,4,∆),(2m)

∆) +

+δ(u(7)(0,1,2,3,4,∆), W

4(∆)) +

+δ(u(8)(0,1,2,3,4,∆),(2m)

∆) +

+δ(u(9)(0,1,2,3,4,∆), W

5(∆)) +

+δ(u(10)(0,1,2,3,4,∆),(3m)

∆) +

+δ(u(11)(0,1,2,3,4,∆), W

6(∆)) +

+δ(u(12)(0,1,2,3,4,∆),(3m)

∆) +

+δ(u(13)(0,1,2,3,4,∆), W

7(∆)) +

+δ(u(14)(0,1,2,3,4,∆),(4m)

∆) +

+δ(u(15)(0,1,2,3,4,∆), W

8(∆)) +

+δ(u(16)(0,1,2,3,4,∆),(s 1)∆) +

+δ(u(17)(0,1,2,3,4,∆), W

9(∆)) +

+. . .+δ(u(k)(0,1,2,3,4,∆),(s

p−7)∆) +

+δ(u(k+1)(0,1,2,3,4,∆), W

p+1(∆))≤t (10)

and

δ(u′(0,1,2,3,4,∆), W1(∆))t

δ(u′′(0,1,2,3,4,∆),(2m)

∆)≤t

δ(u′′′(0,1,2,3,4,∆), W2(∆))t

δ(u(4)(0,1,2,3,4,∆),(3m)

∆)≤t

δ(u(5)(0,1,2,3,4,∆), W

3(∆))≤t

δ(u(6)(0,1,2,3,4,∆),(2m)

∆)≤t

δ(u(7)(0,1,2,3,4,∆), W

4(∆))≤t

δ(u(8)(0,1,2,3,4,∆),(2m)

∆)≤t

δ(u(9)(0,1,2,3,4,∆), W

5(∆))≤t

δ(u(10)(0,1,2,3,4,∆),(3m)

∆)≤t

δ(u(11)(0,1,2,3,4,∆), W

6(∆))≤t

δ(u(12)(0,1,2,3,4,∆),(3m)

∆)≤t

δ(u(13)(0,1,2,3,4,∆), W

7(∆))≤t

δ(u(14)(0,1,2,3,4,∆),(4m)

∆)≤t

δ(u(15)(0,1,2,3,4,∆), W

8(∆))≤t

δ(u(16)(0,1,2,3,4,∆),(s

1)∆)≤t

δ(u(17)(0,1,2,3,4,∆), W

9(∆))≤t

. . . δ(u(k)(0,1,2,3,4,∆),(s

p−7)∆)≤t

δ(u(k+1)(0,1,2,3,4,∆), W

p+1(∆))≤t (11)

Since Wj(∆) ∈ {∆}∗, 1 ≤ j ≤ p+ 1, in view of (11),

without loss of generality, we can assume that Wj(∆),

u(2j−1)(0,1,2,3,4,∆), 1 j p+ 1, are empty words.

In view of (11), it is easy to check that if (2m)

∆) =

Y1Y2. . . Yr and u′′(0,1,2,3,4,∆) = Z1Z2. . . Zr, where

Y1, Y2, . . . , Yr ∈ {2,∆}, Z1, Z2, . . . , Zr∈ {0,1,2,3,4,∆},

then

Yi= ∆⇔Zi= ∆,1≤i≤r.

Therefore, without loss of generality, we can assume that (2m)

∆) ∈ {2}∗ and u′′(0,1,2,3,4,∆) ∈ {0,1,2,3,4}∗.

Since (2m)

∆)∈ {2}∗, it is easy to check that (2m)∆) =

2m. Similarly, we can assume that u2j(0,1,2,3,∆)

{0,1,2,3}∗, (3m)

∆) = 3m, (4m)∆) = 4m, (si)∆ = si,

where 1< j≤p, 1≤i≤p−2.

Let occ(y, v) denote the number of

occur-rences of the letter y in the word v. Since

u′′(0,1,2,3,4,∆), u(4)(0,1,2,3,4,∆) {0,1,2,3,4}

(5)

and (2m)

∆) = 2m, (3m)∆) = 3m, in view of (10), it is

easy to see that

d·occ(0, u′′(0,1,2,3,4,∆))+

+d·occ(1, u′′(0,1,2,3,4,∆))+

+2d·occ(3, u′′(0,1,2,3,4,∆)) =

=δ(u′′(0,1,2,3,4,∆),2m),

d·occ(0, u(4)(0,1,2,3,4,∆))+

+d·occ(1, u(4)(0,1,2,3,4,∆))+

+2d·occ(2, u(4)(0,1,2,3,4,∆)) =

=δ(u(4)(0,1,2,3,4,∆),3m),

d·occ(0, u(6)(0,1,2,3,4,∆))+

+d·occ(1, u(6)(0,1,2,3,4,∆))+

+2d·occ(3, u(6)(0,1,2,3,4,∆)) =

=δ(u(6)(0,1,2,3,4,∆),2m),

d·occ(0, u(8)(0,1,2,3,4,∆))+

+d·occ(1, u(8)(0,1,2,3,4,∆))+

+2d·occ(3, u(8)(0,1,2,3,4,∆)) =

=δ(u(8)(0,1,2,3,4,∆),2m),

d·occ(0, u(10)(0,1,2,3,4,∆))+

+d·occ(1, u(10)(0,1,2,3,4,∆))+

+2d·occ(2, u(10)(0,1,2,3,4,∆)) =

=δ(u(10)(0,1,2,3,4,∆),3m),

d·occ(0, u(12)(0,1,2,3,4,∆))+

+d·occ(1, u(12)(0,1,2,3,4,∆))+

+2d·occ(2, u(12)(0,1,2,3,4,∆)) =

=δ(u(12)(0,1,2,3,4,∆),3m),

and

P6

i=1occ(2, u(2i)(0,1,2,3,4,∆)) +

+occ(3, u(2i)(0,1,2,3,4,∆))5m. (12)

It is not hard to check that

p2=T sp−6T2sp−5T2. . . s2p−8T2s2p−7T.

Since δ(a, b) =d,a ∈ {0,1}, b ∈ {2,3}, in view of (12), δ(u, p2)≥5t. Therefore, p <7. Similarly, it is not hard

to check thatp <3.

Suppose thatp= 2. It is easy to see that in this case

p1=W1(∆)T(2m)∆T W2(∆)T(3m)∆T W3(∆).

Since u is a t-approximate period of x, δ(u, p1) ≤ t.

Therefore,

δ(u′(0,1,2,3,4,∆), W

1(∆)) +

+δ(u′′(0,1,2,3,4,∆),(2m)

∆) +

+δ(u′′′(0,1,2,3,4,∆), W2(∆)) +

+δ(u(4)(0,1,2,3,4,∆),(3m)

∆) +

+δ(u(5)(0,1,2,3,4,∆), W

3(∆))≤t (13)

and

δ(u′(0,1,2,3,4,∆), W

1(∆))≤t

δ(u′′(0,1,2,3,4,∆),(2m)

∆)≤t

δ(u′′′(0,1,2,3,4,∆), W2(∆))t

δ(u(4)(0,1,2,3,4,∆),(3m)

∆)≤t

δ(u(5)(0,1,2,3,4,∆), W

3(∆))≤t (14)

Since Wj(∆) ∈ {∆}∗, 1 ≤ j ≤ p+ 1, in view of (14),

without loss of generality, we can assume that Wj(∆),

u(2j1)(0,1,2,3,4,∆), 1 j p+ 1, are empty words.

In view of (14), it is easy to check that if (2m)

∆) =

Y1Y2. . . Yr and u′′(0,1,2,3,4,∆) = Z1Z2. . . Zr, where

Y1, Y2, . . . , Yr ∈ {2,∆}, Z1, Z2, . . . , Zr∈ {0,1,2,3,4,∆},

then

Yi= ∆⇔Zi= ∆,1≤i≤r.

Therefore, without loss of generality, we can assume that (2m)

∆) ∈ {2}∗ and u′′(0,1,2,3,4,∆) ∈ {0,1,2,3,4}∗.

Since (2m)

∆)∈ {2}∗, it is easy to check that (2m)∆) =

2m. Similarly, we can assume that u2j(0,1,2,3,4,∆)

{0,1,2,3,4}∗, (3m)

∆) = 3m, where 1< j≤p.

Since

u′′(0,1,2,3,4,∆), u(4)(0,1,2,3,4,∆)∈ {0,1,2,3,4}

(6)

and (2m)

∆) = 2m, (3m)∆) = 3m, in view of (13), it is

easy to see that

d·occ(0, u′′(0,1,2,3,4,∆))+

+d·occ(1, u′′(0,1,2,3,4,∆))+

+2d·occ(3, u′′(0,1,2,3,4,∆)) =

=δ(u′′(0,1,2,3,4,∆),2m),

d·occ(0, u(4)(0,1,2,3,4,∆))+

+d·occ(1, u(4)(0,1,2,3,4,∆))+

+2d·occ(2, u(4)(0,1,2,3,4,∆)) =

=δ(u(4)(0,1,2,3,4,∆),3m),

and

2

X

i=1

occ(2, u(2i)(0,1,2,3,4,∆))+

+occ(3, u(2i)(0,1,2,3,4,∆))m.

It is not hard to check that

p2=T2mT22mT, p3=T3mT23mT.

Therefore,

occ(2, u(2)(0,1,2,3,4,∆))+

+occ(2, u(4)(0,1,2,3,4,∆))≥m,

occ(3, u(2)(0,1,2,3,4,∆))+

+occ(3, u(4)(0,1,2,3,4,∆))m,

and

P2

i=1occ(2, u(2

i)(0,1,2,3,4,∆)) +

+occ(3, u(2i)(0,1,2,3,4,∆))2m. (15)

It is not hard to check that

p5=T s2T2s3T.

Since δ(a, b) =d,a ∈ {0,1}, b ∈ {2,3}, in view of (15), δ(u, p5)≥2t. Therefore,p= 1.

Since p= 1, it is easy to check that u = T u′T, where

u′ is a string in alphabet {0,1,2,3,4}, and p1 =T2mT,

p2=T3mT,p7=T4mT. Clearly,

δ(u, p1)≤t, δ(u, p2)≤t, δ(u, p7)≤t.

Therefore,

d·occ(0, u) +d·occ(1, u) + +2d·occ(3, u) + 2d·occ(4, u)≤t,

d·occ(0, u) +d·occ(1, u) + +2d·occ(2, u) + 2d·occ(4, u)≤t,

d·occ(0, u) +d·occ(1, u) +

+2d·occ(2, u) + 2d·occ(3, u)≤t. (16)

In view of (16),

occ(0, u) +occ(1, u) +

+2occ(3, u) + 2occ(4, u)≤m,

occ(0, u) +occ(1, u) +

+2occ(2, u) + 2occ(4, u)≤m,

occ(0, u) +occ(1, u) +

+2occ(2, u) + 2occ(3, u)≤m. (17)

In view of (17),

3occ(0, u) + 3occ(1, u) + 4occ(2, u)+

+4occ(3, u) + 4occ(4, u)≤3m.

Since u = T u′T, where uis a string in alphabet

{0,1,2,3,4},

occ(0, u) +occ(1, u) +occ(2, u)+

+occ(3, u) +occ(4, u) =m.

Therefore,

occ(2, u) +occ(3, u) +occ(4, u) = 0,

andu=T u′T, whereuis a string in alphabet{0,1}.

Since u=T u′T, whereuis a string in alphabet {0,1},

for every stringsi ∈S,δ(u′, si)≤t. In view ofδ(0,1) =

δ(1,0) =m,D(u′, si)d. Therefore,s=u.

Now suppose that there is a string s of length m such

that for every string si ∈S, D(s, si)≤ d. It is easy to

see thatu=T sT ist-approximate period ofx.

(7)

References

[1] Blackburn, S., DeRoure, D., “A tool for

content-based navigation of music,” Proceedings of ACM

International Multimedia Conference (ACMMM), Bristol, UK, pp. 361-368 9/98.

[2] Dillon, M., Hunter, M., “Automated identification

of melodic variants in folk music,” Computers and

the Humanities, V16, N2 pp. 107-117 6/82.

[3] Ghias, A., Logan, J., Chamberlin, D., Smith, B., “Query by humming-musical information retrieval

in an audio database,”Proceedings of ACM

Interna-tional Multimedia Conference (ACMMM), San Fran-cisco, CA, pp. 231-236 11/95.

[4] Kornst¨adt, A., “Themefinder: A web-based melodic

search tool,” Melodic Comparison: Concepts,

Pro-cesure, and Applications, Computing in Musicology, V11, pp. 231-236 98.

[5] Lemstr¨om, K., Laine, P., Perttu, S., “Using relative

interval slope in music information retrieval,”

Pro-ceedings of the International Computer Music Con-ference (ICMC), Beijing, China, pp. 317-320 9/99.

[6] Lindsay, A.T.,Using contour as mid-level

represen-tation of melody, Master’s thesis, MIT Media Lab, 1996.

[7] Selfridge-Field, E., “Conceptual and

representa-tional issues in melodie comparison,”Melodic

Com-parison: Concepts, Procesure, and Applications, Computing in Musicology, V11, pp. 3-64 98.

[8] Hsu, J.-L., Liu, C.-C., Chen, A.L.P., “Efficient

re-peating pattern finding in music databases,”

Pro-ceedings of the 1998 ACM CIKM International Con-ference on Information and Knowledge

Manage-ment, Bethesda, Maryland, USA, pp. 281-288 11/

98.

[9] Liu, C.-C., Hsu, J.-L., Chen, A.L.P., “Efficient theme and non-trivial repeating pattern discovering

in music databases,”Proceedings 15th International

Conference on Data Engineering, Sydney, Australia, IEEE Computer Society, pp. 14-21 3/99.

[10] Tseng, Y.-H., “Content-based retrieval for music

col-lections,” Proceedings of the 22nd annual

interna-tional ACM SIGIR conference on Research and de-velopment in information retrieval (SIGIR), Berke-ley, CA, pp. 176-182 8/99.

[11] Moyzis, R.K., “The human telomere,” Scientific

American, pp. 48-55 8/91.

[12] Pevzner, P.A., “Multiple alignment, communication

cost, and graph matching,” SIAM Journal on

Ap-plied Mathematics, V52, N6, pp. 1763-1779 12/92.

[13] Ukkonen, E., “Algorithms for approximate string

matching”Information and Control, V64, N1-3 pp.

100-118 1-3/85.

[14] Sim, J.S., Iliopoulos, C.S., Park, K., Smyth, W.F.,

“Approximate periods of strings,”Theoretical

Com-puter science, V262, N1-2, pp. 557-568 7/01.

[15] Popov, V.Yu., “The approximate period problem

for DNA alphabet,” Theoretical Computer Science,

V304, N1-3, pp. 443-447 7/03.

[16] Frances, M., Litman, A., “On covering problems of

codes,”Theory of Computer Systems, V30, N2, pp.

113-119 3/97.

[17] Lanctot, J.K., Li, M., Ma, B., Wang, S., Zhang, L.,

“Distinguishing string selection problems,”

Proceed-ings of 10th ACM-SIAM Symposium on Discrete Al-gorithms, pp. 633-642 1/99.

Referências

Documentos relacionados

A riqueza específica de morcegos nos charcos está relacionada com quatro variáveis: biomassa de dípteros, área urbana, velocidade do vento e tipo de charco.. Similarmente

The result and interpretation of numerical analysis support the foundation design, it can be used to enhance structure stability, to improve geotechnical and structural

According to similarities in biological characteristics and both DNA and protein sequences, prominent core genes, and the close phylogenetic relationships among PaP1, JG004, PAK_P1,

A comparison of phylogenetic trees based on functional genes or protein sequences with those of 16S rRNA gene sequences for the corresponding bacteria can be used

Como resultado das buscas e pesquisas, foram selecionados o Decreto 1.973, a Lei Federal 11.340, o Decreto 7.958, Lei Federal 12.845 e a norma técnica

Using the 3D structure of a protein, molecular recognition studies between protein-protein or protein- ligand can be achieved and used to explain biological

Hence, not only can this Brewer be effectively compared for the first time and for a long period to a well-maintained reference photometer, but it can also be used as a

On the light of the present results, we provided a set of predicted protein-protein interactions between viral and human proteins that can be used in functional genomics studies