Jorge Ferreira Alencar Lima
Geometria de Distâncias Euclidianas e Aplicações
CAMPINAS
2015
Ficha catalográfica
Universidade Estadual de Campinas
Biblioteca do Instituto de Matemática, Estatística e Computação Científica
Ana Regina Machado - CRB 8/5467
Lima, Jorge Ferreira Alencar,
L628g
Lim
Geometria de distâncias euclidianas e aplicações / Jorge Ferreira Alencar
Lima. – Campinas, SP : [s.n.], 2015.
Lim
Orientador: Carlile Campos Lavor.
Lim
Coorientador: Tibérius de Oliveira e Bonates.
Lim
Tese (doutorado) – Universidade Estadual de Campinas, Instituto de
Matemática, Estatística e Computação Científica.
Lim
1. Geometria de distâncias. 2. Matrizes de distâncias euclidianas. 3.
Escalonamento multidimensional. I. Lavor, Carlile Campos,1968-. II. Bonates,
Tibérius de Oliveira e. III. Universidade Estadual de Campinas. Instituto de
Matemática, Estatística e Computação Científica. IV. Título.
Informações para Biblioteca Digital
Título em outro idioma: Euclidean distance geometry and applications
Palavras-chave em inglês:
Distance geometry
Euclidean distance matrices
Multidimensional scaling
Área de concentração: Matemática Aplicada
Titulação: Doutor em Matemática Aplicada
Banca examinadora:
Carlile Campos Lavor [Orientador]
José Mario Martínez Pérez
Douglas Soares Gonçalves
Marcos Napoleão Rabelo
Manoel Bezerra Campêlo Neto
Data de defesa: 23-01-2015
Programa de Pós-Graduação: Matemática Aplicada
Powered by TCPDF (www.tcpdf.org)
Abstract
Euclidean distance geometry (EDG) is the study of Euclidean geometry based on the concept of
distance. This is useful in several applications, where the input data consists of an incomplete set
of distances and the output is a set of points in some Euclidean space realizing the given distances.
The key problem in EDG is known as the Distance Geometry Problem (DGP), where an
integer K>0 is given, as well as a simple undirected weighted graph 𝐺 = (𝑉, 𝐸, 𝑑), whose edges are
weighted by a non-negative function 𝑑. The problem consists in determining whether or not there
is a (realization) function that associates the vertices of 𝑉 with coordinates of the 𝐾-dimensional
Euclidean space, in such a way that those coordinates satisfy all distances given by 𝑑.
We considered both theoretical issues and applications of EDG. In theoretical terms, we proved
the exact number of solutions of a subclass of DGP that is very important in the molecular
conformation problems. Moreover, we described necessary and sufficient conditions for determining
whether a complete graph associated to a DGP is realizable and the minimum dimension of such
realization. In practical terms, we developed an algorithm that computes such realization, which
outperforms a classical algorithm from the literature. Finally, we showed a direct application of
DGP to multidimensional scaling.
Keywords: Distance Geometry, Euclidean Distance Matrices, Multidimensional Scaling.
Resumo
Geometria de Distâncias Euclidianas (GDE) é o estudo da geometria euclidiana baseado no
conceito de distância. É uma teoria útil em diversas aplicações, onde os dados consistem em um
conjunto de distâncias e as possíveis soluções são pontos em algum espaço euclidiano que realizam
as distâncias dadas.
O problema chave em GDE é conhecido como Problema de Geometria de Distâncias (PGD),
em que é dado um inteiro 𝐾 > 0 e um grafo simples, não direcionado, ponderado 𝐺 = (𝑉, 𝐸, 𝑑),
cujas arestas são ponderadas por uma função não negativa 𝑑, e queremos determinar se existe uma
função (realização) que leva os vértices de V em coordenadas no espaço euclidiano 𝐾-dimensional,
satisfazendo todas as restrições de distâncias dadas por 𝑑.
Consideramos tanto problemas teóricos quanto aplicações da GDE. Em termos teóricos,
de-monstramos a quantidade exata de soluções de uma classe de PGDs muito importante para
proble-mas de conformação molecular e, além disso, conseguimos condições necessárias e suficientes para
determinar quando um grafo completo associado a um PGD é realizável e qual o espaço euclidiano
com dimensão mínima para tal realização. Em termos práticos, desenvolvemos um algoritmo que
calcula tal realização em dimensão mínima com resultados superiores a um algoritmo clássico da
literatura. Finalmente, mostramos uma aplicação direta do PGD em problemas de escalonamento
multidimensional.
Palavras-chave: Geometria de Distâncias, Matrizes de Distâncias Euclidianas, Escalonamento
Multidimensional
Sumário
Dedicatória
xi
Agradecimentos
xiii
1
Introdução
1
2
Counting the number of solutions of KDMDGP instances
5
2.1
Introduction . . . .
5
2.2
Motivation . . . .
6
2.3
Background material . . . .
7
2.3.1
Incongruence . . . .
7
2.3.2
Probability 1
. . . .
7
2.3.3
Partial reflections . . . .
8
2.4
Counting incongruent realizations . . . .
9
3
An algorithm for realizing Euclidean distance matrices
10
3.1
Introduction . . . 10
3.2
Some results about EDM . . . 11
3.3
Numerical Experiments . . . 16
3.4
Conclusions . . . 18
4
A Distance Geometry-Based Combinatorial Approach to Multidimensional
Sca-ling
20
4.1
Introduction . . . 21
4.1.1
Application: MDS of Clustered Data . . . 21
4.2
Notation and Definitions . . . 22
4.3
Distance Geometry and Multidimensional Scaling . . . 23
4.3.1
An Approach to MDS via EDMCP . . . 26
4.4
Branch-and-Prune Algorithm for Multidimensional Scaling . . . 27
4.5
Cluster Partition-Preserving MDS . . . 29
4.6
The Backtrack Problem and a Naive Randomization Approach . . . 30
4.7
Computational Experiments . . . 31
4.7.1
Application as a Confirmatory MDS Technique
. . . 33
ix
4.7.2
BP-Based Confirmatory MDS for Large Datasets
. . . 34
4.8
Conclusion . . . 35
5
Conclusão
36
Referências
37
A Counting the number of solutions of the Discretizable Molecular Distance
Ge-ometry Problem
42
A.1 Introduction . . . 42
A.2 The Euclidean Distance Matrix Completion Problem . . . 43
A.3 Counting the number of solutions of the DMDGP . . . 43
B Branch-and-prune algorithm for multidimensional scaling preserving cluster
partition
45
B.1 Introduction . . . 46
B.2 A Cluster-Partition Preserving MDS Algorithm . . . 46
B.3 Computational Experiments . . . 48
C Learning Forbidden Subtrees in Branch-and-Prune-Based MDS
50
A minha mãe e minha filha . . . .
Agradecimentos
Agradeço primeiramente e, principalmente, à minha mãe Olivia pelo apoio incondicional e a
minha filha Nicolle por ser a principal razão para conseguir essa conquista.
Agradeço imensamente ao meu grande amigo Germano pelas horas de trabalho, conversa,
troca de ideias, conselhos e, principalmente, pela a amizade. Igualmente, agradeço a minha amiga
Christianne por aguentar ao meu lado os altos e baixos que apareceram ao longo desses anos.
Agradeço ainda aos amigos Estevão, Felipe, Carlos, Douglas, Jardel, Mateus, Michael, Luciano,
entre outros, por tudo que aconteceu ao longo desses anos de doutorado.
Ao prof. Carlile, meu reconhecimento pela oportunidade de realizar este trabalho, sempre com
muito respeito e amizade. Levarei aquilo que aprendi por toda a vida.
Ao prof. Tibérius, meu imenso obrigado pela oportunidade de trabalhar ao lado de alguém
que preza pela qualidade e pelo trabalho em si. Sempre com respeito e dedicação exemplares que
pretendo levar comigo pelo resto de minha vida acadêmica.
Aos professores e funcionários do IMECC, que direta ou indiretamente contribuiram de
al-guma forma, meu reconhecimento e gratidão, em especial aos professores Aurélio, Márcia, Plínio,
Cristiane e Laecio.
Agradeço ainda ao CNPq e a CAPES pelo apoio financeiro.
Lista de Ilustrações
2.1
The action of the reflection 𝑅
𝑣𝑥in R
𝐾. . . .
8
2.2
On the left: the set 𝑋
𝐷of realizations of the graph induced by the discretization
edges. On the right: the effect of the pruning edge {1, 4} on 𝑋
𝐷. . . .
9
3.1
A plot that shows the Stress values associated with the resulting embedding for each
artificial molecular instance. The instances are ordered with respect to the Stress
value of 𝑖𝑠𝑒𝑑𝑚
2. . . 17
3.2
A plot that shows the Stress values obtained on protein instances. The instances
are ordered with respect to the Stress value of 𝑖𝑠𝑒𝑑𝑚
2.
. . . 18
C.1 Example of failed subtree rooted on an embedding of point 𝑥
𝑖. . . 51
Lista de Tabelas
3.1
Stress values obtained on Moré-Wu instances. . . 17
3.2
Stress values obtained on Proteins instances. . . 19
4.1
Comparison between the BP algorithm for MDS (Algorithm 4.1), rBP and the
Metric Multidimensional Scaling algorithm. . . 32
4.2
Comparison between rBP algorithm and the Metric Multidimensional Scaling
algo-rithm on Parkinsons dataset using different orders of the points. . . 33
4.3
Comparison between the standard BP algorithm (Algorithm 4.1) and the proposed
cluster partition-preserving BP algorithm (Algorithm 4.2). . . 34
B.1 Comparison between the standard BP algorithm and the proposed cluster-partition
preserving BP algorithm. . . 48
Lista de Algoritmos
3.1
𝐾 = edmsph(𝐷, 𝑥)
. . . 16
4.1
Branch-and-prune algorithm for MDS. . . 28
4.2
Cluster partition-aware branch-and-prune algorithm.
. . . 30
B.1 Pseudocode of cluster-partition preserving BP algorithm. . . 47
Capítulo 1
Introdução
Na primeira metade do século XX, Menger caracterizou vários conceitos geométricos (por
exem-plo, congruência ou convexidade) em termos de distâncias [45]. Esses e outros resultados foram
organizados, completados e apresentados por Blumenthal [11], originando toda uma área de
co-nhecimento chamada Geometria de Distâncias (DG)
1. Este trabalho está associado a aplicações
dessa área, em particular, à solução do problema fundamental da área [36]:
Problema de Geometria de Distâncias (DGP)
2. Dado um inteiro 𝐾 > 0 e um grafo
simples não direcionado 𝐺 = (𝑉, 𝐸), cujas arestas são valoradas por um função 𝑑 :
𝐸 → R
+, determinar se existe uma função 𝑥 : 𝑉 → R
𝐾tal que:
∀{𝑢, 𝑣} ∈ 𝐸, ‖𝑥(𝑢) − 𝑥(𝑣)‖= 𝑑({𝑢, 𝑣}).
(1.0.1)
Ao longo desse trabalho, escreveremos 𝑥
𝑣ao invés de 𝑥(𝑣) e 𝑑
𝑢𝑣(ou 𝑑(𝑢, 𝑣)) ao invés de 𝑑({𝑢, 𝑣}).
Além disso, admitimos que ‖‖ se refere à norma Euclidiana, fazendo com que, na realidade,
adentremos numa subárea de DG chamada Geometria de Distâncias Euclidianas.
A função 𝑥 satisfazendo (1.0.1) é chamada uma realização de 𝐺 em R
𝐾. Se 𝐻 é um subgrafo de
𝐺 e ¯
𝑥 é uma realização de 𝐻, então ¯
𝑥 é uma realização parcial de 𝐺. Dado um grafo 𝐺, indicamos
seu conjunto de vértices por 𝑉 (𝐺) e seu conjunto de arestas por 𝐸(𝐺).
Vale ressaltar que, para Blumenthal, o problema fundamental de DG era o que ele chamou de
“subset problem” [11], ou seja, achar condições necessárias e suficientes para saber se uma dada
ma-triz é uma mama-triz de distâncias. Condições necessárias, para o caso particular em que as distâncias
são Euclidianas, foram descobertas por Cayley implicitamente ao longo do seu trabalho, que provou
que cinco pontos em R
3, quatro pontos no plano e três pontos numa reta possuem determinante de
Cayley-Menger igual a zero [16]. Algumas condições suficientes foram determinadas por Menger
[44], que provou que é suficiente verificar que toda submatriz quadrada (𝐾 + 3) × (𝐾 + 3) da dada
matriz são todas matrizes de distâncias [11]. A maior diferença é que uma matriz de distâncias
representa um grafo completo ponderado, enquanto o DGP não impõem qualquer estrutura sobre
𝐺. A primeira menção explícita sobre o DGP foi, provavelmente, a seguinte [60]:
1do inglês, Distance Geometry.
2do inglês, Distance Geometry Problem.
The positioning problem arises when it is necessary to locate a set of geographically
distributed objects using measurements of the distances between some object pairs.
(Yemini)
A menção explícita que somente alguns pares de objetos têm distâncias conhecidas faz a
transi-ção crucial do conhecimento clássico de GD para o DGP. Nos anos seguintes a essa publicatransi-ção,
Yemini escreveu uma outra sobre a complexidade computacional de alguns problemas envolvendo
grafos rígidos [59], onde introduziu o position-location problem como o problema de determinar as
coordenadas de um conjunto de objetos no espaço a partir de um conjunto esparso de distâncias.
Isto estava em contraste com os resultados estruturais típicos, cujo foco era a determinação da
rigidez de dadas estruturas (ver [56] e referências contidas). Enquanto isso, Saxe [50] introduziu o
DGP como um problema de 𝐾-imersão e mostrou ser NP-completo, quando 𝐾 = 1, e fortemente
NP-difícil, para 𝐾 > 1.
O interesse em DGP reside na sua larga possibilidade de aplicações, bem como na beleza da
matemática associada. Além disso, o DGP possui uma vasta lista de variações. Nosso trabalho
trata de uma dessas variações, denominada Problema Discretizável de Geometria de Distâncias
(DDGP)
3:
Problema Discretizável de Geometria de Distâncias (DDGP). É um subconjunto dos
casos de DGP para os quais existe uma ordem sobre o conjunto de vértices 𝑉 (𝐺) tal
que:
1. A realização dos 𝐾 primeiros vértices é dada;
2. Cada vértice 𝑣 numa posição 𝑖 > 𝐾 é adjacente a pelo menos 𝐾 vértices que são
predecessores de 𝑣 na ordem dada, esses vértices formam um 𝐾-clique e, dada
uma realização parcial ¯
𝑥 de 𝐺, a realização desses vértices geram um subespaço
afim de dimensão 𝐾 − 1.
A principal ideia por trás da discretização é que a intersecção de 𝐾 esferas no espaço
𝐾-dimensional pode produzir no máximo dois pontos, sob a hipótese de que seus centros estejam
num hiperplano, mas não em um subespaço afim (𝐾 − 2)-dimensional. Consideremos (𝐾 + 1)
pontos: {𝑢
𝑖}
𝐾𝑖=1e 𝑣 em R
𝐾. Se as coordenadas de {𝑢
𝑖}
𝐾𝑖=1são conhecidas, bem como as distâncias
{𝑑(𝑢
𝑖, 𝑣)}
𝐾𝑖=1, então 𝐾 esferas podem ser definidas e suas intersecções provêm, no máximo, duas
possíveis posições para o ponto 𝑣. A definição de uma ordem sobre o conjunto de vértices 𝑉 (𝐺)
satisfazendo tais condições sugere uma busca recursiva sobre uma árvore binária contendo as
possíveis coordenadas para os vértices. A árvore binária de possiveis soluções é explorada a partir
de seu topo, onde os primeiros 𝐾 vértices foram fixados, colocando um vértice de cada vez. A
cada passo, duas possíveis posições para o vértice 𝑣 em questão são computadas, e dois novos
ramos são adicionados à árvore. Como consequência, o tamanho da árvore pode crescer muito
rapidamente, mas a presença de distâncias adicionais não relacionadas com a construção da árvore
pode ajudar a verificar a factibilidade dos pontos adicionados. Assim que uma posição não factível
3do inglês, Discretizable Distance Geometry Problem.
é encontrada, o galho da árvore correspondente pode ser podado e a busca sofre um recuo. Essa
estratégia define um algoritmo eficiente chamado Branch-and-Prune (BP) [32]. Vale enfatizar que
a noção de não factibilidade, mecionada acima, pode diferir das noções apresentas em [32], de
acordo com a aplicação em questão, como veremos em uma de nossas aplicações.
Os próximos 3 capítulos descrevem algumas contribuições do autor, em colaboração com outros
pesquisadores, onde o intuito é mostrar a aplicabilidade da abordagem oriunda da discretização
acima mencionada. A seguir, faremos um breve resumo de cada capítulo.
No Capítulo 2, exploramos um caso particular do DGP, o MDGP
4, associado a experimentos de
Ressonância Magnética Nuclear (NMR)
5, que geram distâncias interatômicas 𝑑
𝑖𝑗
para certos pares
de átomos (𝑖, 𝑗) de uma dada proteína [20]. O problema é como usar esse conjunto de distâncias
a fim de calcular posições 𝑥
1, . . . , 𝑥
𝑛∈ R
3para os átomos que formam a molécula [18]. Um grafo
simples não orientado e ponderado 𝐺(𝑉, 𝐸, 𝑑) pode ser associado ao problema, onde 𝑉 é o conjunto
de átomos, 𝐸 modela o conjunto de pares de átomos cujas distâncias Euclidianas são conhecidas
e a função 𝑑 : 𝐸 → R
+associa os valores das distâncias a cada par em 𝐸. Assim, podemos definir
o MDGP formalmente por:
Dado um grafo simples não orientado e ponderado 𝐺(𝑉, 𝐸, 𝑑), existe uma função 𝑥 :
𝑉 → R
3tal que ‖𝑥
𝑖
− 𝑥
𝑗‖= 𝑑
𝑖𝑗, ∀(𝑖, 𝑗) ∈ 𝐸?
Explorando algumas propriedades de rigidez do grafo 𝐺, o espaço de busca pode ser discretizado
onde um subconjunto dos casos do MDGP é definido por DMDGP
6, que nada mais é que um
caso particular do DDGP em que 𝐾 = 3, e, além diso, adicionamos a seguinte condição extra ao
item 2 da definição de um DDGP: o conjunto de vértices adjacentes predecessores a cada vértice 𝑣
numa posição 𝑖 > 3 contém, pelo menos, os 3 vértices imediatamente anteriores na ordem. Como
primeira contribuição [39], propomos um forma de contar o número de soluções para um dado
DMDGP, baseados nas propriedades de simetria estabelecidas em [42].
No Capítulo 3, tratamos de um dos problemas mais clássicos em Geometria de Distâncias: o
problema de ajuste de distâncias. Esse problema vem sendo estudado desde as primeiras décadas
do século passado [36]. Dada uma matriz não-negativa com diagonal nula, deseja-se determinar
se esta é ou não uma matriz de distâncias Euclidianas e, caso seja, determinar um conjunto de
pontos que realize essa matriz num espaço Euclidiano de dimensão menor possível.
Existem vários trabalhos na literatura sobre esse tipo de problema [52, 36] e diversos algoritmos
foram propostos [19, 21, 53]. Alguns deles precisam da dimensão “mínima” como entrada, enquanto
outros podem se mostrar bastante sensíveis em casos práticos.
Como contribuição, elaboramos um algoritmo que resolve tal problema precisando apenas da
matriz como entrada e identificando se esta é uma matriz de distâncias Euclidiana. Uma realização
e a dimensão mínima são dadas, caso tal matriz seja uma matriz de distâncias Euclidiana. O
algoritmo é baseado no problema de determinar a intersecção entre 𝐾 esferas em R
𝐾, onde 𝐾
varia ao longo do algoritmo [4].
4do inglês, Molecular Distance Geometry Problem. 5do inglês, Nuclear Magnetic Resonance.
6do inglês, Discrezable Molecular Distance Geometry Problem.
No Capítulo 4, desenvolvemos uma técnica de Escalonamento Multidimensional (MDS)
7asso-ciada ao seguinte problema: dadas as informações sobre dissimilaridades entre pares de 𝑛 objetos
de um dado conjunto, achar uma representação de baixa dimensão de tais objetos que minimiza
uma função perda que mede o erro entre as dissimilaridades originais e as distâncias resultantes da
imersão em baixa dimensão [14]. Essa representação em baixa dimensão é normalmente chamada
de um representação de MDS.
Consideremos um conjunto de pontos em R
𝑁ao qual um procedimento de agrupamento (por
exemplo, 𝑘-means) foi aplicado. A aplicação de um procedimento padrão de MDS não garante que,
se o método de agrupamento utilizado anteriormente for aplicado novamente da representação de
MDS , um estrutura similar de agrupamento será obtida em relação àquela obtida para o conjunto
de dados original.
Tentativas de integrar MDS e agrupamentos em uma única técnica já existem na literatura
(Escalonamento de Diferenças de Agrupamento (CDS)
8é uma dessas técnicas [27]). Ao contrário
desses métodos, em que os agrupamentos são deteminados durante o processo, nossa abordagem
requer informações sobre o agrupamento do conjunto desde o ínicio. Mais especificamente,
assumi-mos que, além das informações de dissimilaridades entre pares de objetos, dados sobre a pertinência
dos elementos em agrupamentos é dada como parte da entrada, especificando a qual agrupamento
cada objeto pertence. Nosso objetivo é, dado um agrupamento inicial do conjunto de dados, obter
uma representação dos dados em baixa dimensão que preserve as dissimilaridades, sendo ainda
possível recuperar a estrutura do agrupamento inicial [3, 2].
No Capítulo 5, enunciamos futuros trabalhos e finalizamos com as principais conclusões.
7do inglês, Multidimensional Scaling. 8do inglês, Cluster Differences Scaling.
Capítulo 2
Counting the number of solutions of
K
DMDGP instances
Leo Liberti
1, Carlile Lavor
2, Jorge Alencar
3and Germano Abud
41 École Polytechnique, CNRS LIX, Paris, [email protected].
2 Universidade Estadual de Campinas, IMECC-Unicamp, Campinas, São Paulo, [email protected]. 3 Instituto Federal de Educação, Ciência e Tecnologia do Sul de Minas Gerais, IFSULDEMINAS, Inconfidentes,
Minas Gerais, Brazil. [email protected].
4 Universidade Federal de Uberlândia, FAMAT-UFU, Uberlândia, Minas Gerais, Brazil. [email protected].
Abstract
We discuss a method for finding the number of realizations in R
𝐾of certain simple undirected
weighted graphs.
2.1
Introduction
In this paper we deal with Euclidean realizations of weighted graphs such that the Euclidean distance between pairs of realization points are the same as the weight on the corresponding edge. The vertex sets of our graphs are assumed to be ordered in certain ways formally described below. An order is given as a rank function 𝜌 mapping the vertex (having cardinality 𝑛) set into and onto the set ¯𝑛 = {1, . . . , 𝑛}. In general, for clarity of notation, we may identify a vertex with its rank, e.g. for any two vertices 𝑢, 𝑣 and an integer 𝐾, we write 𝑢 < 𝑣 or 𝑣 > 𝐾 to mean 𝜌(𝑢) < 𝜌(𝑣) and 𝜌(𝑣) > 𝐾.
The
𝐾-Discretizable Molecular Distance Geometry Problem
(KDMDGP) is as follows. Givena positive integer 𝐾, a simple undirected weighted graph 𝐺 = (𝑉, 𝐸, 𝑑) where 𝑑 : 𝐸 → R+, an order < on 𝑉 such
that {𝑢, 𝑣} ∈ 𝐸 for each 𝑣 > 𝐾 and 𝑣 − 𝐾 ≤ 𝑢 ≤ 𝑣 − 1, and a partial realization ¯𝑥 : {1, . . . , 𝐾} → R𝐾, does there
exist a realization 𝑥 : 𝑉 → R𝐾 such that:
∀{𝑢, 𝑣} ∈ 𝐸 ‖𝑥𝑢− 𝑥𝑣‖= 𝑑𝑢𝑣 (2.1.1)
and such that 𝑥𝑣= ¯𝑥𝑣 for each 𝑣 ∈ {1, . . . , 𝐾}? We remark that solving Eq. (2.1.1) for any given weighted graph
and integer 𝐾 is known as the Distance Geometry Problem (DGP). Both the DGP [50] and theKDMDGP [41]
are NP-hard, even for fixed 𝐾.
The motivation for the name — molecular — stems from the natural application to finding molecular confor-mation (so 𝐾 = 3). The vertices of the input graph 𝐺 are atoms, and the edges are pairs of atoms for which the distance is known. We focus on the important case of proteins: since all proteins consist of a backbone with some side chains, we consider the backbone as a natural vertex order. Since covalent bond lengths are known, and the angles between covalent bonds is also known [51], distances corresponding to pairs of atoms {𝑣 − 1, 𝑣} and {𝑣 −2, 𝑣} are known. Nuclear Magnetic Resonance (NMR) experiments provide an estimation of distances shorter than around 6Å, which covers the case of pairs {𝑣 − 3, 𝑣} (as well as other pairs — the backbone folds in space, and it often happens that two atoms that are far apart in the order are actually close in Euclidean space) [51]. Moreover, for elementary geometrical reasons, it is always possible to fix positions of the first, second and third atom in the protein backbone so that the inter-atomic distances over {1, 2, 3} are satisfied. Thus, protein backbones provide natural examples of KDMDGP instances [32]. In the following, we shall partition the edge set 𝐸 into the
discretization edges 𝐸𝐷= {{𝑣, 𝑣 − 𝑗} | 𝑗 ∈ {1, . . . , 𝐾}} and the pruning edges 𝐸𝑃 = 𝐸 r 𝐸𝐷. We let 𝑚 = |𝐸|.
Finding realizations for general graphs usually involves a continuous search [37], but if the graph is rigid [26] then a discrete search type is possible [35]. It was observed in [31] thatKDMDGP graphs are Henneberg graphs,
which are known to be rigid [54]. In [40] we proposed a discrete search algorithm called Branch-and-Prune (BP), where the discretization edges are used to make sure that only a discrete set of points needs to be checked for feasibility w.r.t. Eq. (2.1.1), and the pruning edges are used to reduce the search space.
Since every vertex 𝑣 > 𝐾 is adjacent to (at least) its 𝐾 immediate predecessors, if we know the position 𝑥𝑢
of each of these predecessors 𝑢 of 𝑣, then 𝑥𝑣 is at the intersection of 𝐾 spheres in R𝐾 [17]. Provided the strict
simplex inequalities (a generalization of the triangle inequalities to R𝐾 [31]) hold, this intersection is either empty
or consists of exactly two points. This provides an inductive step to find the next vertex in the order, having placed all its predecessors. The base case is dealt with since we are given the partial realization ¯𝑥. Since at each step there may be two feasible positions 𝑥𝑣 for the next vertex 𝑣, in the worst case the BP yields an exponentially large search
tree, where each node 𝑥𝑣 at level 𝑣 is a possible position for vertex 𝑣. Since the first branch occurs at level 𝐾 + 1,
this worst-case tree has 2𝑛−𝐾 leaf nodes. Each leaf node 𝑥
𝑛 corresponds to a unique path from the root 𝑥1 to 𝑥𝑛,
which therefore encodes a valid realization 𝑥 = (𝑥1, . . . , 𝑥𝑛). We let 𝑋 be the set of all these realizations.
In this paper, we propose an efficient method for computing the cardinality of 𝑋.
2.2
Motivation
Knowing |𝑋| is important for at least two practical reasons. First, in the application of DGP to proteomics, the set 𝑋 is of interest to biochemists, who will evaluate each potential backbone according to chemical criteria. If |𝑋|is too large, this evaluation might be too costly; on the other hand, if |𝑋| is too small, 𝑋 might not contain the “correct” backbone. This observation might sound strange to mathematicians, but one must not forget that the
KDMDGP provides a model of reality, rather than being reality itself: none of the realizations in 𝑋 might be really
correct from the point of view of the biochemical practitioner, but some may be close enough for him or her to recognize them. From a different point of view, the experimental data set usually contains errors [10] which might influence the number of realizations in 𝑋: a small |𝑋| might be evidence of wrong data.
Secondly, the class of globally rigid (also known as “uniquely realizable” [28]) graphs, i.e. those for which it can be shown that |𝑋|= 1, is interesting because in several DGP applications, such as e.g. to wireless sensor localization, ensuring that sufficient distance data are known for the the graph having a unique realization is of paramount importance: recovering a large set of different possible networks obviously prevents practitioners from understanding the actual network geometry. Necessary (combinatorial) conditions for a graph to be globally rigid are given in [28]: informally speaking, if removing a certain edge from a rigid graph still yields a rigid graph, the edge is redundant; in a redundantly rigid graph, all edges are redundant. Redundant rigidity turns out to be a necessary condition for unique realizability. Although there are exact methods for verifying whether a graph is redundantly rigid for 𝐾 ∈ {1, 2}, no such method is known for higher dimensions. A randomized 𝑂(𝑛2𝑚) method
is given in [28].
Although no necessary and sufficient conditions for unique realizability is known so far, several different sufficient conditions are known. Cliques are obviously globally rigid, and the realization can be found in polynomial time [21]. Trilateration graphs are those for which there exists a vertex order where each 𝑣 > 𝐾 has at least 𝐾 + 1 adjacent predecessors: these can be shown to have a unique realization, which can be found in polynomial time [23]. The graphs occurring inKDMDGP are a natural generalization of trilateration graphs, insofar as they require
at least 𝐾 adjacent predecessors. As shown in [42], in general such graphs are not globally rigid, but the number of realizations can be counted in time 𝑂(𝑛 + 𝑚), as shown in Sect. 2.4 below; so those KDMDGP graphs that are
globally rigid can be recognized in polynomial time (under some genericity assumptions, see Sect. 2.3.2). Uniquely
localizablegraphs possess a unique realization in a given 𝐾, and no other realization for any higher value of 𝐾. It is
shown in [43] that these graphs can be realized in polynomial time (up to some approximation constant) by solving a semidefinite programming problem.
2.3
Background material
Although our method for computing |𝑋| is straightforward, is rests on many known but nontrivial results, which we summarize here.
2.3.1
Incongruence
Two sets of points in R𝐾 are congruent if there is a sequence of translations, rotations and reflections that turns
one into the other. Since any realizable graph has uncountably many congruent realizations, we are only interested in the number of incongruent ones. Unfortunately, the way we defined 𝑋 above (i.e. 𝑋 is the set of solutions found by the BP algorithm on KDMDGP instances) is only partially correct in this respect. Because the realizations of
Henneberg graphs are rigid frameworks, each realization in 𝑋 is rigid; so the fact that the first 𝐾 vertices are fixed in given positions ¯𝑥1, . . . ,¯𝑥𝐾 eliminates rotations and translations. By Thm. [32, Thm. 2], with 𝐾 = 3 there is a
“fourth-level symmetry” in 𝑋: half of the realizations in 𝑋 are reflections of the other half along the plane through ¯𝑥1,¯𝑥2,¯𝑥3. This was generalized in [42] for any 𝐾.
So that the definition of 𝑋 is consistent with 𝑋 being a set of incongruent realizations, we simply modify the BP algorithm to choose any of the two possible positions for 𝑥𝐾+1 (without exploring the other), and start branching
from level 𝐾 + 2.
2.3.2
Probability 1
The theory supporting the BP algorithm is always based on the edge weight function 𝑑 satisfying the strict simplex inequalities (i.e. the Cayley-Menger determinant of each 𝐾-subsequence of vertices in the given order multiplied by (−1)𝐾+1is strictly positive). Otherwise, the intersection of 𝐾 spheres in R𝐾 might have uncountable
cardinality, or be a singleton set. These occurrences only happen when theKDMDGP instance is YES, and the values
assigned to 𝑑 yield zero Cayley-Menger determinants [31], i.e. they satisfy a certain given system of polynomial equations. Such systems define manifolds of Lebesgue measure zero in R𝐾. Moreover, it is easy to prove that all
points in all realizations in 𝑋 are in a ball centered at ¯𝑥1with radius bounded by the sum of all edge distances. So,
the probability of uniformly sampling 𝑑 satisfying these equations is zero. This in turn means that the probability of uniformly sampling 𝑑 such that it yields a YESKDMDGP instance satisfying the strict simplex inequalities is 1.
Accordingly, we state most of our results “with probability 1”.
There are at least three related concepts in the literature. The first, genericity (in the standard sense), requires that there should be no rational polynomial satisfying the instance data 𝑑. This condition is “too strong”, in the sense that it would require at least one value of 𝑑 to be transcendental, which makes little sense for computers. The second concept requires that all minors of the complete rigidity matrix are nontrivial [25]. The third requires that 𝑑 is a rational function contained in the (open) complement of the set of those rational functions 𝑑′ yielding
zero Cayley-Menger determinants [46, 48]. The notion we employ is very similar to both the second and the third concept.
2.3.3
Partial reflections
For any realization 𝑥 ∈ 𝑋 and 𝑣 ∈ 𝑉 with 𝑣 > 𝐾, we let 𝑅𝑣
𝑥 be the reflection along the hyperplane through
𝑥𝑣−𝐾, . . . , 𝑥𝑣−1, as shown in Fig. 2.1. Now, for any 𝑣 > 𝐾, we define a partial reflection operator with respect to 𝑥
𝑥𝑣−3
𝑥𝑣−2
𝑥𝑣−1
Figura 2.1: The action of the reflection 𝑅
𝑣𝑥
in R
𝐾.
as:
𝑔𝑣(𝑥) = (𝑥1, . . . , 𝑥𝑣−1, 𝑅𝑣𝑥(𝑥𝑣), 𝑅𝑣𝑥(𝑥𝑣+1), . . . , 𝑅𝑥𝑣(𝑥𝑛)). (2.3.1)
The partial reflection 𝑔𝑣 acts on a realization 𝑥 by reflecting all vectors from rank 𝑣 onwards. We define a product
between partial reflections by setting 𝑔𝑢𝑔𝑣= 𝑔𝑢∘ 𝑔𝑣for all 𝑢, 𝑣 > 𝐾, i.e. 𝑔𝑢𝑔𝑣is the operation consisting in applying
𝑔𝑣 first, and then 𝑔𝑢 later to a realization 𝑥 ∈ 𝑋. More precisely, for 𝑣 > 𝑢 > 𝐾 and 𝑥 ∈ 𝑋,
𝑔𝑢𝑔𝑣(𝑥) = 𝑔𝑢(𝑔𝑣(𝑥))
= 𝑔𝑢(𝑥1, . . . , 𝑥𝑣−1, 𝑅𝑥𝑣(𝑥𝑣), . . . , 𝑅𝑣𝑥(𝑥𝑛))
= (𝑥1, . . . , 𝑥𝑢−1, 𝑅𝑢𝑥(𝑥𝑢), . . . , 𝑅𝑥𝑢(𝑥𝑣−1), 𝑅𝑢𝑔𝑣(𝑥)(𝑥𝑣), . . . , 𝑅 𝑢
𝑔𝑣(𝑥)(𝑥𝑛))
(the case for 𝑢 < 𝑣 is similar). Notice that the action of the left operand 𝑔𝑢after rank 𝑣 does not apply 𝑅𝑢𝑥 to the
components of the argument, but 𝑅𝑢
𝑔𝑣(𝑥). By [41, Lemma 2], this product is commutative.
Now let Γ𝐷 = {𝑔𝑣 | 𝑣 > 𝐾}, and consider the set 𝒢𝐷 = ⟨Γ𝐷⟩ generated by all possible products of elements
in Γ𝐷. By [41], 𝒢𝐷 turns out to be the invariant group of the set of realizations 𝑋𝐷 consisting of all the possible
realizations found by the BP algorithm on the graph 𝐺𝐷 = (𝑉, 𝐸𝐷) induced by the discretization edges (see
Fig. 2.2). Our purpose is to find the invariant group 𝒢𝑃 of the set of realizations of the given graph 𝐺, which we
assume to have a nontrivial set of 𝐸𝑃 of pruning edges. Let the span of a pruning edge {𝑢, 𝑤} ∈ 𝐸𝑃 be the set
𝑆𝑢𝑤 = {𝑢 + 𝐾 + 1, . . . , 𝑤} (assuming 𝑢 < 𝑤; if 𝑤 > 𝑢 we let 𝑆𝑢𝑤 = 𝑆𝑤𝑢). By [41], 𝒢
𝑃 is the subgroup of 𝒢𝐷
generated by
Γ𝑃 = {𝑔𝑣| 𝑣 > 𝐾 ∧ ∀{𝑢, 𝑤} ∈ 𝐸𝑃 (𝑣 ̸∈ 𝑆𝑢𝑤)}. (2.3.2)
In other words, only those vertices that are not in the span of any pruning edge give rise to partial reflection operators that generate the discretization group 𝒢𝑃.
1
2
3
4
5
1
2
3
4
5
Figura 2.2: On the left: the set 𝑋
𝐷of realizations of the graph induced by the discretization edges.
On the right: the effect of the pruning edge {1, 4} on 𝑋
𝐷.
2.4
Counting incongruent realizations
By [41, Thm. 4], there is an integer ℓ such that |𝑋|= 2ℓ with probability 1. We can easily refine the proof of
this result so that it says something more precise on ℓ.
Proposition 2.4.1. With probability 1, |𝑋|= 2|Γ𝑃|.
Proof. The following statements hold with probability 1. By [41], 𝒢𝐷∼= 𝐶2𝑛−𝐾(where 𝐶2is the cyclic group of order
2), so that |𝒢𝐷|= 2𝑛−𝐾. Since 𝒢𝑃 ≤ 𝒢𝐷, |𝒢𝑃| divides the order of |𝒢𝐷|. By elementary group theory, |𝒢𝑃|= 2|Γ𝑃|.
By Thm. [42, Thm. 6.4], the action of 𝒢𝑃 on 𝑋 only has one orbit, i.e. 𝒢𝑃𝑥= 𝑋 for any 𝑥 ∈ 𝑋. We remark that
every partial reflection operator is idempotent, i.e. 𝑔2
𝑣= 1, and hence 𝑔𝑣−1= 𝑔𝑣 for all 𝑣 > 𝐾. Thus, if 𝑔𝑥 = 𝑔′𝑥for
two 𝑔, 𝑔′ ∈ 𝒢
𝑃 and 𝑥 ∈ 𝑋, then (𝑔′)−1𝑔𝑥= 𝑥, which implies 𝑔′𝑔𝑥= 𝑥, which implies 𝑔′𝑔 = 1 whence 𝑔′= 𝑔. This
means that |𝒢𝑃𝑥|= |𝒢𝑃|. Thus, for any 𝑥 ∈ 𝑋, |𝑋|= |𝒢𝑃𝑥|= |𝒢𝑃|= 2|Γ𝑃|.
Now, all that remains to do is to present an algorithm to compute |Γ𝑃|. This follows directly from the definition
in Eq. (2.3.2). We let 𝑏 = (𝑏𝐾+1, . . . , 𝑏𝑛) be an array initialized so that 𝑏𝑖= 1 for all 𝑖 in {𝐾 + 1, . . . , 𝑛}. Then we
scan every edge {𝑢, 𝑣} in 𝐸𝑃, and for each 𝑖 in 𝑆𝑢𝑣 we set 𝑏𝑖 = 0. Finally, |Γ𝑃|= 𝑛
∑︀
𝑖=𝐾+1
𝑏𝑖. This algorithm runs in
𝑂(𝑛 + 𝑚). We remark that, by Sect. 2.3.1, if 𝑋 is required to only contain incongruent realizations, then the first
component of 𝑏 should be 𝑏𝐾+2 rather than 𝑏𝐾+1.
Obviously, if |Γ𝑃|= 1, then theKDMDGP graph is globally rigid (with probability 1).
Acknowledgments
Financial support is gratefully acknowledged from French National Research Agency project ANR-10-BINF-03-08 “Bip:Bip”, and the Brazilian research agencies FAPESP, CNPq and CAPES.
Capítulo 3
An algorithm for realizing Euclidean
distance matrices
Jorge Alencar
1, Tibérius Bonates
2, Carlile Lavor
3and Leo Liberti
41 Instituto Federal de Educação, Ciência e Tecnologia do Sul de Minas Gerais, IFSULDEMINAS, Inconfidentes,
Minas Gerais, Brazil. [email protected].
2 Universidade Federal do Ceará, DEMA-UFC, Fortaleza, Ceará, Brazil. [email protected].
3 Universidade Estadual de Campinas, IMECC-Unicamp, Campinas, São Paulo, [email protected]. 4 École Polytechnique, CNRS LIX, Paris, [email protected].
Abstract
We present an efficient algorithm to find a realization of a (full) 𝑛 × 𝑛 Euclidean distance
ma-trix in the smallest possible dimension. Most existing algorithms work in a given dimension:
most of these can be transformed to an algorithm to find the minimum dimension, but gain a
logarithmic factor of 𝑛 in their worst-case running time. Our algorithm performs linearly in
𝑛 (and quadratically in another parameter which is fixed for most applications).
3.1
Introduction
The problem of adjustment of distances among points has been studied since the first decades of the 20th century [36]. It can be formally defined as follows: Let 𝐷 be a 𝑛 × 𝑛 symmetric hollow (i.e., with zero diagonal) matrix with non-negative elements. We say that 𝐷 is a squared Euclidean Distance Matrix (EDM) if there are
𝑥1, 𝑥2, . . . , 𝑥𝑛∈ R𝐾, for a positive integer 𝐾, such that
𝐷(𝑖, 𝑗) = 𝐷𝑖𝑗 = ‖𝑥𝑖− 𝑥𝑗‖2, 𝑖, 𝑗 ∈ {1, . . . , 𝑛},
where ‖·‖ denotes the Euclidean norm. The smallest 𝐾 for which such a set of points exists is called the embedding
dimension of 𝐷, denoted by dim(𝐷). If 𝐷 is not an EDM, we define dim(𝐷) = ∞.
We are concerned with the problem of determining dim(𝐷) for a given symmetric hollow matrix 𝐷. If dim(𝐷) =
𝐾 < ∞, we also want to determine a sequence 𝑥 = (𝑥1, . . . , 𝑥𝑛) of 𝑛 points in R𝐾 such that 𝐷 is the EDM of 𝑥.
We emphasize that 𝐷 is a full matrix.
In the literature we prevalently find efficient methods for solving a related problem, i.e. whenever 𝐾 is given as part of the input (see e.g. [21]). Each of these algorithms can be used within a bisection search to determine the embedding dimension, incurring a multiplicative factor of 𝒪(log(𝑛)) to their running time. These algorithms also require the embedding of a clique in R𝐾, this procedure incurring a multiplicative factor of 𝒪(𝐾3) to their running
time. Therefore, in the worst case, using these algorithms within a bisection search accomplishes the required task in 𝒪(𝑛3log(𝑛)). We propose an algorithm which accomplishes the required task in 𝒪(𝑛3). If the embedding
dimension is known, this reduces to linear time in 𝑛.
Our algorithm, detailed below, is based on the problem of determining the intersection of 𝐾 spheres in R𝐾,
where 𝐾 varies during the algorithm. The problem of determining the intersections of spheres is well known, as are its applications, which include navigation problems, molecular conformation, network location, robotics, as well as many other problems of distance geometry (see, e.g., [36]). We also numerically compare the algorithm with an existing technique available in the literature, and show we also do better in terms of realization quality.
3.2
Some results about EDM
It is well known, [7], that a symmetric hollow matrix 𝐷 with nonnegative entries is a EDM if and only if 𝐷 is negative semidefinite on
𝑀 = {𝑥 ∈ R𝑚: 𝑥𝑡𝑒= 0},
the orthogonal complement of 𝑒, the 𝑛-dimensional vector of all ones. Let 𝐽 = 𝐼𝑚− 𝑛1𝑒𝑒𝑡 be the orthogonal
projection matrix onto subspace 𝑀, then we can enunciate such result as follows,
Theorem 3.2.1. Let 𝐷 be a symmetric hollow matrix with nonnegative entries. Then 𝐷 is a EDM if and only if 𝜏(𝐷) = −12𝐽 𝐷𝐽 is semidefinite positive. And, if 𝐷 is a EDM, then its embedding dimension is the rank of 𝜏(𝐷).
Actually, we can easily see that, if 𝐷 is a EDM, then 𝜏(𝐷) is the Gram matrix associated to the points (vec-tors) which realize 𝐷, i.e., the matrix of inner products of such points. Based on this result, Datorro deve-loped a routine to verify whether or not a matrix 𝐷 is a EDM and, if so, to determine an embedding in the least possible dimension. The routine, called isedm, was written in Matlab and can be downloaded for free at http://www.convexoptimization.com/wikimization. Due to necessity of a spectral decomposition step, we can estimate the complexity order of this algorithm to be 𝒪(𝑛3).
Our algorithm does not use such classical result to work. In the rest of this section we present the theoretical basis of our algorithm. Let [𝑛] = {1, . . . , 𝑛} and [𝑛1, 𝑛2] = {𝑛1, 𝑛1+1, · · · , 𝑛2−1, 𝑛2}. Furthermore, if 𝑈, 𝑉 ⊆ [𝑛] such
that 𝑉 = {𝑣1, . . . , 𝑣𝑛1}, 𝑈 = {𝑢1, . . . , 𝑢𝑛2}and 𝐷 is a 𝑛×𝑛 matrix, then 𝐷(𝑉, 𝑈) = (𝑑𝑖𝑗) is the submatrix of 𝐷 such that 𝑑𝑖𝑗 = 𝐷(𝑣𝑖, 𝑢𝑗) with 𝑖 ∈ [𝑛1] e 𝑗 ∈ [𝑛2]. Given a positive integer 𝑛, we define {𝑥𝑖}𝑛𝑖=1= {𝑥1, 𝑥2, · · · , 𝑥𝑛−1, 𝑥𝑛}.
The following is a well-known result of the literature and provides an upper bound on the embedding dimension of a given EDM in terms of its order. For the sake of completeness we will prove this result using a different approach.
Proposition 3.2.2. Let 𝐷 be a 𝑛 × 𝑛 EDM. Then dim(𝐷) ≤ 𝑛 − 1.
Proof. If 𝐷 is a 𝑛×𝑛 EDM, then exists {𝑥𝑖}𝑛𝑖=1⊆ R𝑚for any 𝑚 ≥ dim(𝐷) which realizes 𝐷. Let 𝑘 be the dimension
of the linear subspace generated by the vectors {𝑥𝑖− 𝑥1}𝑛𝑖=2⊆ R𝑚. Since this space is a 𝑘-dimensional subspace of
R𝑚, then it is isomorphic to R𝑘 by a linear isometry 𝑄. Let 𝑦1= 0 and {𝑦𝑖= 𝑦1+ 𝑄(𝑥𝑖− 𝑥1)}𝑛𝑖=1. Thus:
‖𝑦𝑖− 𝑦𝑗‖= ‖𝑦1+ 𝑄(𝑥𝑖− 𝑥1) − 𝑦1− 𝑄(𝑥𝑗− 𝑥1)‖= ‖𝑄(𝑥𝑖− 𝑥𝑗)‖= ‖𝑥𝑖− 𝑥𝑗‖
for all 𝑖, 𝑗 ∈ [𝑛]. From this, we have that {𝑦𝑖}𝑛𝑖=1⊆ R
𝑘 also realizes 𝐷. Therefore dim(𝐷) ≤ 𝑘 ≤ 𝑛 − 1.
The next couple of results will help us in our main result and in establishing some interesting properties about the points that realize a given EDM.
Lemma 3.2.3. Let 𝐷 be a 𝑛 × 𝑛 EDM and {𝑥𝑖}𝑛𝑖=1, {𝑦𝑖}𝑛𝑖=1 ⊆ R𝑚, for any 𝑚 ≥ dim(𝐷), sets of points which
realize 𝐷. For 𝑖, 𝑗, 𝑘 ∈[𝑛], we have:
(𝑥𝑖− 𝑥𝑗)𝑡(𝑥𝑧− 𝑥𝑗) = (𝑦𝑖− 𝑦𝑗)𝑡(𝑦𝑧− 𝑦𝑗).
Proof. Without loss of generality, let us assume that 𝑖 < 𝑗 < 𝑘. Let 𝐷′ be the EDM realized by the subset of
points {𝑥𝑖, 𝑥𝑗, 𝑥𝑘} and {𝑦𝑖, 𝑦𝑗, 𝑦𝑘}. From Proposition 3.2.2 there are {¯𝑥𝑖,¯𝑥𝑗,¯𝑥𝑘}, {¯𝑦𝑖,¯𝑦𝑗,¯𝑦𝑘} ⊆ R2 which realize
𝐷′. We notice that, by the isometry used in Proposition 3.2.2, (𝑥𝑖 − 𝑥𝑗)𝑡(𝑥𝑧− 𝑥𝑗) = (¯𝑥𝑖−¯𝑥𝑗)𝑡(¯𝑥𝑧 −¯𝑥𝑗) and
(𝑦𝑖− 𝑦𝑗)𝑡(𝑦𝑧− 𝑦𝑗) = (¯𝑦𝑖−¯𝑦𝑗)𝑡(¯𝑦𝑧−¯𝑦𝑗). Since
‖¯𝑥𝑖−¯𝑥𝑗‖= ‖¯𝑦𝑖−¯𝑦𝑗‖
‖¯𝑥𝑖−¯𝑥𝑘‖= ‖¯𝑦𝑖−¯𝑦𝑘‖
‖¯𝑥𝑗−¯𝑥𝑘‖= ‖¯𝑦𝑗−¯𝑦𝑘‖
we have that the triangles obtained are similar. Therefore,
(¯𝑥𝑖−¯𝑥𝑗)𝑡(¯𝑥𝑧−¯𝑥𝑗) = (¯𝑦𝑖−¯𝑦𝑗)𝑡(¯𝑦𝑧−¯𝑦𝑗).
Thus,
(𝑥𝑖− 𝑥𝑗)𝑡(𝑥𝑧− 𝑥𝑗) = (¯𝑥𝑖−¯𝑥𝑗)𝑡(¯𝑥𝑧−¯𝑥𝑗)
= (¯𝑦𝑖−¯𝑦𝑗)𝑡(¯𝑦𝑧−¯𝑦𝑗) = (𝑦𝑖− 𝑦𝑗)𝑡(𝑦𝑧− 𝑦𝑗).
We say that two subsets of points {𝑥𝑖}𝑛𝑖=1, {𝑦𝑖}𝑛𝑖=1 ⊆ R𝑚 for any 𝑚 ≥ dim(𝐷) are orthogonally similar if
there is an orthogonal operator 𝑄 on R𝑚, such that 𝑄(𝑥
𝑖− 𝑥𝑗) = 𝑦𝑖− 𝑦𝑗, for 𝑖, 𝑗 ∈ [𝑛].
Proposition 3.2.4. Let 𝐷 be a 𝑛 × 𝑛 EDM and {𝑥𝑖}𝑛𝑖=1, {𝑦𝑖}𝑛𝑖=1 ⊆ R
𝑚, for any 𝑚 ≥ dim(𝐷), be subsets of points
which realize 𝐷. Then {𝑥𝑖}𝑛𝑖=1 is orthogonally similar to {𝑦𝑖}𝑛𝑖=1.
Proof. We define the sets of vectors {𝑣𝑖1= 𝑥𝑖− 𝑥1}𝑛𝑖=2, {𝑢𝑖1 = 𝑦𝑖− 𝑦1}𝑛𝑖=2⊆ R𝑚. Let 𝑇 : [𝑣𝑖1]𝑛𝑖=2−→[𝑢𝑖1]𝑛𝑖=2 be a
linear transformation such that 𝑇 (𝑣𝑖1) = 𝑢𝑖1 with 𝑖 ∈ [2, 𝑛]. If 𝑣 = ∑︀𝑛𝑖=2𝑎𝑖𝑣𝑖1, then
𝑇(𝑣)𝑡𝑇(𝑣) = 𝑛 ∑︁ 𝑖=2 𝑛 ∑︁ 𝑗=2 𝑎𝑖𝑎𝑗𝑢𝑡𝑖1𝑢𝑗1. By Lemma 3.2.3, we have 𝑢𝑡 𝑖1𝑢𝑗1= 𝑣𝑡𝑖1𝑣𝑗1. Thus 𝑇(𝑣)𝑡𝑇(𝑣) = 𝑛 ∑︁ 𝑖=2 𝑛 ∑︁ 𝑗=2 𝑎𝑖𝑎𝑗𝑢𝑡𝑖1𝑢𝑗1= 𝑛 ∑︁ 𝑖=2 𝑛 ∑︁ 𝑗=2 𝑎𝑖𝑎𝑗𝑣𝑖1𝑡𝑣𝑗1= 𝑣𝑡𝑣.
Therefore, 𝑇 is a linear isometry, i.e., an isomorphism. This implies that there is a linear isometry ¯𝑇 :
([𝑣𝑖1]𝑛𝑖=2)⊥ −→([𝑢𝑖1]𝑛𝑖=2)⊥, so we can define 𝑄 : R𝑚−→ R𝑚such that, if 𝑣 = 𝑣1+ 𝑣2 ∈ R𝑚 where 𝑣1∈[𝑣𝑖1]𝑛𝑖=2 e
𝑣2∈([𝑣𝑖1]𝑛𝑖=2)⊥, 𝑄(𝑣) = 𝑇 (𝑣1) + ¯𝑇(𝑣2), and we have that 𝑄 is linear and
𝑄(𝑣)𝑡𝑄(𝑣) = 𝑇 (𝑣1)𝑡𝑇(𝑣1) + ¯𝑇(𝑣2)𝑡𝑇¯(𝑣2) = 𝑣𝑡1𝑣1+ 𝑣𝑡2𝑣2= 𝑣𝑡𝑣,
implying that 𝑄 is a orthogonal operator.
Corollary 3.2.5. Let 𝐷 be a 𝑛 × 𝑛 EDM and {𝑥𝑖}𝑛𝑖=1⊆ R𝑚for any 𝑚 ≥ dim(𝐷) a subset of points which realizes
𝐷. Then the dimension of[𝑥𝑖− 𝑥1]𝑛𝑖=2 is equal to dim(𝐷).
In the proof of the Proposition 3.2.4 we verified that if {𝑥𝑖}𝑛𝑖=1 and {𝑦𝑖}𝑛𝑖=1 ⊆ R𝑚, for any 𝑚 ≥ dim(𝐷), are
subset of points which realizes 𝐷, then the linear subspaces [𝑥𝑖− 𝑥1]𝑛𝑖=2 and [𝑦𝑖− 𝑦1]𝑛𝑖=2 have the same dimension.
This means that any two subsets of points which realizes 𝐷 generate linear subspaces with the same dimension, in particular the subset of points in the embedding dimension.
Given a EDM of order 𝑛, the following lemma establishes that the embedding dimension of the given EDM is greater than the embedding dimension of any of its (𝑛 − 1)-th principal submatrices by at most one.
Lemma 3.2.6. Let 𝐷 be a(𝑛 + 1) × (𝑛 + 1) EDM. If dim(𝐷([𝑛], [𝑛])) = 𝐾, then dim(𝐷) ∈ {𝐾, 𝐾 + 1}.
Proof. Let {𝑥𝑖}𝑛+1𝑖=1 be a subset of points in R𝐾+1which realizes 𝐷, defining the subset of vectors {𝑣𝑖1= 𝑥𝑖−𝑥1}𝑛+1𝑖=2,
we have [𝑣𝑖1]𝑛𝑖=2 is a linear 𝐾-dimensional subspace, since {𝑥𝑖}𝑛𝑖=1 realizes 𝐷([𝑛], [𝑛]) and dim(𝐷([𝑛], [𝑛])) = 𝐾.
Therefore, we have
[𝑣𝑖1]𝑛𝑖=2⊆[𝑣𝑖1]𝑛+1𝑖=2 = [𝑣𝑖1]𝑛𝑖=2+ [𝑣𝑖(𝑛+1)]
⇒dim([𝑣𝑖1]𝑛𝑖=2) ≤ dim([𝑣𝑖1]𝑛+1𝑖=2) ≤ dim([𝑣𝑖1]𝑛𝑖=2) + dim([𝑣𝑖(𝑛+1])
⇒dim(𝐷([𝑛], [𝑛])) ≤ dim(𝐷) ≤ dim(𝐷([𝑛], [𝑛])) + 1 ⇒𝐾 ≤ dim(𝐷) ≤ 𝐾 + 1.
The next lemma ensures that, given a set 𝑆 ⊂ R𝑚 of 𝑛 points that realizes the 𝑛-th principal submatrix of
a EDM of order 𝑛 + 1 and embedding dimension at most 𝑚, 𝑆 can be augmented into a realizing set for the full matrix without any change on the space dimension.
Lemma 3.2.7. Let 𝐷 be a(𝑛 + 1) × (𝑛 + 1) EDM, and let dim(𝐷) ≤ 𝑚. Additionally, let {𝑥𝑖}𝑛𝑖=1 ⊆ R𝑚be a set of
points which realizes 𝐷([𝑛], [𝑛]), the 𝑛-th principal submatrix of 𝐷. Then there exists 𝑥𝑛+1∈ R𝑚 such that {𝑥𝑖}𝑛+1𝑖=1
realizes 𝐷.
Proof. Let {𝑦𝑖}𝑛+1𝑖=1 ⊆ R
𝑚be a subset of points which realizes 𝐷 and let {𝑥
𝑖}𝑛𝑖=1be a subset of points which realizes
𝐷([𝑛], [𝑛]). By Proposition 3.2.4, we have that {𝑦𝑖}𝑛𝑖=1 and {𝑥𝑖}𝑛𝑖=1 are orthogonally similar, i.e., there is a linear
operator 𝑄 on R𝑚such that 𝑄(𝑦
𝑖− 𝑦𝑗) = 𝑥𝑖− 𝑥𝑗, for 𝑖, 𝑗 ∈ [𝑛]. If 𝑥𝑛+1= 𝑥1+ 𝑄(𝑦𝑛+1− 𝑦1), then ‖𝑥𝑛+1− 𝑥𝑖‖= ‖𝑥1− 𝑥𝑖+ 𝑄(𝑦𝑛+1− 𝑦1)‖ = ‖𝑄(𝑦1− 𝑦𝑖) + 𝑄(𝑦𝑛+1− 𝑦1)‖ = ‖𝑄(𝑦1− 𝑦𝑖+ 𝑦𝑛+1− 𝑦1)‖ = ‖𝑄(𝑦𝑛+1− 𝑦𝑖)‖ = ‖𝑦𝑛+1− 𝑦𝑖‖,
with 𝑖 ∈ [𝑛]. Therefore, {𝑥𝑖}𝑛+1𝑖=1 realizes 𝐷.
The following theorem establishes necessary and sufficient conditions for a 𝑛 × 𝑛 symmetric hollow matrix with nonnegative elements to be a EDM. If this matrix is a EDM with dim(𝐷) = 𝐾, then there exists a set of points which realizes the given matrix such that 𝐾 + 1 of them form a triangular structure in some sense, as explained below.
Theorem 3.2.8. Let 𝐾 be a positive integer and 𝐷 be a 𝑛 × 𝑛 symmetric hollow matrix with nonnegative elements, with 𝑛 ≥2. 𝐷 is a EDM with dim(𝐷) = 𝐾 if and only if there exist {𝑥𝑖}𝑛𝑖=1 ⊆ R𝐾 and an index set 𝐼= {𝑖𝑗}𝐾+1𝑗=1 ⊆
[𝑛] such that ⎧ ⎨ ⎩ 𝑥𝑖1 = 0 𝑥𝑖𝑗(𝑗 − 1) ̸= 0, 𝑗 ∈ [2, 𝐾 + 1] 𝑥𝑖𝑗(𝑖) = 0, 𝑗 ∈ [2, 𝐾 + 1], 𝑖 ∈ [𝑗, 𝐾], where {𝑥𝑖}𝑛𝑖=1 realizes 𝐷.
13
Proof. Let 𝐾 be a positive integer and 𝐷 be a 𝑛 × 𝑛 EDM such that dim(𝐷) = 𝐾, we want to prove that there is
{𝑥𝑖}𝑛𝑖=1 ⊆ R𝐾 and an index set 𝐼 ⊆ [𝑛] with 𝐾 + 1 elements such that
⎧ ⎨ ⎩ 𝑥𝑖1 = 0 𝑥𝑖𝑗(𝑗 − 1) ̸= 0, 𝑗 ∈ [2, 𝐾 + 1] 𝑥𝑖𝑗(𝑖) = 0, 𝑗 ∈ [2, 𝐾 + 1], 𝑖 ∈ [𝑗, 𝐾]
for {𝑖𝑗}𝐾+1𝑗=1 ⊆ 𝐼 and {𝑥𝑖}𝑛𝑖=1 realizes 𝐷.
We remark that, since 𝐾 is a positive integer, then 𝐷 ̸= 0. We proceed by induction on 𝑛. For 𝑛 = 2 we have
𝐷= (︂ 0 𝐷(1, 2) 𝐷(1, 2) 0 )︂ ,
therefore, dim(𝐷) = 1 and {𝑥1= 0, 𝑥2= √︀𝐷(1, 2)} ⊂ R1 and 𝐼 = {1, 2}, then the statement is true.
As induction hypothesis suppose that the statement is true for some 𝑛 ≥ 1, i.e., given a 𝑛 × 𝑛 EDM 𝐷 such that dim(𝐷) = 𝐾, exists {𝑥𝑖}𝑛𝑖=1⊆ R
𝐾 and an index set 𝐼 ⊆ [𝑛] with 𝐾 + 1 elements such that
⎧ ⎨ ⎩ 𝑥𝑖1 = 0 𝑥𝑖𝑗(𝑗 − 1) ̸= 0, 𝑗 ∈ [2, 𝐾 + 1] 𝑥𝑖𝑗(𝑖) = 0, 𝑗 ∈ [2, 𝐾 + 1], 𝑖 ∈ [𝑗, 𝐾]
for {𝑖𝑗}𝐾+1𝑗=1 ⊆ 𝐼 and {𝑥𝑖}𝑛𝑖=1 realizes 𝐷.
Let 𝐷 be a (𝑛 + 1) × (𝑛 + 1) EDM such that dim(𝐷) = 𝐾, so ¯𝐷 = 𝐷([𝑛], [𝑛]) is a EDM such that, by Lemma
3.2.6, dim( ¯𝐷) = 𝑘, with 𝑘 = 𝐾 or 𝑘 = 𝐾 − 1. By the induction hypothesis, there exists {𝑥𝑖}𝑛𝑖=1⊆ R
𝑘which realizes
¯
𝐷 and an index set 𝐼 ⊆ [𝑛] with 𝑘 + 1 elements such that
⎧ ⎨ ⎩ 𝑥𝑖1 = 0 𝑥𝑖𝑗(𝑗 − 1) ̸= 0, 𝑗 ∈ [2, 𝑘 + 1] 𝑥𝑖𝑗(𝑖) = 0, 𝑗 ∈ [2, 𝑘 + 1], 𝑖 ∈ [𝑗, 𝑘]
for {𝑖𝑗}𝑘+1𝑗=1 ⊆ 𝐼. Without any loss of generality, we can assume {𝑥𝑖}𝑛𝑖=1 ⊆ R
𝑘+1: we can do that by defining
the (𝑘 + 1)st coordinate of each vector to be zero. Since dim(𝐷) ≤ (𝑘 + 1), by Lemma 3.2.7, there exist 𝑦 = (𝑦1, 𝑦2, · · · , 𝑦𝑘+1) such that {𝑥𝑖}𝑛𝑖=1∪ {𝑦}realizes 𝐷.
This means, 𝑦 belongs to the intersection of the spheres centered in {𝑥𝑖}𝑛𝑖=1⊆ R𝑘+1each with radius √︀𝐷(𝑖, 𝑛 + 1).
Therefore, 𝑦 is the solution of the following non-linear system: ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩ ‖𝑥1− 𝑦‖2= 𝐷(1, 𝑛 + 1) ‖𝑥2− 𝑦‖2= 𝐷(2, 𝑛 + 1) ... ‖𝑥𝑛− 𝑦‖2= 𝐷(𝑛, 𝑛 + 1)
Reordering the equations such way that 𝑗th equation is ‖𝑥𝑖𝑗 − 𝑦‖
2= 𝐷(𝑖 𝑗, 𝑛+ 1) for 𝑗 ∈ [𝑘 + 1], we have ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ‖𝑥𝑖1− 𝑦‖ 2= 𝐷(𝑖 1, 𝑛+ 1) ‖𝑥𝑖2− 𝑦‖ 2= 𝐷(𝑖 2, 𝑛+ 1) ... ‖𝑥𝑖𝑘+1− 𝑦‖ 2= 𝐷(𝑖 𝑘+1, 𝑛+ 1) ... ‖𝑥𝑗1− 𝑦‖ 2= 𝐷(𝑗 1, 𝑛+ 1) ‖𝑥𝑗2− 𝑦‖ 2= 𝐷(𝑗 2, 𝑛+ 1) ... ‖𝑥𝑗𝑛−𝑘−1− 𝑦‖ 2= 𝐷(𝑗 𝑛−𝑘−1, 𝑛+ 1)
14
where {𝑗𝑖}𝑛−𝑘−1𝑖=1 = [𝑛]∖𝐼. Applying the induction hypothesis, we know the points {𝑥𝑖𝑗} 𝑘+1
𝑗=1. Using this information
and subtracting the first equation from the others, we obtain: ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ‖𝑦‖2= 𝐷(𝑖 1, 𝑛+ 1) 𝑥𝑡 𝑖2𝑦= 𝑏𝑖2 ... 𝑥𝑡 𝑖𝑘+1𝑦= 𝑏𝑖𝑘+1 ... 𝑥𝑡 𝑗1𝑦= 𝑏𝑗1 𝑥𝑡 𝑗1𝑦= 𝑏𝑗2 ... 𝑥𝑡 𝑗𝑛−𝑘−1𝑦= 𝑏𝑗𝑛−𝑘−1 where 𝑏𝑖= ‖𝑥𝑖‖2−𝐷(𝑖, 𝑛 + 1) + 𝐷(𝑖1, 𝑛+ 1) 2
for 𝑖 ∈ [𝑛] − {𝑖1}. Let 𝐵 be (𝑛 − 1) × 𝑘 the associated matrix to the linear part of the non-linear system and 𝑏 the
solution vector, both of them ordered according to the system above. Then, we can rewrite the system of equations
as {︂
‖𝑦‖2= ¯𝐷(𝑖
1, 𝑛+ 1)
𝐵𝑦([𝑘]) = 𝑃 𝑏.
By construction, we have that 𝐵 is a lower triangular matrix without null elements in the diagonal, therefore the linear part of the system has only two possible outcomes: there is a unique solution or there is no solution. If
the system has no solution, then the set generated by the intersection of the spheres in R𝑘+1 is empty, and,
thus, 𝐷 is not a EDM, that is an absurd. So, the linear part of the system has only one solution 𝑦*.
Replacing the found solution in the linear system in ‖𝑦‖2= ‖𝑦([𝑘])‖2+𝑦2
𝑘+1= 𝐷(𝑖1, 𝑛+ 1), we obtain
𝑦𝑘+12 = 𝐷(𝑖1, 𝑛+ 1) − ‖𝑦*‖2.
If 𝐷(𝑖1, 𝑛+ 1) − ‖𝑦*‖2 is negative, then the system has no solution, i.e., the intersection of the spheres in
R𝑘+1is empty, and, thus, 𝐷 is not a EDM, which is, again, an absurd. Therefore, the difference is not negative. If the difference is null, then 𝐾 = 𝑘 and the last entry of each point is unnecessary and the index set of the induction hypothesis remains valid and the statement is true.
If 𝑘 = 𝐾 − 1, then the difference is strictly positive which implies two possible solutions from what we chose one of them. Then we state 𝑥𝑛+1 = 𝑦, ¯𝐼 = 𝐼 ∪ {𝑥𝑛+1} as the index set, and there exists {𝑥𝑖}𝑛+1𝑖=1 ⊆ R
𝑘+1 which
realizes 𝐷 and an index set ¯𝐼 ⊆ [𝑛 + 1], with 𝐾 + 1 elements such that ⎧ ⎨ ⎩ 𝑥𝑖1 = 0 𝑥𝑖𝑗(𝑗 − 1) ̸= 0, 𝑗 ∈ [2, 𝐾] 𝑥𝑖𝑗(𝑖) = 0, 𝑗 ∈ [2, 𝐾], 𝑖 ∈ [𝑗, 𝐾]
for {𝑖𝑗}𝐾+1𝑗=1 = ¯𝐼. Therefore, the statement is true.
The reciprocal of the statement is true by definition of EDM and of the embedding dimension of the linear space generated by the defined set of points.
This induction process suggests an algorithm to verify whether or not a matrix 𝐷 is a EDM and, if so, to determine an embedding in the least possible dimension. The procedure is shown in Alg. 3.1, and we refer to it as edmsph, from “EDM” and “sphere”. The pseudocode makes use of a function expand(𝑥) which endows point vectors in the sequence 𝑥 with an additional zero component. We denote the sphere centered in 𝑝 ∈ R𝐾+1 with radius 𝑟
by S𝐾(𝑝, 𝑟).
We remark that, given 𝐾 spheres in R𝐾, we assume their centers are in general position, i.e. they span a (𝐾
−1)-dimensional affine space. Then we have at most two points in the intersection of these spheres. More specifically,
we have no point if the intersection is empty, one point if the intersection lies in the (𝐾 − 1)-dimensional affine space generated by the centers and two points if there are no points of the intersection in the (𝐾 − 1)-dimensional affine space generated by the centers.
Using trilateration on the appropriately indexed points guaranteed by Thm. 3.2.8, finding Γ in Alg. 3.1 requires solving triangular linear systems of order from 2 to the embedding dimension of a EDM, which can be carried out in time proportional to ¯𝐾2, for ¯𝐾 ∈[2, dim(𝐷)]. This leads to a total time of 𝒪(𝑛3) in the worst case.
Alg. 3.1 𝐾 = edmsph(𝐷, 𝑥)
1:𝐼 = {1, 2}
2:𝐾 = 1
3:(𝑥
1, 𝑥
2) = (0,
√
𝐷
12)
4:for 𝑖 ∈ {3, . . . , 𝑛} do
5:Γ =
⋂︀ 𝑗∈𝐼S
𝐾(𝑥
𝑗, 𝐷
𝑖𝑗)
6:if Γ = ∅ then
7:return ∞
8:else if Γ = {𝑝
𝑖} then
9:𝑥
𝑖= 𝑝
𝑖 10:else if Γ = {𝑝
+𝑖, 𝑝
−𝑖} then
11:𝑥
𝑖= 𝑝
+𝑖 12:𝑥 ← expand(𝑥)
13:𝐼 ← 𝐼 ∪ {𝑖}
14:𝐾 ← 𝐾 + 1
15:else
16:
error: dim aff(span(𝑥
𝐼)) < 𝐾 − 1
17:
end if
18:
end for
19:
return 𝐾
3.3
Numerical Experiments
As mentioned early, in [19] Dattorro developed isedm, an algorithm for checking whether a given symmetric hollow matrix 𝐷 with nonnegative entries is a EDM. In what follows we compare the experimental results obtained with isedm and edmsph based on a series of tests.
In the first series of experiments, we used the constructions proposed by Moré and Wu [More1997]. Those constructions consist of structures with 𝑠3 elements (𝑠 ∈ N) positioned on the three-dimensional lattice defined by
{(𝑖1, 𝑖2, 𝑖3) ∈ R3|0 ≤ 𝑖𝑘 ≤ 𝑠 −1, 𝑘 ∈ [3]}, for 𝑠 ∈ [2, 10]. The second and third columns in Table 3.1 shows the results
of those experiments. Each entry reports the Stress value (the Frobenius norm) between the original EDM and the one obtained by the algorithms. In all experiments all algorithms obtained the correct embedding dimension.
We can see a significant difference between the Stress values obtained by the algorithms. One possible explana-tion for such a difference is the sensitivity of the spectral decomposiexplana-tion applied in the isedm routine. The edmsph algorithm is more stable in that respect because it solves triangular linear systems in each step. In order to solve this problem with the sensitivity of the isedm we re-made the algorithm using another routines, e.g. svd. This “new” routine is called isedm2 and its results on the Moré-Wu constructions are presented in last column of the
Table 3.1, and these results are pretty much similar to the ones obtained by edmsph as we can see. For the rest of the section we will use the routine isedm2 instead of the usual isedm.