Conclusões - Sistema de Geração de Texto Para Performance

Construída e testada a ferramenta, irei concluir sobre alguns aspetos provenientes do processo de construção, dos resultados obtidos e dos possíveis caminhos futuros.

Propus-me a construir uma ferramenta de geração automática de texto reativa a um sinal áudio. A ideia original era que a ferramenta pudesse servir de suporte visual para uma performance musical. Um sistema simples de input de áudio e output de texto. No entanto, no decorrer deste trabalho, a ferramenta veio a tornar-se em algo mais interativo. É semiautomática, semiautónoma, mas responsiva. Ou seja, tornou-se mais um instrumento da própria performance e menos um acessório visual automático, ainda que o possa ser. Assim, durante o percurso de pesquisa e de construção, acabei por tornar a ferramenta numa ferramenta mais ativa, construída de encontro à minha interação com esta como músico.

A construção da ferramenta é intencionalmente simplista e elementar na sua componente técnica, visto que, desde o início, pretendia focar-me essencialmente na construção do modelo de ligação som/texto, na exploração dos descritores de baixo nível como forma “pseudouniversal” de extração de conteúdo semântico e percetual de um sinal áudio, mas também nas referências pessoais e históricas que trazia para este trabalho provenientes do meu background. A vontade de explorar os sistemas simples usados pelos pioneiros da inteligência artificial, mas acima de tudo, do potencial de serendipidade, a ideia de montar um sistema elementar mas modular e com o máximo potencial de adaptação e crescimento em aberto. Assim, ainda que consciente de que apresento aqui um trabalho marcadamente pessoal e específico no seu contexto, acredito que a sua documentação contenha conteúdos relevantes para diferentes áreas e que a ferramenta resultante, da maneira que foi construída, possa servir para outras aplicações.

A escrita de uma dissertação no contexto da action research, especialmente com base artística, levanta alguns problemas como a dificuldade de documentar da forma científica tradicional algumas características mais pessoais do trabalho que se prendem principalmente com o background e as referências artísticas. No entanto, este tipo de pesquisa parece-me cada vez mais relevante no contexto académico atual, nomeadamente nas áreas da multimédia, como potencial gerador de conhecimento novo e, principalmente, de novas formas de chegar a esse conhecimento. A multimédia, que faz a ponte entre diferentes componentes tecnológicas poderia desta forma fazer mais ligações também entre conhecimento tecnológico e conhecimento teórico, como fizeram os pioneiros da inteligência artificial aqui falados quanto cruzaram matemática com psicologia ou computação com criptografia.

Esta ferramenta faz a ponte entre duas componentes dificilmente relacionáveis, o som e o texto, devido à componente percetiva, sensorial e interpretativa, que confere uma componente de subjetividade pessoal de difícil estudo. Conceitos como a perceção e a interpretação são extremamente difíceis de tratar cientificamente, existindo uma infinidade de definições de diferentes autores para cada um. No entanto, foram precisamente estes conceitos, e talvez mais ainda a sua correlação, que tentei também aqui explorar. Pretendi gerar entendimento sobre estes temas através da documentação extensiva dos processos, dos conceitos e das influências pessoais, bem como a justificação das decisões tomadas e dos resultados experimentais obtidos.

A área da geração automática de texto tem também demasiadas ramificações para cobrir na sua totalidade mas fiz questão de referenciar não só as mais centrais para este trabalho como as mais relevantes para a sua inserção no contexto cultural, geográfico e temporal, tornando-se assim, espero, num documento que possa servir, agora e no futuro, a quem se interesse por áreas tão diferentes e dispersas como as que aqui tentei unir.

5.1 Trabalho futuro

No decorrer deste trabalho surgiram bastantes ideias de aplicações futuras deste trabalho. A principal, consciente desde o início, é a possibilidade de adaptar graficamente o resultado visual a qualquer contexto graças às capacidades do Processing. O recurso ao texto abre a porta, só por si, à área da tipografia, que por sua vez traz consigo toda a componente semântica do texto como elemento visual. O tipo de letra, a apresentação gráfica, as cores, as formatações, são tudo características fáceis de controlar nesta ferramenta e podem servir para adaptar o resultado visual ao contexto pretendido. Pretendo também explorar a implementação de tecnologia text-to-speech, sonorizando o texto gerado em tempo real, o que poderá ser também utilizado na performance musical. Pretendo ainda estudar a capacidade de tornar o sistema num equipamento autónomo, recorrendo por exemplo ao Raspberry Pi, que permita o uso desta ferramenta restringido a uma caixa com um input áudio e um output vídeo, sem a necessidade de recurso a um computador. Usarei ainda a ferramenta unicamente pela sua componente de geração de texto, como por exemplo, para criação de livros de textos gerados através da mistura de duas obras literárias conforme “lidas” pelo input áudio de músicas diferentes (e.g. a bíblia e o corão conforme reinterpretados por um álbum de black metal ou techno). Existe ainda a possibilidade de utilizar esta ferramenta em contexto de instalação, em que sirva, por exemplo para gerar texto através do input sonoro do som ambiente de um espaço, reagindo ao som das pessoas que passam e permitindo a sua interação com o sistema através do som.

6. Bibliografia

Aarseth, E. J. (1998). Cybertext, Perspectives on Ergodic Literature. In Leonardo Music Journal (Vol. 8).

Arnold, K. (2007). The creative unconscious, the unknown self, and the haunting melody: Notes on Reik’s theory of inspiration. Psychoanalytic Review, 94(3), 431–445.

https://doi.org/10.1521/prev.2007.94.3.431

Koninklijke Brill NV, Leiden, The Netherland

Bowens, K. N. (2008). Interactive Musical Visualization Based on Emotional And Color Theory. Texas University

Burroughs, W. S., & Gysin, B. (1978). Cut-ups self-explained. In The Third Mind. Viking Press. Busch, K. (2009). Artistic Research and the Poetics of Knowledge. ART&RESEARCH: A Journal of

Ideas, Contexts and Methods. Volume 2. No. 2. Spring 2009, Volume 2(2), 7. Bussler, F. (2020). Will The Latest AI Kill Coding? Towardsdatascience.Com.

https://towardsdatascience.com/will-gpt-3-kill-coding-630e4518c04d

C.P.Bryan. (2012). Cut n’ Mix User Guide (Version 5.4). http://www.cutnmix.com/cut-n-mix-manual- V5-4-1.pdf

Castanyer, L. B. (2005). Textualidades electrónicas - Nuevos escenarios para la literatura. Editorial UOC.

Collier, W. G., & Hubbard, T. L. (2004). Musical Scales and Brightness Evaluations: Effects of Pitch, Direction, and Scale Mode. Musicae Scientiae, 8(2), 151–173.

https://doi.org/10.1177/102986490400800203

Costa Lima, P. (1995). Música: um paraíso familiar e inacessível. Percurso No_15.

Crupi, G. (2019). Volvelles of knowledge. Origin and development of an instrument of scientific imagination (13th-17th centuries). JLIS.It, 10(2), 1–27. https://doi.org/10.4403/jlis.it-12534 D’Orazio, D. (2014). Computer allegedly passes Turing Test for first time by convincing judges it is a

13-year-old boy. The Verge Online. https://www.theverge.com/2014/6/8/5790936/computer- passes-turing-test-for-first-time-by-convincing-judges-it-is

Debord, G. (1967). Society of the Spectacle (1983rd ed.). Black and Red.

Debord, G. (2003). Relevé des citations ou détournements de La société du Spectacle. Editions Farandola.

Situationniste #1, 1(1).

FCUP, G. de comunicação. (2018). Homenagem a Rogério Nunes | NCR 4100 | 50 anos. Sigarra. https://sigarra.up.pt/fcup/pt/noticias_geral.ver_noticia?p_nr=30572

Friberg, A., Schoonderwaldt, E., Hedblad, A., Fabiani, M., & Elowsson, A. (2014). Using listener- based perceptual features as intermediate representations in music information retrieval. The Journal of the Acoustical Society of America, 136(4), 1951–1963.

https://doi.org/10.1121/1.4892767

Gaboury, J. (2013). Queer History of Computing: Part Three. Rhizome.

https://rhizome.org/editorial/2013/apr/9/queer-history-computing-part-three/

Gagniuc, P. (2017). Markov Chains: From Theory to Implementation and Experimentation. Wiley. Giannakopoulos, T., & Pikrakis, A. (2014). Introduction to Audio Analysis: A MATLAB Approach.

In Introduction to Audio Analysis: A MATLAB Approach. Elsevier. https://doi.org/10.1016/C2012-0-03524-7

Gouyon, F., Herrera, P., Gomez, E., Cano, P., Bonada, J., Loscos, A., Amatriain, X., & Serra, X. (2008). Content Processing of Music Audio Signals. In P. Polotti & D. Rocchesso (Eds.), Sound to Sense, Sense to Sound - A State of the Art in Sound and Music Computing (p. 486).

Graefe, A. (2016). Guide to Automated Journalism.

https://www.cjr.org/tow_center_reports/guide_to_automated_journalism.php Henriques, J. (1998). Antologia - Internacional Situacionista. Antígona

Hornigold, T. (2018). The First Novel Written by AI Is Here—and It’s as Weird as You’d Expect It to Be. Singularity Hub. https://singularityhub.com/2018/10/25/ai-wrote-a-road-trip-novel-is-it-a- good-read/#sm.0000chkz0c12q4etbyrz6tm5sehf5

Howe, D. C. (2009). RiTa: Creativity support for computational literature. C and C 2009 -

Proceedings of the 2009 ACM SIGCHI Conference on Creativity and Cognition, January 2009, 205–210. https://doi.org/10.1145/1640233.1640265

Ian Goodfellow, Yoshua Bengio, & Aaron Courville. (2016). Deep Learning. MIT Press.

Jehan, T. (2004). Event-synchronous music analysis/synthesis. 7th International Conference on Digital Audio Effects, 6.

Jehan, T., & Schoner, B. (2001). An Audio-Driven Perceptually Meaningful Timbre Synthesizer. Johnson, C. D. (2012). N+2, or a Late Renaissance Poetics of Enumeration. MLN, 127(5), 1096–1143.

https://doi.org/10.1353/mln.2012.0140

Kleon, A. (2010). Newspaper Blackout. Harper Perennial.

Link, D. (2006). There Must Be an Angel: On the Beginnings of the Arithmetics of Rays. In Variantology 2 On Deep Time Relations of Arts, Sciences and Technologies.

Link, D. (2010). Scrambling Truth - Rotating Letters as a Material Form of Thought. Variantology 4. On Deep Time Relations of Arts, Sciences and Technologies in the Arabic–Islamic World, 4, 215–266.

http://gen.lib.rus.ec/book/index.php?md5=dd8ad516b273d9f45b08dc2edf49598c

Manns, S. (2013). Grenzen des Erzählens - Konzeption und Struktur des Erzählens in Georg Philipp Harsdörffers “Schauplätzen.”

McCranor, T., & Michels, S. (2020). Science Fiction and Political Philosophy: From Bacon to Black Mirror. Lexington Books.

McNiff, S. (1998). Art-based Research. Jessica Kinglsey Publishers

Noorden, R. Van. (2014). Publishers withdraw more than 120 gibberish papers. Nature Magazine. https://www.nature.com/news/publishers-withdraw-more-than-120-gibberish-papers-

1.14763?WT.mc_id=TWT_NatureNews

OpenAI. (2019). Better Language Models and Their Implications. https://openai.com/blog/better- language-models/

Peeters, G. (2004). A large set of audio features for sound description (similarity and classification) in the CUIDADO project.

Piringer, J. (2007). Nam Shub – A Text Creation and Performance Environment. Environment, 1–13. PO.EX, A. D. da. (n.d.). Textos Permutacionais e Aleatórios - Pedro Barbosa, Literaruta e

Cibernética 1. Arquivo Digital Da PO.EX. Retrieved August 17, 2020, from https://po- ex.net/taxonomia/materialidades/digitais/pedro-barbosa-literatura-cibernetica-1-textos/ Privault, N. (2013). Springer Undergraduate Mathematics Series Understanding Markov Chains.

http://www.springer.com/series/3423

Puckette, M. S., Apel, T., & Zicarelli, D. D. (1998). Real-time audio analysis tools for Pd and MSP. Icmc 98, 74(October 1998), 109–112.

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.40.6961

Queirós, L. M. (2017). Pedro Barbosa, pioneiro na ciberliteratura. P3 - Público Online.

https://www.publico.pt/2017/07/21/culturaipsilon/noticia/pedro-barbosa-o-pioneiro-portugues- 1779845

Retresco. (2020). Robot Journalism. https://www.retresco.de/en/encyclopedia/robot-journalism/ Rettberg, S. (2019). Electronic Literature. Wiley.

Roazen, P. (1992). Freud And His Followers. Da Capo Press. Sacks, O. (2007). Musicophilia. Pan Macmillan.

7. Schäfer, J. (2006). Literary Machines Made in Germany. German Proto-Cybertexts from the Baroque Era to the Present. Cybertext Yearbook 2006: Ergodic Histories

Schofield, J. (2014). Computer chatbot “Eugene Goostman” passes the Turing test. ZD Net. https://www.zdnet.com/article/computer-chatbot-eugene-goostman-passes-the-turing-test/ Schubart, E. (1978). Handbook of Perception Vol.IV - Hearing (E. Carterette & M. Friedman (eds.)).

University of California.

27(4), 623–656. https://doi.org/10.1002/j.1538-7305.1948.tb00917.x Streindberg, H. (2009). Relations Between Text and Music.

https://henrikstrindberg.se/sv/content/relations-between-text-and-music Wark, M. (2011). The Beach Beneath the Street (2015th ed.). Verso Books.

Weizenbaum, J. (1966). ELIZA-A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45. https://doi.org/10.1145/365153.365168

Westerhoff, J. C. (1999). Poeta Calculans: Harsdorffer, Leibniz, and the “Mathesis Universalis.” Journal of the History of Ideas, 60(3), 449. https://doi.org/10.2307/3654013

Wizenbaum, J. (1976). Computer Power and Human Reason. W.H. Freeman & Co.

Ziemer, T., Yu, Y., & Tang, S. (2016). Using psychoacoustic models for sound analysis in music. ACM International Conference Proceeding Series, 08-10-Dece, 1–7.

https://doi.org/10.1145/3015157.3015158

Zwicker, E. (1961). Subdivision of the Audible Frequency Range into Critical Bands (Frequenzgruppen). The Journal of the Acoustical Society of America, 33(2), 248–248. https://doi.org/10.1121/1.1908630

8. Anexos

Anexo 1

Código integral do primeiro protótipo de geração de texto através da mistura de duas fontes textuais diferentes, programado em Processing.

import rita.*; RiMarkov markov;

String line = "Click Me!";

String[] files = { "A.txt", "B.txt" };

int x = 160, y = 240; void setup()

{

size(500, 500); fill(255);

textFont(createFont("verdana", 21));

// create a markov model; n=3 from the files

markov = new RiMarkov(2); markov.loadFrom(files, this); } void draw() { background(0, 0, 255); text(line, x, y, 400, 400); } void mouseReleased() { if (!markov.ready()) return; x = y = 50;

String[] lines = markov.generateSentences(3); line = RiTa.join(lines, " ");

Anexo 2

Código de geração de texto final, implementado em Processing. import rita.*; import oscP5.*; import netP5.*; OscP5 oscP5; RiMarkov[] markov; PFont font;

//loads the two external *.txt files

String firstFileName = "dada.txt";

String secondFileName = "reich.txt";

//Reads the contents of the txt files and creates a String array of its ind ividual lines

String[] linesA, linesB;

// Array of feeds. Each feed will be made of a certain number of times of t ext A and text B (e.g. AAAAAAAA, AAABBBBB, BBBBBBAA)

String[] feed;

//String variable to get the text from the markov procedure

String longText = "";

//String variable to get the text that will be displayed on the screen

String textNoEcra ="";

// Initial nGram (DON´T USE 1 OR LESS).

int nGramInicial=2;

//Number of feeds to be created. Actually it will be a total of eight feed in order to get a perfect mix of copies of text A and text B.

int stepCount = 8;

// Total generated text length (number of characters)

int textLength = 1000 ;

// variable used in the creation of text feeds and respective n-gram

int feedMarkov=0;

//AUDIO RESPONSIVE CONTROLS //text font size

int tamanhoFonteTexto=54;

//amount of each text to apply markov procedures: 0=only text A; 6=only tex t B

int mix=3;

//background light intensity

int intensity = 0;

// number of displayed sentences

int frases = 1; void setup() {

// Screen size or fullscreen

size(1280, 800); //fullScreen(1);

linesA = loadStrings(firstFileName); linesB = loadStrings(secondFileName); String textA = "";

for (int i=0; i<linesA.length; i++) { textA += linesA[i]+" ";

String textB = "";

for (int i=0; i<linesB.length; i++) { textB += linesB[i]+" ";

}

//Setting up the Markov chains (27 markov chains= (8+1)*3). At this moment, no text is assigned for any markov chain.

//We´e just created the placeholder (RiMarkov objects)

markov = new RiMarkov[(stepCount+1)*3]; for (int i=0; i<(stepCount+1)*3; i++) { //all ngram markovs

int grauMarkov=(i%3)+nGramInicial;

markov[i] = new RiMarkov(grauMarkov, true); //println("index"+i+" "+"grau"+grauMarkov);

}

// creates eight feeds. Each feed will have a specific amount of copies of text A and text B.

// each feed will be copied with equal amount of copies of text A and B (e.g. AAAAABBB, AAAAABBB, AAAAABBB)

// to be assigned a specific n-gram (e.g. AAAAABBB n-gram 2, AAAAABBB n- gram 3, AAAAABBB n-gram 4)

//thus 9 feeds * 3 n-grams = 27 markov chains

//below we initialize the array

feed = new String[stepCount+1];

// variable in a for loop to decide the amount of copies of text A and text B in each feed

int deciderText=0;

//creates all feeds fed with the specific copies of each text

for (int i=0; i<=stepCount; i++) { for (int j=0; j<stepCount; j++) { if (j<deciderText) { feed[i] += textB; } else { feed[i] += textA; } }

//variable to control the amount of text A and text B for each feed

deciderText++; }

//creates markov chains for each feed. 8 in total.

for (int i=0; i<(stepCount+1)*3; i++) { if(i%3==0 && i!=0)

{

feedMarkov++; }

markov[i].loadText(feed[feedMarkov]);

println("indexMarkov"+i+" "+"feed"+feedMarkov); }

/* start oscP5, listening for incoming messages at port 12000 */

oscP5 = new OscP5(this, 12000);

oscP5.plug(this, "setLight", "/light"); oscP5.plug(this, "setBang", "/bang"); oscP5.plug(this, "setMix", "/mix"); oscP5.plug(this, "setNGram", "/ngram"); oscP5.plug(this, "setFrases", "/frases"); }

void draw() { background(intensity); displayTexto(); } //triga luminosidade

public void setLight(int light){ intensity = light;

}

//triga texto

public void setBang(int b) { if( b==1)

{

//variable to store the text created by the markov chain of the specific chosen feed

String[] lines = markov[mix].generateSentences(frases);

//String variable to get the text from the previous variable and create one unified String object

longText = "";

for (int i = 0; i<lines.length; i ++) { longText += lines[i];

}

//copy the contents of the longText variable to textNoEcra variable

textNoEcra = longText;

//checks if longText is bigger than the desired text length

if (longText.length()>textLength) {

//Returns a new string that is a part of the original string. longText.substring(beginIndex, endIndex)

textNoEcra = longText.substring(0, textLength); }

}

//resets the variable b to zero (max always sends 1 when the bang is triggered

b=0; }

//text mixer 0= only text A; 7= only text B

public void setMix(int mixMax){ mix = mixMax;

}

//number of sentences receiver

public void setFrases(int frasesMax){ frases = frasesMax;

}

//text options

void displayTexto() {

textSize(tamanhoFonteTexto);

font = createFont("IBM.ttf",tamanhoFonteTexto); textFont(font);

fill(0);

text(textNoEcra, 50,20, width-100, height-90); textAlign(CENTER, CENTER);

No documento Sistema de Geração de Texto Para Performance (páginas 54-63)