Scrape Elegy (2022)
Scrape Elegy installation at the Science Gallery Melbourne. Photos by Alan Weedon
Project Summary
Scrape Elegy was conceived by Gabby Bush and Willoh Weiland as part of the SWARM Exhibition at Science Gallery Melbourne from 13 August – 3 December 2022 and supported by the Centre for Artificial Intelligence and Digital Ethics (CAIDE) at the University of Melbourne. I was called into the project as a Sound Designer in April 2022.
The installation combined sound, a physical installation in the form of a pink public toilet and participatory practice through participants’ Instagram accounts. As they entered the toilet, participants input their Instagram handle into an iPad interface. A data scraping algorithm then scraped through the participants’ captions from their posts, selected random captions that were of the appropriate length and passed these into a text-to-speech service for rendering as speech audio. These speech samples were combined with background music and pre-recorded voice samples from our voice actor to generate a personalised audio journey for each participant of about 5 minutes in length.
For visitors who did not have an Instagram account and as a back-up, we created a dummy audio journey from a dummy Instagram account. To create this, we asked our own Instagram followers to share their most "cringe internet speak". Inspired by this input, we scripted the banal, generic content reflective of many social media posts.
The emergence of banality and the aesthetics of the everyday in art have been traced to a widespread reaction to the elitism of the modern movement. Combined with the dishing up of private everyday details of our lives for scrutiny and admiration of the public in social media, art has turned into a background to life (Varnelis 2008). In this reality-as-entertainment culture, art movements such as New Aesthetics emerged, where the imageries and structures of the networked world superimpose on the physical. Artists have increasingly harnessed this new aesthetics to expose the insidious way in which Big Tech and network capitalism have co-opted us into their algorithms and frameworks without our awareness, by presenting this very ordinariness in contexts that expose their un-ordinariness.
"A revolution normally lies ahead of us and is heralded with sound and fury. The algorithmic revolution lies behind us and nobody noticed it. That has made it all the more effective – there is no longer any area of social life that has not been touched by algorithms’.... In this sense, a new aesthetics – along with former digital aesthetics – is a priori bound to the domain of the vernacular, and any critique must begin by addressing this ordinariness behind 'the new'. New aesthetics is the transcoding of evil and grey media that become sensible to us as banal effects of the everyday." (Andersen and Pold 2015)
In choosing the form of a public toilet, we drew attention to the toilet as a space where we often reflect upon our most private thoughts. At the same time, the toilet is where our digested waste gets dumped and conjoined with everyone else’s waste, an apt metaphor for our social media output. On this physical world we superimposed not visual imagery of the networked world, but a sonic one. The participant’s Instagram captions escape the network transformed into physical soundwaves. Listening to an unfamiliar neural voice reading out their captions, the participant re-experienced their own voice through a ruptured body, rendering the ordinariness of social media un-ordinary.
System Design
The Scrape Elegy system consisted of a Python-based backend and React frontend. Below is a technical description by our developer Misha Mikho (Bush and Lim 2024):
Scrape Elegy has five containers:
-
The Frontend (which builds static files during docker-compose build using webpack and shoots them off into a volume, to be picked up by Nginx, immediately exiting during docker-compose up);
-
The Backend (which runs Daphne, an ASGI Django server with support for Channels to facilitate the use of websockets);
-
The task queue (Huey);
-
Redis (an in-memory database used both by Django Channels to facilitate websocket connections and by the task queue Huey); and
-
The web server (Nginx), which serves all the static files, which are:
a. the optimised frontend (React) production build;
b. our backend static files, e.g., for the Django admin site; and
c. the audio clips, which are generated by the Huey task queue and passed onto Nginx).
Participants interfaced with our frontend via two iPads. The first, mounted on the exterior wall of the work, indicated whether our toilet was vacant/occupied. The second iPad, mounted to the inner circle, had a series of prompts which asked for consent and stepped the participant through the process. Participants could input their Instagram handle here or experience our dummy scrape. If any participant had a private account, we had to implement a friend request from our own Scrape Elegy Instagram account to access their captions. The participant was then invited into the structure to experience their audio journey.
The stereo sound was played through a dome parabolic speaker located above the participant as they sat on the toilet. The parabolic speaker was chosen to reduce sound bleed into the rest of the exhibition and to preserve some privacy for the participant. A more detailed explanation of Scrape Elegy has been published in the Journal of Artistic Research co-authored by Gabby Bush and myself (Bush and Lim 2024).
Transcorporeality
Scrape Elegy physically consisted of two semi-circular structures built with acoustic panels and steel frame, with a non-functioning toilet at its heart. The structure led the participant to explore the space as the sound drew them into the centre, eventually encouraging them to sit on the toilet beneath the dome speaker. The structure gave a sense of privacy, but was in fact not completely enclosed, reflecting social media’s blurring of public and private spaces.
Render of Scrape Elegy physical installation by Lauren Stellar
The physical form of a toilet placed the work in a long line of absurd toilets in conceptual art, most famously Marcel Duchamp's Fountain (2019), Gelitin’s Locus Focus (2004) and Maurizio Cattelan’s America (2016). This lineage in postmodernist critique, exceptional pink colour and placement in a gallery was crucial in supporting the sound design to subvert participants’ experience of social media. The embodied experience of the participant in encountering, queuing up, navigating and listening in the physical space of the pink toilet with its fake toilet brush and non-functioning plumbing was a critical part of the transcorporeal experience.
Sound Design
"...if you can model the voice, you can serve the one thing which generative AI cannot - a body."
The audio journey for Scrape Elegy had to grapple with a few key issues. Firstly, as a representation of the participants' sense of self (Choi, Williams, and Kim 2020), it was a sonic narration or staging of their life, yet we did not know at the design process who their social media self was, let alone their content, style or length. Secondly, regardless of the individual profile of each participant, we wanted to use sound design to reinforce the affect of the physical toilet with its sense of absurdity and exaggerated banality;
using banality to question and ultimately fracture itself.
And finally, we wanted to create a dramaturgical and emotional narrative, bringing in techniques of film scoring to create an intimate emotional journey for the participant that would support the sense of nostalgia or "cringe" when hearing their own captions.
A critical part of the sound design was the neural voice used for reading out participants’ captions. Wanting a gender-neutral voice with a certain deadpan ennui, we explored the possibility of training our own custom neural voice with providers such as Resemble.ai, Overdub by Descript and Amazon's Polly. However, due to cost considerations (custom AIs are expensive) as well as being denied access for our inability to detail what the voice would be saying and mindful of contributing to the coffers of Big Tech, we settled on using a free pre-trained voice by Microsoft Azure Cognitive Services in two different emotional modes – "whispering" and "unfriendly".
Microsoft Azure offers the option of training a custom artificial voice through an application process, and our application was returned with the request for more details of the words spoken by the voice. As the intention was to use the Instagram captions of participants, we were unable to provide a script or word list to Microsoft and therefore were denied access.
A pre-trained female voice was pitched down 20%. We adjusted the rate of speaking, volume, the emotional mode and gaps between sentences to achieve variation in the vocal journey, which were pre-programmed to occur at certain points in time. Different durations for each section and parameters were trialled during the design process to tweak the emotional narrative.
Part of the code used to define the neural voice, pitch, rate of speaking, volume and timing for converting participants’ captions into audio files.
Rather than bringing in participants' captions from the start (without being able to know its content in advance), we decided to use pre-rendered captions by artist Sullivan Patten to help participants settle into the intimate mental space of the cognitive toilet. These pre-rendered captions were used to supplement the vocal captions throughout the audio journey. They consisted of the sounds we make in everyday conversation that usually disappear in our written texts (e.g. "hmmm", sighs, "like"), as well as emoji phrases that usually exist only in the textual domain and sound unfamiliar read out (e.g. "two hearts, two hearts").
We also introduced whole phrases to provide bookmarks at certain narrative points, such as "Do not stand in my grave and weep, I am not there" marking the dramatic midpoint and the song "bye...bye.... sorry" at the end. These provided a consistent narrative structure across different participants and their individual Instagram handles, and also provided anchor points for the caption-selection algorithm even where the participant had very few social media posts. Interestingly, a number of participants did not realise that some captions were pre-rendered, instead absorbing them into their own social media self-image - perhaps attesting to the homogenisation of language caused by "like" culture, search optimisation engines, and character limits (Andersen and Pold 2015).
To create banality in sound – like the pink toilet, a type of shiny, other-worldly, something-is-wrong banality - we designed an underscore of sounds to accompany the vocal captions. Starting with a simple ascending arpeggio reminiscent of a Public Service Announcement, it calls the participant into the inner sanctum of the cubicle. The first part of the journey built on the familiar, kitsch sound of a Wurlitzer electric piano in a pastiche Alberti bass pattern. These familiar sounds and patterns were, however, made uneasy using pitch slides at the ends of phrases.
The subtle pitch slides became large string glissandi by the middle of the audio journey, marking a dramatic point where the underscore nearly overwhelms the vocal captions. The voice becomes faster with smaller gaps between captions, making a sort of mental crescendo irrespective of the actual content of the captions (which in many cases were just descriptions of food and the humdrum of everyday life). The final section of the audio journey brought the participant literally into the world of laments and elegies, using the sound of a funereal pipe organ as the final accompaniment to their digital graveyard. Rather than recording "real" instruments, we used sampled digital instruments as a parallel to the transcoding of our real lives into a digitised, compressed bit version.
Participants’ captions were not filtered or ordered based on any content or emotional valence, to allow for absurd juxtaposition of captions with the melodramatic and over-the-top underscore. As it built to its dramatic peaks or fizzled to its melancholic ending, the nothingness of endless hashtags, text-to-speech emojis and netspeak acronyms highlighted the tragicomedy of our social media selves.
The following audio clips are samples of our algorithm on scraped captions of our friends who fit in the Science Gallery demographic. These were compiled during the development phase and therefore may not use the final audio files, volumes and timings, however they give an indication of how the algorithm works across different Instagram profiles. They also provide insight into how we evaluated the algorithms during our development process.
Evaluation
Scrape Elegy, ironically, became something of an Instagrammable phenomenon sought out by influencers, to be experienced and then shared on their own social media posts. To the extent that success is affirmed through others, then, it was a successful work, allowing participants to be our delegated performers by performing their self for themselves.
Sonically, the work could be further developed by incorporating procedural gaming audio techniques to generate the background soundtrack accompanying the voice. This would have the advantage of modularity, allowing different configurations of sound layers for different participants, for example, or for more accurately integrating the timing of the neural voice captions with the other sonic elements. Although we toyed with this idea in the development of Scrape Elegy, we ran out of time to explore the technical requirements to incorporate such an approach into the backend.
In addition, as text-to-music and text-to-sound generative AI models develop, it may also be possible to integrate generated sonic elements from the participants’ captions into the audio journey. This would allow a further transcoding of the networked self into simulated physical reality, blurring real memories with virtually-created ones.
Significance to Research
This project was significant to my research as it opened up the potential for neural voices to be used for real-time sound generation in creative work. This has been explored in further projects that investigate what “voice” signifies in socio-political relations (e.g. I have used neural voices with non-native English accents to provide participants the experience of being a migrant) and posthumanist thought (e.g. in a later work, I subverted human-computer relations by placing the human participant as the “speaker’” for an AI chatbot). This demonstrates the potential for new technology to create novel affordances for creative interaction.
This project also sparked my interest in digital ethics surrounding new technologies, which were further explored in Guài and Echo Chamber.
Credits
Creative concept: Gabby Bush, Willoh Weiland
Sound design: Monica Lim
Coding: Misha Mikho
Installation design: Lauren Stellar
Voice actor: Sullivan Patten
Supported by: CAIDE, University of Melbourne and Science Gallery Melbourne