Sound files offer the advantage of access to a large volume of data and speech features that cannot be captured on paper. Sound recordings can preserve a diasporic dialect on the verge of extinction. Cataloging sound files and making them accessible to both the lay community and professionals present new challenges. Collection-level, rather than item-level cataloging helps to deal with commercial music recordings. Double coding, using both the International Ethnographic Thesaurus and specifically Slavic metadata, can make field recordings accessible to both the Slavic community and specialists. Archives of important artists created by amateurs need scholars to generate finding aids.
- finding aids
- language documentation
- metadata selection
Access to Document
Other files and links
Sound Recordings in the Archival Setting : Issues of Collecting, Documenting, Categorizing, and Copyright. / Cope, Lida; Kononenko, Natalie; Qualin, Anthony; Yoffe, Mark.In: Slavic and East European Information Resources, Vol. 20, No. 3-4, 02.10.2019, p. 85-100.
Research output: Contribution to journal › Article › peer-review
TY - JOUR
T1 - Sound Recordings in the Archival Setting
T2 - Issues of Collecting, Documenting, Categorizing, and Copyright
AU - Cope, Lida
AU - Kononenko, Natalie
AU - Qualin, Anthony
AU - Yoffe, Mark
N1 - Funding Information: Cataloging metadata selection finding aids copyright language documentation Working with digital sound recordings presents many advantages in terms of the amount of data that can be made accessible. Recordings also provide access to non-verbal information, such as intonation, which can be captured in a recording, but is difficult to communicate through transcription. Of course, sound file databases present their own issues and challenges. Some are old problems such as copyright – which now appear with a new and more complicated twist. Working with digitized sound, however, requires new solutions; old cataloging systems may not work and need significant modification to make them appropriate to the material. Some problems are new, such as coming up with adequate terminology for digital databases of sound and finding a balance between existing metadata and new terms needed to cover categories not dealt with previously. The choice between existing terminology and more culture-appropriate terms is particularly pressing when dealing with non-English texts. Even when working with English texts, those that provide information about Slavic émigrés to the United States and Canada, Western terminology often proves deficient: imposing Western terminology may not adequately convey the features of the Slavic World. These and other topics were the subject of two roundtables on digital sound presented at the 2018 meeting of the Association for Slavic, East European, and Eurasian Studies held in Boston. The article that follows discusses the work of four roundtable participants: Lida Cope, Natalie Kononenko, Anthony James Qualin, and Mark Yoffe. Cope and Kononenko recorded their own data. Qualin and Yoffe worked with recordings made by others. Yoffe is the curator of the Rock Music Under Dictatorships Audio Collection at the International Counterculture Archive, Global Resources Center, of the George Washington University Libraries. He works with commercial recordings and, at the roundtable, he talked about cataloging issues and the very thorny problem of copyright. The collection that Yoffe described consists of over 700 audio recordings in CD format. Its thematic scope covers historical rock, folk, jazz, and avant-garde music from the countries that suffered either dictatorial or culturally abusive and conservative political regimes. The time period during which the recordings were made begins in the late 1960s, concentrates on the 1970s-1990s, and includes a few recordings made in the first decade of the twenty-first century. The geographical scope of the collection encompasses the USSR, Russia, Ukraine, Moldova, Armenia, Uzbekistan, Estonia, Poland, the Czech and Slovak Republics, Slovenia, Croatia, Macedonia, Serbia, Hungary, and Turkey. Some recordings from Latin America, specifically from Argentina and Uruguay, are also included, as are a few items from Western Europe, notably from Denmark and Finland. The inclusion of Denmark and Finland is justified by the fact that, although these countries did not experience dictatorial or authoritarian regimes, when rock music was first being introduced into this area in the 1960s, it faced culturally conservative resistance, forcing musicians to put special effort into overcoming negativity and rejection. The Rock Music Under Dictatorships collection is held entirely in CD format for copyright reasons. All items in the collection are cataloged and placed in the library stacks; they are available for overnight checkout. The CDs in the collection are cataloged in two ways: initial cataloging was done individually with each CD receiving an individual cataloging record and accession number. Each record processed in this manner was given appropriate Library of Congress subject headings. The recordings are searchable by the name of the artist: Machine Readable Cataloging (MARC) field 100, or by band (MARC field 110), or by the name of the album (MARC field 245). Where appropriate, author-added entries are given (MARC field 700 or 710 for a band or artist). Approximately half of the collection was processed in this way. The cataloging method described above proved labor-intensive and time-consuming. Therefore, for the sake of expediency, Yoffe changed his approach, switching to placing the remaining items into groups for collection-level cataloging. This means that the CDs were grouped either according to the country of origin or the type of music. For each collection, generic titles were used. Examples include: “17 rock bands from Slovenia,” “24 rock music bands from Croatia, Serbia and Bosnia,” “Punk rock music from the Czech Republic” and “Avant-garde jazz from Russia.” These collection-level titles were placed in field 245 of the MARC record. To ensure that the materials could be searched in the most detailed way possible, a numbered list of individual titles and the names of authors or bands was given in MARC record field 505 (table of contents). Such a field typically includes up to thirty entries. Thus, in addition to MARC fields 700 and/or 710, there are entries for musicians and bands listed in field 505. Individual album titles appear in MARC field 740. This system makes each author, band, or album title individually searchable in the library catalog. Each CD has an individual item record (entered by the Resource Description Department). CDs were also physically processed by the Acquisitions Department where they were labeled and re-boxed in protected, lockable jewel boxes. The result is that an individual catalog search (by title, author, band) will pull up the collection within which the required item can be found. Because each individual album within the collection is assigned an accession number, any desired item can be easily retrieved from the stacks. Processing CDs on the collection level, Yoffe reported, turned out to be a most effective approach, saving time and providing uniformity and contextual clarity. While time savings were not substantial, the “findability” of each item proved excellent, leading Yoffe to recommend collection level cataloging to anyone dealing with similar materials. A positive by-product of the project for the International Counterculture Archive was the creation of new and close collaborative ties with the Resource Description and Acquisition departments of the library. Collection-level cataloging proved to be a feasible alternative to bulky cataloging projects of the type used at the beginning of the Rock Music Under Dictatorships Audio work, especially when dealing with repetitive items. As a result, the approach is being more frequently implemented in the practices of other areas at George Washington University Libraries. Cataloging of physical items, namely individual CDs, was the result of copyright considerations. Copyright is an ongoing issue for libraries. Most university libraries, including the one at George Washington University, prohibit patrons from copying and distributing the content of their collections, even as there are no practical ways of enforcing this rule. Yoffe’s team, like others, is contemplating the still greater copyright challenge that will arise when their institution decides to move toward digitizing their sound collections and making them available online. University legal councils, including Yoffe’s own, warn that copyright infringement issues are many and require extra caution. For the moment, George Washington University Libraries follow the lead of the Library of Congress and allow scholars to use copyright restricted materials on an individual and as-needed basis via temporary, password protected access. Copyright issues were the part of Yoffe’s presentation that elicited the most heated discussion by the panel, and the audience and Yoffe wishes to offer special thanks to Janice Pilch, the Copyright Librarian at Rutgers University Library, for her suggestions and comments. Anthony James Qualin’s presentation also dealt with a database of sound recordings made by others. Although the archive does contain a small percentage of commercial products, the sound files are, for the most part, not commercial recordings. What predominates are Vladimir Vysotsky’s studio recordings and recordings made at his many concerts. While the studio recordings tend to be of good quality, the hundreds of performances and thousands of smaller audio files captured during Vysotsky’s concerts, then digitized and made available on the Internet by collectors and researchers vary a great deal. Some were recorded directly through concert soundboard, while others were made with amateur recording equipment and microphones located somewhere in the audience and off-stage. The positive aspect of including all this material, regardless of quality, is that it makes almost two thirds of Vysotsky’s known performances accessible. The negative aspect, in addition to inconsistent sound quality, is the difficulty of navigation. In his talk, Qualin tried to make the Vysotsky recordings more accessible to researchers by describing two resources that make the task of searching the archive more manageable. The first of these is Svetlozar Kovachev’s index to the database. 1 The index allows researchers to search songs by their first lines. Many of the song lists also contain links to written transcripts of the songs. The second resource is a collection of transcribed lyrics from nearly all the recordings. These transcriptions were compiled by Andrei ‘Andy’ Ivanov under the title “Geneses,” a title chosen to underscore the fact this compendium allows researchers to trace changes in song lyrics over the course of Vysotsky’s career. Unfortunately, Geneses is not available at the time of this writing. Qualin’s purpose in presenting the Vysotsky digital archives and encouraging scholars to use them comes from his conviction that, even some forty years after his death, Vladimir Vysotsky remains one of the most significant cultural figures of the late Soviet period. While in Russia he has come to be viewed as a leading poet of the era, his verse receives little attention in English language publications, and even major works on guitar poetry tend to contain qualifications about Vysotsky’s abilities as a poet. The groundbreaking English language study by Gerald Stanton Smith, Songs to Seven Strings , may have set the tendency toward skepticism about Vysotsky’s stature as a poet. Smith’s evaluation of Vysotsky’s talent contains such caveats as “Undeniably, Vysotsky is limited,” and “Whether Vysotsky was a phenomenon whose significance will not outlast his epoch – the late sixties and the seventies – cannot be predicted.” 2 Perhaps another reason that Vysotsky’s work is underrepresented in English language scholarship is that in the West, this artist is still viewed as an underground phenomenon, one too amorphous to be the subject of systematic study. Another problem with Vysotsky’s material is again the issue of copyright. Vysotsky’s heirs own the rights to his recordings but are not highly active in their distribution. In fact, they do not seem to have any interest in publishing a complete set of the poet’s collected works or in distributing his art in other ways. Neither do they seem concerned with challenging the fair use of the Vysotsky audio files for research purposes. There was an instance in 2003–2004 when Nikita Vysotsky, the poet’s son, sued a collector for distributing a Vladimir Vysotsky compendium on disc. The charge against Genadii Silkin, the man who issued the Vysotsky compendium, was not the illegal recording of the poet’s performances, but his attempts to profit from their sales. Silkin denied the charges and, during the trial, Nikita Vysotsky is reported to have asserted that such complete collections of his father’s works, among which many recordings are of poor sound quality, do not have any cultural value. As should be obvious from Qualin’s efforts to promote Vysotsky scholarship, this is a statement with which he strongly disagrees. 3 The major cause of the ambiguity concerning ownership and copyright issues connected to Vysotsky’s oeuvre comes from the artistic situation in effect during the late Soviet period. The government policies of the period meant that Vysotsky had only limited opportunities to make commercial quality studio recordings and to organize his work into albums or printed collections. Only very few of his creations were officially published. 4 What Vysotsky could do, and did with vigor, was organize concert programs numbering in the hundreds. While concert program recordings create the quality problems already mentioned, they do offer scholars some unique research opportunities. Not only can researchers explore the work of an important artist, they can also study the interrelationships among the songs, viewing each concert as a literary cycle of a special, even unique, type. There are many challenges that scholars face in the study of Vysotsky, one being the fact that his archive in RGALI (Russian State Archive of Literature and Art) is off-limits to researchers. Yet what is available online does allow interested scholars to interact with the most important part of the poet’s oeuvre : that which he presented to his concert audience. The major problem with the Vysotsky online archive is its lack of transparency. It is difficult to search when one knows what s/he is looking for and even more challenging when one wants to explore. The easiest way to access the recordings is to open vv.uka.ru and click on Torrent Tracker, which opens a discussion forum. From there, the user should click on “ Fonogrammy iz fonda saita vysotsky.km.ru ,” where there are Torrent files showing digitized recordings organized by year. Once the researcher has gained access to the sound files, Kovachev’s index can be used to find specific recordings. Several features of Kovachev’s work make it especially useful. For one thing, he cataloged songs by their first line because Vysotsky gave multiple names to many songs. For another, Kovachev uses the same identification numbers for concerts that are used in the archive, thus making his index a valuable archive guide. Working with this material, a researcher might want to explore the variations of the song such as “ Ia ne liubliu .” Kovachev’s index reveals that there are 245 recordings of this song in the archive. The researcher can then analyze variations in the song’s lyrics, its proximity to other songs in the concert as a whole, and Vysotsky’s comments about the song before and after its performance. This is only one possible approach. Another productive approach would be to delve into the performative aspects of the work. As noted at the beginning of this article, access to sound opens a rich field for the exploration of works such as Vysotsky’s that cannot be provided on a printed page. The number of recordings of Ia ne liubliu and its relatively high level of variability make it a particularly rich subject for exploration and a good example of what can be done with the Vysotsky archive. What, for example, is the significance of the variants “ No esli nado – vystreliu v upor ” and “ Ia takzhe protiv vystrelov v upor ,” “ I mne ne zhal’ raspiatogo Khrista ” and “ Vot tol’ko zhal’ raspiatogo Khrista ,” or “ V kotoroe boleiu i ne p’iu ” and “ V kotoroe boleiu ili p’iu ”? The poet’s commentary when introducing the song, or avtometaparatekst in Russian, also offers new ways of approaching the material. For example, it shows the extent to which these comments are a recurring part of the concert program: a part of the overall text, though more variable than the body of the songs themselves. There are periods when Vysotsky emphasizes that the song was written for the play Svoi ostrov and, thus, reflects the personality of the character who sings it. However, in many concerts, particularly later in his career, the poet claims the point of view in the song as his own. Of course, the two are not mutually exclusive, but such statements stand in contrast to the commentary to many other songs, in which Vysotsky distances himself from the poetic persona from whose point of view he sings. While Vysotsky’s comments during concerts are frequently taken as authoritative statements about the artist’s songs or his world view, Qualin believes that they warrant a more nuanced study. He feels that they, like the songs they accompany, frequently contain irony and that a detailed study of these texts would reveal that Vysotsky can be as much of a trickster in his commentary as he can be in his songs. Of course, the thoughts of the poet himself and those of a character created by him overlap, even as, at times, Vysotsky distances himself from the poetic persona from whose point of view he sings. These are but examples of important artistic features that can be extracted from the Vysotsky database. In sum, less polished and less carefully organized archives are very much worth exploring, according to Qualin, especially when they contain the production of a major artist whose work has yet to be properly investigated in English language scholarship. While Yoffe and Qualin work with the recordings made by others, Cope and Kononenko deal primarily with their own sound files. The types of sound found in their archives are also different. These are not recordings of the work of major artists produced commercially or by enthusiastic fans. Rather, these two researchers record the speech and stories of ordinary people, materials that would not be sold for profit. Kononenko’s presentation focused on processing sound recordings she made as a member of the Sacral Heritage Documentation Project, or Sanctuary Project for short. This effort was conducted with John Paul Himka and Frances Swyripa, also of the University of Alberta, and was intended as a record of Ukrainian sacral culture on the Canadian Prairies, specifically the provinces of Alberta and Saskatchewan. Manitoba was excluded because a similar documentation effort is being made by that province’s main university. The motivation for this project was two-fold. The prairie provinces are the area where many Ukrainians settled upon arrival in Canada, partially of their own accord and partially at the encouragement of the government. This created the highest concentration of Byzantine rite religious practice outside of Ukraine and made the prairies into a landscape dotted by beautiful, domed churches. The second reason, and the impetus for undertaking this project at this specific time, was the fact that these churches are disappearing at an alarming rate. Changes in farming practices and the use of sophisticated, digitally programmable equipment makes it possible for just a few people to work large tracts of land and, with fewer hands needed on family farms, less physically strenuous office jobs located in urban environments become increasingly attractive. The services provided by small prairie towns are no longer needed because the increase in car ownership makes it easy to drive to cities for goods and services. Rural areas, including small towns, are becoming depopulated, many disappearing altogether. The entire province of predominately rural Saskatchewan is currently populated by just over one million people. Without people to attend and financially support rural churches, more and more of them are closing. Some, especially those belonging to the Ukrainian Orthodox Church of Canada, are burnt down after desacralization. Some are sold and then repurposed, becoming, for example, storage sheds at church-run summer camps. Many are left to decay into the landscape. Keenly aware of this situation, the Sanctuary Team set out to document as much as possible before even greater loss to this tangible and intangible culture heritage. During the project, Himka and Swyripa made visual documentation, photographing the churches themselves, all related buildings such as belfries and church halls, and all church contents such as tetrapods, icons, banners, vestments, etc. The photo documentation of church interiors and contents made the Sanctuary Project unique. Kononenko’s job was to record the intangible aspect of sacred culture. Specifically, she asked for accounts of rituals such as weddings, baptisms, and funerals, stories about holidays such as Christmas and Easter, and any recollections about parish life that her respondents wished to share. Over the ten-year course of the project, the sound files amounted to over two hundred hours, while the number of photographs grew to over 300,000. From the beginning, the purpose of the Sanctuary Project was to build a repository of data that could be used by researchers, the Ukrainian community, and any parties interested in Ukrainian culture and heritage. 5 Also, early in the project, on the advice of Alberta University Library staff, the photo database was separated from the sound file database, because, while potentially damaging parts of the sound files could be ‘silenced,’ allowing the sound files to open for public access, potentially sensitive information could not be removed from the photo repository. In a number of cases, the photographs showed objects that might have tempted thieves and vandals and, with the impossibility of regular oversite of rural church properties, many of which are quite isolated, it was best to keep a full record of church contents in a password protected, University of Alberta controlled website to discourage break-ins by people trolling for potentially valuable items. Kononenko’s job was to organize and present the sound files through a public, library-maintained portal. The programming for the site was done by Omar Rodriguez of the Arts Resource Center at the University of Alberta. The program is constructed in such a way that the user can search by geographical area or by topic. A keyword search then brings up a list of audio points where discussions of the desired topic can be listened to. Clicking on one of these takes the user to that point and, because the files have not been cut, the user can scroll ahead or backwards and listen to the context of the desired discussion. While the program now works well and the database should be open as soon as Kononenko supplies the needed photo illustrations and explanatory text, getting to that point was not easy. 6 Recording quality was a problem. Like some of the files discussed by Qualin, these are not studio recordings. Someone scraping a chair across the floor or yelling to a friend in the background impedes understanding of the speech on the recording. More important were issues of what could and should be included. While Kononenko did not face copyright problems, she did have the burden of deciding what could be made public. Any discussions that implied that valuable objects were present in a particular church were ‘silenced,’ in the sense that access to those sections through the public site was removed. The other sensitive issue was the presence of possibly inappropriate comments about other members of a congregation or its clergy. These comments also were silenced, even though some respondents specifically asked to have their criticisms of others included. However, Silencing meant that, in a sense, Kononenko was practicing censorship. No database can work without metadata to guide the digital search. Kononenko had built an earlier sound file database using the recordings she made in Ukraine and coding them according to the categories and terminology used by the people from whom she collected her data. This worked well because the categories were entirely suitable to the subject matter they represented. At the same time, working with this set of metadata required extensive knowledge of Ukrainian culture and ritual practice. Both the Kule Institute for Advanced Study, one of the granting agencies which supported the Sanctuary Project, and Kononenko herself wanted metadata that would be universally accessible and not restricted to specialists. The International Ethnographic Thesaurus (IET), an initiative of the American Folklore Society and currently part of the United States Library of Congress, provided such terminology. 7 Of course, the more widely applicable the metadata, the more problematic they can be when applied to a specific data set. For example, the IET lacks Slavic terminology. While there are terms for Hispanic and Hindi festivals, and for those of other nationalities, there are practically no Slavic-specific terms. Equally problematic is the fact that the terminology trees within the IET are cumbersome, with some terms needed for the Sanctuary Project, such as “church,” embedded eight layers beneath the top category. The IET is also periodically updated. Such an update occurred in the middle of coding work, confounding the graduate student working on assigning metadata. Furthermore, any standard set of metadata, the IET included, is ambiguous. For example, does “icon” belong in the category of “art object” or in the category of “religious object?” It is both; placing it in one category or the other depends on a scholar’s research interests. Does a ritual meal, such as the Christmas Eve supper still held in most Ukrainian Canadian communities, belong under foodways or under ritual? Kononenko worked with two very knowledgeable doctoral students, Lina Ye and Daria Polianska on the coding. Still, decisions were difficult and chances for inconsistencies high. Kononenko’s preference for the IET was not based solely on the demands of one of her granting agencies. During her study of folkloric, ethnographic, and anthropological scholarship, she had noticed that when examples of a particular feature or phenomenon are given, they are typically drawn from Western Europe or from an exotic location, such as the South Sea Islands. Eastern Europe in general, and Slavic countries in particular, were seldom used as illustrative material. Wanting to remedy this situation and draw more attention to the Slavic world, she decided ensuring Slavic data would come up in any IET search was the perfect way to make non-Slavists aware of, and interested in, Slavic phenomena. Kononenko did struggle with her database, especially with the issue of long and cumbersome terminology chains, and she presented her struggles at several conferences, each time taking a step, or several steps, forward. The lack of Slavic terms was a huge problem and, at one point, Kononenko and her team considered having two databases, one using IET categories and another with Ukrainian vocabulary that would cater to the community wanting to hear their co-ethnics discuss rituals and church-related practices. All problems have now been solved. Discussions with the American Folklore Society and the Library of Congress led Kononenko to go back through her IET coding and abbreviate the long data chains, leaving only the most essential terminology. This decision means her work may not appear in all possible data searches, but it does improve “findability,” to use Yoffe’s term, of material specific to the database. As for missing Slavic terminology, the database double-codes Ukrainian terms, meaning that festivals like Malanka are keyed both as “carnival” and as Malanka . In short, the same point in the recording will be accessed regardless of whether the user searches for “carnival” or “Malanka.” “Carnival” does not adequately describe Malanka , but there are enough common features to justify cross-coding, especially considering the goal of attracting non-Slavists to this material. The database is finally workable and no longer clunky. The only reason it is not currently open to the public is because Kononenko retired and moved to a different province. As soon as she settles in and provides the images and the explanatory text that the database needs, it should be available. Cope’s work also documents interviews with ordinary people, specifically Texas Czechs. Her materials were recorded by her and by several predecessors, notably Svatava Pírková Jakobson, a former professor at the University of Texas at Austin, and Karel Kučera of Charles University in Prague. As an applied linguist and a Czech herself, Cope is concerned with language loss and its profound consequences for biolinguistic diversity and human knowledge. She is interested in capturing and preserving what she can of Texas Czech, a diasporic variety of European Czech on the verge of extinction. Cope’s goal is twofold: to provide a rich record of spoken Texas Czech for scholars examining everything from archaic base dialects, diasporic variety formation, and diachronic language change, to the process of assimilation and language loss that plagues both indigenous and many immigrant language communities; and to serve the Texas Czech community, assuring that young generations of Texas Czechs have access to their language heritage and the voices of their ancestors when this diasporic speech is long gone. Cope began by explaining the sociohistorical context for this work. Most Czech Moravian emigrants to Texas came from the poverty-stricken Eastern Moravian and Northeastern Bohemian regions of Austria-Hungary (after 1918 from the first Czechoslovak republic) between 1870s and 1920s. By 1920, the population of Czech Moravians in Texas was around 50,000. Between the 1920s and 1940s, the numbers of new immigrants from Europe declined, mainly due to the US Emergency Quota Act of 1921 and the Immigration Act of 1924. By the mid-1940s, the language was no longer transmitted in Texas Czech families. The 2010 Texas census recorded over 136,000 persons of “Czech” or “Czechoslovak” descent, of whom only 8,700 were Czech speakers (although the latter number is too broad to capture just the remaining speakers from the historically Czech diaspora in Texas). Cope emphasized that the long history of self-sufficient, relatively isolated communities built around the farms dotting the vast Texas landscape prolonged the lifespan of Texas Czech through the turn of the current century. Today, while the Texas Czech ethnoculture continues to thrive, the number of speakers has declined drastically, and what remains is only a sentimental attachment to the idea of having a heritage language. According to the Expanded Graded Intergenerational Disruption Scale, Texas Czech is best classified as “nearly extinct/critically endangered” 8 and is likely to disappear within five to ten years. Cope began collecting speech recordings from Texas Czechs in 1997. After many return trips to the community and visits to the University of Texas at Austin (UT), she decided to follow in the footsteps of the Texas German Dialect Project and began working with its director, Hans Boas, and with the chair and faculty of the Department of Slavic and Eurasian Studies to create a digital archive for Texas Czechs. The Texas Czech Legacy Project (TCLP) was born in 2012. Because there is a vibrant Texas Czech community in Texas, and because UT has a long history of Czech language teaching and holds a community-initiated Czech endowment, it seemed only logical to create a home for this project at that institution even though the TCLP director, a research associate at UT, lives and works in North Carolina. Cope noted the Project’s main challenges: balancing the goal of serving both scholarly and lay audiences, identifying qualified and interested Czech speakers to transcribe the data, finding time and funding in the face of the massive database, and for the Jakobson collection, piecing together metadata for each unlabeled recording this prolific folklorist collected. Understanding these challenges, the TCLP team set out to create a central place for documenting the language, culture, and history of ethnic Czech Moravians, starting with assembling a digital repository of audio-recordings gathered from Texas Czech speakers between the 1970s and the 2000s. The goal is to create both a scholarly resource for linguists, dialectologists, historians, folklorists, genealogists, and educators as well as a legacy archive for the local community, featuring oral histories delivered in the Texas Czech dialect. 9 The work of the Project is supported by the Czech Endowment of the Department of Slavic and Eurasian Studies at UT, which also covered start-up costs. Additional funding has come from two Humanities Texas media grants and a generous donation from the Texas Czech Heritage Society of South Texas. As Cope reported, the team currently consists of the TCLP director (Lida Cope), programmer (Ryan Miller), and two transcribers on contract (Alžběta Vítková and Lucie Salzmannová). Finding proficient Czech speakers both capable of, and interested in, transcribing Texas Czech was a challenge at first. Since 2016, UT has allowed Cope to employ two to three students from Charles University in Prague through international contracts. The TCLP site is built on Drupal 7 , an open source CMS platform that affords flexibility in constructing the envisioned infrastructure. While Drupal provides all the desired features, the design of a searchable corpus requires programming expertise. With the infrastructure in place, however, a lay editor can upload and edit the content including the voice files and transcriptions. In addition to “the Archive,” the website includes “About,” “Our News,” “References,” useful “Links” for all things Texas Czech, “Who We Are” Contact Us,” and “Giving to TCLP” pages. The archive visits are tracked by Google Analytics. Currently, the TCLP Archive of audio recordings draws from two major databases: eighteen hours recorded in 1986 by Karel Kučera of Charles University and 327+ hours recorded by Cope. All analog tapes from early stages of recording were digitized by the Liberal Arts Instructional Services (LAITS) at UT (the Cope Collection) and by Charles University (the Kučera Collection). All files are semi-structured interviews; the Cope data also include a series of language tasks. In the future, the team will tackle 189 hours recorded in the 1970s-1980s by Svatava Pírková Jakobson (1904-2000) and fifteen hours collected by John Tomecek during his graduate work at UT in 2007–2009. Jakobson’s material, also digitized by LAITS, includes her analog tapes and some of the 800 reel-to-reel tapes stored at the Briscoe Center for American History in Austin and the Texas Czech Heritage and Cultural Center in La Grange, Texas. Apart from being maintained by LAITS, multiple copies of the recordings in the WAVE and MP3 formats are stored on UT Box, which also facilitates collaboration with the Project transcribers, on computer hard drives, and on external portable hard drives. Cope explained that the Project’s purpose and audiences require that data be accessible and interesting to community members, while providing an adequate amount of linguistic detail for researchers. For example, what makes Texas Czech interesting to both linguists and dialectologists is that this diasporic variety of Czech is rooted largely in the dialects spoken in the 19th century Eastern Moravia. On the one hand, dialectologists can focus on its Eastern Moravian traits, particularly those no longer attested in the source language, and linguists can examine its reduced features developed in contact, studying the dynamics of internal and external causes of change in diasporic dialects of Czech and in language contact situations in general. 10 On the other hand, community historians, educators and anyone with an interest in Texas Czech communities will likely wish to listen for content, and most of them will appreciate finding it in English. Therefore, the TCLP Transcription and Translation Guide was developed to strike a balance among the three principle requirements: accuracy, authenticity, and accessibility. The transcripts are broadly accessible, using the modified Standard Czech orthography that offers phonetic transcription only for select dialectal variants. Cope reported that it takes at least sixty hours to process one hour-long voice file. Each recording is divided into one to three-minute segments, based on topic. This choice was made to suit both types of audiences as it increases the “findability” of bits of interviews that address one’s topic of interest. Transcriptions are completed using the ELAN software, a free transcription tool that enables time-aligned transcription and supports open formats such as WAVE, MPEG1/2 and UNICODE. 11 Once logged into the Archive, the user can browse topics and access the corresponding MP3 files and transcripts. The transcript displays on the website in three layers: the Texas Czech utterance, tokenized English translation with only essential linguistic coding, and free English translation. The user can customize the viewing experience by turning off either language (e.g. reading only the English translation), and by switching from the table mode (drawn from the ELAN transcript generated in the HTML format) to the “Read Along” mode, most suitable for those interested only in content. One can listen and read along or just read selected transcripts. Each recording is coded and linked to essential metadata: “segment information” includes the collection, interviewer, event type, date of recording, and location; “interviewee information” includes the speaker number (embedded in the code identifying each segment), gender, age when recorded, birth year, and childhood residence. Users new to the TCLP archive have the option to take a live tour prior to conducting their first searches. They can search by topic (e.g. reading Czech newspapers, farming and picking cotton, or Easter), specific linguistic features, or facets drawn from the metadata listed above. Speaker locations at the time of interview can be viewed on a map. The interviewees (barring a few exceptions depending on whether consent was given) remain anonymous. Most participated when the idea of sharing oral histories in digital archives did not exist, and some when interview consent forms were not customary. Ultimately, the TCLP will offer a searchable corpus of both spoken and written word in Texas Czech and about Texas Czechs. A visual archive will have print materials, such the Texas Czech newspapers Svoboda (1885–1966), photographs (starting with Texas Czech tombstone inscriptions), letters, memoirs, songbooks, and other historical documents. The Texas Czech Legacy Project is a long-term undertaking. Processing data with this amount of attention to detail is enormously time-consuming, which affects the pace of adding recorded sound and transcripts to the archive. In the meantime, Cope emphasized the importance of keeping the collected data safe and organized for future generations of documentary linguists and Texas Czech enthusiasts wishing to continue building TCLP as a virtual sociolinguistic space for community learning, education, and multidisciplinary scholarly research. The experiences of the four authors contributing to this article demonstrate that having an enormous volume of data is both a boon and a burden. If all data are to be made accessible, then old classification methods need to be simplified, as in Yoffe’s case, and even new metadata must be kept to a minimum, as Kononenko discovered. Cope has retained her commitment to full transcription and translation, but she may decide that she too needs to simplify data processing to make a meaningful volume of her recorded sound accessible. Qualin faced the special situation of a database built by enthusiastic amateurs rather than professionals. Both he and others are creating work-arounds external to the Vysotsky archive that make it more user-friendly. The other issue that dominated the round table was the problem of what can be made public. Digital presentation through the internet affords access to a potentially very large audience. Such outsider attention is desirable, especially in projects like Cope’s and Kononenko’s, and it motivates Qualin who hopes to bring more attention to a singer and poet largely neglected in the West. But creating worldwide digital access threatens copyright, as in the case of Yoffe’s work and possibly also in the case of the Vysotsky archive. Ordinary people such as those recorded by Cope and Kononenko do not face threats of copyright infringement, but they may, especially in Kononenko’s case, face theft of tangible property if too much information about their possessions is revealed. Furthermore, the person in charge of the database faces the difficult choice of retaining or silencing comments that may be hurtful to others. We, the contributors to this article, hope this account of our experiences with sound file archiving and digital website maintenance can help others involved in the creation, coding, and maintenance of sound archives. This is a rapidly evolving field with much potential, and we hope that we have made a small contribution to its advancement. Publisher Copyright: ©, Published with license by Taylor & Francis. © Lida Cope, Natalie Kononenko, Anthony Qualin and Mark Yoffe. ©, © Lida Cope, Natalie Kononenko, Anthony Qualin and Mark Yoffe.
PY - 2019/10/2
Y1 - 2019/10/2
N2 - Sound files offer the advantage of access to a large volume of data and speech features that cannot be captured on paper. Sound recordings can preserve a diasporic dialect on the verge of extinction. Cataloging sound files and making them accessible to both the lay community and professionals present new challenges. Collection-level, rather than item-level cataloging helps to deal with commercial music recordings. Double coding, using both the International Ethnographic Thesaurus and specifically Slavic metadata, can make field recordings accessible to both the Slavic community and specialists. Archives of important artists created by amateurs need scholars to generate finding aids.
AB - Sound files offer the advantage of access to a large volume of data and speech features that cannot be captured on paper. Sound recordings can preserve a diasporic dialect on the verge of extinction. Cataloging sound files and making them accessible to both the lay community and professionals present new challenges. Collection-level, rather than item-level cataloging helps to deal with commercial music recordings. Double coding, using both the International Ethnographic Thesaurus and specifically Slavic metadata, can make field recordings accessible to both the Slavic community and specialists. Archives of important artists created by amateurs need scholars to generate finding aids.
KW - Cataloging
KW - copyright
KW - finding aids
KW - language documentation
KW - metadata selection
UR - http://www.scopus.com/inward/record.url?scp=85076432796&partnerID=8YFLogxK
U2 - 10.1080/15228886.2019.1694373
DO - 10.1080/15228886.2019.1694373
M3 - Article
AN - SCOPUS:85076432796
VL - 20
SP - 85
EP - 100
JO - Slavic and East European Information Resources
JF - Slavic and East European Information Resources
SN - 1522-8886
IS - 3-4