Antoniou, M. (2018). Text analytics & topic modelling on music genres song lyrics. Towards Data Science. https://towardsdatascience.com/text-analytics-topic-modelling-on-music-genres-song-lyrics-deb82c86caa2
Uses a koggle dataset of 380k songs since 1970 and analyzes various characteristics of songs based on their genre/comparisons of genre. Shows that jazz has the lowest average number of lyrics and the lowest median length. Uses box plots and word cloud (of the lyrics from the top genres) as visualizations of findings.
Franzke, A.S., Bechmann, A., Zimmer, M., & Ess, C. M. (2020). Internet Research: Ethical Guidelines 3.0. Association of Internet Researchers. https://aoir.org
The Association of Internet Researchers (AoIR) is a professional organization of academics and other stakeholders who develop methods and approaches for internet-based research. They have published guidelines for researchers to consider when studying born-digital content, particularly its implications as human-subjects research. These guidelines will be useful if we choose to collect information about the users who contribute to Genius.com or about their specific comments.
Hartmann, J., Huppertz, J., Schamp, C., & Heitmann, M. (2019). Comparing automated text classification methods. International Journal of Research in Marketing, 36(1), 20–38. https://doi.org/10.1016/j.ijresmar.2018.09.009
This article focuses on marketing research applications but is concerned with the effectiveness of different approaches/methods of text classification on unstructured text data. Word choice classification methods and sentiment analysis methods in particular. Naïve Bayes works well with small samples of unstructured texts. Lexicon-based methods aren’t traditionally in marketing research.
Janicke, S., Franzini, G., Cheema, M. F., & Scheuermann, G. (2017). Visual Text Analysis in Digital Humanities, 36(6), 226-250. https://doi.org/10.1016/j.ijresmar.2018.09.009
Discussion of existing research relating to visualization process and techniques for close reading (annotations) and distant reading (abstraction) in the DH (text sources, data transformation steps, types of visualizations used).
Pre-processing steps include: XML-based TEI, and XMLS stylesheets to transform TEI into visualizations. Tokenization and normalization (chinking and frequency analysis). POS-Parts of Speech tagging. NER-Named Entity Recognition for people and place names. Topic modeling (LDA most popular).
For close reading, the structure of the text is generally retained so can do deep analysis and compare text editions and linguistic patters. Visualizations use color, font size, glyphs, and connections. For distant reading the summaries of information about corpora is important and the visualizations use heat maps. Maps. Tag/word cloud. Timelines. Graphs.
Lin, J., Milligan, I., Oard, D. W., Ruest, N., & Shilton, K. (2020). We Could, but Should We?: Ethical Considerations for Providing Access to GeoCities and Other Historical Digital Collections. Proceedings of the 2020 Conference on Human Information Interaction and Retrieval, 135–144. https://doi.org/10.1145/3343413.3377980
This article considers the ethical issues of web archiving, particularly when it comes to researchers that want to work with older content. The authors suggest that research that uses certain online data sets, particularly the archived collection of GeoCities websites, necessarily changes the context of publication. Despite being published on the public web, GeoCites were once relatively private and making the archive freely available could disclose personal information in a way that had never been anticipated. While our use of the Genius.com database does not raise the same issues of private information, this article serves as an important reminder that by using publicly available data we are nevertheless transforming it and changing the context of publication.
Moser, S. (2007). Media modes of poetic reception: Reading lyrics versus listening to songs. Poetics, 35(4), 277–300. https://doi.org/10.1016/j.poetic.2007.01.002
In this article, Moser argues that “songs are a multisensorial mode of linguistic communication” and therefore analyses of how song texts are received may consider many factors. Although lyrics exist in multiple modalities, such as oral, printed, and audiovisual forms, most analyses of lyrics follow traditional methods of textual analysis. Moser suggests that lyrics that have been separated from their melody and vocal reproduction does not necessarily represent the full song text. This is an important caveat for our project, where lyrics specifically take a central role. While we may be able to describe broad trends in jazz lyrics, we must be weary of overgeneralizing our findings as representative of jazz music more
Myers, M. (2013). Why Jazz Happened. University of California Press.
This book provides a social history of the mid century, specifically 1942-1972, jazz that connects changes in style to changes in the music industry, and in American culture at large. The narrative focuses almost entirely on the major commercial and technological forces that allowed jazz to be recorded and broadcast. Some of the major extra-musical factors that made this possible include developments in business, technology, the economy, demographics, and race relations. Myers accomplishes this by using a combination of sources, much pulled from insider interviews Myers personally conducted during the years 2008-2011 with performers, producers, and many others within the industry. In our encoding of song lyrics, various words and phrases found in these songs that correlate to specific social history events can be correctly tagged and/or noted.
Rhody, L. M. (2012). Topic Modeling and Figurative Language. Journal of Digital Humanities, 2(1). http://journalofdigitalhumanities.org/2-1/topic-modeling-and-figurative-language-by-lisa-m-rhody/
LDA looks at a finite number of topics within a corpus of texts. Topic modeling of figurative texts does not produce topics with the same clarity as non-fiction or academic text in general (how would this work for song lyrics?). Can not apply labels to topics in the same way based on our assumption that the topics or “thematic” especially if you know the texts and are pre-supposed to reading it a certain way (e.g. the meanings are more fluid in figurative texts). Topics are representations of discourse rather than thematics strong of coherent terms (language as it is used and is it participates in recognized social forms) = TYPES OF TOPICS. Then examine the docs/samples of docs that the model tells you apply to each type of discourse to see what they tell you about the generated topic.
Rustin-Paschal, N., & Tucker, S. (Eds.). (2008). Big ears: Listening for gender in jazz studies. Duke University Press.
Tucker and Rustin-Paschal put together a collection of articles by eminent scholars in multiple disciplines, all centered around the idea of jazz and gender. Various articles cover women and men, masculinity and femininity, race, class, and space in varied ways. Specifically with the article, "Separated at 'Birth': Singing and the History of Jazz", the author critiques the ways in which singing (gendered female) and actual female singers have been removed from the genre and history of jazz in favor of the dominance of male-coded instrumentalism. By our DH project focusing (almost) exclusively on song lyrics as a textual analysis, we are re-emphasizing the importance of singers and their significant role within the jazz genre. For our own use of this book, not all articles will be used since the chapters included go beyond our own scope such as Ursel Schlicht's article on women musicians and audiences in post-war Germany.
Stratton, V. N., & Zalanowski, A. H. (1994). Affective Impact of Music Vs. Lyrics. Empirical Studies of the Arts, 12(2), 173–184. https://doi.org/10.2190/35T0-U4DT-N09Q-LQHW
This article demonstrates that lyrics alone can evoke different affective responses, as compared to music alone or music and lyrics combined. This research demonstrates the value in a project such as ours, which focuses solely on the lyrics of music. However, it also rightly reminds us that lyrics have a different impact on individuals when they are separated from their original music. Therefore, our project must be precise in how we describe our methodologies, as well as recognize the limited scope of our analysis.
Sugimoto, G. (2019). Introduction to Populating a Website with API Data. Programming Historian. https://programminghistorian.org/en/lessons/introduction-to-populating-a-website-with-api-data
The article provides information on integrating API data onto a webpage. It walks through the steps of registering for an API by utilizing the “European” API as an example. Then, it provides the steps to set up a virtual server using XAMPP and goes over basic HTM/PHP syntax. Finally, it walks through the process of integrating a JSON file from the API data in the previous example, into a new web page.This is a useful tutorial if our group decides to display elements of our project onto a webpage. While PHP can become complicated, this particular use of it shouldn’t be too difficult.
Tucker, S. (1999). Telling Performances: Jazz History Remembered and Remade by the Women in the Band. Oral History Review, 26(1), 67–84. https://doi.org/10.1093/ohr/26.1.67
In this article, author Sherrie Tucker argues that more attention should be given to the oral histories of women jazz musicians. By doing so, jazz historians would gain a more complex understanding of the history of jazz and the contributions that women have made to the genre. Tucker contrasts this proposed methodology with the contemporary scholarship that focuses on “favored artistes” and “superior genres.” An interesting feminist historical analysis and critique that highlights the overlooked contributions of women to jazz. The article is also an important reminder that academic methodology can shape the way history is told.
Turkel, W. J., & Crymble, A. (2012). Output Keywords in Context in an HTML File with Python. Programming Historian. https://programminghistorian.org/en/lessons/output-keywords-in-context-in-html-file
This tutorial in Programming Historian describes a method for using python to write data into an HTML file. While the tutorial’s example—a dictionary of n-grams—may be different from what analysis we eventually conduct, the descriptions of writing a python function to wrap data in HTML tags is useful for getting our work into a presentable format.
Walsh, M. (2021). Song Genius Data Collection. In Introduction to Cultural Analytics & Python. https://doi.org/10.5281/ZENODO.4411250
In her textbook, Melanie Walsh describes methods to use Python for Cultural Analytics methods. In this section, she provides a helpful tutorial on how to interact with the Genius.com API via python. The tutorial outlines basic concepts for working with the API, examples for collecting song lyrics, and some initial scripts to being processing song lyrics that have been saved to text files. Most importantly, she introduces John Miller’s Python package LyricsGenius which is a helpful wrapper for interacting with the Genius API.