Character dialogue was fairly easy to find within the game files. The dialogue was stored as a .xnb file, which were converted into .yaml and then into .xml files for analysis.
The files found with character dialogue could be placed into one of four categories:
Throughout the game, the player can interact with characters found in different locations. Characters will typically have one or two lines before dialogue 'runs out' for the day. Character XML has a different file for each character in the game, with metadata on what to say based on day of the week and weather.
Stardew Valley has multiple 2 festivals each season that occur on specific days. Unlike the Character XML, Festival XML include phrases and lines for all characters in one file. Some metadata exists to define the position and direction of characters, as well as festival cutscenes.
Throughout the game, cutscenes will play after certain requirements are met. These cutscenes are tied to the location where they play rather than the characters. Location XML have a lot more metadata per line than Character or Festival XML. The metadata includes the positions, movements, animations, facing directions, and even sounds. Dialogue is often weaved in between metadata.
The Misc XML is a collection of xml files that don't properly fit into any other category. The files are typically structured like Festival XML or Location XML. They includes dialogue spoken by characters after recieving a gift, dialogue from mail, dialogue from signs, rainy-day dialogue, and other lines.
Regex was employed to get properly tagged and cleaned dialogue. Most lines of dialogue were preceded by some metadata, telling what senario a line of dialogue would play. This was wrapped with a <context> element. Dialogue was wrapped with quotation marks which made for easy use of regex.
Metadata would be found outside the text as well as inside it. Text characters like ^, $#e$, or $r signified areas where a dialogue changes based on gender, a new dialogue box appears, or a response can be chosen from a list. Other dialogue metadata like | or {} were less frequent.
When creating our network and SVG graphs, we used xquery. For our network, we had to write xquery that would output the names of the characters, the SPEAKER label, the file in which their name is present, and the name of the location that corresponds to said file. Each of these had to be in that exact order with a TAB between each one in order for it to be ready properly when loaded into Cytoscape. The character names were derived from the auto-tagged name elements in which the type attribute equaled PERSON. For the locations, the names came from the location elements of each file, which each contained a place attribute that equaled the locations's name. These entries would then become a connection on our Cytoscape network.
For our SVG graphs, we had to write xquery that would output a graph featuring each character's name, their corresponding bars on the graph, and the percentages that they represented. For these, the character names came from the who attributes in the dialogue elements that we added to our XML files using regex. The percentages were calculated within our xquery. We also had to oragnize the graph in some way, which we did by both alphabetizing the character names and also ordering them from highest percentage of dialogue to lowest. Finally, we applied a color gradient to the graph so that it becomes lighter as it descends. These graphs are viewable on the Visual Data page.