Erin Wilson, Author at Towards Data Science

Do European M&Ms Actually Taste Better than American M&Ms?

Erin Wilson — Fri, 21 Feb 2025 21:52:58 +0000

(Oh, I am the only one who’s been asking this question…? Hm. Well, if you have a minute, please enjoy this exploratory Data Analysis — featuring experimental design, statistics, and interactive visualization — applied a bit too earnestly to resolve an international debate.)

1. Introduction

1.1 Background and motivation

Chocolate is enjoyed around the world. From ancient practices harvesting organic cacao in the Amazon basin, to chocolatiers sculpting edible art in the mountains of Switzerland, and enormous factories in Hershey, Pennsylvania churning out 70 million kisses per day, the nuanced forms and flavors of chocolate have been integrated into many cultures and their customs. While quality can greatly vary across chocolate products, a well-known, shelf-stable, easily shareable form of chocolate are M&Ms. Readily found by convenience store check-out counters and in hotel vending machines, the brightly colored pellets are a popular treat whose packaging is re-branded to fit nearly any commercializable American holiday.

While living in Denmark in 2022, I heard a concerning claim: M&Ms manufactured in Europe taste different, and arguably “better,” than M&Ms produced in the United States. While I recognized that fancy European chocolate is indeed quite tasty and often superior to American chocolate, it was unclear to me if the same claim should hold for M&Ms. I learned that many Europeans perceive an “unpleasant” or “tangy” taste in American chocolate, which is largely attributed to butyric acid, a compound resulting from differences in how milk is treated before incorporation into milk chocolate.

But honestly, how much of a difference could this make for M&Ms? M&Ms!? I imagined M&Ms would retain a relatively processed/mass-produced/cheap candy flavor wherever they were manufactured. As the lone American visiting a diverse lab of international scientists pursuing cutting-edge research in biosustainability, I was inspired to break out my data science toolbox and investigate this M&M flavor phenomenon.

1.2 Previous work

To quote a European woman, who shall remain anonymous, after she tasted an American M&M while traveling in New York:

“They taste so gross. Like vomit. I don’t understand how people can eat this. I threw the rest of the bag away.”

Vomit? Really? In my experience, children raised in the United States had no qualms about eating M&Ms. Growing up, I was accustomed to bowls of M&Ms strategically placed in high traffic areas around my house to provide readily available sugar. Clearly American M&Ms are edible. But are they significantly different and/or inferior to their European equivalent?

In response to the anonymous European woman’s scathing report, myself and two other Americans visiting Denmark sampled M&Ms purchased locally in the Lyngby Storcenter Føtex. We hoped to experience the incredible improvement in M&M flavor that was apparently hidden from us throughout our youths. But curiously, we detected no obvious flavor improvements.

Unfortunately, neither preliminary study was able to conduct a side-by-side taste test with proper controls and randomized M&M sampling. Thus, we turn to science.

1.3 Study Goals

This study seeks to remedy the previous lack of thoroughness and investigate the following questions:

Is there a global consensus that European M&Ms are in fact better than American M&Ms?
Can Europeans actually detect a difference between M&Ms purchased in the US vs in Europe when they don’t know which one they are eating? Or is this a grand, coordinated lie amongst Europeans to make Americans feel embarrassed?
Are Americans actually taste-blind to American vs European M&Ms? Or can they taste a difference but simply don’t describe this difference as “an improvement” in flavor?
Can these alleged taste differences be perceived by citizens of other continents? If so, do they find one flavor obviously superior?

2. Methods

2.1 Experimental design and data collection

Participants were recruited by luring — er, inviting them to a social gathering (with the promise of free food) that was conveniently co-located with the testing site. Once a participant agreed to pause socializing and join the study, they were positioned at a testing station with a trained experimenter who guided them through the following steps:

Participants sat at a table and received two cups: 1 empty and 1 full of water. With one cup in each hand, the participant was asked to close their eyes, and keep them closed through the remainder of the experiment.
The experimenter randomly extracted one M&M with a spoon, delivered it to the participant’s empty cup, and the participant was asked to eat the M&M (eyes still closed).
After eating each M&M, the experimenter collected the taste response by asking the participant to report if they thought the M&M tasted: Especially Good, Especially Bad, or Normal.
Each participant received a total of 10 M&Ms (5 European, 5 American), one at a time, in a random sequence determined by random.org.
Between eating each M&M, the participant was asked to take a sip of water to help “cleanse their palate.”
Data collected: for each participant, the experimenter recorded the participant’s continent of origin (if this was ambiguous, the participant was asked to list the continent on which they have the strongest memories of eating candy as a child). For each of the 10 M&Ms delivered, the experimenter recorded the M&M origin (“Denmark” or “USA”), the M&M color, and the participant’s taste response. Experimenters were also encouraged to jot down any amusing phrases uttered by the participant during the test, recorded under notes (data available here).

2.2 Sourcing materials and recruiting participants

Two bags of M&Ms were purchased for this study. The American-sourced M&Ms (“USA M&M”) were acquired at the SFO airport and delivered by the author’s parents, who visited her in Denmark. The European-sourced M&Ms (“Denmark M&M”) were purchased at a local Føtex grocery store in Lyngby, a little north of Copenhagen.

Experiments were conducted at two main time points. The first 14 participants were tested in Lyngby, Denmark in August 2022. They mostly consisted of friends and housemates the author met at the Novo Nordisk Foundation Center for Biosustainability at the Technical University of Denmark (DTU) who came to a “going away party” into which the experimental procedure was inserted. A few additional friends and family who visited Denmark were also tested during their travels (e.g. on the train).

The remaining 37 participants were tested in Seattle, WA, USA in October 2022, primarily during a “TGIF happy hour” hosted by graduate students in the computer science PhD program at the University of Washington. This second batch mostly consisted of students and staff of the Paul. G. Allen School of Computer Science & Engineering (UW CSE) who responded to the weekly Friday summoning to the Allen Center atrium for free snacks and drinks.

Figure 1. Distribution of participants recruited to the study. In the first sampling event in Lyngby, participants primarily hailed from North America and Europe, and a few additionally came from Asia, South America, or Australia. Our second sampling event in Seattle greatly increased participants, primarily from North America and Asia, and a few more from Europe. Neither event recruited participants from Africa. Figure made with Altair.

While this study set out to analyze global trends, unfortunately data was only collected from 51 participants the author was able to lure to the study sites and is not well-balanced nor representative of the 6 inhabited continents of Earth (Figure 1). We hope to improve our recruitment tactics in future work. For now, our analytical power with this dataset is limited to response trends for individuals from North America, Europe, and Asia, highly biased by subcommunities the author happened to engage with in late 2022.

2.3 Risks

While we did not acquire formal approval for experimentation with human test subjects, there were minor risks associated with this experiment: participants were warned that they may be subjected to increased levels of sugar and possible “unpleasant flavors” as a result of participating in this study. No other risks were anticipated.

After the experiment however, we unfortunately observed several cases of deflated pride when a participant learned their taste response was skewed more positively towards the M&M type they were not expecting. This pride deflation seemed most severe among European participants who learned their own or their fiancé’s preference skewed towards USA M&Ms, though this was not quantitatively measured and cannot be confirmed beyond anecdotal evidence.

3. Results & Discussion

3.1 Overall response to “USA M&Ms” vs “Denmark M&Ms”

3.1.1 Categorical response analysis — entire dataset

In our first analysis, we count the total number of “Bad”, “Normal”, and “Good” taste responses and report the percentage of each response received by each M&M type. M&Ms from Denmark more frequently received “Good” responses than USA M&Ms but also more frequently received “Bad” responses. M&Ms from the USA were most frequently reported to taste “Normal” (Figure 2). This may result from the elevated number of participants hailing from North America, where the USA M&M is the default and thus more “Normal,” while the Denmark M&M was more often perceived as better or worse than the baseline.

^{Figure 2. Qualitative taste response distribution across the whole dataset. The percentage of taste responses for “Bad”, “Normal” or “Good” was calculated for each type of M&M. Figure made with Altair.}

Now let’s break out some Statistics, such as a chi-squared (X2) test to compare our observed distributions of categorical taste responses. Using the scipy.stats chi2_contingency function, we built contingency tables of the observed counts of “Good,” “Normal,” and “Bad” responses to each M&M type. Using the X2 test to evaluate the null hypothesis that there is no difference between the two M&Ms, we found the p-value for the test statistic to be 0.0185, which is significant at the common p-value cut off of 0.05, but not at 0.01. So a solid “maybe,” depending on whether you’d like this result to be significant or not.

3.1.2 Quantitative response analysis — entire dataset.

The X2 test helps evaluate if there is a difference in categorical responses, but next, we want to determine a relative taste ranking between the two M&M types. To do this, we converted taste responses to a quantitative distribution and calculated a taste score. Briefly, “Bad” = 1, “Normal” = 2, “Good” = 3. For each participant, we averaged the taste scores across the 5 M&Ms they tasted of each type, maintaining separate taste scores for each M&M type.

Figure 3. Quantitative taste score distributions across the whole dataset. Kernel density estimation of the average taste score calculated for each participant for each M&M type. Figure made with Seaborn.

With the average taste score for each M&M type in hand, we turn to scipy.stats ttest_ind (“T-test”) to evaluate if the means of the USA and Denmark M&M taste scores are different (the null hypothesis being that the means are identical). If the means are significantly different, it would provide evidence that one M&M is perceived as significantly tastier than the other.

We found the average taste scores for USA M&Ms and Denmark M&Ms to be quite close (Figure 3), and not significantly different (T-test: p = 0.721). Thus, across all participants, we do not observe a difference between the perceived taste of the two M&M types (or if you enjoy parsing triple negatives: “we cannot reject the null hypothesis that there is not a difference”).

But does this change if we separate participants by continent of origin?

3.2 Continent-specific responses to “USA M&Ms” vs “Denmark M&Ms”

We repeated the above X2 and T-test analyses after grouping participants by their continents of origin. The Australia and South America groups were combined as a minimal attempt to preserve data privacy. Due to the relatively small sample size of even the combined Australia/South America group (n=3), we will refrain from analyzing trends for this group but include the data in several figures for completeness and enjoyment of the participants who may eventually read this.

3.2.1 Categorical response analysis — by continent

In Figure 4, we display both the taste response counts (upper panel, note the interactive legend) and the response percentages (lower panel) for each continent group. Both North America and Asia follow a similar trend to the whole population dataset: participants report Denmark M&Ms as “Good” more frequently than USA M&Ms, but also report Denmark M&Ms as “Bad” more frequently. USA M&Ms were most frequently reported as “Normal” (Figure 4).

On the contrary, European participants report USA M&Ms as “Bad” nearly 50% of the time and “Good” only 18% of the time, which is the most negative and least positive response pattern, respectively (when excluding the under-sampled Australia/South America group).

^{Figure 4. Qualitative taste response distribution by continent. Upper panel: counts of taste responses — click the legend to interactively filter! Lower panel: percentage of taste responses for each type of M&M. Figure made with Altair.}

This appeared striking in bar chart form, however only North America had a significant X2 p-value (p = 0.0058) when evaluating each continent’s difference in taste response profile between the two M&M types. The European p-value is perhaps “approaching significance” in some circles, but we’re about to accumulate several more hypothesis tests and should be mindful of multiple hypothesis testing (Table 1). A false positive result here would be devastating.

When comparing the taste response profiles between two continents for the same M&M type, there are a couple interesting notes. First, we observed no major taste discrepancies between all pairs of continents when evaluating Denmark M&Ms — the world seems generally consistent in their range of feelings about M&Ms sourced from Europe (right column X2 p-values, Table 2). To visualize this comparison more easily, we reorganize the bars in Figure 4 to group them by M&M type (Figure 5).

^{Figure 5. Qualitative taste response distribution by M&M type, reported as percentages. (Same data as Figure 4 but re-arranged). Figure made with Altair.}

However, when comparing continents to each other in response to USA M&Ms, we see larger discrepancies. We found one pairing to be significantly different: European and North American participants evaluated USA M&Ms very differently (p = 0.000007) (Table 2). It seems very unlikely that this observed difference is by random chance (left column, Table 2).

3.2.2 Quantitative response analysis — by continent

We again convert the categorical profiles to quantitative distributions to assess continents’ relative preference of M&M types. For North America, we see that the taste score means of the two M&M types are actually quite similar, but there is a higher density around “Normal” scores for USA M&Ms (Figure 6A). The European distributions maintain a bit more of a separation in their means (though not quite significantly so), with USA M&Ms scoring lower (Figure 6B). The taste score distributions of Asian participants is most similar (Figure 6C).

Reorienting to compare the quantitative means between continents’ taste scores for the same M&M type, only the comparison between North American and European participants on USA M&Ms is significantly different based on a T-test (p = 0.001) (Figure 6D), though now we really are in danger of multiple hypothesis testing! Be cautious if you are taking this analysis at all seriously.

Figure 6. Quantitative taste score distributions by continent. Kernel density estimation of the average taste score calculated for each each continent for each M&M type. A. Comparison of North America responses to each M&M. B. Comparison of Europe responses to each M&M. C. Comparison of Asia responses to each M&M. D. Comparison of continents for USA M&Ms. E. Comparison of continents for Denmark M&Ms. Figure made with Seaborn.

At this point, I feel myself considering that maybe Europeans are not just making this up. I’m not saying it’s as dramatic as some of them claim, but perhaps a difference does indeed exist… To some degree, North American participants also perceive a difference, but the evaluation of Europe-sourced M&Ms is not consistently positive or negative.

3.3 M&M taste alignment chart

In our analyses thus far, we did not account for the baseline differences in M&M appreciation between participants. For example, say Person 1 scored all Denmark M&Ms as “Good” and all USA M&Ms as “Normal”, while Person 2 scored all Denmark M&Ms as “Normal” and all USA M&Ms as “Bad.” They would have the same relative preference for Denmark M&Ms over USA M&Ms, but Person 2 perhaps just does not enjoy M&Ms as much as Person 1, and the relative preference signal is muddled by averaging the raw scores.

Inspired by the Lawful/Chaotic x Good/Evil alignment chart used in tabletop role playing games like Dungeons & Dragons©, in Figure 7, we establish an M&M alignment chart to help determine the distribution of participants across M&M enjoyment classes.

Figure 7. M&M enjoyment alignment chart. The x-axis represents a participant’s average taste score for USA M&Ms; the y-axis is a participant’s average taste score for Denmark M&Ms. Figure made with Altair.

Notably, the upper right quadrant where both M&M types are perceived as “Good” to “Normal” is mostly occupied by North American participants and a few Asian participants. All European participants land in the left half of the figure where USA M&Ms are “Normal” to “Bad”, but Europeans are somewhat split between the upper and lower halves, where perceptions of Denmark M&Ms range from “Good” to “Bad.”

An interactive version of Figure 7 is provided below for the reader to explore the counts of various M&M alignment regions.

^{Figure 7 (interactive): click and brush your mouse over the scatter plot to see the counts of continents in different M&M enjoyment regions. Figure made with Altair.}

3.4 Participant taste response ratio

Next, to factor out baseline M&M enjoyment and focus on participants’ relative preference between the two M&M types, we took the log ratio of each person’s USA M&M taste score average divided by their Denmark M&M taste score average.

Equation 1: Equation to calculate each participant’s overall M&M preference ratio.

As such, positive scores indicate a preference towards USA M&Ms while negative scores indicate a preference towards Denmark M&Ms.

On average, European participants had the strongest preference towards Denmark M&Ms, with Asians also exhibiting a slight preference towards Denmark M&Ms (Figure 8). To the two Europeans who exhibited deflated pride upon learning their slight preference towards USA M&Ms, fear not: you did not think USA M&Ms were “Good,” but simply ranked them as less bad than Denmark M&Ms (see participant_id 4 and 17 in the interactive version of Figure 7). If you assert that M&Ms are a bad American invention not worth replicating and return to consuming artisanal European chocolate, your honor can likely be restored.

Figure 8. Distribution of participant M&M preference ratios by continent. Preference ratios are calculated as in Equation 1. Positive numbers indicate a relative preference for USA M&Ms, while negative indicate a relative preference for Denmark M&Ms. Figure made with Seaborn.

North American participants are pretty split in their preference ratios: some fall quite neutrally around 0, others strongly prefer the familiar USA M&M, while a handful moderately prefer Denmark M&Ms. Anecdotally, North Americans who learned their preference skewed towards European M&Ms displayed signals of inflated pride, as if their results signaled posh refinement.

Overall, a T-test comparing the distributions of M&M preference ratios shows a possibly significant difference in the means between European and North American participants (p = 0.049), but come on, this is like the 20th p-value I’ve reported — this one is probably too close to call.

3.5 Taste inconsistency and “Perfect Classifiers”

For each participant, we assessed their taste score consistency by averaging the standard deviations of their responses to each M&M type, and plotting that against their preference ratio (Figure 9).

^{Figure 9. Participant taste consistency by preference ratio. The x-axis is a participant’s relative M&M preference ratio. The y-axis is the average of the standard deviation of their USA M&M scores and the standard deviation of their Denmark M&M scores. A value of 0 on the y-axis indicates perfect consistency in responses, while higher values indicate more inconsistent responses. Figure made with Altair.}

Most participants were somewhat inconsistent in their ratings, ranking the same M&M type differently across the 5 samples. This would be expected if the taste difference between European-sourced and American-sourced M&Ms is not actually all that perceptible. Most inconsistent were participants who gave the same M&M type “Good”, “Normal”, and “Bad” responses (e.g., points high on the y-axis, with wider standard deviations of taste scores), indicating lower taste perception abilities.

Intriguingly, four participants — one from each continent group — were perfectly consistent: they reported the same taste response for each of the 5 M&Ms from each M&M type, resulting in an average standard deviation of 0.0 (bottom of Figure 9). Excluding the one of the four who simply rated all 10 M&Ms as “Normal”, the other three appeared to be “Perfect Classifiers” — either rating all M&Ms of one type “Good” and the other “Normal”, or rating all M&Ms of one type “Normal” and the other “Bad.” Perhaps these folks are “super tasters.”

3.6 M&M color

Another possible explanation for the inconsistency in individual taste responses is that there exists a perceptible taste difference based on the M&M color. Visually, the USA M&Ms were noticeably more smooth and vibrant than the Denmark M&Ms, which were somewhat more “splotchy” in appearance (Figure 10A). M&M color was recorded during the experiment, and although balanced sampling was not formally built into the experimental design, colors seemed to be sampled roughly evenly, with the exception of Blue USA M&Ms, which were oversampled (Figure 10B).

Figure 10. M&M colors. A. Photo of each M&M color of each type. It’s perhaps a bit hard to perceive on screen in my unprofessionally lit photo, but with the naked eye, USA M&Ms seemed to be brighter and more uniformly colored while Denmark M&Ms have a duller and more mottled color. Is it just me, or can you already hear the Europeans saying “They are brighter because of all those extra chemicals you put in your food that we ban here!” B. Distribution of M&Ms of each color sampled over the course of the experiment. The Blue USA M&Ms were not intentionally oversampled — they must be especially bright/tempting to experimenters. Figure made with Altair.

We briefly visualized possible differences in taste responses based on color (Figure 11), however we do not believe there are enough data to support firm conclusions. After all, on average each participant would likely only taste 5 of the 6 M&M colors once, and 1 color not at all. We leave further M&M color investigations to future work.

Figure 11. Taste response profiles for M&Ms of each color and type. Profiles are reported as percentages of “Bad”, “Normal”, and “Good” responses, though not all M&Ms were sampled exactly evenly. Figure made with Altair.

3.7 Colorful commentary

We assured each participant that there was no “right “answer” in this experiment and that all feelings are valid. While some participants took this to heart and occasionally spent over a minute deeply savoring each M&M and evaluating it as if they were a sommelier, many participants seemed to view the experiment as a competition (which occasionally led to deflated or inflated pride). Experimenters wrote down quotes and notes in conjunction with M&M responses, some of which were a bit “colorful.” We provide a hastily rendered word cloud for each M&M type for entertainment purposes (Figure 12) though we caution against reading too far into them without diligent sentiment analysis.

Figure 11. A simple word cloud generated from the notes column of each M&M type. Fair warning — these have not been properly analyzed for sentiment and some inappropriate language was recorded. Figure made with WordCloud.

4. Conclusion

Overall, there does not appear to be a “global consensus” that European M&Ms are better than American M&Ms. However, European participants tended to more strongly express negative reactions to USA M&Ms while North American participants seemed relatively split on whether they preferred M&Ms sourced from the USA vs from Europe. The preference trends of Asian participants often fell somewhere between the North Americans and Europeans.

Therefore, I’ll admit that it’s probable that Europeans are not engaged in a grand coordinated lie about M&Ms. The skew of most European participants towards Denmark M&Ms is compelling, especially since I was the experimenter who personally collected much of the taste response data. If they found a way to cheat, it was done well enough to exceed my own passive perception such that I didn’t notice. However, based on this study, it would appear that a strongly negative “vomit flavor” is not universally perceived and does not become apparent to non-Europeans when tasting both M&Ms types side by side.

We hope this study has been illuminating! We would look forward to extensions of this work with improved participant sampling, additional M&M types sourced from other continents, and deeper investigations into possible taste differences due to color.

Thank you to everyone who participated and ate M&Ms in the name of science!

Figures and analysis can be found on github: https://github.com/erinhwilson/mnm-taste-test

Article by Erin H. Wilson, Ph.D.[1,2,3] who decided the time between defending her dissertation and starting her next job would be best spent on this highly valuable analysis. Hopefully it is clear that this article is intended to be comedic— I do not actually harbor any negative feelings towards Europeans who don’t like American M&Ms, but enjoyed the chance to be sassy and poke fun at our lively debates with overly-enthusiastic data analysis.

Shout out to Matt, Galen, Ameya, and Gian-Marco for assisting in data collection!

[1] Former Ph.D. student in the Paul G. Allen School of Computer Science and Engineering at the University of Washington

[2] Former visiting Ph.D. student at the Novo Nordisk Foundation Center for Biosustainability at the Technical University of Denmark

[3] Future data scientist at LanzaTech

The post Do European M&Ms Actually Taste Better than American M&Ms? appeared first on Towards Data Science.

Visualizing My Data Science Job Search

Erin Wilson — Fri, 19 Apr 2024 14:37:27 +0000

2023 was a turbulent year for many job seekers. At least for me, it felt like quite a journey. Over the 11 months between January and November, I had 107 career-related conversations and applied to 80 positions, resulting in 2 offers (I took a "break" in May to defend my PhD ). Some applications were stretches, sure, but many of them were posts to which I thought I’d be a great match – or at least worth a conversation!

The frequency with which my efforts were met with silence surprised me. I felt like most of my cover letters were quite earnest and my resume had unique experiences, prompting me to second guess my communication skills. Is my resume not clear? Cover letter too wordy? Am I misrepresenting my skills? Am I not actually good enough for any of these positions?? It was certainly a disheartening time – a feast for the impostor syndrome goblins lurking in the corners of my mind.

From a learning perspective, it’s rough that rejections tend to come without any sort of feedback. Which parts can I improve for my next application to a job for which I think I am definitely a strong candidate? Many kind souls offered helpful suggestions on my materials, which I do think got better over time, but it’s hard to say what actually made a difference in the end. Maybe I finally started submitting strong/competitive applications after months of weaker attempts? Maybe 2023 was a particularly rough year? Maybe I got unlucky with factors out of my control, like hitting fake job ads simply posted to "drum up interest" in a company, or ones pre-destined for an internal hire?

In the spirit of sharing potentially helpful nuggets with others on the job hunt (and ok, maybe an attempt at catharsis through visualizing an experience that frequently activated my tear ducts) here are some data-supported reflections from my 2023 job search process. I was most eagerly searching for positions at the intersection of Biology x Data Science x Climate Solutions, either in Seattle or remote.

The Path of a Job Application

For my 80 job applications, what route did they take from submission to a final "yes" or "no"? Figure 1 is a Sankey chart, designed to show the flow of volumes between nodes.

In the second row of Figure 1, we can see that of those 80, about half were positions to which I uploaded my info to a company’s portal without knowing anyone there, which I call a "Cold Application." For the other half, I had a "Personal Contact" – sometimes this was someone who knew me well and could refer me, but often it was someone who I had met once (for coffee, a zoom chat, at a networking event). They didn’t necessarily know my specific skills super well, but they knew I was a real human, hopefully thought I was friendly/thoughtful, and could potentially relay a message to another real human to at least look for my application in the pile.

Figure 1. Sankey chart showing the volume of applications that flowed through each stage of the application/interview process. Image by author.

The third row shows how many applications resulted in an Interview request or some form of rejection/no answer. Some folks insist that you should never apply to a job where you don’t already know someone internally that can refer you – it’s just a waste of time.

My data support that having a personal contact improved my rate of interview requests: 11/42 Personal Contact applications led to a conversation for a ~25% interview rate, compared to 4/38 Cold Applications for a ~10% interview rate. So about a quarter (4/15) of my interview requests came from a Cold Application. I would not go as far to say never apply to a position without a personal contact – it can be fruitful for the right position – but definitely put effort into making personal contacts. At least for me, this application path was more successful in reaching a real human with decision making power.

Some companies are diligent enough to send you an automated rejection when they have moved your application to the "No" pile (n=21). Very rarely I had a human tell me "no" from just the application stage (n=3). Just over half of my applications were "Lost to the Abyss" (n=41), where I heard back from neither human nor robot, and can only assume some void-dwelling AIs are munching on my cover letters.

In the cases where a human requested an interview, I usually proceeded to an initial screen, either by HR or the hiring manager. If that went well, I met more folks from the wider team, and in 3 cases I was invited to give a job talk about my research. None of my interviews included formal coding challenges – most of the positions I applied to had more of a science/research focus over software/infrastructure – but I was often asked for links to my personal github account.

In the bottom row, we see the final tally. I got 74 "No’s" before 2 companies finally said "yes" (within a day of each other, which was some wildly good luck after being on the hunt for 11 months!). I withdrew my other in-progress applications after I accepted my current job.

Pre-Application Groundwork

A big note: applications were not my first steps on my job search. After setting my PhD defense date for Spring 2023, I started setting up "informational interviews" with folks starting in late 2022. Many early conversations were with closer colleagues/mentors/peers whom I felt comfortable asking for advice and trusted with my detailed career goals. This helped me 1) feel better starting a process I was anxious about, and 2) communicate that I was officially ~on the hunt~ so people could let me know if any interesting positions crossed their radar in the coming months. Definitely tug on your support network as you venture into this process!

Next, I started setting up meetings with new people – either through a warm intro from a mutual connection or just an out-of-the-blue message on LinkedIn (some cold-LinkedIn messages got no response, but some did! Maybe ~half?). These requests were generally framed as "I’m a soon-to-be-graduating PhD student exploring career routes in the [Computational Biology/Data Science/Climate] industry – may I ask you about your experience working at [your current position], your career path that led you there, and any advice you have for someone early on this journey?" I also briefly personalized each message so it was clear I was interested in speaking to this person specifically and wasn’t just spamming everyone at that company.

Figure 2. Overall timeline for each informational conversation, job application, interview, and offer from late 2022 to late 2023, colored by the sector most closely related to the company of person I was talking to/position I was applying to. Most job applications were to computational positions with an added focus on biology (blue), climate (orange), or a combination of biology and climate (green), while a few were purely related to data science without biology or climate (pink). Progressive interviews with the same company are connected by lines. Image by author.

The top panel of Figure 2 shows I kept up a relatively steady rate of reaching out to people. Even though most conversations were not about an active job opening, they did not feel like a waste of time. Some key benefits of informational interviews:

Since I wasn’t immediately looking for a job (but would be soon), the casual framing of "just chatting over coffee" or "just learning about your experience" removed a ton of pressure. It was easier to be myself and get practice explaining my interests and experiences out loud without constantly fearing that I wasn’t saying things optimally enough to get the job. It was also great to practice active listening and asking questions to dig deeper into technical/science details with smart people.
It was helpful to keep a pulse on the industry – people could tell me more candidly if their company was not likely to be hiring in the near future, or if they were somewhat likely to have openings soon. This helped focus my attention more directly to certain company’s career pages while attending more casually to others. Also, now that they knew me, these new contacts could possibly let me know if their company’s hiring status changed in the future.
Just meeting people friendly enough to have a chat/coffee with a stranger was pretty cool. Sure some chats were a bit awkward, and occasionally I could tell the other person kind of wanted to leave, but most folks were very encouraging/supportive and were happy to elaborate about their own journeys. Meeting folks working on cool things kept me inspired, and often they’d suggest another person they thought would have a useful perspective for me, so the conversation train continued.
After chatting with someone, if I later came across a relevant job post at their company, I would always let them know that I applied! They didn’t know me well enough to deeply vouch for my skills after one conversation, but hopefully they at least had a positive impression of me. If they felt like it, they might mention my application to a real human, who then might actually read my cover letter rather than leaving it stuck in the AI filters

The second panel in Figure 2 shows job application events: I submitted a sprinkling of applications in early 2023, paused right around my defense date in May ‍ and then ramped up late summer. The third panel shows when (finally!) a trickle of interview requests started coming through!

In one rather strange experience, I got an offer from a company, attempted to politely negotiate, and then swiftly received an "actually, we changed our mind, no more offer" email. This was pretty jarring. It was the first time I had ever tried negotiating, and as a woman looking for a computational job, it has been hammered into my brain that even if it feels unnatural/awkward/uncomfortable, you just gotta try! I had assumed the worst that would happen was they would say "no, that’s really our maximum offer, take it or leave it." I’m pretty sure I was not being rude or unreasonable in my request, but having an offer retracted was quite upsetting I feared I had committed some big faux pas in trying to advocate for myself and fair compensation. Several mentors assured me that this was a highly unusual outcome and not a reflection on me, but of something weird happening at that company. I hope it does not happen regularly to others, but it’s not impossible (n=1).

The bottom panel shows that in November, I finally received two real job offers, thus concluding the Great Hunt.

To put this job search timeline in the context of the wider economic climate, I grouped each interaction type – casual informational convo (dark blue), job application (turquoise), and formal interview (green) – and plotted a cumulative count over time, overlaid with the trend of the S&P 500 index (purple) (Figure 3).

10,000 people). Image by author." srcset="https://towardsdatascience.com/wp-content/uploads/2024/04/0z24mvLJCa0s-d-YG.png 1600w, https://towardsdatascience.com/wp-content/uploads/2024/04/0z24mvLJCa0s-d-YG-300x161.png 300w, https://towardsdatascience.com/wp-content/uploads/2024/04/0z24mvLJCa0s-d-YG-1024x548.png 1024w, https://towardsdatascience.com/wp-content/uploads/2024/04/0z24mvLJCa0s-d-YG-768x411.png 768w, https://towardsdatascience.com/wp-content/uploads/2024/04/0z24mvLJCa0s-d-YG-1536x822.png 1536w" sizes="auto, (max-width: 1600px) 100vw, 1600px" />

Figure 3. Cumulative sum of each job search interaction type over time (left y-axis), overlayed with the daily high of the S&P 500 index (right y-axis) and several big tech company layoffs (>10,000 people). Image by author.

Other than the gap in May around my PhD defense, my informational convo rate was pretty steady over the year, while my job application rate increased in late summer, eventually catching up to my informational convo total. I’m not practiced in reading stock market trends and variability, but if I squint at it, it seems like an S&P 500 upswing after some lows in late 2022 roughly coincides with the steeper slope of my job applications (perhaps when more positions were being posted?) and a corresponding uptick in interviews Notably, this came after a series of big tech layoffs (>10,000 people) in early 2023 near some S&P 500 local minima.

Focusing my job search energy

In grad school, I knew my "Science Happy Place" was to work on a project where I was:

thinking about biology (because genetics is just the coolest!)
doing computer science (because I enjoy the puzzle solving nature of programming work)
with applications in sustainability (because climate change is devastatingly urgent to solve)

Excitingly, I found my way into a grad school lab where I was using computational methods to analyze genetic data in methane-eating bacteria. I hoped my career would similarly encompass the intersection of my knowledge, skills, and enthusiasm in these 3 pillars of Science Happiness.

There were several climate-minded synthetic biology companies I had set my sights on, but in early 2023, I learned that nearly all of the ones with a computing team were in hiring freezes Maybe hitting all 3 of these Science Happiness pillars was too much to expect in a fragile economic climate and I’d have to accept a job with only 1 or 2, especially if I was intent on living in Seattle?

My job search energy shifted in focus throughout the year, indicated by the density of colored circles back in Figure 2. Conversations, applications, and interviews are colored by company focus (as a computer science student, most positions I applied to had at least some computational element, and the broader companies were generally tackling problems related to biology, climate, or a combination). Though working towards climate solutions is where my heart is, I initially convinced myself that I would be ok working on something not-climate related while building experience and skills. Accordingly, in early 2023, we see I focused more on conversations and applications in "Biology + Data Science" (blue circles, typically health/pharma applications) when "Biology + Climate + Data Science" positions were quite sparse.

A few existential crises later, my thinking flipped: I decided I’d rather work on a climate application, even if it meant my job would not leverage my years of study in genetics/microbiology. Later in 2023, my job application effort reflects this mindset shift to "Climate + Data Science" (orange circles, climate work without biology). While I was initially nervous to pursue positions further outside my biology wheelhouse, I enjoyed the chance to learn about adjacent fields and consider how to adapt my skills to the types of non-biology climate solutions companies were tackling.

Figure 4 roughly summarizes the total energy I dedicated to each job field within the intersection of biology, climate, and data science, where each informational convo, job application, and formal interview counts as an energy unit. Notably, I didn’t get to pick when I got formal interviews – that depended on someone else choosing to talk to me – but I decided to sum these three energy units together as ~instances of intense focus/stress.~ This sum does not capture other types of effort, like reading, research, and preparation, but those tended to be more spread out over time.

Figure 4. Total sum of energy units dedicated to each job sector of interest. Image by author.

Luckily for me, Fall 2023 brought about a thaw to some previous hiring freezes and a few super exciting positions opened at companies that combined computing and biology for climate applications (green circles in Figure 2, and dark green center circle in Figure 4). THESE were the exact kinds of positions I was hoping to find because they so strongly aligned with my interests, skills, and values! I eagerly messaged all the folks with whom I had previously made connections at these companies and expressed my most earnest enthusiasm about my applications!

At long last, I landed one of those jobs at the perfect intersection of my career goals: I’m in my early days as a data scientist at LanzaTech, a biotech company that engineers bacteria that eat carbon emissions from industrial waste streams and convert it into sustainable materials

Concluding thoughts – plant a career garden

In summary, I’m not totally sure which aspects of my job search led to my applications being received particularly well or poorly. It felt long and stressful, and the lack of feedback from the big pool of Automated Rejections and applications Lost to the Abyss was hard to train on.

Speculating a bit, my biggest piece of advice to pass on is to talk to people! Often! We all hear about the importance of ~networking~ (building out your professional contacts, etc). It can feel awkward and exhausting to initiate conversations with strangers all the time. But coming to each conversation with an earnest intent to connect, listen, and ask questions can be super helpful! Informational interviews are low pressure settings in which to form connections and share your interests.

Not every conversation will directly feed into a job opportunity, so it can seem like a lot of time to invest without a direct result. But you never know which ones may eventually open a door. An anecdote in my case: I happened to sit at the same lunch table with someone from LanzaTech at a conference in 2016 and kept up a casual conversation with that person via email (once every 1–3 years?). In late 2023 (7 years later!), a job posting opened that matched my skillset, so I messaged this person (and a couple other folks whom I had met from LanzaTech recently). I told them how genuinely excited I was about the position, which perhaps helped flag my application to real humans in hiring positions to continue the conversation

If "professional networking" feels intimidating, maybe think of your connections as a garden – not every seed will yield a fruit, and some things grow fast while others grow slow. Gardens don’t always flourish exactly when you want them to (like when your grad school health insurance runs out and you really would like a tomato to magically appear and offer you a new medical plan). But gardens make healthy, steady progress when periodically tended, and are generally a lovely breath of fresh air when you need support.

It’s not too late to start a career connection garden – it can be as simple as an earnest-but-professional message to someone that seems interesting

Best of luck out there, job seekers

Thanks to Daniel and Matt for feedback on my early drafts!

The post Visualizing My Data Science Job Search appeared first on Towards Data Science.

Modeling DNA Sequences with PyTorch

Erin Wilson — Thu, 15 Sep 2022 13:31:31 +0000

Image by author

DNA is a complex data stream. While it can be represented by a string of ACTGs, it is filled with intricate patterns and structural nuances that are difficult for humans to understand just by looking at a raw sequence of nucleotides. In recent years, much progress has been made towards modeling DNA data using Deep Learning.

Researchers have applied methods such as convolutional neural networks (CNN), long-short term memory networks (LSTMs), and even transformers to predict various genomic measurements directly from DNA sequences. These models are particularly useful because with enough high-quality training data, they can automatically pick up on sequence patterns – or motifs – that are relevant to your prediction task rather than requiring an expert to specify which patterns to look for ahead of time. Overall, enthusiasm is growing for using deep learning in genomics to help map DNA sequences to their biological functions!

As a grad student interested in using computational approaches to address challenges in sustainability and synthetic biology, I’ve been learning how to use PyTorch to study DNA sequence patterns. There’s no shortage of tutorials on how to get started with PyTorch, however many tend to focus on image or language input data. For using DNA as an input, there are many great projects out there that have developed PyTorch frameworks to model all sorts of biological phenomena [1,2,3], but they can be quite sophisticated and difficult to dive into as a beginner.

I had some trouble finding beginner examples for those new to PyTorch that were also focused on DNA data, so I compiled a quick tutorial in case any future DNA modelers find it helpful for getting started!

The tutorial itself can be run interactively as a Jupyter Notebook, or you may follow along with the summary of key concepts and Github gists in the rest of this article.

Build a PyTorch model to predict a score from a DNA sequence

This tutorial shows an example of a PyTorch framework that can use raw DNA sequences as input, feed these into a neural network model, and predict a quantitative label directly from the sequence.

Tutorial Overview:

Generate synthetic DNA data
Prepare data for PyTorch training
Define PyTorch models
Define training loop functions
Run the models
Check model predictions on test set
Visualize convolutional filters
Conclusion

It assumes the reader is already familiar with ML concepts like:

What is a neural network, including the basics of a convolutional neural network (CNN)
Model training over epochs
Splitting data into train/val/test sets
Loss functions and comparing train vs. val loss curves

It also assumes some familiarity with biological concepts like:

DNA nucleotides
What is a regulatory motif?
Visualizing DNA motifs

Note: The following methods aren’t necessarily the optimal way to do this! I’m sure there are more elegant solutions, this is just my attempt while learning. But if you’re just getting started with PyTorch and are also using DNA sequences as your input, perhaps this tutorial can be a helpful example of how to "connect some PyTorch tubes together" in the context of DNA sequence analysis.

1. Generate synthetic DNA data

Usually scientists might be interested in predicting something like a binding score, an expression strength, or classifying a transcription factor binding event. But here, we are going to keep it simple: the goal in this tutorial is to observe if a deep learning model can learn to detect a very small, simple pattern in a DNA sequence and score it appropriately (again, just a practice task to convince ourselves that we have actually set up the PyTorch pieces correctly such that it can learn from input that looks like a DNA sequence).

So arbitrarily, let’s say that given an 8-mer DNA sequence, give it points for each letter as follows:

A = +20 points
C = +17 points
G = +14 points
T = +11 points

For every 8-mer, sum up its total points then take the average. For example,

AAAAAAAA would score 20.0

mean(20 + 20 + 20 + 20 + 20 + 20 + 20 + 20) = 20.0

ACAAAAAA would score 19.625

mean(20 + 17 + 20 + 20 + 20 + 20 + 20 + 20) = 19.625

These values for the nucleotides are arbitrary – there’s no real biology here! It’s just a way to assign sequences a score for the purposes of our PyTorch practice.

However, since many recent papers use methods like CNNs to automatically detect "motifs," or short patterns in the DNA that can activate or repress a biological response, let’s add one more piece to our scoring system. To simulate something like motifs influencing gene expression, let’s say a given sequence gets a +10 bump if TAT appears anywhere in the 8-mer, and a -10 bump if it has a GCG in it. Again, these motifs don’t mean anything in real life, they are just a mechanism for simulating a really simple activation or repression effect.

A simple scoring system for 8-mer DNA sequences. Image by author.

Here’s an implementation of this simple scoring system:

Plotting the score distribution for 8-mer sequences, we see them fall into 3 groups:

sequences with GCG (score = ~5)
sequences without a motif (score = ~15)
sequences withTAT (score = ~25)

Distribution of 8-mer scores. Image by author.

Our goal is now to train a model to predict this score by looking at the DNA sequence.

2. Prepare data for PyTorch training

For neural networks to make predictions, you have to give it your input as a matrix of numbers. For example, to classify images by whether or not they contain a cat, a network "sees" the image as a matrix of pixel values and learns relevant patterns in the relative arrangement of pixels (e.g. patterns that correspond to cat ears, or a nose with whiskers).

We similarly need to turn our DNA sequences (strings of ACGTs) into a matrix of numbers. So how do we pretend our DNA is a cat?

One common strategy is to one-hot encode the DNA: treat each nucleotide as a vector of length 4, where 3 positions are 0 and one position is a 1, depending on the nucleotide.

This one-hot encoding scheme has the nice property that it makes your DNA appear like how a computer sees a picture of a cat! Image by author.

With this one-hot encoding scheme, we can prepare our train, val, and test sets. This quick_split just randomly picks some indices in the pandas dataframe to split (sklearn has a function to do this too).

Note: In real/non-synthetic tasks, you might need to be more clever about your splitting strategy depending on your prediction task: often papers will create train/test splits by chromosome or other genome location features.

A big step when preparing your data for PyTorch is using DataLoader and Dataset objects. It took me a lot of googling around to figure something out, but this is a solution I was able to concoct from a lot of combing through docs and stack overflow posts!

In short, a Dataset wraps your data in an object that can smoothly give your properly formatted X examples and Y labels to the model you’re training. The DataLoader accepts a Dataset and some other details about how to form batches from your data and makes it easier to iterate through training steps.

These DataLoaders are now ready to be used in a training loop!

3. Define PyTorch models

The primary model I was interested in trying was a Convolutional Neural Network, as these have been shown to be useful for learning motifs from genomic data. But as a point of comparison, I included a simple Linear model. Here are some model definitions:

Note: These are not optimized models, just something to start with (again, we’re just practicing connecting the PyTorch tubes in the context of DNA).

The Linear model tries to predict the score by simply weighting the nucleotides that appear in each position.
The CNN model uses 32 filters of length (kernel_size) 3 to scan across the 8-mer sequences for informative 3-mer patterns.

4. Define the training loop functions

Next, we need to define the training/fit loop. I admit I’m not super confident here and spent a lot of time wading through matrix dimension mismatch errors – there are likely more elegant ways to do this! But maybe this is ok? –shrug– (Shoot me a message if you have feedback )

In any case, I defined functions that stack like this:

# adds default optimizer and loss function
run_model()
    # loops through epochs
    fit()
        # loop through batches
        train_step()
            # calc train loss for batch
            loss_batch()
        val_step()
            # calc val loss for batch
            loss_batch()

5. Run the models

First let’s try running a Linear Model on our 8-mer sequences.

After collecting the train and val losses, let’s look at them in a quick plot:

Linear model training and validation loss curves. Image by author.

At first glance, not much learning appears to be happening.

Next let’s try the CNN and plot the loss curves.

Loss curves for both CNN and Linear model. Image by author.

It seems clear from the loss curves that the CNN is able to capture a pattern in the data that the Linear model is not! Let’s spot check a few sequences to see what’s going on.

From the above examples, it appears that the Linear model is really under-predicting sequences with a lot of G’s and over-predicting those with many T’s. This is probably because it noticed GCG made sequences have unusually low scores and TAT made sequences have unusually high scores. However, since the Linear model doesn’t have a way to take into account the different context of GCG vs GAG, it just predicts that sequences with G’s should be lower. We know from our scoring scheme that this isn’t the case: it’s not that G’s in general are detrimental, but specifically GCG is.

The CNN is better able to adapt to the differences between 3-mer motifs! It predicts quite well on both the sequences with and without motifs.

6. Check model predictions on the test set

An important evaluation step in any machine learning task is to check if your model can make good predictions on the test set, which it never saw during training. Here, we can use a parity plot to visualize the difference between the actual test sequence scores vs the model’s predicted scores.

Comparison of actual vs predicted scores for test set sequences. Image by author.

Parity plots are useful for visualizing how well your model predicts individual sequences: in a perfect model, they would all land on the y=x line, meaning that the model prediction was exactly the sequence’s actual value. But if it is off the y=x line, it means the model is over- or under-predicting.

In the Linear model, we can see that it can somewhat predict a trend in the Test set sequences, but really gets confused by these buckets of sequences in the high and low areas of the distribution (the ones with a motif).

However for the CNN, it is much better at predicting scores close to the actual value! This is expected, given that the architecture of our CNN uses 3-mer kernels to scan along the sequence for influential motifs.

But the CNN isn’t perfect. We could probably train it longer or adjust the hyperparameters, but the goal here isn’t perfection – this is a very simple task relative to actual regulatory grammars. Instead, I thought it would be interesting to use the Altair visualization library to interactively inspect which sequences the models get wrong:

Notice that the sequences that are off the diagonal tend to have multiple instance of the motifs! In the scoring function, we only gave the sequence a +/- bump if it had at least 1 motif, but it certainly would have been reasonable to decide to add multiple bonuses if the motif was present multiple times. In this example, I arbitrarily only added the bonus for at least 1 motif occurrence, but we could have made a different scoring function.

In any case, I thought it was cool that the model noticed the multiple occurrences and predicted them to be important. I suppose we did fool it a little, though an R2 of 0.95 is pretty respectable

7. Visualize convolutional filters

When training CNN models, it can be useful to visualize the first layer convolutional filters to try to understand more about what the model is learning. With image data, the first layer convolutional filters often learn patterns such as borders or colors or textures – basic image elements that can be recombined to make more complex features.

In DNA, convolutional filters can be thought of like motif scanners. Similar to a position weight matrix for visualizing sequence logos, a convolutional filter is like a matrix showing a particular DNA pattern, but instead of being an exact sequence, it can hold some uncertainty about which nucleotides show up in which part of the pattern. Some positions might be very certain (i.e., there’s always an A in position 2; high information content) while other positions could hold a variety of nucleotides with about equal probability (high entropy; low information content).

The calculations that occur within the hidden layers of neural networks can get very complex and not every convolutional filter will be an obviously relevant pattern, but sometimes patterns in the filters do emerge and can be informative for helping to explain the model’s predictions.

Below are some functions to visualize the first layer convolutional filters, both as a raw heatmap and as a motif logo.

Ok, maybe this is a little helpful, but usually people like to visualize sequences with some uncertainty as motif logos: the x-axis has positions in the motif and the y-axis is the probability of each nucleotide appearing in each position. Often these probabilities are converted into bits (aka information content) for easier visualization.

To convert raw convolutional filters into position weight matrix visuals, it is common to collect filter activations: apply the weights of the filter along a one-hot encoded sequence and measure the filter activation (aka how well the weights match the sequence).

Filter weight matrices that correspond to a close match to a given sequence will activate strongly (yield higher match scores). By collecting the subsequences of DNA that yield the highest activation scores, we can create a position weight matrix of "highly activated sequences" for each filter, and therefore visualize the convolutional filter as a motif logo.

Diagram of how strongly activating subsequences can be collected and converted to a motif logo for a given convolutional filter. Image by author.

From this particular CNN training, we can see a few filters have picked up on the strong TAT and GCG motifs, but other filters have focused on other patterns as well.

There is some debate about how relevant convolutional filter visualizations are for model interpretability. In deep models with multiple convolutional layers, convolutional filters can be recombined in more complex ways inside the hidden layers, so the first layer filters may not be as informative on their own (Koo and Eddy, 2019). Much of the field has since moved towards attention mechanisms and other explainability methods, but should you be curious to visualize your filters as potential motifs, these functions may help get you started!

8. Conclusion

This tutorial shows some basic PyTorch structure for building CNN models that work with DNA sequences. The practice task used in this demo is not reflective of real biological signals; rather, we designed the scoring method to simulate the presence of regulatory motifs in very short sequences that were easy for us humans to inspect and verify that PyTorch was behaving as expected. From this small example, we observed how a basic CNN with sliding filters was able to predict our scoring scheme better than a basic linear model that only accounted for absolute nucleotide position (without local context).

To read more about CNN’s applied to DNA in the wild, check out the following foundational papers:

I hope other new-to-ML folks interested in tackling biological questions may find this helpful for getting started with using PyTorch to model DNA sequences

9. Foot notes

FOOTNOTE 1

In this tutorial, the CNN model definition uses a 1D convolutional layer – since DNA is not an image with 2 dimensions, Conv1D is sufficient to just slide along the length dimension and not scan up and down. (In fact, sliding a filter "up" and "down" doesn’t apply to one-hot encoded DNA matrices: separating the A and C rows from the G and T rows doesn’t make sense – you need all 4 rows to accurately represent a DNA sequence.)

However, I once found myself needing to use an analysis tool built with keras and found a pytorch2keras conversion script. The conversion script only knew how to handle Conv2d layers and gave errors for models with Conv1d layers

In case this happens to you, here is an example of how to reformat the CNN definition using a Conv2D while ensuring that it still scans along the DNA as if it were a Conv1D:

FOOTNOTE 2

If you’re doing a classification task instead of a regression task, you may want to use CrossEntropyLoss. However, CrossEntropyLoss expects a slightly different format than MSELoss – try this:

loss = loss_func(xb_out, yb.long().squeeze(1))

The post Modeling DNA Sequences with PyTorch appeared first on Towards Data Science.

Mistborn: The Final Eyebrow

Erin Wilson — Sun, 03 Jan 2021 05:20:29 +0000

TLDR: Enjoy some interactive visualizations summarizing eyebrow interaction data in Mistborn.

If you’ve found yourself raising your eyebrows at the Mistborn series by Brandon Sanderson, you are not alone. Yes, Mistborn is a fun fantasy read: the first era features a lovable bunch of thieves who navigate social disparity, political intrigue, and revolution amidst the steely rule of an oppressive Empire. And while Mistborn’s magic system of Allomancy – the ability to wield cleverly versatile powers from ingesting small shavings of various metals – is perhaps the coolest magic system I’ve ever read, there’s a certain feature of the story that really rises above the rest: eyebrows.

Yes, eyebrows. The number of times the characters of this series “raise an eyebrow” at each other is astounding. Perhaps it was the intonation of the audiobook narrator that made the phrase jump out so distinctly, but it became such a joke with my partner as we were listening that whenever a character “raised an eyebrow,” we’d immediately turn to each other with our wildest eyebrow-contorting expression.

But then… I got curious.

How many times did Kelsier “raise an eyebrow”? Was there a pattern to it? Did he raise his eyebrows equally at all his crewmates or did he find certain crewmates particularly perplexing? As an aspiring data scientist… I decided to dig in a bit further.

Below is a pilot analysis of the social dynamics present in the first book of the Mistborn series, The Final Empire, as conveyed through the raising of eyebrows. Books 2 and 3 will be covered in a future post. This article contains spoilers for The Final Empire, so if you haven’t read it __ yet and want to, stop here and save this for later. But if you’re caught up and intrigued, read on for data and interpretation covering three main analyses:

Which characters raised their eyebrows most often? Which characters were the most frequent recipients of eyebrow raises?
Is there a temporal significance to characters’ eyebrow raising behaviors?
What can we infer about the social dynamics between characters based on the frequency and directionality of their eyebrow exchanges?

Eyebrow data collection

Just to provide a brief summary for how the data were collected:

We borrowed an eBook copy of The Final Empire, The Well of Ascension, and The Hero of Ages.
We ‘control-F’ed for instances of the word “eyebrow.”
We read the passage surrounding the eyebrow interaction to ensure it was indeed a “raising” or “cocking” or “lifting” of said eyebrow – basically we counted any sort of elevation.
** Notably, the word “eyebrow” only occurred twice in the entire series without an associated interaction between characters: in book 3, The Hero of Ages, Marsh raises his eyebrows upon entering an empty room, and Slowswift is described as having “bushy eyebrows.” These were not included in the data.
Finally, we recorded: 1) the character raising the eyebrow (“source”), 2) the character at whom the eyebrow was raised (“target”), and 3) the page number of the interaction.

In total, we recorded 53 eyebrow interactions in The Final Empire, 48 in The Well of Ascension, and 42 in The Hero of Ages.

For whom the brows raise

My first analysis was to simply count the total number of eyebrow source (“raiser”) and target (“raisee”) instances for each character (Figure 1).

Figure 1. Raw counts of eyebrow interaction types (“source” or “target”) for each character. (Image by author)

Unsurprisingly, Kelsier is far and away the most frequent eyebrow source, doling out 19 distinct raises. This makes sense: his position as crew leader combined with his roguish charisma naturally lends itself to witty banter and friendly condescension towards his oddball crew. Additionally, if “raising an eyebrow” is one of Sanderson’s author quirks, Kelsier’s role as one of the two primary point-of-view (POV) characters simply gives him more page time during which to waggle those brows.

The next most frequent eyebrow source is Kelsier’s apprentice and co-primary-POV character, Vin, clocking in with 8 raises. Again, Vin gets a ton of page time so her second place “raiser” status is not too surprising, though sarcastic Soother Breeze is not far behind her with 6. But what really stands out about Vin is her first place eyebrow target status: Vin is the recipient of a whopping 22 eyebrow raises! Quiet and strange, the crew don’t always know what to make of Vin and often regard her with concern or skepticism. However, this veritable mountain of eyebrow targeting can’t be purely explained by page time as Kelsier receives only 6 raises and says plenty of ridiculous things.

So far, these results aren’t all that exciting: the two POV characters are involved the bulk of the eyebrow interactions, with the main support cast – Breeze, Ham, Dockson, Sazed, and Elend – racking up a handful each. Several minor characters fire off a brow or two in passing.

Next, I investigated the temporal dynamics of eyebrow interactions to see if characters’ brow behaviors changed throughout the story.

Temporal arcs of eyebrow snark

To get a better sense of the distribution of eyebrow raises across the series, for each character, I plotted each of their eyebrow interactions – both as a source or a target – over time. The x-axis is the page number (time) and y-axis includes the main characters involved in at least 3 interactions (Figure 2).

Figure 2. Main character eyebrow interactions over the course of The Final Empire. (Image by author)

From this view, we can still clearly see that Kelsier and Vin are involved in the most eyebrow interactions as they have the most dots, but there is an intriguing difference in when these interactions are occurring! Kelsier has a relatively regular pattern of eyebrow raises throughout the beginning of the story, interspersed with a sprinkling of eyebrows targeted at him. But at about ⅔ of the way through the book, he suddenly stops raising his brows at others… It’s not his death – that happens towards the very end of the book and in the figure we can see that he is still alive and targetable (orange dots) in the final few hundred pages. I believe this sudden brow ceasefire approximately coincides with a major setback to his plans (the destruction of Yeden’s hidden forces). Though aware that his plan to overthrow the Lord Ruler is extremely risky and the odds of success are highly improbable, perhaps Kelsier realizes that the time for playful banter has come to an end and he needs to get serious. Sure, he still spiels the crew about the importance of smiling and keeping a positive outlook, but his highly animated eyebrows are no longer part of his leadership style as he adjusts his plan for the final act…

Conversely, Vin spends the entire first half of the book as a target of eyebrow raises. As discussed earlier, her skittish disposition is something that takes a while for the others to get used to and she remains wary as she tries to find her place among this goofy crew.

But then! Something shifts: right at the halfway point through the story, Vin suddenly becomes an eyebrow raiser 5 times in quick succession! While she still receives a handful of raises after her sudden cluster as an eyebrow source, this moment seems to have opened her up and she becomes comfortable raising her eyebrows in the future…

So what happened? What triggered this shift? To dig further into this question, I reformatted the figure design to additionally visualize the connection between the source and target of each eyebrow encounter (Figure 3). This figure is a bit wild, and I apologize in advance for the visual complexity! While I enjoy the simplicity of Figure 2, I designed this alternate version to allow for investigation of the specific eyebrow interchanges __ over time. I think the figure is slightly more effective as an interactive visualization, available here.

Just to orient you, it’s set up the same way as Figure 2: the x-axis is page number (time) and the y-axis has a section for each character to mark their eyebrow exchanges. Now instead of blue and orange, a point is a diamond if the character was the eyebrow source and a circle if the character was the eyebrow target. Vertical lines now directly connect the source and target of each interaction, with the color tied to the source character.

Figure 3. All character eyebrow interactions over the course of The Final Empire, now with the eyebrow “source” (diamond) and “target” (circle) of each interaction connected by a vertical line. (Interactive version, image by author)

Getting back to the analysis of Vin’s sudden venture into the league of eyebrow raisers, let’s focus on her pattern of interactions, emphasized in bold below:

A closer look at Vin’s eyebrow interactions in Figure 3. (Image by author)

Sliding along Vin’s track in the middle of the plot, we still see the open circles marking her as the target of many eyebrow raises – they primarily come from Kelsier but Ham, Breeze, Marsh, and Sazed all join in. But Vin’s eyebrow source cluster mentioned above is entirely directed at Elend! Elend sneaks in a raise at Vin a few pages earlier upon their initial encounter at the ball at Keep Venture, but by Vin’s second and third balls, she has built up the nerve to raise her eyebrows at a prominent nobleman.

This marks a key moment in Vin’s transformation: we meet her as a street urchin – skirting in the shadows, trying to avoid scrutiny – and watch her become a real player in the game of courtly intrigue and grow into a self-confident crew member as she comes into her power as a Mistborn. Brow raises are clearly an important marker of this arc: after flexing her eyebrow muscles openly and often at Elend, she then proceeds to fire shots at her closest mentors, Sazed and Kelsier. In fact, Kelsier’s final eyebrow interaction of the series is as a recipient from Vin. Oh, how the Apprentice indeed becomes the Master…

One Brow to raise them all, and in the Deepness bind them

In my final analysis, I was curious about the overall social network of characters as defined by their eyebrow exchanges. Here I dropped the time component and summarized the data in a directed graph (Figure 4).

Figure 4. Directed graph capturing the frequency and directionality of eyebrow raises between all characters in The Final Empire. Every character is a node, sized proportionally to the total number of eyebrow interactions in which they were involved. Every edge between nodes indicates the direction of an eyebrow interaction between the source and target character; arrows are also sized proportionally to the total number of eyebrow raises from the source to the target. (Image by author)

Again, this network shows that Vin and Kelsier are the hubs of eyebrow activity, but there is a clear distinction between raises from Kelsier to Vin (12) versus from Vin to Kelsier (1). This directionality feels important to capture as it emphasizes the Master of Snark vs Perplexing Apprentice power dynamic.

We can also see more crew dynamics: Breeze and Ham create a tightly connected component with Kelsier and Vin. Perhaps we can perceive Breeze’s heightened exasperation with Ham’s philosophizing given the thicker flow of brow raises from Breeze’s node to Ham’s. Additionally, Vin and Kelsier exchange eyebrows at about equal rates with Sazed while stalwart Dockson only has outgoing eyebrows: towards Kelsier and an unnamed soldier. Ever logical, no one finds cause to raise an eyebrow at Dox.

Notably, Vin is the only crew member to exchange eyebrows with the members of the nobility. She is also a bit of a mystery to the nobles and earns both friendly and not-so-friendly raises from Elend and Shan. But the thickest arrow again marks that solid outflow of brow-raises towards Elend, demonstrating her growing confidence. While her noblewoman improvisation is a bit rough around the edges, she eventually embraces her role and evolves from political pawn to political player.

(An interactive version of the network is available below. While interactivity is not essential to understanding the network, I’m learning/practicing D3 and thought it was fun to drag the character bubbles around.)

Low brows and no brows

While “raising an eyebrow” is a mannerism shared by many characters in the Mistborn universe, there are a few notable exceptions. Among the crew, Clubs is involved in zero eyebrow exchanges. The grizzled general isn’t outwardly phased by any of the crew’s shenanigans, nor does he say anything particularly perplexing. Among the villainy, two Obligators raise several brows, but we get none from the Lord Ruler or any Steel Inquisitors (do they even have eyebrows?). When these baddies arrive on scene, intense action swiftly follows, leaving no extra time for brow-based banter.

Eyebrows rise, Empires fall

( We have seen the Crew do it all! __ )

By their brows combined, this goofy crew pulled off the impossible. While their metallic mastery takes center stage as they Steel Push, Iron Pull, and Pewter… Pummel their way to a shattering victory over the Lord Ruler, the brows in the background really knit them all together. Overall, the exchange of eyebrows seems to play an important role in shaping social dynamics between characters, as well as underscore key moments of character transformation as they reckon with their changing worlds.

And that’s The Final Eyebrow of The Final Empire! It’s always wonderful when series nail character development trajectories across books, and thus examining changes in characters’ eyebrow behaviors between this book and the rest of the series may reveal some interesting patterns. Especially with the kingpin brow-raiser, Kelsier, sidelined for the remainder of the series, who’s brows will rise to the occasion?

Keep an eye out for a follow up analysis delving further into the eyebrow antics in The Well of Ascension and The Hero of Ages, books 2 and 3 of the Mistborn series. Just for fun:

Interactive versions of the plots for all three books are compiled here.
Data and code are available at https://github.com/erinhwilson/mistborn, primarily in this Jupyter Notebook and this Observable notebook.

A big thanks to Matt Johnson, Claire Johnson, and Kylie Fournier for help with data collection/visualization and early feedback!

The post Mistborn: The Final Eyebrow appeared first on Towards Data Science.