Lessons Learned from the Friends Episode Tagging Product
The Basic Theory and Definitions
Tagging involves associated an item with a concept. Weâre saying Item X is somehow related to Tag Y. In its most pure form, weâre not commenting on how these things are related, just that they are â Item X is somehow a member of the set identifed by Tag Y.
Some definitions (for the sake of this document only â I donât claim these are universal)
Item: a content object to which we apply a tag; any number of tags (or none at all) can be applied to a single item; tags can only be applied once â there is not concept in applying a tag more than one time
Tag: a label we apply to one or more items; in most applications, this label is simply a short bit of text, but other implementations might have descriptions attached; a tag is not standalone â it has no reason to exist if not applied to at least one item
Tag Assignment: the singular application of a specific tag to a specific item
flowchart LR I1[Item 1] I2[Item 2] I3[Item 3] T1[Tag A] T2[Tag B] T3[Tag C] T4[Tag D] I1 --> T1 I1 --> T3 I1 --> T4 I2 -- Tag Assignment--> T1 I2 --> T2 I2 -->T4
I didnât want to get pedantic about definitions, because tagging content is literally one of the simplest and most basic information archtecture models. However, in some places before, I need to refer to some structural aspects, and I wanted to baseline a vocabulary.
What does applying a tag say about an item?
When you apply a tag, what are you saying?
- This Item is fundamentally about the concept represented by this tag
- This Item has some connection, however tenuous, to this tag
How this relationship is implied depends on what the tag represents. The tag #ross-rachel meant, âThis episode advanced the storyline of the romance between Ross and Rachel.â While the tag #treeger just meant, âTreeger appears in this episode somewhere.â
The former is more all-encompassing than the latter. The latter is basically trivia, while the former usually always mean one of the storylines was fundamentally about Ross and Rachel and the contents of that storyline would impact further episodes.
When is something âtag-worthyâ?
When does the relationship betweenâŚwhatever, and a tag rise to the level of applying a tag?
For example, the tag #chandler-monica means that the episode somehow advanced the relatioship between Chandler and MonicaâŚ.or does it? Should I have applied it to any episode that involved Chandler and Monica in a relationship? Thereâs kind of three âlevelsâ that situation?
- Chandler and Monica appear in an episode
- There is some reference â either spoken or visual â to Chandler and Monica being in a relationship
- One plotline of the episode is centered about the progression of their relationship (âprogressionâ is important here â a plotline could involve them, but not be about them, if that makes sense)
#1 is silly because they ere in every episode. #2 would be important duing a specific period of time when their relationship wasâŚnovel; when it was new and a specific story arc. However, as the relationshop worse on, #2 really didnât apply much because the relationship kind ofâŚsettled (?) into the background, which means weâre left with #3 â episodes that very specifically involved some aspect of their relationship.
The larger point here is that a concept of theme is âtag-worthyâ only in relationship to the larger context. Something might have been tag-worthy at one point in time, and not in another.
When is something too common to tag?
Earlier I talked about tagging #treeger. This is because Treeger only appeared in 5 episodes, so it was vaguely interesting whenever he showed up.
But consider Gunther. Do we tag his appearances? He appeqred in 150 episodes (64% â it was more odd when he wasnât in an episodeâŚ), and to my knowledge, he was rarely important to any plot point (one major exception: he told Rachel that Ross cheated on her, but this happen off-screen). At best, Gunther was mostly⌠set dressing? He would show up and say a joke or two, but thatâs about it.
So, do I tag every episode with Gunther? At what point does something just become⌠background noise? (Sorry, GuntherâŚ)
When do you have to explain the tag assignment?
The reasoning for some tags might not be obvious. If I tag something with #guest-star, for example, the immediate question is: who was the star?
I solved this by allowing for some explanation â if a tag has a little information icon on it (đ), then you can mouse over for more information. I would use this with guest stars to explain who the star was.
I did it a few other times, like to explain a #monica-obssessive assignment when it just happened in the end credits. I also did it a couple of times with #real-world-person.
Itâs handy, but I donât think this is common in most tagging systems.
Can/should a tag ever only have one assignment?
Remember that the entire purpose of a tag is to group things together. So what would be the point of have a solitary tag assignment?
Thereâs mostly no point to this, because tags are designed to âconnectâ things. However, this does happen because tags can be exploratory. One of the entire points behind tagging is to organize a domain of information from the âbottom up,â which means you might tag something in the expectation that other things will join it, but you donât know this for sure.
Also, maybe youâre just tagging something descriptively? If I tagged an episode as #tear-jerker (I didnâtâŚ) then Iâm saying something about the episode, even if Iâm not connecting it to anything. Itâs a label Iâm slapping on it â a channel to provide a dimension or perspective.
Can tags every be hidden for administration or search biasing?
In this project, all the tags were open and available for review and browsing, but in some projects, Iâve had hidden tags. These were usually for searching biasing, but a few times I used for adminiatrative organization as well.
For example, a tag of #this-writing-sucks is probably not something you want to display to the content consumer, but a report of all content tagged like that can be a handy way to create a âwork listâ to manage tasks and keep things organization.
To âhideâ tags Iâve done one of two things:
- Had a separate field for âadmin tagsâ
- Allowed parentheticals. So, if you put a tag in parenthesis, it was assumed to be âhiddenâ and would be handled as such (like:
politics, congress, (this-sucks), foreign-policy
)
Which brings me back to the question: are hidden tags still tags? I think so. If youâre only using them for search biasing, they tend to be called âkeywords,â but if they function the same as tags for all other purposes â theyâre just not displayed to the end user â then that naming is fine.
Could we apply an intensity level to tags?
What if tagging something wasnât binary, but a matter of degree? Would there be value in applying an âintensity levelâ to a tag?
For example, if we were apply a tag of #joey-actor, we could apply it a low intensity for the episode where Joey misses the Days of Our Lives float on Thanksgiving. Technically, that happened because Joey was an actor, but itâs more about Joey being kinda dumb, and the related plotline is more about Phoebe trying to teach him to lie (âA raccoonâŚ!â)
But what about the episode when Brooke Shields plays the deranged fan who believes Joey is Dr. Drake Ramoray? Thatâs much more âaboutâ being an actor than the Thanksgiving thing. What if we could applyâŚmore âtag-nessâ to it: joey-actor:3
So, weâre applying 3x the joey-actor
than a normal tag assignment?
This wouldnât be hard technically, and the colon-based example from above would be fine to annotate it. But what would we do with this information?
âŚI have no idea? When we list the items assigned to that tag, we could order them by this value in reverse (no specified intensity would be assumed to be 1
). In that case, the episode that were really about something would be listed first.
(That doesnât apply to this project, since items are episodes and I was always sorting them by air date. But maybe you could give the user the option to sort by intensity?)
However, intensities would be tricky because weâd be asking multiple people to coordinate opinions on intensity. Or weâd be asking one person to know about all potential intensities to form a mental scale. For example, say if you were to leave Item 13 at the default, but decide that Item 43 was 2x the intensity. Thatâs fine until you come to Item 96, which is more intense than 13 but less intense than 43. Can we do an intensity of 2.5?
I feel like youâd have to keep it very simple.
- Assume a â2â value if no intensity was specified
- Use a â1â if the tag only sort of applies
- Use a â3â is the tag really applies
(Although, on that simplied scale, we might just use a â+â for more intensity and â-â for lessâŚ)
This is one of those things that might be a solution in search of a problem. However, it would help solve the âtag indecisionâ problem. I would have been much more likely to tag something if I knew that I could specify an intensity level.
My problem was that I almost didnât want to tag the episode when Joey missed the parade, because I would think about other episodes that had an higher intensity of joey-actor
-ness, and Iâd think this wasnât equivalent. I might have been more likely to tag because by leaving it at the default intensity (and giving a corresponding raise to the Brooke Shields episodeâs intensity), I could accurately represent how I felt about it.
What does it mean when something has no tags?
There were a handful of episodes to which I didnât assign any tags. Does that mean nothing happened in those episodes? Well, no, it just means nothing happened that merited a tag assignment âŚin my subjective opinion, based on my natural tag selecting tendencies.
Looking at all the notes above this one, there were some epsiodes that involved things that didnât merit a tag, either because they were ubiquitous, or unique, or otherwise didnât rise to the right level. If we decided to tag Gunther, for example, then some untagged episodes would get tagged for him.
Still, I always felt badly about it. And if I watched all the episodes in detail, I might find I missed something, or might come up with a way to tag them â some aspect of the episode that I could turn into a tag.
âŚbut I didnât want to manfacture tags for their own sake.
Should we provide a query language for tags?
Searching for items associated with a single tag is simple and straightforward. However, should we allow users to search for items based on a âqueryâ of different stag stautus?
For example â
If I want to see items about Paolo â the Italian guy from the first season â I can search for #paolo. If I want to see items about Phoebeâs career as a massuese, I can search for #phoebe-massuese. But what if I want to see if those two things intersect?
And they did â in The One with the Dozen Lasagnas, Paolo hit on Phoebe while she was massaging him, leading to the end of his relationship with Rachel. How would we search for this?
To this end, do we need a tag query language? Could we search for paolo+phoebe-masseuse
to refer to that intersection? Could we go further â
paolo|phoebe-masseuse
to find items that have either of those tagsbing-adoption--erika
to find items about Chandler and Monico trying to adopt before they met Erika (the eventualy adoptive mother). (Note that I had to use--
there because the dash is already used as a word separator)Could we do parentheticals? If I wanted to see items that just involved Carol or Ben, without Susan, could I do
(carol+ben)--susan
?
If we provided a query language, would this affect how we did tag assignments?
If we did allow tag querying, I feel like this would fundamentally change how we were able to tag things. A lot of tagguing boils down to trying to make sure someone can find something. However, with a query language, we would be much less concerned with intersections.
If I tag something #las-vegas and #ross-rachel, I might still tag it with ross-rachel-marriage
, because that speaks to a specific intersection.
However, with a query language, someone with enough domain knowledge could query for las-vegas+ross-rachel
to find this, since their drunken marriage was their major plotline in Vegas.
(This is a little contrived, because it was hard to find an example in this project. But consider a blog post that I tag with history
and technology
. Thereâs a specific intersection there: history-of-technology
. If I have a query language, thatâs implicitly covered by the query history+technology
.)
However, this leads to another pointâŚ
How much are tags about âexploration,â rather than just organization?
Do we tag things to search them? Or do we tag things to explore a domain of information?
I feel like tags are⌠opportunisitic. People are reactive about tags, not proactive. Theyâre not going to search tags, theyâre going to see one, be reminded of some aspect of the content theyâre consumuming, and want to see what else fits into that tag.
So, if we did provide a comprehensive tag query language, would anyone use it?
Or could weâŚ
Could tags be algorithmic?
Hereâs one of the benefits of a tag: itâs a promise to produce content related to the tag. By default, the model is that the tag produces content that has been proactively tagged with the same thing â this is how tags are presumed to work.
But what if we supplmented that model with algorithmic tag assignment? To go back to the example above, what if, when we were rendering tags, we detected that an item was tagged with both history
and technology
, so we automatically added history-of-technology
?
We could handle the search two ways:
If we actually added this as a âtrueâ tag on item save, then there would presumably be others tagged with the same thing, so it would just work normally.
Alternately, we could bind certain âauto tagsâ to a tag search, so if someone tried to access
history-of-technology
, we would actually produce the items with a tag query search forhistory+technology
Auto tag assignment wouldnât have to be just based on manual assignment. We would search the item to find references to keywords, like #marcel. If the description or text of an item included âmarcel,â we could check for that tag, and auto-assign it if it didnât exist.
How important is tag naming, and is that static?
Ron Leibman played Dr. Leonard Green, Rachelâs father. He was a notable, recurring character â any episode he was in always had a plotline just for him.
But what do we call him? #rachels-dad or #dr-green? And does this matter?
Clearly, the core question is, which name/label do users identify with more? When a user sees this on a tag, which one are they going to recognize.
But thereâs another interesting angle here â
Note that #rachels-dad is relational. It frames Dr. Green as having an identity only in relationship to another character. He matters only because of his connection to Rachel.
However, letâs pretend for a second that Dr. Green fell in love with Chandlerâs Mom, and they both became regular characters in the later seasons. Can he still be #rachels-dad? Or has his usage in the show morphed into something self-sustaining, and that doesnât need Rachel as an âanchorâ?
Maybe thatâs only specific to this domain of information â a narrative that continues over time â but the phenomenon of a tag changing context and meaning could be more prevalent.
Another example is #ross-professor. He was only a professor in the later seasons (after he stopped working at the museum). This was a notable career change, because it drove some story arcs (the entire #elizabeth and a non-trivial part of the #charlie arc). So the #ross-professor tag is really an transformation of the #ross-museum tag â its refers to the same concept and characterization â that of Ross having a career that requires a high level education, but theyâre two separate tags.
Is there a point in a tag thatâs only used once?
If something is tagged one time, then whatâs the point? If I canât view other things tagged that way⌠should the tag exist? Should we suppress one-off tags from the list of tags? Should a tag only become âactiveâ when more than one thing is tagged?
When we tag something, are we saying:
- âThis thing is about this tagâ
- âClick this tag to find more things like this thingâ
Are we just labeling a body of information, or are we providing exploration opportunities?
Single-use tags are also inherent in the âbottom upâ nature of tagging. Put another way: every tag is single-use at some point, because we tag incrementally. The first time I applied a tag, that was the only place it was assigned, until I applied it somewhere else.
And this leads us to another point: when we apply a tag for the first time, are we naturally assuming we will apply it somewhere else?
I am very familiar with this particular body of information. So when I tagged #ross-rachel for the first time in Season 1, Episode 1, I did so knowing that I would tag many future episodes the same way. So, is that why I tagged it? Because I knew this was a recurring thing?
In Season 8, Episode 14 â The One With The Secret Closet â you see the âfourth wallâ for the one and only time in the entire series. Thereâs a shot of Joey and Chandler standing in front of the titular closet, from the perspective of the closet, and you can clearly see the presumed wall that all the cameras shoot through (itâs purple, it turns out).
This is the only time that happened. Should I have tagged this fourth-wall
? When I tagged that episode, I knew this moment happened, and I also knew that it never happened before or since. Is this why I didnât tag it as such?
(Honestly, I donât remember my actual thought process at the time, but this is what I recollectâŚ)
If I saw the fourth wall again, would I have thought to myself, âIâve see that once before. So thatâs two. I need to go back and tag the first occurence.)
Itâs like that kidâs game Concentration. You flip over a card and think, âIâve seen this picture somewhere before,â and you go looking for it.
The problem of tag consistency
I had an inherent advantage in this situation, because all the tagging lay with me. I was the only one applying tags, so that leads to some consistnency â
- Active consistency, in the sense that I can remember tags Iâve applied and apply them again
- Passive consistency, in the sense that all tag applications are the product of a single mind that works (relatively) the same way from application to application
But, clearly, even I made mistakes â forgetting I used a tag previously, or mispelling it, or thinking up some new angle.
Tags, by definition, donât have a central authority behind them. People can make up any tag they want. Some scenarios might force people to use a specific set of tags, but I would argue those are tags anymore, those are categories.
Also, if you have more than one person, there can be little coordination between them. They could use different tag names, or different tag⌠levels or attitudes. Someone might just apply one tag per episode, thinking that they should consider the episode in its entiety, and sum it up in a single tag. Someone else might be looking at every individual plotline per episode (usually two or three), and also any trivia or random appearances of things or concepts.
Some things Iâve seen â
Autocomplete. When users start typing a tag, they see tags that match what theyâre typing. This helps with mispellings, and avoids people invented new tags, but itâs necessarily dependent on how the tag is spelled, and how well you can match it based on that. Whatâs trickier is to autocomplete the concept, not just the spelling.
Restrictions about tag invention. Some systems will prevent most users from creating an new tag. To use a tag for the first time, a user has to clear some bar â either be placed in a specific user group, or have contributed X number of content items, or some other rubric to figure out if they know what theyâre doing.
Tag suggestions. Some systems might examine the item being tagged and suggest tags based on existing tags applied (âItems tagged
sports-car
are often also tagged withcars
â) or based on the content itself (âContent like this is often taggedpolitics
). This is esentially some level of system-powered organization, but itâs just asking for the userâs permission. If they donât take that suggestion (give their permission), do you tag it anyway? Or do you do this on the consumption side â when someone is viewing the items assigned to thesports-car
tag, do you say something like, âYou might also be interested in thecars
tag?âHuman intervention: Some systems (using that word to include human-powered processes) have human editors who review new tags (all tags?) to make sure theyâre assigned correctly. This group (cabal?) of editors presumeably commincates about consistency, discussing new and emergenin tags and how they should be handled. Wired had a long article about this process on a popular fan fiction site. Additionally, Bob Boiko was pushing the content of the âmetatorâ (an âeditor for metadataâ) as far back as the late 1990s.
When do you need to explain the tag assignment?Should anything ever be tagged only once, or is the core idea of tagging to string multiple things together?Should tags ever be hidden, so they can be used for biasing or navigation?Could we apply an âintensityâ value to tags? #ross-rachelWhat is something has no tags? Does it mean nothing happened, or that it just canât be linked to anything else?Tags only make sense in the context of a specific domain of information
Tag specification â tag queringTag queries allow you to avoid âtag clutter.â history + technology, rather than history, technology, and history-of-technologyWhat do you call a tag? #elizabeths-dad #rachel-dad or #dr-geller? What if he became a doctor halfway through the series?What is the point of a tag, really? To find individual things, to link things together, to find intersections?What are we saying when we tag something? That this thing is only present in the content? That the content is about the thing? #reclinersItâs really helpful to have tag help â when you come back to something after a while away, itâs really hard to remember what tags you used.When does a tag evolve? #ross-professorThe unser interface is everything
Do I have to explain when something is NOT tagged for some reason? When Anna Faris played Erica, she was not famous, so she is not a âguest starâ
Can a tag be a proxy for something else? Does âadoptionâ mean âEricaâ? Can adoption exist without erica?
In what âdirectionâ do we apply tags? From the item perspective, or the tag perspective? Example: Iâm looking tag at a tag page and thinking of all the items I should add to itâŚ