The Bike Shed: 425: Modeling Associations in Rails

About this Episode

Stephanie shares an intriguing discovery about the origins of design patterns in software, tracing them back to architect Christopher Alexander's ideas in architecture. Joël is an official member of the Boston bike share system, and he loves it. He even got a notification on the app this week: "Congratulations. You have now visited 10% of all docking stations in the Boston metro area." #AchievementUnlocked, Joël!

Joël and Stephanie transition into a broader discussion on data modeling within software systems, particularly how entities like companies, employees, and devices interconnect within a database. They debate the semantics of database relationships and the practical implications of various database design decisions, providing insights into the complexities of backend development.

Transcript:

We're excited to announce a new workshop series for helping you get that startup idea you have out of your head and into the world. It's called Vision to Value. Over a series of 90-minute working sessions, you'll work with a thoughtbot product strategist and a handful of other founders to start testing your idea in the market and make a plan for building an MVP.

Join for all seven of the weekly sessions, or pick and choose the ones that address your biggest challenge right now. Learn more and sign up at tbot.io/visionvalue.

JOËL: Hello and welcome to another episode of The Bike Shed, a weekly podcast from your friends at thoughtbot about developing great software. I'm Joël Quenneville.

STEPHANIE: And I'm Stephanie Minn. And together, we're here to share a bit of what we've learned along the way.

JOËL: So, Stephanie, what's new in your world?

STEPHANIE: So, I learned a very interesting tidbit. I don't know if it's historical; I don't know if I would label it that. But, I recently learned about where the idea of design patterns in software came from. Are you familiar with that at all?

JOËL: I read an article about that a while back, and I forget exactly, but there is, like, a design patterns movement, I think, that predates the software world.

STEPHANIE: Yeah, exactly. So, as far as I understand it, there is an architect named Christopher Alexander, and he's kind of the one who proposed this idea of a pattern language. And he developed these ideas from the lens of architecture and building spaces. And he wrote a book called A Pattern Language that compiles, like, all these time-tested solutions to how to create spaces that meet people's needs, essentially. And I just thought that was really neat that software design adopted that philosophy, kind of taking a lot of these interdisciplinary ideas and bringing them into something technical.

But also, what I was really compelled by was that the point of these patterns is to make these spaces comfortable and enjoyable for humans. And I have that same feeling evoked when I'm in a codebase that's really well designed, and I am just, like, totally comfortable in it, and I can kind of understand what's going on and know how to navigate it. That's a very visceral feeling, I think.

JOËL: I love the kind of human-centric approach that you're using and the language that you're using, right? A place that is comfortable for humans. We want that for our homes. It's kind of nice in our codebases, too.

STEPHANIE: Yeah. I have really enjoyed this framing because instead of just saying like, "Oh, it's quote, unquote, "best practice" to follow these design patterns," it kind of gives me more of a reason. It's more of a compelling reason to me to say like, "Following these design patterns makes the codebase, like, easier to navigate, or easier to change, or easier to work with." And that I can get kind of on board with rather than just saying, "This way is, like, the better way, or the superior way, or the way to do things."

JOËL: At the end of the day, design patterns are a means to an end. They're not an end in of itself. And I think that's where it's very easy to get into trouble is where you're just sort of, I don't know, trying to rack up engineering points, I guess, for using a lot of design patterns, and they're not necessarily in service to some broader goal.

STEPHANIE: Yeah, yeah, exactly. I like the way you put that. When you said that, for some reason, I was thinking about catching Pokémon or something like filling your Pokédex [laughs] with all the different design patterns. And it's not just, you know, like you said, to check off those boxes, but for something that is maybe a little more meaningful than that.

JOËL: You're just trying to, like, hit the completionist achievement on the design patterns.

STEPHANIE: Yeah, if someone ever reaches that, you know, gets that achievement trophy, let me know [laughs].

JOËL: Can I get a badge on GitHub for having PRs that use every single Gang of Four pattern?

STEPHANIE: Anyway, Joël, what's new in your world?

JOËL: So, on the topic of completing things and getting badges for them, I am a part of the Boston bike share...project makes it sound like it's a, I don't know, an exclusive club. It's Boston's bike share system. I have a subscription with them, and I love it. It's so practical. You can go everywhere. You don't have to worry about, like, a bike getting stolen or something because, like, you drop it off at a docking station, and then it's not your responsibility anymore. Yeah, it's very convenient. I love it.

I got a notification on the app this week that said, "Congratulations. You have now visited 10% of all docking stations in the Boston metro area."

STEPHANIE: Whoa, that's actually a pretty cool accomplishment.

JOËL: I didn't even know they tracked that, and it's kind of cool. And the achievement shows me, like, here are all the different stations you've visited.

STEPHANIE: You know what I think would be really fun? Is kind of the equivalent of a Spotify Wrapped, but for your biking in a year kind of around the city.

JOËL: [laughs]

STEPHANIE: That would be really neat, I think, just to be like, oh yeah, like, I took this bike trip here. Like, I docked at this station to go meet up with a friend in this neighborhood. Yeah, I think that would be really fun [laughs].

JOËL: You definitely see some patterns come up, right? You're like, oh yeah, well, you know, this is my commute into work every day. Or this is that one friend where, you know, every Tuesday night, we go and do this thing.

STEPHANIE: Yeah, it's almost like a travelogue by bike.

JOËL: Yeah. I'll bet there's a lot of really interesting information that could surface from that. It might be a little bit disturbing to find out that a company has that data on you because you can, like, pick up so much.

STEPHANIE: That's --

JOËL: But it's also kind of fun to look at it. And you mentioned Spotify Wrapped, right?

STEPHANIE: Right.

JOËL: I love Spotify Wrapped. I have so much fun looking at it every year.

STEPHANIE: Yeah. It's always kind of funny, you know, when products kind of track that kind of stuff because it's like, oh, like, it feels like you're really seen [laughs] in terms of what insights it's able to come up with. But yeah, I do think it's cool that you have this little badge. I would be curious to know if there's anyone who's, you know, managed to hit a hundred percent of all the docking stations. They must be a Boston bike messenger or something [laughs].

JOËL: Now that I know that they track it, maybe I should go for completion.

STEPHANIE: That would be a very cool flex, in my opinion.

JOËL: [laughs] And, you know, of course, they're always expanding the network, which is a good thing. I'll bet it's the kind of thing where you get, like, 99%, and then it's just really hard to, like, keep up.

STEPHANIE: Yeah, nice.

JOËL: But I guess it's very appropriate, right? For a podcast titled The Bike Shed to be enthusiastic about a bike share program.

STEPHANIE: That's true.

So, for today's topic, I wanted to pick your brain a little bit on a data modeling question that I posed to some other developers at thoughtbot, specifically when it comes to associations and associations through other associations [laughs]. So, I'm just going to kind of try to share in words what this data model looks like and kind of see what you think about it.

So, if you had a company that has many employees and then the employee can also have many devices and you wanted to be able to associate that device with the company, so some kind of method like device dot company, how do you think you would go about making that association happen so that convenience method is available to you in the code?

JOËL: As a convenience for not doing device dot employee dot company.

STEPHANIE: Yeah, exactly.

JOËL: I think a classic is, at least the other way, is that it has many through. I forget if you can do a belongs to through or not. You could also write, effectively, a delegation method on the device to effectively do dot employee dot company.

STEPHANIE: Yeah. So, I had that same inkling as you as well, where at first I tried to do a belongs to through, but it turns out that belongs to does not support the through option. And then, I kind of went down the next path of thinking about if I could do a has one, a device has one company through employee, right? But the more I thought about it, the kind of stranger it felt to me in terms of the semantics of saying that a device has a company as opposed to a company having a device. It made more sense in plain English to think about it in terms of a device belonging to a company.

JOËL: That's interesting, right? Because those are ways of describing relationships in sort of ActiveRecord's language. And in sort of a richer situation, you might have all sorts of different adjectives to describe relationships. Instead of just belongs to has many, you have things like an employee owns a device, an employee works for a company, you know because an employee doesn't literally belong to a company in the literal sense. That's kind of messed up. So, I think what ActiveRecord's language is trying to use is less trying to, like, hit maybe, like, the English domain language of how these things relate to, and it's more about where the foreign keys are in the database.

STEPHANIE: Yeah. I like that point where even though, you know, these are the things that are available to us, that doesn't actually necessarily, you know, capture what we want it to mean. And I had gone to see what Rails' recommendation was, not necessarily for the situation I shared. But they have a section for choosing between which model should have the belongs to, as opposed to, like, it has one association on it. And it says, like you mentioned, you know, the distinction is where you place the foreign key, but you should kind of think about the actual meaning of the data. And, you know, we've talked a lot about, I think, domain modeling [chuckles] on the show.

But their kind of documentation says that...the has something relationship says that one of something is yours, that it can, like, point back to you. And in the example I shared, it still felt to me like, you know, really, the device wanted to point to the company that it is owned by. And if we think about it in real-world terms, too, if that device, like, is company property, for example, then that's a way that that does make sense.

But the couple of paths forward that I saw in front of me were to rework that association, maybe add a new column onto the device, and go down that path of codifying it at the database level. Or kind of maybe something as, like, an in-between step is delegating the method to the employee. And that's what I ended up doing because I wasn't quite ready to do that data migration.

JOËL: Adding more columns is interesting because then you can run into sort of data consistency issues. Let's say on the device you have a company ID to see who the device belongs to. Now, there are sort of two different independent paths. You can ask, "Which company does this device belong to?" You can either check the company ID and then look it up in the company table. Or you can join on the employee and join the employee back under company. And those might give you different answers and that can be a problem with data consistency if those two need to stay in sync.

STEPHANIE: Yeah, that is a good point.

JOËL: There could be scenarios where those two are allowed to diverge, right? You can imagine a scenario where maybe a company owns the device, but an employee of a potentially different company is using the device. And so, now it's okay to have sort of two different chains because the path through the employee is about what company is using our devices versus which company actually owns them. And those are, like, two different kinds of relationships. But if you're trying to get the same thing through two different paths of joining, then that can set you up for some data inconsistency issues.

STEPHANIE: Wow. I really liked what you said there because I don't think enough thought goes into the emergent relationships between models after they've been introduced to a codebase. At least in my experience, I've seen a lot of thought go up front into how we might want to model an ActiveRecord, but then less thought into seeing what patterns kind of show up over time as we introduce more functionality to these models, and kind of understand how they should exist in our codebase. Is that something that you find yourself kind of noticing? Like, how do you kind of pick up on the cue that maybe there's some more thought that needs to happen when it comes to existing database tables?

JOËL: I think it's something that definitely is a bit of a red flag, for me, is when there are multiple paths to connect to sort of establish a relationship. So, if I were to draw out some sort of, like, diagram of the models, boxes, and arrows or something like that, and then I could sort of overlay different paths through that diagram to connect two models and realize that those things need to be in sync, I think that's when I started thinking, ooh, that's a potential danger.

STEPHANIE: Yeah, that's a really great point because, you know, the example I shared was actually a kind of contrived one based on what I was seeing in a client codebase, not, you know, I'm not actually working with devices, companies, and employees [laughs]. But it was encoded as, essentially, a device having one company. And I ended up drawing it out because I just couldn't wrap my head around that idea.

And I had, essentially, an arrow from device pointing to company when I could also see that you could go take the path of going through employee [laughs]. And I was just curious if that was intentional or was it just kind of a convenient way to have that direct method available? I don't currently have enough context to determine but would be something I want to pay attention to. Like you said, it does feel like, if not a red flag, at least an orange one.

JOËL: And there's a whole kind of science to some of this called database normalization, where they're sort of, like, they all have rather arcane names. They're the first normal form, the second normal form, the third normal form, you know, it goes on. If you look at the definition, they're all also a little bit arcane, like every element in a relation must depend solely upon the primary key. And you're just like, well, what does that mean? And how do I know if my table is compliant with that? So, I think it's worth, if you're Googling for some of these, find an article that sort of explains these a little bit more in layman's terms, if you will.

But the general idea is that there are sort of stricter and stricter levels of the amount of sort of duplicate sources of truth you can have. In a sense, it's almost like DRY but for databases, and for your database schema in particular. Because when you have multiple sources of truth, like who does this device belong to, and now you get two different answers, or three different answers, now you've got a data corruption issue. Unlike bugs in code where it's, you know, it can be a problem because the site is down, or users have incorrect behavior, but then you can fix it later, and then go to production, and disruption to your clients is the worst that happened, this sort of problem in data is sometimes unrecoverable. Like, it's just, hey, --

STEPHANIE: Whoa, that sounds scary.

JOËL: Yeah, no, data problems scare me in a way that code problems don't.

STEPHANIE: Whoa. Could you...I think I interrupted you. But where were you going to go about once you have corrupted data? Like, it's unrecoverable. What happens then?

JOËL: Because, like, if I look at the database, do I know who the real owner of this...if I want to fix it, let's say I fix my schema, but now I've got all this data where I've got devices that have two different owners, and I don't know which one is the real one. And maybe the answer is, I just sort of pick one and say, "Oh, the one that was through this association is sort of the canonical one, and we can just sort of ignore the other one." Do I have confidence in that decision? Well, maybe depending on some of the other context maybe, I'm lucky that I can have that.

The doomsday scenario is that it's a little bit of one, a little bit of the other because there were different code paths that would write to one way or another. And there's no real way of knowing. If there's not too many devices, maybe I do an audit. Maybe I have to, like, follow up with all of my customers and say, "Hey, can you tell me which ones are really your devices?" That's not going to scale. Like, real worst case scenario, you almost have to do, like, a bit of a bankruptcy, where you say, "Hey, all the data prior to this date there's a bit of a question mark on it. We're not a hundred percent sure about it." And that does not feel great. So, now you're talking about mitigation strategies.

STEPHANIE: Oof. Wow. Yeah, you did make it sound [laughs] very scary. I think I've kind of been on the periphery of a situation like this before, where it's not just that we couldn't trust the code. It's that we couldn't trust the data in the database either to tell us how things work, you know, for our users and should work from a product perspective. And I was on a previous client project where they had to, yeah, like, hire a bunch of people to go through that data and kind of make those determinations, like you said, to kind of figure out it out for, you know, all of these customers to determine the source of truth there. And it did not sound like an easy feat at all, right? That's so much time and investment that you have to put into that once you get to that point.

JOËL: And there's a little bit of, like, different problems at different layers. You know, at the database layer, generally, you want all of that data to be really in a sort of single source of truth. Sometimes that makes it annoying to query because you've got to do all these joins. And so, there are various denormalization strategies that you can use to make that. Or sometimes it's a risk you're going to take. You're going to say, "Look, this table is not going to be totally normalized. There's going to be some amount of duplication, and we're comfortable with the risk if that comes up."

Sometimes you also build layers of abstractions on top, so you might have your data sort of at rest in database tables fully normalized and separated out, but it's really clunky to query. So, you build out a database view on top of that that returns data in sort of denormalized fashion. But that's okay because you can always get your correct answer by querying the underlying tables.

STEPHANIE: Wow. Okay. I have a lot of thoughts about this because I feel like database normalization, and I guess denormalization now, are skills that I am certainly not an expert at. And so, when it comes to, like, your average developer, how much do you think that people need to be thinking about this? Or what strategies do you have for, you know, a typical Rails dev in terms of how deep they should go [laughs]?

JOËL: So, the classic advice is you probably want to go to, like, third to fourth normal form, usually three. There's also like 3.5 for some reason. That's also, I think, sometimes called BNF. Anyway, sort of levels of how much you normalize. Some of these things are, like, really, really basic things that Rails just builds into its defaults with that convention over configuration, so things like every table should have a primary key. And that primary key should be something that's fixed and unique.

So, don't use something like combination of first name, last name as your primary key because there could be multiple people with the same name. Also, people change their names, and that's not great. But it's great that people can change their names. It's not great to rely on that as a primary key.

There are things like look for repeating columns. If you've got columns in your schema with a number prefix at the end, that's probably a sign that you want to extract a table. So, I don't know, you have a movie, and you want to list the actors for a movie. If your movie table has actor 1, actor 2, actor 3, actor 4, actor 5, you know, like, all the way up to actor 20, and you're just like, "Yeah, no, we fill, like, actor 1 through N, and if there's any space left over, we just put nulls in those columns," that's a pretty big sign that, hey, why don't you instead have a, like, actor's table, and then make a, like, has many association?

So, a lot of the, like, really basic normalization things, I think, are either built into Rails or built into sort of best practices around Rails. I think something that's really useful for developers to get as a sense beyond learning the actual different normal forms is think about it like DRY for your schema. Be wary of sort of multiple sources of truth for your data, and that will get you most of the way there.

When you're designing sort of models and tables, oftentimes, we think of DRY more in terms of code. Do you ever think about that a little bit in terms of your tables as well?

STEPHANIE: Yeah, I would say so. I think a lot of the time rather than references to another table just starting to grow on a certain model, I would usually lean towards introducing a join table there, both because it kind of encapsulates this idea that there is a connection, and it makes the space for that idea to grow if it needs to in the future.

I don't know if I have really been disciplined in thinking about like, oh, you know, there should really...every time I kind of am designing my database tables, thinking about, like, there should only be one source of truth. But I think that's a really good rule of thumb to follow. And in fact, I can actually think of an example right now where we are a little bit tempted to break that rule. And you're making me reconsider [laughter] if there's another way of doing so.

One thing that I have been kind of appreciative of lately is on my current client project; there's just, like, a lot of data. It's a very data-intensive and sensitive application. And so, when we introduce migrations, those PRs get tagged for review by someone over from the DevOps side, just to kind of provide some guidance around, you know, making sure that we're setting up our models to scale well. One of the things that he's been asking me on my couple of code changes I introduced was, like, when I introduced an index, like, it happened to be, like, a composite index with a couple of different columns, and the particular order of those columns mattered.

And he kind of prompted me to, like, share what my use cases for this index were, just to make sure that, like, some thought went into it, right? Like, it's not so much that the way that I had done it was wrong, but just that I had, like, thought about it. And I like that as a way of kind of thinking about things at the abstraction that I need to to do my dev work day to day and then kind of mapping that to, like you were saying, those best practices around keeping things kind of performant at the database level.

JOËL: I think there's a bit of a parallel world that people could really benefit from dipping a toe in, and that's sort of the typed programming world, this idea of making impossible states impossible or making illegal states unrepresentable. That in the sort of now it's not schemas of database tables or schemas of types that you're creating but trying to prevent data coming into a state where someone could plausibly construct an instance of your object or your type that would be nonsensical in the context of your app, kind of trying to lock that down.

And I think a lot of the ways that people in those communities think about...in a sense, it's kind of like database normalization for developers. So, if you're not wanting to, like, dip your toe in more of the sort of database-centric world and, like, read an article from a DBA, it might be worthwhile to look at some of those worlds as well. And I think a great starting point for that is a talk by Richard Feldman called Making Impossible States Impossible. It's for the Elm language. And there are equivalents, I think, in many others as well.

STEPHANIE: That's really cool that you are making that connection. I know we've kind of briefly talked about workshops in the past on the show. But if there were a workshop for, you know, that kind of database normalization for developers, I would be the first to sign up [laughs].

JOËL: Hint, hint, RailsConf idea. There's something from your original question that I think is interesting to circle back to, and that's the fact that it was awkward to work through in Ruby to do the work that you wanted to do because the tables were laid out in a certain way. And sometimes, there's certain ways that you need the tables to be in order to be sort of safe to represent data, but they're not the optimal way that we would like to interact with them at the Ruby level.

And I think it's okay for not everything in Ruby to be 100% reflective of the structure of the tables underneath. ActiveRecord gives us a great pattern, but everything is kind of one-to-one. And it's okay to layer on some things on top, add some extra methods to build some, like, connections in Ruby that rely on this normalized data underneath but that make life easier for you, or they better just represent or describe the relationships that you have.

STEPHANIE: 100%. I was really compelled by your idea of introducing helpers that use more descriptive adjectives for what that relationship is like. We've talked about how Rails abstracted things from the database level, you know, for our convenience, but that should not stop us from, like, leaning on that further, right? And kind of introducing our own abstractions for those connections that we see in our domain. So, I feel really inspired. I might even kind of reconsider the way I handled the original example and see what I can make of it.

JOËL: And I think your original solution of doing the delegation is a great example of this as well. You want the idea that a device belongs to a company or has an association called company, and you just don't want to go through that long chain, or at least you don't want that to be visible as an implementation detail. So, in this case, you delegate it through a chain of methods in Ruby.

It could also be that you have a much longer chain of tables, and maybe they don't all have associations in Rails and all that. And I think it would be totally fine as well to define a method on an object where, I don't know, a device, I don't know, has many...let's call it technicians, which is everybody who's ever touched this device or, you know, is on a log somewhere for having done maintenance. And maybe that list of technicians is not a thing you can just get through regular Rails associations. Maybe there's a whole, like, custom query underlying that, and that's okay.

STEPHANIE: Yeah, as you were saying that, I was thinking about that's actually kind of, like, active models are the great spot to put those methods and that logic. And I think you've made a really good case for that.

JOËL: On that note, shall we wrap up?

STEPHANIE: Let's wrap up. Show notes for this episode can be found at bikeshed.fm.

JOËL: This show has been produced and edited by Mandy Moore.

STEPHANIE: If you enjoyed listening, one really easy way to support the show is to leave us a quick rating or even a review in iTunes. It really helps other folks find the show.

JOËL: If you have any feedback for this or any of our other episodes, you can reach us @_bikeshed, or you can reach me @joelquen on Twitter.

STEPHANIE: Or reach both of us at hosts@bikeshed.fm via email.

JOËL: Thanks so much for listening to The Bike Shed, and we'll see you next week.

ALL: Byeeeeeeee!!!!!!!!!!

AD:

Did you know thoughtbot has a referral program? If you introduce us to someone looking for a design or development partner, we will compensate you if they decide to work with us.

More info on our website at: tbot.io/referral. Or you can email us at: referrals@thoughtbot.com with any questions.

Support The Bike Shed

Simple Sharing Page

Embeddable Audio Player

Download URL

Social Network Quick Links