Jessica Chan, Web Engineer: Taking On Your First Big Project

Jessica Chan, Web Engineer: Taking On Your First Big Project

Experiential advice, best practice tips and resources for executing a major first-go, web development project

Jessica Chan is full stack Web Engineer at Pinterest, with an extensive design and self-taught programming background.

In this Codecast, Jessica presents crucial tips and code examples to help new developers navigate the terrain of large-scale, web platform production on a company team. She offers tons of priceless direct experience, explains the unique environment of the browser-server relationship in response to React, Node.js and Python language protocols, offers supporting technologies to implement and gifts a joyful dose of friendly reminders on staying calm and organized during this high-learning, steep growth process.

Key Theme » Structured Curiosity

"That juncture where I decided to be a full engineer was a very difficult choice, but in the end it was so hard to be good at both. There were designers that were so good and engineers that were so good and I just felt like being both (the unicorn) I almost think is truly a myth. But who knows? There could be people definitely much more talented than I am that could do well at both. For me, I don’t think I could learn everything I needed to learn to be a good engineer and be a good designer, at the same time." – Jessica Chan

Quotes

"Exploration takes time but you don’t need to know everything to change something."

"I did a lot of prototyping and a lot of hacking into various parts of the code where I assumed rendering happened or had to do with rendering. I was just trying to get something to work. So iteratively I was just trying to say I want to see this and then I would try to hack the code to make that happen."

The Codecast

Watch on YouTube

Full Episode » Transcribed

ERIK

Welcome to the Viking Codecast, my name is Erik Trautman founder of Viking Code School and I’m here with Jessica Chan who’s a web engineer, designer, writer and editor. All sorts of different things. She’s come off from a non-traditional background via a design path and is currently working over at Pinterest and is here today to talk to us about a major project that she took on at Pinterest around reorganizing the entirety of how the whole site is presented. Which is essentially the meat of what we’re going to hear about today.

I’m really excited to welcome you, Jessica. The first question that I have for you is really just who are you, how can we get to know you and the path that you took to get to us, a little bit better.

JESSICA

Sure, hi I’m Jessica. It’s really nice to be here. I started coding when I was a kid so I borrowed an HTML book from the library when I was like 12 years old and I started just coding personal web pages. I signed up for every single free hosting company available, like GeoCities, Tripod, and all that stuff. It was free and I just started making fun websites for myself through middle school, through high school, and in college I was just playing around.

In college I didn’t major in computer science. I actually majored in English and Economics. But on the side, I actually started getting into freelance for graphic design for websites. And these clients I kept actually for years after I graduated, it was a pretty awesome way to make side money through college. After that, I still didn’t go the technical route. After college I went into corporate marketing, at first. I thought I was going to be a corporate marketing manager, and then I went into project management, and then I went into becoming like an editor and thought – I was going to be an editor. My career kind of meandered a bit, but during that whole time I still freelanced on the side, coding web pages.

Eventually I did decide to take it full time and do design and development as a hybrid after full time for a small agency. Then I went from that small agency to a larger company and I joined the design team also as a hybrid, because I was kind of lonely. I was learning by myself, I was doing a lot of coding by myself, I wasn’t really interacting with other developers so that was my move to a bigger company.

Moving from design to development.

Then, at that company I also moved to the engineering team from the design team because somebody convinced me that engineering was a path to take rather than trying to do both, or trying to be good at both design and engineering. That was really hard. Being able to focus and being in an engineering organization I really learned about cadences and sprints. They were really hardcore with agile methodology so I got to really be agile. That was just eye-opening for me and kind of life changing. So being an engineer, committed to being an engineer, I learned a lot. I ramped up, especially on JavaScript, especially on building client side applications, and then I got a recruiting email from a very big company. I got a recruiting email from Facebook and that actually motivated me to see if I could do it.

A lot of that involved learning stuff that I didn’t learn before, so I got an email from Facebook saying "here’s the prep – make sure you know these things." These things included algorithms, data structures, and I was like “oh no, what are those?!” (Laughs) I went on study mode, at night, evenings, I just borrowed a lot of books again, almost to reprise what I did when I was 12 – but I borrowed a lot of books from libraries. There was one that was really helpful called Data Structures and Algorithms in JavaScript.

I didn’t know C++, I was mainly front end so I basically implemented binary sort, all these different algorithms, all these different data structures, link list, double link list. I kind of had to learn what those were on my own. I didn’t get Facebook, (laughs) but I did get Pinterest so I was able to move from that company to Pinterest. Here I’ve just grown immensely. It’s been amazing to work here and I feel very lucky every day.

There was a really big project that I am at the tailend of right now that is basically what I’m going to be talking about today. That kind of summarizes my journey here.

ERIK

Thanks, it sounds like the path that you’ve taken - basically both as a designer and a coder that officially makes you a unicorn. (Laughs)

JESSICA

That juncture where I decided to be a full engineer was a very difficult choice but in the end it was so hard to be good at both. There were designers that were so good and engineers that were so good and I just felt like being both - the unicorn, I almost think is truly a myth. But who knows? There could be people definitely much more talented than I am that could do well at both. But for me, I don’t think I could learn everything I needed to learn to be a good engineer and be a good designer at the same time, just for me personally.

ERIK

Why exactly did you make that shift over? I think one of the big misconceptions about code is that it is not very creative. (It is) but in a different way (than) design. So how did you go from that graphic level creativity to the code level creativity, and why did that become the right choice for you?

JESSICA

I think maybe it had to do with the way I was really seeking challenges at the time. My life might have turned out differently if I was in a different position or if I was in maybe a different company. But at that time I think the technical challenges were just so much more rewarding to me and I really wanted to pursue it. And at that company, there wasn’t really room for somebody to dive into that as well as succeed on the design level. I think my choice was more about following what, at the time, was going to make me the happiest. It’s true that technical engineering challenges are so different from design challenges, so I think it was either make a choice or just kind of be unhappy in both worlds.

ERIK

Nice. The other question I had was actually about the mindset of a designer versus the mindset of an engineer. What do you think carries over one from the other? Do you think being a good designer makes you a good engineer or vice versa or how do those two cross?

JESSICA

It’s funny because I’ve been an engineer now for like three, plus, years and so the design side is kind of slipping from me. But I remember design being a lot about - they both have elements of iteration so there’s that phase where you’re not even on a computer. I remember as a designer using a pencil and paper and like sketching out ideas, sketching out layouts, putting together compositions, and in code, as well. Having that architecture out first, just doing that exploration, that kind of hit and miss crumpling the stuff and throwing it in the trash can type phase, and then you go and you get more clarity. My process at least, is to get a pencil and paper and just kind of write out APIs in code. I think those two were very similar. I feel like design is different in that it’s a lot about presentation, knowing what your stakeholders need, and making sure you’re meeting the requirements and that you’re kind of iterating on the requests coming in and what you know about who the client is.

Engineering is interesting because it’s kind of combining everything that you know with everything you’re discovering about the system you’re implementing it into. It’s almost like the client is like, good practices and all this stuff, so I don’t know it’s a little different. Design I think is more about people and making sure that what you’re doing communicates and speaks to the person you’re doing it for. Engineering is a lot more about designing great systems that work well and the really good principles, and things like that.

ERIK

Great - thank you. I know you had some stuff you wanted to show us, speaking of engineering, about a project you’ve worked with.

JESSICA

Yeah, just to touch upon the main topic of this presentation. I know that when I was first starting as an engineer I was getting a lot of projects that were pretty well scoped. It was technically challenging but also the end result was pretty clear.

One thing that I wanted to share was my experience in taking on a bigger project that was a little less well defined and impacted a lot of people, which was really challenging for me because that’s just something I’ve never done before. I’ve never rolled out a project this large on this scale. So, I guess this is more my story and not necessarily "advice" for everyone. I just want to make that clear, I just want to kind of share my experience and hopefully there’s some takeaways that you can then apply for yourselves, but mainly I wanted to just share.

I will bring up a slide, and what I’m going to attempt to do is I’m going to attempt to describe the scope of the project in pretty simple terms so you can kind of walk through the story with me. Don’t feel like you have to understand everything but I’m going to try to scope it down and make sure.

ERIK

Would you prefer to have questions - wait until the end or kind of just group up in the middle?

JESSICA

No, I’m happy to answer questions throughout, if you’d like.

JESSICA

Okay - is everyone able to see this?

ERIK

Yep.

JESSICA

Okay. Most of you I think, are familiar with Pinterest.com, how it works. You basically search for something that you want to see and Pinterest is really good at serving images that match your criteria. It’s great for hobbyists and people who want ideas, clothing ideas, beard ideas, (laughs) apparently that’s very popular.

Pinterest.com is actually made up of hundreds of modules. So every single thing that’s outlined in red here and more, things that are inside of these red things are all individual modules that are reused over and over again in the site. Basically in our code base everything is a module, we try to reuse as much as we can, button module is an example. We have a pin module, we have lots of modules.

The browser-server relationship.

I’m just going to go into a little bit about how we serve Pinterest.com when you actually type it into your browser. Just a really quick overview is when you type Pinterest.com that request goes to our server and our server sends back the HTML back to your browser plus some JavaScript and other stuff.

The important part to remember here is that when the server actually takes that request and gets all the HTML, it’s using this engine called Jinja. It’s a Python engine and what happens is that initial request that’s coming from the browser and goes straight to the server that only happens once, that happens on the very first time you type out Pinterest and you send it. After that what happens is every time you click around or you want to look at a pin or something like that, the JavaScript then takes over and it starts to handle all these events, these clicks and the scrolling and everything like that.

What it does is when you do click on a pin and you see the pin appear in your browser, the browser doesn’t then make a huge page reload or anything like that. You can actually see that the pin is showing up on your browser right away – and that’s what we call client rendered. The client is doing the render of the HTML or the re-rendering of the HTML and what the client is using is something called Nunjucks.

What you have to remember is:

  1. Server / Jinja / Python.

  2. Client / JavaScript / Nunjucks.

Is that pretty clear so far? Yeah? Okay. So knowing that, this is a quick little snippet of what our template looks like, so here’s a template that sets a variable called "greeting", sets a variable called "name", they’re getting it from some kind of data source and it just prints out

{{ greeting }}, {{ name }}, welcome to Pinterest!

Jinja and Nunjucks are ports of each other – meaning, that they can see this code, this template code and they both can read it. It’s nice because on the server side, we use Jinja to parse this template and then when the user then goes and clicks around on the site they use Nunjucks, and we can reuse the same template on those hundreds of modules around the site. Nunjucks is able to just read it and spit out the same thing. So that’s great, everything is great. There is nothing that needs to be changed.

There are some not so great things about this.

First of all, when Jinja goes ahead and parses this template we actually have to add utilities and libraries in Python in order for Jinja to use them. But, any utilities or libraries that we need to add that Jinja uses then also needs to be added to the Nunjucks instance on the client, so we have duplication here and that’s not great.

There are also some language differences between Python and JavaScript that are very inconvenient for developers. There’s "truthiness" differences, and there’s some other language specific differences that give our developers a lot of headaches, plus having to implement a utility twice in two different languages, that’s kind of annoying. But other than that everything is okay, everything runs. People are getting their pins, that’s great.

What’s the big deal why do we need to change anything? Well, Pinterest and our platform is really interested in new technologies. So right now, we’re running the client’s application on backbone, we have Django on the server side, we have Jinja and Nunjucks, great. However we heard about this really cool technology called React.js and we want to move to it, we’ve kind of weighed a lot of the pros and cons and decided yay! React is totally the way to go, let’s go.

In order to get from here to React what we need to do is we need to somehow get the server rendering React templates, because look, they look completely different. If we change one we have to change the other, because like I said server and client are sharing the template. So if we change from Nunjucks/Jinja templates syntax to React syntax we’re going to need to make sure they’re both running React. However there’s no Python–React, so somehow we’re going to need to get our server which is Django which has tons of code in it in our application logic, to call JavaScript somehow so we can server render React templates.

Then, we also while we’re doing this, we’d like to share those utilities and libraries among the server and the client. There’s a big problem there because to get from here to here or from here to here, that’s huge. We can’t just shut Pinterest down for a couple months and convert all of our templates and tell everyone to wait until we’re done. We need to find a way to make this change iteratively so we don’t interrupt the many developers developing new features on the site, and we don’t interrupt the experience of all the users that are going to our site which is a lot – a lot of users.

We developed a roadmap to get from our current state to here, and I didn’t do the entire roadmap. I only did a piece of that roadmap and that roadmap piece I’m calling "Server-side Nunjucks." What this does is this actually gets our server to call Nunjucks. Why aren’t we calling React?

Well, in order to do this incrementally, we want to be able to just take that first baby step. Which is just have our server be able to even call JavaScript even though it’s Python. Then, have it read the templates that are currently being shared between server and client. And then from here, we can then start iteratively changing the templates. So instead of going straight from Jinja to React there’s this intermediate step. Let’s try to get the server to call Nunjucks instead and then we can focus on switching up the template engine.

Once the server and the client are using the same engine, once they’re both using Nunjucks it will be much easier to change both of them out for React, so that made a lot of sense.

This was my project, and I had to do it while hundreds of developers were still developing and millions of users were visiting the site – we can’t tell them to stop, like I said.

Exploration.

So, the first step for me was exploration and here is kind of my first approach to this project, which looking back I think I could have done a little bit more efficiently. I didn’t come into Pinterest like knowing all this. I definitely had to dive into the code base and just figure out what was our application logic doing. I have never worked with Django before, what does Django do. I had never worked with Backbone.js what does Backbone do, routing, middleware. I basically chased a life of a request through the entire application and took a lot of notes, annotated the code everywhere.

This was fine, I think it’s something that anyone should do when they first join a company anyway, like get really familiar with the stack, get really familiar with whatever part of the code that you’re definitely going to be working in. However, I would say that this was not the most efficient use of my time for this project, specifically. I actually didn’t really need to know that much about, for example the routing or even Django really, or Backbone. All I was really focused on was the utilities and the libraries and the template renders. So all that stuff that is around it, like routing, middleware, app logic, Django - I didn’t really need to know that much about that.

So I guess my takeaway here is:

Exploration takes time, but you don’t need to know everything to change something.

The advice given to me was to:

Get a general sense of what everything does and just dive deep in the system that needs to change and expand your knowledge incrementally.

One of my hesitations when I started this project was how can I refactor this entire engine without even knowing everything about this whole system? It turns out that a lot of people, a lot of people way more senior than me give me the advice of “oh yeah, I don’t know anything about the whole system, I just kind of hack into the part that I need to hack into.”

That really kind of gave me a little more courage to say, "okay, I don’t need to know everything about everything. I just need to go in and try something out." That’s kind of my first takeaway here. I don't want to keep talking and talking. Should I give breaks for questions or anything like that, in between slides?

ERIK

Why not? Let’s just see if anyone has anything queued up. If there are any questions, fire away.

PARTICIPANT

Yeah I had one. I was just wondering - I know a little bit about React, I know it’s like the hottest new thing as far as the JavaScript ecosystem. I haven’t really done a lot with JavaScript but basically my question is: what are your reasons for transitioning to React? What are the pros there, because it’s a huge undertaking I’m sure to transition. So I’m sure there must be a pretty long list of pros.

JESSICA

Benefits of using React.js.

Yeah, I probably won’t have time to go through all of them but some of the major ones are, first of all it’s open source. What’s great about that is a lot of our code currently is kind of a "Frankensteined" Backbone custom modules, rendering system, module tree rendering system. So with React, it does a lot of what our custom code does but it’s open source and it’s well supported. Instead of having to maintain documentation for example to onboard new developers, we could just point to the React tutorial and say, "hey, just go do that."

We’re actually moving, in general, to more open source type stuff, to either open source our own stuff or to consume more open source stuff just because the power of that community really is a multiplier in terms of onboarding.

There’s also performance gains to be had. React performance on Node.js server, React virtual dom just leads to crazy performance increases and performance gains. The way that we’re re-rendering currently with our architecture, we are completely destroying and re-constructing the DOM. Whereas, React virtual DOM is a lot smarter about diffing the DOM trees and enacting that on the browser. Things like that, performance, developer productivity, there’s an ecosystem around React, like we’re using Jest right now. Having a little bit of trouble with Jest code coverage but Jest itself is a great testing framework that we’re pretty excited about. Just in general, Redux.js, the way that React has opinions about the way we should be propagating data around the application. A lot of those patterns we’re pretty onboard with, so those are just a few of the reasons. But it’s pretty neat, so far, learning about it.

Cool, so I’ll continue.

The next slide talks about...so that was exploration just kind of like diving in and not feeling like I needed to know everything about everything in order to just start hacking.

Where to actually start, here’s another example. I did a lot of prototyping and a lot of hacking into various parts of the code where I assumed rendering happened or had to do with rendering. I was just trying to get something to work. So iteratively I was just trying to say I want to see this and then I would try to hack the code to make that happen.

As I was doing that, I was learning more and more about the code. I think that in the beginning that spirit of exploration and that spirit of having an expectation, "I want this to work so let’s see what this does, I want that to work let’s see what that does", was good. But then it came time to figure out how to roll this out to production. A little background: Pinterest has an experimentation framework that allows me to say, "let’s experiment this new code on just a few users." I’ll go into that a little bit later, it’s an amazing framework. I haven’t worked in many companies so I don’t know if all companies have this, I know that a lot of companies and a lot of people are familiar with A/B testing, but the way Pinterest approaches it is pretty amazing. Again I’ll go into that later.

But anyway the point is I needed to find a place in the code where I could fork the code and make it so that the system went to my new rendering system in one place – so I can turn it on and off, almost like a light switch.

Here, I had to find one place where instead of Python using Jinja, it would actually use what we call a "sidecar" Node process. What happens is we have a Python process serving a website, we’re actually spawning up another process that’s running Node which is a JavaScript engine. It actually gets a request from Python and spits back out HTML using Nunjucks. Ultimately, this was the way that we would have Python calling Nunjucks, or JavaScript code.

Here, my takeaway from that was:

Find that place where the system interfaces with the piece you need to change and hack that place. If you can.

So this caveat "if you can", I’m lucky in that Pinterest code was pretty modular. There was really nice interfaces between the systems. The systems were pretty orthogonal, it was like this system did something, that system did something else. This system expects this response from this other system, so I was able to kind of hack that response and say "okay, when the application calls render and Jinja gives it back this – I was able to say okay let’s have this application call Render and if these conditions are met actually use my new system to return back the HTML."

I was returning back the same response as Jinja, but the application was calling my code rather than Jinja at this one place in the fork. So that made experimentation easy because I was able to say "if you’re in the experiment use my system, if you’re not in the experiment use this other system."

That actually brings me to my next point where:

If you can help it, don’t try to cowboy a huge change into production.

It would be great if all code was like that where you could just experiment, but there actually was one place where I did have to refactor a huge piece of the module rendering system. I've been talking a lot about the server side – where I made it so that it didn’t call Jinja, it called my Node sidecar process. But, on the client side, actually, there was a huge refactor that needed to happen in order for both Node and the browser to be able to call the same code.

I don’t know if you’ve run into this before, but there’s kind of a new area here of, some people call it "isomorphic code" some people call it "agnostic code", but basically, JavaScript that can be called from both Node and the browser – a lot of people like to reuse that code like we’re trying to do.

Which presents its own problems because that JavaScript, a lot of our JavaScript actually expected a browser environment and would break if it was being called by Node. A big refactor had to happen so that Node and the browser could call the same code. Yeah, I had to land that without experimenting, because the change was so big. So if you can help it don’t do that. But if you can’t help it, here’s my takeaway from that:

You want to plan it in incremental stages so that if something goes wrong, you know where and how to fix it.

At first I was actually trying to change the server side and the client side at the same time and push it out and I got a big "no, like don’t do that." If something breaks like where is it going to break, are you going to go look in Python code, are you going to go look in Node code, are you going to look in browser code? Where do you think the break is going to happen?

In rolling out these huge big changes you just want to make sure to stage them so that when you roll something big out, if you can’t experiment on it, roll out the piece that you know if something will break how it might break and where that code might be.

For us, we have a lot of monitoring systems where a lot of errors suddenly start happening after a deploy, somebody broke something during that deploy. I did totally break stuff when I rolled out my huge client side change, and luckily I was able to fix it very quickly because somebody stopped me from rolling out everything and I was able to know it was a client side change and client side problem and I was able to fix it very quickly. Had I rolled everything out at once I would have been so lost and I’m really glad I didn’t do that.

I’m going to take a break here, too, to see if anybody has any questions, as well.

ERIK

As before, feel free to unmute yourself here or ask questions in the Hangout, sorry in the Q&A app. I think we might be safe.

JESSICA

Okay, cool.

PARTICIPANT

Actually I did have a question. Maybe you could talk about this more towards the end of the talk, but I’ve gotten questions from peers about when they break something, people get a little overwhelmed about that thought process. It’d be great for you talk about what you did to fix it, if there’s time later on, like how to not be overwhelmed with actually breaking something – particularly for a major project.

JESSICA

Sure, I can actually speak now to that, because I broke Pinterest a few times. (Laughs) In this particular case when I did roll out the client side refactor there are two big things that broke.

One was the experiments that some people were doing completely broke. They pretty much, all the people they were exposing to this experiment and getting metrics from were actually not being exposed to the right things, so their metrics were kind of screwed up.

The second thing that happened was the search page. You try to search for something – you can’t scroll down! (Laughs) So that was really bad. The things that I would say about that is debugging those things while there’s a breakage going on is very stressful, it’s not pleasant. You’re definitely kind of working against the clock, there are people who aren’t very happy. So all you can really do is kind of take notes on how you might prevent this in the future.

So for me, with the experiment thing I actually knew exactly what was going on. I actually forgot to implement the refactored experiment's client, so I fixed that really quickly and what was missing there were tests which would have caught that. So definitely a lesson in making sure that your code is well tested so that when you land things you know if something is missing, or you know something that was working before doesn’t work anymore.

The other issue was just something I don’t even to this day think I would have been able to catch. It was just an implementation detail. It’s a little too technical to really talk about but it had to do with the search API or the search interfaces with our data, and making sure that we were passing the right data over. That’s just something maybe testing could have also caught.

Am I answering the question okay? I think it’s a stressful thing and yeah, the advice is just like when you get over it and you’re past fixing it and everybody is like "okay, what went wrong, how could we have prevented this in the future?" It’s great learning and it’s great opportunity to kind of be a better engineer.

PARTICIPANT

Yeah that’s perfect, that’s food for thought for the community. (Laughs)

A/B Testing.

JESSICA

Okay, so alright the next slide - A/B Testing. If you can A/B test your big changes it’s an amazing insight. Pinterest has an experiment framework like I said where it automatically buckets a percentage of users into one, inside or outside your experiment and there’s a lot of experiments going on at the same time.

When you kind of have in your code this branch where it’s like - if you’re in this experiment I’ll call it experiment Nunjucks, you will go through my new Nunjucks rendering system. If you’re outside the experiment you’re just going to go through the normal system as before, the fact that I had that one place allowed me to make that logic change where I could bucket you into my experiment.

The cool thing is that Pinterest has this really amazing dashboard that shows all these key metrics that the company cares about. With my experiment, I’m not doing a feature change, I’m not changing the design of anything so the hypothesis is everything’s flat, right? I’m not going to make people pin more, I’m not going to make people want to leave the site faster or stay on the site more. If metrics change, that’s actually an anomaly that I’m not expecting. So all I’m doing is I’m rendering templates differently so really the hypothesis is: everything’s flat.

What happened was board edits went down and there was nothing I could reproduce. I couldn’t figure out - I used a bunch of different browsers and I tried to reproduce a bug and I couldn’t find any bugs inside my experiment. I asked the experiment people, I’m saying: "oh, everything is fine except for board edit, do I have to worry about that?" They said "yes, yes you have to worry about that!" (Laughs) There wasn’t any way I could ship my experiment if some metric was down.

I actually found out through some pretty intensive debugging that there was a bug. The way I found out was I actually put stats in production code to narrow down at what layer the bug was happening. Here’s a strategy that you can try is: there’s an open source statistical visualization called StatsD.

I was able to put points in my code that would increment if that code path was hit. So I had some nice graphs of how many people were hitting this piece of code at any given point in time and those graphs over time gave me a picture of deltas between people in my experiment and people outside my experiment. I noticed that at the data layer it was true that my experiment people were editing boards around five to seven percent less than the people outside my experiment. I knew it wasn’t a problem with the data layer. Then in the JavaScript, I also had graphs showing that at the JavaScript layer even, there was that delta, as well. It definitely had to do with template rendering but I couldn’t reproduce it.

This is kind of more luck, I actually got a bug from a different area of the site not related to board edits where elements were appearing in weird places on the screen only in certain browsers. That was an "aha" moment and fixing that bug actually resolved the board edit regression, and I was able to see my metrics kind of start to consolidate and the delta was disappearing. So actually I was able to even monitor as the deploy went out that..."Yay! People are now editing boards at the same rate inside and outside the experiment."

Just the value of experimenting and making sure that things don’t break, even when I couldn’t reproduce it I knew something was broken. That was pretty amazing and that probably saved a lot of really bad pinning experiences.

The takeaway here is just if you don’t have a framework in place, just something to think about is to try and:

Identify metrics that matter and just monitor those metrics after exposing a percentage of users to your new code.

That turned out to be really important when rolling out this thing at the scene.

Profiling & Performance.

Okay, and then profiling. So performance is really important, as well. What I noticed in my experiment is that the page load time went down a whole second, which is unacceptable. That’s really bad to take away the quality of the speed like that.

Again, debugging this was really important and solving this was really important and here are just some things I did. I profiled pretty much everything about my rendering system. I profiled in Python on the Python side like calling the new system, the network calls, all that stuff, how long they took.

I profiled the Node side, and there were two ways that I did it. One was just manual timing statements. I wanted to profile every line of code that I was pretty much writing to kind of compare between Jinja and Nunjucks what was slower. I noticed a really big chunk of time that was slower on the Python side between the two rendering systems, and was able to do a little bit of refactoring there to make that faster.

There’s also logging again, using StatsD. Again, I am able to, on the right side, the logging to an external stat service, this stat client is something that we have at Pinterest. We have a StatsD service that outputs graphs. So I’m able to actually put timing statements here that output the time out to the service and it gives me a nice graph of how long these things are taking. Here you can see some stuff in all four of these examples is just some arbitrary function that’s doing something and I’m just putting timing statements before to capture the time right before, right when some stuff concludes, just printing out the time that it took.

You’ll have to do some different things if you’re running asynchronous code or if you’re working with greenlets, or something like that. There’s a lot of profiling tools and Node flamegraphs are great for CPU stuff. You can Google all that stuff, there are a lot of different strategies for profiling, but this is the one I think I used the most to just kind of get a really quick profile of everything that was happening in my application.

The takeaway:

Just knowing how well my code performed against the old system and profiling to know what parts take how long is really important. What you can measure is often a great way of knowing what you need to fix.

Before this project, I had never profiled anything. I had never really worried about performance before, and when I saw this, I was pretty freaked out. People were saying "yeah, you can’t ship that that’s not going to fly", so having to solve that really forced me to learn about how to worry about the time that code takes to execute in a real life way, not just in a (inaudible) way.

I’ll take a break here and just see, does anybody have any questions at this point?

ERIK

As before, feel free to just unmute yourself or you can ask in the Q&A app.

PARTICIPANT

When you were doing your refactoring to get your time back down, what exactly kinds of refactoring did you have to do? Did you have to think about data structures and I’m going to use a hash because a hash is quicker, whatever. Is that the kind of thing you had to do?

JESSICA

No, actually I thought that might be something that I would have uncovered, like maybe I was using a bad data structure but it was actually very specific to Python, the language. One thing, with this context this might actually be worth explaining just for kicks. (Laughs)

Python, when it was running Jinja it’s just importing a Jinja package and it’s just running Jinja code. So it’s all in process, it’s all just Python calling Python. But when Python is calling Node it’s actually doing so over a network interface, so it’s actually making an HTTP call.

What happens is, with Jinja that’s like blocking code, you’re blocking the process when you’re calling Jinja. So when Jinja takes a long time to render that process is kind of blocked.

But, when Python calls Node, that’s a network call. So you can actually use greenlets to "faux parallelize" that. Basically I found an optimization where when Python calls Node I can actually fire off a whole bunch of greenlight workers to make parallel requests to Node. They would just come back and when those workers resolve I could do rendering that way. Whereas before, I was blocking on those calls. In a way, having Python actually call Node was good for the system because it was able to then fire off all these parallel requests asynchronously, whereas with Jinja it would have to wait until each one’s done.

That was just something that I found out after talking - I actually talked to an SRA who talked a little bit, he mentioned something about, "network calls and greenlets doesn’t have to block the thread." I was like, "wait! That’s what I’m doing with my system, I can totally use that", and that’s when I went back to my code and I tried it and it made it - I got a gain of like 850 milliseconds from that, so that was an awesome performance improvement. And actually when we do switch from Nunjucks to React the speed at which React server side renders a template will more than make up for - the hypothesis will more than make up for the remaining delta.

That’s kind of how we justified the speed regression, but knowing that thing about Python and I would have never known that thing about Python without talking to the SRA who just like had done his own debugging and done something similar and had a presentation about that. It’s kind of serendipitous to like be able to loop that back into the project. But yeah, thanks for asking.

ERIK

We will always be learning, always be listening.

Automated testing.

JESSICA

Exactly. So my last two things, automated testing - huge. The performance and the board edit, that stuff, that affects pinners and we definitely don’t want users to have a bad experience. At the same time while we’re rolling out this huge change we also don’t want developers to be having a bad experience, too.

We also don’t want developers to break our code. So there were actually a lot of things that I had to do to make sure that developers were kind of going along with the things that I needed them to do in order to make this successful.

There were actually a lot of template changes, I had to make code bots across the whole repository to make sure that these templates could be read by Nunjucks, and some of the templates were actually legacy templates that were only being read by Jinja. I’m not going to go into that, that’s just a whole other can of worms.

But basically, I had to make sure that a lot of developers were doing the right thing. There were usages of the template that they could have committed that would have broken my system completely.

While an experiment is going on, of course two different systems are running at the same time. So this percentage of users are going to see a perfectly fine experience – the people that are in my experiment, if you do something that breaks my system – are going to have a bad experience, and therefore screw up the metrics. It may not actually be a reflection of the system it may be just some developer didn’t know this was the new right thing to do. Which is really important because a lot of the time with a big organization of developers and you’re making a platform change you do want developers to actually change the way that they’re used to developing and it’s really difficult to go to each one and yell at them, or look over their shoulder and make sure they’re coding the right way.

Automated testing, I learned a lot at this company about linters, which yell at you even if you just try to build. It will just break your build because it will give an error that you can customize and you can say, "you did this wrong don’t do it, do this instead." It’s not really punitive it’s more actually educational for developers because then they can know what the new rule is very quickly and easily without having to read anything, they just kind of run into it.

Utilize linters.

Then also testing obviously is really important, so that was linters, and linters basically "lint" your code. There’s a lot of common linters out there like JSLint or ESLint that make sure your semicolons are in the right places and that all variables are declared for example, but ESLint especially has the ability for you to actually implement your own custom rules where you can create a rule that traverses the syntax tree and finds conditions where you can throw an error.

I actually wrote a Nunjucks linter that lints templates. There’s actually no "Nunjucks linter" out there so I wrote one and basically was able to add a whole bunch of rules to make sure your template was JavaScript-safe and not using any Python stuff that wouldn’t have worked in JavaScript.

Learning about linters, writing a linter was awesome. That was really cool, and then learning about automated testing was really cool, too, because again we want to make sure the templates look the same – no matter what system we’re using. I had an automated test that went through every single template every time you tried to commit a change and made sure they were all the same, and if they weren’t, it would throw an error because that means that would have broken something in my experiment.

So this is great, just learning about how to make things automatically do things for you (laughs) is really great.

Takeaway:

Automate your yelling. If developers try to land something that will break your system, they should be prevented by tests and linters.

Then finally the last one:

Monitoring & Alerts.

This is like literally a graph, this is the gist of what a graph looks like when something is going really wrong. Basically, when we spawned a new service like Node, it’s another process that’s serving templates. That’s just a new process to kind of worry about. What I needed to do was, I needed some way to track and make sure across the many, many hosts that are serving Pinterest.com that all the Node processes on them are really healthy.

How do you do that? I again used an external graphing service to just increment once when the Node service was giving back a response, like a 200 successful response or 500 error response. Once that got released into production I started getting numbers back. I set a threshold to make sure that this is the normal number of 200 that I should be expecting at any given point in time. If it ever goes down below that there’s something wrong.

Again I did that also, it wouldn't be shown here but a different kind of graph where it's like: "here are the number of 500 that we deem acceptable, if it spikes there’s something wrong." Any time that threshold is breached, I would get a call by a robot. (Laughs) Pagerduty is a service that we use where it will tell me, "hey, watch out, Node's going down!" Or "watch out, somebody committed some code that got past our tests and linters that are causing 500 errors to spike so you better watch out for this."

The takeaway here is just:

Make sure you can see when errors happen on production and what "normal" looks like. When something abnormal happens, make sure you know about it no matter the hour.

There was one time Amazon service went down and had nothing to do with our code, had nothing to do with our infrastructure. It was just a vendor that we rely on was down for a little bit. I got a call at 11 PM and was like, "oh my god, what happened?" It was nice to know that it was working.

On the flipside of that, before I had all this implemented I did have an experiment that was specific to SEO so whenever we touch HTML we should do an SEO experiment because we want to make sure that Google still likes us after our change.

What happened was everything was going great, the bots that were in my experiment were happy. But then suddenly over the weekend, again before I had any of this monitoring happening, somehow – I didn’t even know until Tuesday morning, I’m ashamed to say – but I saw these metrics were just totally bombing. I saw this like right during a deploy and for some reason, something in that deploy fixed it and I’ll never know what.

I had no monitoring, the logs were only being kept around for like 10 hours, we’ve since changed that to a week. (Laughs) The errors that were happening were just long gone. I’ll just never know, it’s going to haunt me to this day what happened that weekend. There was nothing to debug, there was nothing to look at and nothing let me know that happened. Those bots took a while to recover too, so they weren’t happy and I wasn’t happy, and SEO team wasn’t happy.

It’s just important when you make changes like these to know what normal is and make sure that you know what happened, what to do when things go bad.

ERIK

Even good code goes bad.

JESSICA

Yeah.

ERIK

Thank you so much, that’s awesome. I think we have time here for a last little batch of questions from anyone out there. Who’s up?

PARTICIPANT

So coming from freelancing and coming into a team, you obviously ran into more large problems with a larger project.

Were there times when it was overwhelming but then also times that you felt like you had more support from a team? I have been freelancing for a couple years now and a lot of this is very familiar to me on a small scale. Middle of the night or middle of the day I’m doing something else, I’m working my other job and there’s a problem. So I’m just curious how is moving into more of a team development?

JESSICA

Yeah, I feel ya! It is like night and day. Freelancing was so crazy. I remember working through like Christmas, there’s no mercy. When things go down it’s just on you and it’s great because you have the freedom – you don’t have to deal with office politics, you don’t have to deal with land grabbing, you don’t have to deal with a lot of that stuff. But at the same time, on a project like this where it was a huge platform change and a lot of people were kind of waiting for this. It’s a lot of pressure, too.

So it was great to have the support of the people, I again have never done a big project like this before and people trusted me to do it, and I am forever grateful for that. I think I learned so much. I don’t think that at a freelance level or an agency, it’s kind of like "sink or swim" right? You’re doing it or, even at an agency they pick the people who can do it, they’re not going to take risks on somebody who maybe hadn’t done this before or something like that, sometimes.

I would say there’s something cool about having extra resources that can help you and support you. Whereas a freelancer I feel like you are so responsible for everything and that’s a great opportunity for growth but it’s also very like "Aargh! I’ve got to do this." The support is really nice and I think that’s kind of why I seeked out a team more, yeah. Maybe one day when I’m more comfortable I might go back to freelancing but right now I feel like I have so much to learn still.

PARTICIPANT

Thank you.

ERIK

Alright. I do hate to cut it short, but we are actually at the end of our time here. I think typically the last thing that we cover here is actually from you which is if you put yourself in the shoes of someone who is just getting started, who’s climbing the learning curve, who is pushing that boulder up the hill before it starts rolling down again (if it ever does), what advice do you offer someone who’s in that position?

JESSICA

Well I’m still here. (Laughs) I feel like I’m still doing that. I think - I don’t know, it’s hard for me to say.

Just learning all you can and just that everyone is going to do it differently. I would say that there is an element, especially at a company. I guess the pros of a freelancer is that you’re on your own – you’re your own bar – you kind of like have your own bar. At a company like this I’m always kind of looking around and seeing. I’m inspired by so many people around me, so just remembering that everyone’s journey is different and it’s pretty exciting, learning, keeping up with everything that’s going on. Just enjoy it, enjoy the curiosity and enjoy the journey. I think it will take you, as long as it still gives you joy, I think it will take you places in your career.

ERIK

Beautiful. That’s a great note to go out on. Jessica thank you so much for joining us. Everyone else here, everyone who’s out there, you guys thank you. Thank you all, we will be back when we’re back, this is a regular event. Join us next time.

Jessica, if people want to thank you how do they get in touch if you’re comfortable putting anything on public, your Twitter or anything like that.

JESSICA

Sure, my website is JessicaChanStudios.com. There’s a contact me link there, there’s Twitter @MissyJCat. Feel free to tweet me, I’m actually not on Twitter that much, but I’ll be more diligent about it.

ERIK

Okay. We’ll post up the slides and the transcript as well on the blog, so visit VikingCodeSchool.com/blog and you’ll see that. Thanks a lot everyone and have a great night!

Contacting Jessica

Personal Site

Twitter

Jessica's Presentation Slides

From n00b to ninja, we'll help you to become a developer
Subscribe to get expert guidance on learning, building, and getting hired delivered right to your inbox each week.


We guarantee your privacy 100%. Your information will not be shared.