When I wrote my last update in September, the days of summer were retreating and I was sleeping on an air mattress on the floor of my new Brooklyn apartment, waiting for my household goods to arrive from California, and more importantly, quietly begging for the summer heat to subside. I got my wish, and was rewarded with one of the coldest winters in my personal history. On my frigid walks to work in the morning past Brooklyn brownstones, my two key accessories have been a scarf I bought on Etsy and a messenger bag to carry the tools of my trade (a laptop). The ritual of wrapping the scarf and putting my laptop in the messenger bag each morning reminds me of how special Etsy is and how important it is for me and my team to deliver a stable, high-performing web experience for the buyers and sellers who pour their passions into their work and our community every day. While we are continually making improvements and our work is never done delivering that experience, we have made progress on many fronts, all while Etsy has continued to grow rapidly and welcome lots of new talent into the Engineering Team.
Monitoring and Communication
I realized early on that the Engineering Team needed to communicate more transparently with the buyers and sellers on Etsy, and that we needed to improve our systems monitoring capabilities so that we had an immediate understanding of the nature of core problems when issues did arise and could communicate them appropriately and reliably. When I arrived at Etsy, the systems for proactively monitoring the site were very limited and the approach was primarily reactive. After a lot of hard work and planning, we rolled out a sophisticated monitoring system in November that today enables us to keep an eye on 700 services running on over 170 pieces of hardware, including servers, network gear and storage systems. When there is an issue with one of our systems, our engineers are proactively alerted. The monitoring system performs over one million automated checks every day — about 12 checks per second — and notifies us of any problems 24×7. The end result is that we know about systems problems faster and typically fix systems and networking problems before users notice them.
Occasionally, we will experience a new problem that we haven’t seen before, so we add detection mechanisms for those problems to our monitoring system when we see them. The monitoring system has become the central nervous system for Etsy’s site operations team and improves every day as we make it smarter. We also use a third-party service from Gomez to proactively measure page load times for Etsy at various points around the world. Gomez (which is used by 14 of the top 15 most visited sites on the Web) is an incredibly rich information source and tells us how fast Etsy’s home page is loading right now in Los Angeles, or last week in Madrid or the average for the world last Tuesday.
While we have made strides in monitoring, sometimes our various monitoring systems indicate problems that can’t be fixed immediately and cause inconvenience or disruption for you, so we have been posting updates to fix.etsy.com to keep the community informed. We use fix.etsy.com because it is hosted separately from our main site. In the event that our main site is unavailable and we need to update the community, we can post updates there. We will continue to use fix.etsy.com as a place to post about issues that affect large numbers of Etsy users.
In some cases, the issues we discover as we rearchitect Etsy’s systems, database, network and hardware require disruptive maintenance to correct. We know that downtime without warning is disruptive for both buyers and sellers, so we announced monthly scheduled maintenance times for the rest of 2009 in late January. I am hoping that this approach makes it much easier for buyers and sellers to plan. While we may still need to take the site or particular services offline in emergency scenarios, we will do our best to limit any disruptive maintenance to these windows. When we do have to take functions offline, we will always provide the most possible notice given the particular situation. I would like to thank you in advance for your patience in those situations as we make Etsy better. I watch services like Twitter very closely during our maintenance windows and know the sense of withdrawal that even planned downtime creates, so we take downtime very seriously, planned or not.
Performance and Scalability
In Maria’s most recent Talking Shop update, you might have noticed that site performance was in our list of priorities. Site performance is not a product per se, but is fundamental to user experience on the Web. We consider site speed a critical feature of Etsy. In order to make and keep the site fast, we are constantly tuning our systems to seamlessly meet the increased demand on them. As a site grows, what worked well in the prior six months may not work well or even at all, now. In technology circles, you often hear engineers talk about the word “scalability” along with performance. In simple terms, a scalable Web site is one that can maintain its existing performance as the number of users, transactions and page views climb. In Etsy’s case, our performance as measured by our own systems and Gomez has vastly improved in the past six months even as site usage has grown considerably. In other words, the changes we have made to our systems and approach are demonstrating scalability. The numbers from Gomez, our third-party measurement service, tell the story in clear measurable terms. Our home page now loads 2-3 times faster in most locations around the world compared to October and as much as nine times faster in some places. Our average home page load time over a 24-hour period as measured by Gomez in the US in October was 4.6 seconds and today it is 1.5 seconds. Singapore? From 18.6 seconds in October to 2.2 seconds today.
We achieved some of these improvements by using the services of Akamai, a web infrastructure company. Akamai directs 25% of the world’s Web traffic using 40,000 servers in 70 locations around the world. When you load Etsy.com in your browser, the images and other pieces of our pages are served via Akamai’s worldwide network from a server that is nearest to you. For buyers and sellers on Etsy, that means Etsy is now almost as fast in Paris, France as it is in Paris, Texas.
Taken all together, all of the changes outlined above are helping make Etsy faster and more stable for you as our traffic continues to grow. These improvements were especially evident during the holiday shopping season, and I was very pleased that our site was fast and stable during this critical period for buyers and sellers (this was not true for other Web sites). I would like to thank the Etsy Engineering Team for its efforts to make this possible.
Search and Developer API
When I arrived in September, Etsy’s search function was unacceptably slow, plain and simple. As I wrote in that update: “Some searches are taking as long as 60 seconds to return results and others are timing out altogether.” We began work on Search immediately. You should have noticed speed improvements in search and across the whole site beginning in November as we improved existing search and eventually migrated the backend entirely to a widely used open source platform (Solr) in January. Today, search on Etsy is faster than ever and we have measured 8-30x performance increases since October based on our Gomez measurements. We have solved the Search speed problem. There is more to come with search, as described by Sean Flannagan on Etsy’s Product Team.
As I wrote in early February, the Etsy API has also been released to a small group of developers and a public beta launch will follow soon. I’m looking forward to letting developers outside of Etsy access the infrastructure that we are working so hard to build.
While I am pleased with some of the progress we have made on infrastructure, we still have a lot of work to do. (Of course, the engineering team is only partly about infrastructure – we have been working closely with the Product Team to develop a number of new features, but I am going to leave it to my colleague Sara Hicks to tell you about that in an upcoming article.) Looking ahead, our goals in the Engineering Team are very simple: to continue to refine Etsy’s technology architecture towards a high-performance and scalable model, and to deliver the new features to buyers and sellers in the areas described in Maria’s update on Etsy’s priorities for 2009. The Engineering Team at Etsy and I are really excited about the rest of the year, and we look forward to continuing to make Etsy even better for all of you.
P.S. If you want to get the breaking, vital Etsy news as it happens — delivered straight to your email inbox, then sign up for Etsy News Alerts emails. Read more about it here. Expect this as an Etsy Alert email in your inbox if you already subscribe to the list.
You can also get an RSS feed of all TechUpdates here.