How we do visual regression testing
We first started investigating visual regression testing in 2012, starting with Wraith and subsequently forking it to use Selenium webdriver rather than PhantomJS. We’ve also tried Backstop and PhantomCSS, but neither stuck in our workflow. A major stumbling block was deciding when a frontend engineer should run the tests. Ad-hoc during feature development? Before submitting a pull request for a feature? On the develop branch once a feature is merged? Nightly? Maybe a QA automation engineer should be running these instead, as part of a test suite?
After a quick brainstorm with our engineering team, we decided that we needed the following basics:
- Separate tools for taking screenshots and reporting differences. There are so many ways to define and set up a test (load a single URL; crawl a whole site; target a specific DOM element; change DOM or session state first etc.).
- Use our preferred tool, Selenium, so that we can run tests across mutliple browsers and platforms.
- Negligible setup for each new project.
- Simple integration with continuous integration tools (we use Jenkins) so that developers do not need to run tests manually.
- Integration with existing test automation tools to reduce the overhead of maintaining yet another test suite.
So we made Spectre and it’s open source.
Spectre is a Ruby on Rails application that manages your visual regression test suites. It provides an API for runner scripts to submit screenshots and receive a pass or fail in real time, and a simple UI for browsing and inspecting diffs.
Here’s how we use it:
Each of our projects has a test runner (usually a Rake task), triggered nightly (1), that contains a list of URLs or is pointed to a styleguide to crawl. The script uses our Selenium grid (2) to load the URLs and snap a full-height screenshot at multiple viewport widths (3), and post the result to Spectre along with other metadata such as test name, viewport width and source URL (4).
Spectre ingests the screenshot and compares it against a previous test of the same name. If the images are sufficiently different, the test fails (5) and the Rake task will report a Jenkins build failure. Jenkins notifies the team via Hipchat (6).
Static and dynamic content
Any change will naturally result in a regression; even simple content changes. So we only ever test against static content in a controlled QA or demo environment. Never against a production website. We don’t want our clients making CMS changes that result in false positives and noise for the development team.
But while testing against a frontend styleguide or set of HTML templates is all fine and dandy, and will catch the most obvious regressions, it doesn’t guarantee complete coverage. The main reason we developed Spectre in the way we did was to completely integrate visual regression testing into our existing testing toolsets of choice: RSpec and Cucumber.
Frontend and QA, sitting in a tree, visually
Friday’s QA automation engineers maintain large test suites that sit neatly in our CI pipeline. These are generally implemented as gherkin scenarios brought to life using Cucumber step definitions written in Ruby. So we created the Spectre ruby gem to make it easy to bank a screenshot while end-to-end tests run and an application’s state is manipulated.
Landed on a product page? Take a screenshot. Opened an accordion? Take a screenshot. Added four products to your basket? Take a screenshot. Submitted a lead form to Salesforce? Take a screenshot. The result is *full* visual regression coverage, from static frontend styleguides through to end-to-end integrations with content management systems, CRMs and payment gateways. No stone left unturned.
So many uses
We currently run 783 nightly tests across just two projects. Some examples of where Spectre has already been immensely useful:
- Day to day maintenance of large-scale component libraries with many concurrent frontend and QA engineers.
- Screenshotting a local site after making large scale typography changes for client review.
- Performing a full regression of 120 components and 40 HTML templates after swapping ruby-sass for libsass, to ensure that differences in float rounding in the CSS output would not impact layout.
- Performing a full site regression after refactoring a Handlebars data binding implementation.
Give Spectre a try and let us know how you get on!
Are you an engineer looking for a new home? Want to see what else we can do? Why not drop us a line at firstname.lastname@example.org.