JavaScript: Knowing which apps you’re breaking using static analysis

This post was featured in JavaScript Weekly #213.

When you join large projects and fix bugs or build features, you typically rely on senior (in terms of time at the company) engineers to say “don’t forget to test feature X (that you probably didn’t know existed).”

Being new, you have limited knowledge of the entire architecture. Our tools should help fill this gap in knowledge. Our tools should help us traverse and understand large application architectures: the primary reason why I built and continue to work on the Sublime Dependents plugin.

I recently implemented a new feature that allows you to better understand which features or pages would break with a change to a module by showing you which app entry points depend on that module. In this post, I’ll talk about some of the tools and performance considerations that went into building this feature.

Motivation

At Bēhance, we have a multi-bundle JavaScript architecture: where each feature/page has its own (built via rjs) bundle. This allows us to avoid a monolithic bundle and only ship code that’s necessary for a given page’s features.

Sublime Dependents already has the ability to see which files directly require/import the current file that you’re modifying, but that’s not enough. Often, it’s not just a question of which modules use this module, but which apps/features use this module.

Dependents of a Dependent’s Dependents

What we’d like to do is uncover each layer of the dependency tree – starting with the module that we’re modifying. The feature for finding the dependents of the current file does just that. However, we’d have to perform this once for the current file, then for every one of those dependents. We’d repeat this process recursively until we hit scripts that are entry points (i.e., driver scripts) to the application. The entry points that were reached in the dependency tree of the current module are the “relevant entry points” that the feature should display.

This is an outward approach to finding the results – starting from the module and going out to the ends of the dependency tree until we hit roots. Though this would work, it would take a large number of redundant iterations across all JS files to uncover each layer of the outwardly-growing tree.

Even with the continued performance enhancements that I’m making to get the “finding of the dependents” to be as fast as possible, that execution time (currently ~2s) is multiplied by a large factor with this approach.

Going from the Outside in

Another approach is to start from the total set of driver scripts and work our way in to find the current module. This method requires a few steps:

  1. Programmatically find all driver scripts (i.e., entry points)
  2. For each driver, generate its dependency tree
  3. If the current module is in the dependency tree for a driver, that driver should be in the set of “relevant” drivers

Each step resulted in a new npm module:

  • app-root for automatically finding the application’s entry points
  • dependency-tree for generating the dependency tree for a given module
  • taxicab for a (relevant) driver(s)

Programmatically Finding the Roots of an Application

I wrote about how to do this when building YA to auto-generate the configuration for grunt’s requirejs and browserify plugins. The gist of the algorithm is that a root is a module that no other module requires.

It turns out that this simple definition yielded a new edge case that I hadn’t thought of when building YA: test files are also roots.

Every test file is a module that has its own dependency tree (typically pulling in the module its testing plus all of that module’s dependencies) and no other module requires/imports that test.

You could sniff the AST of a file to determine if it’s a test (looking for CallExpressions with names like ‘it’, ‘describe’, ‘beforeEach’, and ‘afterEach’. However, I’ve opted to ignore these false positives for now. Despite the fact that the dependency trees for these false positives have to be generated in step 2, the way in which the dependency trees are computed yield no redundant computations.

Generating the Dependency Tree of a Module

The algorithm for this is pretty textbook: a recursive, graph traversal – collecting the dependencies (via precinct) of every visited module until we hit leaf nodes (i.e., modules that have no dependencies). Dependency trees can also have cycles – which need to be avoided during the traversal.

If you took every found driver script from step 1 and ran this tree generation algorithm, you’d perform a lot of redundant traversals and dependency extractions – particularly across core, heavily-reused application modules. The solution for this is also pretty textbook: memoization.

On our way back up from the recursion we remember the dependency tree so far for each visited module. This uses a bit of memory (caching potentially long lists of absolutely-pathed strings), but that’s the tradeoff to get this process as fast as possible.

At Bēhance, we have ~75 driver scripts, so you can likely imagine how slow a non-memoized version of this process would be.

The memoization works well because the trees are serially generated for each driver. I always lean towards parallelizing algorithms, but we’d definitely risk race conditions here: where redundant tree generations occur because of one process didn’t see a module in the cache right before another process added that module (and its tree) to the cache.

Finding Relevant Entry Points

The taxicab module coordinates steps 1 and 2. It creates the cache object that’s used to memoize the tree generations across all drivers. We then loop through the trees looking for a particular module and collect the names of the drivers that have trees where the module is a member.

As currently described, this feature takes anywhere from ~35 to 50 seconds to complete about the Bēhance codebase (around ~1000 JS modules). This is due to the app-root module; I need to dig deeper to see what’s causing that. However, Bēhance coincidentally uses AMD and RequireJS – where it’s possible to bypass programmatically finding the roots by supplying the RequireJS build configuration file. If you’re unfamiliar, that json file is used by the rjs optimizer to build out the bundles. That file supports a “modules” list where you manually describe the bundles to generate (corresponding to entry points of your application).

With the list of modules from the build config, this process takes ~5s. So yeah, it’s obvious where the ~30 to 45 second difference lies: app-root. Though, small codebases will still be fast and if your architecture is that of a single-bundle, you won’t really find value in this feature.

Shipped

This feature works for all supported module types (AMD, CommonJS, ES6, and even SASS stylesheets) thanks to precinct. Now you’ll be able to see which apps/features/pages will be affected by changes to a particular module. Try it out and feel free let me know what you think.

Cheers and thanks for reading!