Navigating an AMD codebase in Sublime Text using Static Analysis and Node.js – Mr. Joel Kemp

This post was featured in Node Weekly #43

When you have a large JS codebase that uses the AMD module pattern, like we do at Bēhance, it becomes tedious to perform certain tasks within your editor. This hurts productivity and adds up to wasted developer time in the long term. Here are some of the problems and possible/existing solutions.

  1. Jumping to the module(s) that requires the current module.
  2. Jumping to a dependency of the current module.
  3. Jumping to the dependency that a function (constructor or utility) maps to.

I’ll focus in on problem 1 in this post, but will briefly touch on the process for performing problems 2 and 3.

Assume the following module definition for exemplification:

define([  'path/to/my/util'], function(util){  ...   util('foobar');});

2. Jumping to a dependency of the current module

Problem 2 involves looking at the dependency list of the module (['path/to/my/util']), reading its path, pressing CMD + P to bring up the symbol find panel, typing in part of the path, navigating to or clicking on the file that you want.

A more effective solution would be to hold a button and click on the dependency’s path within the dependency list and be immediately taken to that file. I know of no Sublime Text plugin that does this, though, the functionality is pretty common in more full-fledged IDEs like Eclipse. No, the solution isn’t to start using Eclipse or Webstorm. Update: Version 1.1.0 of Sublime Dependents now does something similar.

Similarly, for CommonJS modules, you should be able to also click on the filepath within a require call and for custom modules (not npm installed or core modules like path and fs), get taken to that file.

3. Jumping to the dependency that a function (constructor or utility) maps to

Imagine that somewhere deep in the module’s definition, util is used. Most often, we’d like to see more about util‘s api to understand how/why it’s used. To do this, we have to figure out the path to the module that defines/exports util – hovering over the tab’s name or revealing the file in the left side panel to deduce its path. We then need to look at the factory function for the current module that has util as a parameter. Then positionally trace that back to the proper string in the dependency list (i.e., if util is first in the parameter list, then it must be first in the dependency list).

Even if you used the factory form of AMD (below), you’d still have to map the variable to a path.

define(function(require){  var util = require('path/to/my/util');  ...  util('foobar');});

A more effective solution would be to hold some key and click on the variable util and have tooling parse the AST of the current module, determine the node that assigns a required module to util, parse the require FunctionExpression to get the dependency’s path, then jump to that path.

At this point, you might be wondering “why use esprima and not a regular expression?” Valid point. Regexes are hard to write, hard to read, hard to maintain, but potentially faster (yielding the first match of a single pass of the string vs transforming the string into a javascript object then traversing all values/nodes checking that node against a condition). Despite the theoretical performance hit, using ASTs allows you to more clearly express the pattern you’re looking for, simply checking for properties of a JavaScript object. Plus, it would be one beast of a regex to, for example, match the second parameter or the factory function of a define function call. It’s possible, just messy.

1. Jumping to the module(s) that requires the current module

If I’m looking at a module that defines a certain API, I’d most often like to see how and where that module is being used. To do this, I currently have to find the path to the module that I’m looking at (in order to know what it’s require path would be), then press CMD + Shift + F to bring up the Find All quick panel, enter the path for the current module, then look through the results to see where that module was required.

A better solution would be to automate this. Ideally, I would have a button combination that would allow me to jump to the file that requires the current module. If there are multiple modules that require the current module, then a panel should open up showing all of the files and letting me click on which one to jump to.

Implementing this feature has an easy solution and a more involved solution.

The easy solution is to, on a key-binding, override the Find All panel and pre-populate it with a trimmed path of the current module. Unfortunately, Sublime Text’s API doesn’t let you prefill the fields of the Find All panel. Since Sublime Text is also (unfortunately) closed source, it’s not like I can submit a pull request with that functionality.

The harder solution involves traversing the dependency tree about every JS file in your source’s directory and aggregating the paths of all files that require the current module.

This is the solution that I took in the tool Sublime Dependents, which is a Python layer that interacts with a node tool that I wrote called node-dependents; it’s the node tool that does the heavy lifting. I’ll elaborate on the steps necessary to solve this problem – showing how static analysis is used.

Finding all JS files within a given directory

I really like fs-host’s node-dir module for this problem; though, you could just roll your own recursive directory traversal function. Node-dir has a clean-ish api that gives you access to the contents of a JS file and its filename during traversal. Post-traversal, you also have access to the list of found files.

Finding the dependencies of a module

For each file, we need to get the list of dependencies. For the CommonJS case, there are some great tools for this. Substack’s detective module walks the AST of a JS file and looks for all ‘require’ calls within the file. Defunctzombie’s required module uses detective to add a bit of metadata to the dependencies – identifying which are missing or not resolve-able in addition to distinguishing core (like fs or path) and non-core dependencies.

For the AMD case, there’s no love. I had to write a detective-like module that looked for all require calls within an AMD module; the module is called detective-amd. Finding dependencies within an AMD module is tricky – mainly because there are 4 different ways to define AMD modules: named form, dependency list form, factory form, no-dependency form. That’s what I call the forms anyway.

To clarify:

Named form

define('util', ['my/dep'], function(myDep){  ...});

Dependency List form

define(['my/dep'], function(myDep){  ...});

Factory form

define(function(require){  ...});

No-dependency form

define({  ...});

The first two forms are similar in how you extract the dependencies, but the Factory form obviously requires a different approach since it doesn’t have that array of dependencies.

The Technical Details

Extracting the dependencies involves first identifying the type of AMD module definition syntax used. First you look for a node in the AST representing a ‘define’ function call. A given node is a ‘define’ function call if:

node.type === 'CallExpression' && node.callee.type === 'Identifier' && node.callee.name === 'define';

This basically reads that a ‘define’ function call is a CallExpression where the identifier being called/invoked is ‘define’.

Once you find the define call, you know you’re dealing with an AMD module, so we continue traversing the AST.

The next step is to determine if a node that we’re looking at represents one of the 4 AMD forms. I’ll talk about the Factory form (since it’s the most interesting) and defer you to ast-module-types for the conditionals that test for the other forms.

Again, the factory form looks like:

define(function(require){  ...});

The AST for the factory form looks a bit like (abridged):

{  "type": "ExpressionStatement",  "expression": {    "type": "CallExpression",    "callee": {      "type": "Identifier",      "name": "define"    },    "arguments": [      {        "type": "FunctionExpression",        "id": null,        "params": [          {            "type": "Identifier",            "name": "require"          }        ],      }    ]  }}

This top portion represents the define() call:

{  "type": "ExpressionStatement",  "expression": {    "type": "CallExpression",    "callee": {      "type": "Identifier",      "name": "define"    },

We’re then interested in looking at the “arguments” portion to see if the node represents a function that has its first argument as the identifier ‘require’. The conditional for this looks like:

var args = node['arguments'],    firstParamNode =  args.length && args[0].params ?                       args[0].params[0] :                       null;args[0].type === 'FunctionExpression' &&firstParamNode && firstParamNode.type === 'Identifier' && firstParamNode.name === 'require'

Once that condition passes, we know that we’re looking at a node that represents the factory form. We can then traverse that node/AST looking for all nested require function calls

Computing the Dependents

Now that we can get the dependencies for each module in a directory, we need to create a lookup table consisting of a file and the files that require (depend on) it.

Here’s an example:

Modules A and C require module B. Assume that module B has no dependencies.

Modules A and C require module B, so they’re dependents of (i.e., depend on) module B. Adding that to the lookup table results in a structure like:

var dependents = {  B: [A, C],  C: [],  A: []}

We generate a single lookup table for the entire directory of JS modules. If you’re looking at module B and want to jump to the files that require it, you should see a drop down list with the filepaths of modules A and C. Clicking on either entry would take you to the respective module.

Sublime Dependents

Again, the module node-dependents is responsible for computing the dependents look up table and printing the dependents of a given file. The Sublime Text plugin, Sublime Dependents, is a simple interface that invokes the node tool, scrapes stdout, and either jumps directly to the dependent (if there’s only one) or shows the dropdown list.

Why does the node tool instead of Python do the heavy lifting? I had most of the libraries mentioned built out for another static analysis project YA, so it was easier to build off of those than to convert everything to Python.

The benefit to this was obviously development turnaround. However, updates to the node tool aren’t automatically reflected as updates to the sublime text plugin. This means that any enhancements to the node tool require a commit-less tag (version bump) of the sublime text plugin to notify users (via a changelog) to upgrade the node tool. This isn’t optimal. I have an idea for this where post-execution, the python code could check for updates to the node tool and automatically run an npm install to get the freshest code.

Performance Concerns

The plugin works very well. Though, our codebase at Bēhance is about 1200 modules. We found that computing the dependents of a file was taking ~2700ms (2.7 seconds) before showing a result. That’s too slow.

Iterating over all files in the directory using node-dir took ~200ms, so that left 2500ms to optimize. After profiling, I found that deducing the dependencies for a module using detective-amd was the bottleneck, in aggregate. I say “in aggregate” because it only took ~2ms to compute the dependencies of a single file. That’s no time at all really. However, when you process 1200 files serially, it adds up to around the 2500ms performance hit.

I considered a number of solutions to this problem like a bootable server that maintains an in-memory dependents graph, or caching the results of previous runs to disk with a purge on file modification, or using node’s clustering api. I ultimately went with the clustering solution since it felt like the least over-engineered solution. I won’t go into how to cluster your programs (that’s worth another, more introductory post), but I’ll discuss why this problem fit that solution.

Node’s clustering API allows you to fork other node (worker) processes running the same code from a master process. You can send the worker some data after it’s spawned and then have the worker send the master process back some data.

I decided to spawn workers (the number of workers equal to the number of available cores on your cpu) and then split the list of JS files across those workers. When you want to find the dependents for the current module, I give the workers the path of that module and tell it to compute the dependents for that module with a subset of JS files in the directory. This allows the computation of dependents to happen in parallel. We wait for all workers to respond with their findings, uniquely aggregate the results, and then print them to the console. I figured that the time spent coordinating/waiting for the workers to finish was a worthwhile tradeoff.

Clustering resulted in shaving off an entire second in the time it takes to get results. We went from 2700ms to ~1700ms. It’s not as significant as I would have hoped – but the plugin feels snappier, for sure. If you have any ideas on how to make this process faster, I’d love to hear them

What’s next?

I’m hoping to build out more of these tools/plugins to solve problems 2 and 3. Hopefully, you’ve seen that it’s pretty straightforward to interact with the AST of a JavaScript file. Go build tools to expedite your own workflows! Also, check out sublime dependents and let me know how you like it.

Thanks for reading!

Questions, concerns, or suggestions? Ping me @mrjoelkemp