Julie Ng

Julie Ng

JavaScript Best Practices and node_modules

The JavaScript ecosystem and how it stores modules locally in the project can be frickle. Especially when coming from another language, many developers will struggle with dependency hell in Node.js. Let’s look at some best practices to avoid that.

I also struggled when I first developing with JavaScript. Node.js has a package manager npm, which helps a great deal. Unfortunately most people don’t leverage it’s power. But once you master it, it usually just works.

Note: In this article, I always refer to npm. But these best practices also apply to yarn.

Why use a dependency manager?

The best thing about JavaScript is the vibrant community. JavaScript is the most popular language on GitHub - by far with 2.3 million pull requests. The runner-up, Python has less than half at 1 million. This activity reflect how there are so many node modules and the nested dependency structure common in the JavaScript ecosystem.

Dependency managers help us normalize our code across environments and different developer machines, preventing the “it works for me problem”. But npm is unlike most other dependency managers. It stores dependencies locally and in the node_modules folder, which can cause headaches for many.

Misunderstanding node_modules

With other package managers like Bundler for Ruby, you don’t actually see the code of your dependencies in your project. You can import them out of thin air - well out of a central cache. Here are the largest frustrations with the node_modules folder:

  • it can be very large and easily explode to thousands of files
  • there is no global caching
  • you need internet with every npm install

While I also find these points frustrating, at some point I chose to stop fighting and always focus on shipping. This is only possible when you follow JavaScript best practices.

Use npm scripts, ALWAYS

Problem: It works for me 🤷‍♂️

Global installations are the biggest cause of the “it works for me problem”. While in most cases you can get away with this sin, actively developed and constantly improving frameworks (which is good!) can cause inconsistencies. The most common use-case I have seen at work is generating a production-ready optimized SPA frontend. Some optimization may suddenly fail.

Why? The developer was running:

# Bad practice
$ ng build

which executes the globally installed angular-cli 🤦‍♀️

Solution: create an npm script

Instead, you should follow this pattern:

# Best practice
$ npm run build:production

which would execute the version of the angular-cli specified in the scripts block inside package.json.

This simple difference will solve 95% of your problems. Most new developers probably stumble because documentation will simply say <MODULE_NAME> init or node dist/app.js and the npm run-script practice is at best a footnote.

Pro Tip: Use Conventions when naming scripts

Note my script is named build:production not just build. Also note I used a semicolon to scope my command. I prefer to use the <VERB>:<CONTEXT> format to concisely communicate what a command does. Here is an example from my express-es6-starter:

"scripts": {
  "lint": "eslint .",
  "lint:fix": "eslint --fix .",
  "lint:watch": "nodemon -w specs -w src --exec 'npm run lint'"
}

Conventions help remove the mental effort required when figuring out what a command does. Keep in mind the context won’t necessarily be development vs. production environments.

Never reference node_modules

New JavaScript developers may think OK, I will use the versioned dependency like this:

$ ./node_modules/.bin/tap

Just use an npm script as described above. The effect is the same. I’ve also seen references to node_modules when importing modules. Don’t. If you need to do that, your module isn’t packaged properly as a module. At work I see teams using this because they are writing work in progress libraries. My advice is to use a git submodule until your library is mature enough to be published. Clean code is more important than forcing yourself to adhere to best practices. When you or your code isn’t ready yet, you lose too much time in your development workflow.

Never check in node_modules

This code is system generated, which means the source is elsewhere. A general best practice it to never check in generated code. That’s why a standard .gitignore for Node.js projects will look like this:

node_modules/
dist/

Native Bindings

Most importantly dependencies may depend on your operating system. That’s why continuous integration is important. Your development and production environments probably use different operating systems.

It sounds terrible to have native bindings in a dependency but this is not uncommon in Node.js. According to the Node.js Foundation, “30 percent of all modules rely indirectly on native modules”.

30%! That’s crazy, right? How is that even maintainable? Well, the JavaScript ecosystem is open source and supported by many individual contributors and with significant support from companies like Google and Microsoft, who also use these technologies. So as crazy as this sounds, it works thanks to the community.

Never push node_modules

So now that you understand that your dependencies are system specific and might include native modules, you should never assume that your node_modules folder will work in production. I’ve seen developers try because the npm install step can take a long time and you want to minimize deployment time.

So how do you avoid slow deployments?

A good PaaS caches for you

The same way you do it on your computer. The node_modules folder is not generated from scratch each time. Once it already exists, the install step runs significantly faster.

So a good PaaS will handle any caching between deployments and builds for you. For example, Heroku has a node modules caching feature in their platform, which is on by default, but customizable. The Microsoft Azure Cloud also offers a caching feature for node modules.

Cloud Foundry does not

(Note: the following opinion is my own and not that of my employer.)

But Cloud Foundry goes against best practice. They recommend you push your node_modules folder too. WTF, seriously?! When you push an app, you see in the console:

Starting app node-es6-starter in org pcfdev-org / space pcfdev-space as user...
Downloading nodejs_buildpack...
Downloaded nodejs_buildpack
Creating container
Successfully created container
Downloading app package...
Downloaded app package (58.4K)
Staging...
-------> Buildpack version 1.5.32
       PRO TIP: It is recommended to vendor the application's Node.js dependencies
                See http://docs.cloudfoundry.org/buildpacks/node/index.html#vendoring for more information
-----> Creating runtime environment

In their documentation it says:

The cf push command uploads the vendored dependencies with the app.

It doesn’t make sense to me to recommend that developers should risk failure in production because of an incompatibilty due to native bindings in dependencies. Exact versioning with package-lock.json can help you quickly apply a hotfix, rolling back and locking your app to the last functioning version. But that’s a code smell.

I spent an hour on Friday debugging a reference project for teams that used to deploy fine - and still does with PCF dev. So the cause is our implementation of Cloud Foundry in our cloud. But the deployment fails unless I push the node_modules folder.

Note that Heroku default documentation and configurations reflect best practice, ignoring the possibly system dependent node_modules folder. Unfortunately Cloud Foundry does the opposite, defaulting to bad practice - because someone requested it. It should be the other way around. A product should let you override best practice for edge cases. But defaulting to bad practice is just 🤦‍♀️.

Other Tips for Production

Only install what you need

The package.json file let’s your categorize your dependencies:

  • dependencies
  • devDependencies are not installed when running npm install --production which is best practice.
  • optionalDependencies if a package fails to install, overrides default behavior and continues

Don’t take all your dev tools overhead with you to production. To test you’ve categorized your dependencies properly without deploying just remove it and try to start your app:

$ rm -rf node_modules/
$ npm install --production
$ npm start

If it doesn’t work, check that you didn’t accidentally include a dev dependency in your source code or still referred to dev tools like babel in your npm scripts.

Run Built Code, not Source Code

When you running code, make sure it’s been compiled to ES5.

"scripts": {
  "good": "node dist/app.js",
  "bad": "babel-node src/app.js"
}

Don’t use Node.js

Maybe you don’t need it. If you have a single page application, you can avoid dependencies altogether by having your build server generate the production-optimized frontend and throwing it in a nginx server, for example.

That’s it for now. Later next week I’ll finish and publish a post about using Best Practices when using Node.js in Docker, especially when it comes to the node_modules challenge.