Webpack and yarn magic against duplicates in bundles

This page describes the theory and some technical details behind the webpack-deduplication-plugin plugin, which helped us reduce javascript size in Jira by ~10%.

Setting up the scene

Some terminology used in the article:

Direct dependencies: packages on which your project relies explicitly. Typically installed via yarn add package-name. The full list of those can be found in the dependencies field in package.json at the root of the project.

Transitive dependencies: packages on which your project relies implicitly. Those are the dependencies on which your direct dependencies rely on. Typically you won’t see them in package.json, but they can be seen, for example, in yarn.lock file.

Duplicated dependencies: transitive dependencies with mismatched versions. If one of the project dependencies has a button package version 4.0.0 as a transitive dependency and another has the same button version 3.0.0, both of those versions will be installed and the button dependency will be duplicated.

De-duplication: the process of elimination of duplicated dependencies according to their semver versions (x.x.x — major.minor.patch). Typically, a range of versions within the same major version will contain no breaking changes and only the latest version within this range can be installed. For example, a button version 4.0.0 and 4.5.6 can be “de-duplicated” and only 4.5.6 version will be installed.

yarn.lock file: an auto-generated file that contains the exact and full list of all direct and transitive dependencies and their exact versions in yarn-based projects.

The problem of duplicated dependencies

Considering that those are bundled together and served to the customers, in order to reduce the final javascript size it is important to reduce the number of duplicates to a minimum. This is where the deduplication process comes into play.

Deduplication in yarn

and in yarn.lock we will see something like this:

modal-dialog@^3.0.0:
version "3.0.0"
resolved "exact-link-to-where-download-modal-dialog-3.0.0-from"
dependencies:
button@^2.4.1

button@^2.5.0:
version "2.5.0"
resolved "exact-link-to-where-download-2.5.0-version-from"

button@^2.4.1:
version "2.4.1"
resolved "exact-link-to-where-download-2.4.1-version-from"

Now, we know that according to semver button@2.4.1 and button@2.5.0 are compatible, and therefore we can tell yarn to grab the same button@2.5.0 version for both of them — “deduplicate” them. From the project perspective it will look like this:

and in yarn.lock file we’ll see this:

modal-dialog@^3.0.0:
version "3.0.0"
resolved "exact-link-to-where-download-modal-dialog-3.0.0-from"
dependencies:
button@^2.4.1

button@^2.4.1, button@^2.5.0:
version "2.5.0"
resolved "exact-link-to-where-download-2.5.0-version-from"

Deduplication in yarn — not compatible version

Using the same technique, we can de-duplicate buttons from 1.x.x version, and from the project perspective it will look like this:

And in yarn.lock file we will see this:

modal-dialog@^3.0.0:
version "3.0.0"
resolved "exact-link-to-where-download-modal-dialog-3.0.0-from"
dependencies:
button@^1.0.0

editor@^5000.0.0:
version "5000.0.0"
resolved "exact-link-to-where-download-editor-5000.0.0-from"
dependencies:
button@^1.3.0

button@^2.5.0:
version "2.5.0"
resolved "exact-link-to-where-download-2.5.0-version-from"

button@^1.0.0, button@^1.3.0:
version "1.3.0"
resolved "exact-link-to-where-download-1.3.0-version-from"

Two versions of buttons are unavoidable, and in this case, usually, there is nothing we can do other than upgrading the versions of modal-dialog and editor to the versions when they both have button from 2.x.x range and it can be de-duplicated properly. Typically, in this case, we stop, say that our project has “2 versions of buttons” and move on with our lives.

But what if we dig a little bit further and check out how exactly those 2 buttons are installed on disk and bundled together?

Duplicated dependencies install

and inside node_modules folder we’ll see this structure:

/node_modules
/editor
/modal-dialog
/tooltip

And because of that, we can be sure that we only have one version of tooltip in the project, even if two completely different dependencies depend on slightly different versions of it.

Unless…

Unless those versions are not semver compatible and can not be deduped that easily 😬 Basically, the situation in the project with buttons from the above will look like this:

Even if dependencies are “deduped” on yarn.lock level and we “officially” have only 2 versions of buttons in yarn.lock, every single package with button@1.3.0 as a dependency will install its own copy of it.

Duplicated dependencies and webpack

Webpack behind the scenes just builds a graph of all your files and their dependencies based on what’s installed and required in your node_modules via normal node resolution algorithm. TL;DR: every time a file in editor does “import Button from ‘button’;”, node will try to find this button in the closest node_modules starting from the parent folder of the file the request appeared. The same story with the modal-dialog. And then, from webpack perspective, the very final ask for the button will be:

  • project/node_modules/editor/node_modules/button/index.js — when it’s requested from within editor
  • project/node_modules/modal-dialog/node_modules/button/index.js — when it’s requested from within modal-dialog

Webpack is not going to check whether they are exactly the same, will treat them as unique files and bundle both of then in the same bundle. Our “duplicated” button just got double duplicated.

Deduplication in webpack — first attempt

Webpack is incredibly flexible, it provides rich plugin interface with access to almost everything you can imagine (and to some things that you can not), at its core, most of its features are built with plugins as well, and it exports a lot of them for others to use.

One of those plugins is NormalModuleReplacementPlugin — it gives the ability to replace one file with another file during build time based on a regular expression. Which is exactly what we need! The rest is just a matter of coding.

First, detect all “duplicated” dependencies by grabbing a list of all packages within node_modules and filtering those that have node_modules in their install path more than once (basically all the “nested” packages from the yarn install chapter above), and group them by their version from package.json.

Example of the real duplicates in Jira

Second, replace all encounters of the “same” package with the very first one from the list

And 💥, there is no “third”, the solution works, it’s safe, and reduces bundle sizes in Jira by ~10%.

Minimal implementation of the deduplication logic

The full implementation was literally just 100 lines. Be mindful with celebrating and copying the approach though, this is not the end of the article 😉

Deduplication in webpack — actual solution

Debugging what the hell is going on, understanding why and releasing a solution that fixes it for good took a week, deep dive into the internals of NormalModuleReplacementPlugin and webpack itself and would take another huge article to describe properly. I will try to list here the most interesting things about yarn and webpack (in no particular order) that were discovered along the way and could be useful for others to know.

Findings and other curiosities

The non-deterministic behaviour is reproducible non-deterministically

Interestingly, the order in which the hook that you need to listen to in order to override requests for files is always different, but the final assets on small examples (and without deduping for that matter) are deterministic.

NormalModuleReplacementPlugin was not built for the purpose.

{
request: "./styled",
context: "/project/node_modules/editor/node_modules/button"
}

The final “path” to the file is a resolution of both, and in order to properly detect “duplicates”, we need to watch them both and replace all the fields that have the “duplicated” information (including context), which NormalModuleReplacementPlugin does not do.

Nevertheless, NormalModuleReplacementPlugin can actually be used if there is a need

From this:

{
request: "./styled",
context: "/project/node_modules/editor/node_modules/button"
}

it transforms into this

{
request: "/project/node_modules/modal-dialog/node_modules/button/styled",
context: "/project/node_modules/editor/node_modules/button"
}

and webpack is okay with it and able to correctly bundle it 😲

“naive” replace (string.startsWith) is not going to work

  • button@2.x and icon@3.x at the root
  • editor has button@1.x as a transitive dependency, which has icon@1.x as a transitive dependency
  • modal-dialog has button@1.x as a transitive dependency, which has icon@1.x as a transitive dependency PLUS it has icon@2.x as its own transitive dependency (those who survived and read til this moment — kudos to you, you’re heroes)

this will be represented as the following folder structure:

/node_modules
/editor
/node_modules
/button-1.3.0
/icon-1.0.0 // on the same level as button above
/modal-dialog
/node_modules
/button-1.3.0
/node_modules
/icons-1.0.0 // nested within button since on the lvl above there is another icon
/icon-2.0.0
/button-2.5.0

and final requests to icon@1.x will be:

/project/node_modules/editor/node_modules/icon-1.0.0
/project/node_modules/modal_dialog/node_modules/button-1.3.0/node_modules/icon-1.0.0

considering that button@1.x is a duplicate, we need to replace button in modal_dialog with the button from editor. And just “naive” startsWith will replace

 /project/node_modules/modal_dialog/node_modules/button-1.3.0

with

 /project/node_modules/editor/node_modules/button-1.3.0

and the path to the last icon will be transformed into

/project/node_modules/editor/node_modules/button-1.3.0/node_modules/icon-1.0.0

but there is no icon at this path.

The end

  • non-deterministic order of hooks on the webpack side
  • “naive” initial strings replacement
  • listening and replacing only “request” in the initial implementation
  • a few other edge cases that are not mentioned here and that caused not all modules to be resolved

Now, that it’s solved, and the plugin was battle-tested in Jira, everyone else can use it too and shrink their bundles a bit. In Jira it gave us ~10% overall of the bundle size reduction and ~300ms TTI improvement in the Issue View page. The plugin is available here: https://github.com/atlassian-labs/webpack-deduplication-plugin

Frontend architect & aficionado, CI/CD and automations enthusiast. Love solving problems and fixing things.