Webpack and yarn magic against duplicates in bundles

This page describes the theory and some technical details behind the webpack-deduplication-plugin plugin, which helped us reduce javascript size in Jira by ~10%.

Setting up the scene

In order to not to turn this article into a book on modern dependencies and bundling tools, it is assumed that the reader has some understanding of what tools like npm, yarn and webpack are used for in modern frontend projects and want to dig a bit deeper in the magic behind the scenes.

The problem of duplicated dependencies

“Duplicated” dependencies in any of the middle- to large scale projects that rely on npm packages are inevitable. When a project has dozens “direct” dependencies, and every one of those has their own dependencies, the final number of all packages (direct and transitive) installed in a project can be close to hundreds. In this situation, it is more likely than not that some of the dependencies will be duplicated.

Deduplication in yarn

Consider, for example, a project, that among its direct dependencies has modal-dialog@3.0.0 and button@2.5.0, and modal-dialog brings button@2.4.1 as a transitive dependency. If left unduplicated, both buttons will exist in the project

modal-dialog@^3.0.0:
version "3.0.0"
resolved "exact-link-to-where-download-modal-dialog-3.0.0-from"
dependencies:
button@^2.4.1

button@^2.5.0:
version "2.5.0"
resolved "exact-link-to-where-download-2.5.0-version-from"

button@^2.4.1:
version "2.4.1"
resolved "exact-link-to-where-download-2.4.1-version-from"
modal-dialog@^3.0.0:
version "3.0.0"
resolved "exact-link-to-where-download-modal-dialog-3.0.0-from"
dependencies:
button@^2.4.1

button@^2.4.1, button@^2.5.0:
version "2.5.0"
resolved "exact-link-to-where-download-2.5.0-version-from"

Deduplication in yarn — not compatible version

The above deduplication technic is the only thing that we usually have in the fight against duplicates, and usually, it works quite well. But what will happen if a project has not-semver-dedupable transitive dependencies? If, for example, our project has modal-dialog@3.0.0, button@2.5.0 and editor@5000.0.0 as direct dependencies, and those bring button@1.3.0 and button@1.0.0 as transitive dependencies?

modal-dialog@^3.0.0:
version "3.0.0"
resolved "exact-link-to-where-download-modal-dialog-3.0.0-from"
dependencies:
button@^1.0.0

editor@^5000.0.0:
version "5000.0.0"
resolved "exact-link-to-where-download-editor-5000.0.0-from"
dependencies:
button@^1.3.0

button@^2.5.0:
version "2.5.0"
resolved "exact-link-to-where-download-2.5.0-version-from"

button@^1.0.0, button@^1.3.0:
version "1.3.0"
resolved "exact-link-to-where-download-1.3.0-version-from"

Duplicated dependencies install

When we install our dependencies via classic yarn or npm (pnpm or yarn 2.0 change the situation and are not considered here), npm hoists everything that is possible up to the root node_modules. If, for example, in our project above both editor and modal-dialog have a dependency on the same “deduped” version of tooltip, but our project does not, npm will install it at the root of the project.

/node_modules
/editor
/modal-dialog
/tooltip

Duplicated dependencies and webpack

So, if a project is bundled with webpack, how exactly it handles the situation from above? It doesn’t actually (there was webpack dedup plugin in the long past, but it was removed after webpack 2.0).

  • project/node_modules/modal-dialog/node_modules/button/index.js — when it’s requested from within modal-dialog

Deduplication in webpack — first attempt

Since those buttons are exactly the same, the very first question that comes into mind: is it possible to take advantage of that and “trick” webpack into recognising it? And indeed, it is possible and it is ridiculously simple.

Example of the real duplicates in Jira
Minimal implementation of the deduplication logic

Deduplication in webpack — actual solution

While the solution above worked good and safe (we did vigorous testing of it before releasing it to production), it had an unfortunate side-effect: webpack started to generate assets in non-deterministic way 🤦‍♀ On every single re-build it was either moving some pieces of those “duplicated” modules around or was just generating new internal ids. Any possible reason within the code from our side was eliminated quite fast. Something weird was happening within webpack internals themselves.

Findings and other curiosities

The non-deterministic behaviour is reproducible non-deterministically

If you try to reproduce the non-deterministic part on a small synthetic example, most likely you won’t be able to. I was only able to do so it in my toy repo when I imported the entire @atlaskit/editor to it, just a combination of a few Atlaskit components didn’t do it. Reducing the chunk size to a minimum to make webpack split code, manual async imports also didn’t help.

NormalModuleReplacementPlugin was not built for the purpose.

First of all, it executes the RegExp only on request property of the result and replaces only request as well (check out the source). However, there are many more properties in the result object that contain information about the origin of the module (and in theory need to be replaced as well), one of which is context — where the request actually originated from. And if a file is requested relatively, it will have a relative path in the request property as well.

{
request: "./styled",
context: "/project/node_modules/editor/node_modules/button"
}

Nevertheless, NormalModuleReplacementPlugin can actually be used if there is a need

Because everything that is said in the “finding” above is not entirely correct, and what turned out to be enough in the end is to replace just request with the resolved absolute path from both of those.

{
request: "./styled",
context: "/project/node_modules/editor/node_modules/button"
}
{
request: "/project/node_modules/modal-dialog/node_modules/button/styled",
context: "/project/node_modules/editor/node_modules/button"
}

“naive” replace (string.startsWith) is not going to work

Even if in theory if packages are “the same”, in reality, they are not, and the difference is called “transitive dependencies of transitive dependencies”. The “simplest” example of the use case would be:

  • editor has button@1.x as a transitive dependency, which has icon@1.x as a transitive dependency
  • modal-dialog has button@1.x as a transitive dependency, which has icon@1.x as a transitive dependency PLUS it has icon@2.x as its own transitive dependency (those who survived and read til this moment — kudos to you, you’re heroes)
/node_modules
/editor
/node_modules
/button-1.3.0
/icon-1.0.0 // on the same level as button above
/modal-dialog
/node_modules
/button-1.3.0
/node_modules
/icons-1.0.0 // nested within button since on the lvl above there is another icon
/icon-2.0.0
/button-2.5.0
/project/node_modules/editor/node_modules/icon-1.0.0
/project/node_modules/modal_dialog/node_modules/button-1.3.0/node_modules/icon-1.0.0
 /project/node_modules/modal_dialog/node_modules/button-1.3.0
 /project/node_modules/editor/node_modules/button-1.3.0
/project/node_modules/editor/node_modules/button-1.3.0/node_modules/icon-1.0.0

The end

So, what was the final reason for non-deterministic behaviour? It is actually a combination of the:

  • “naive” initial strings replacement
  • listening and replacing only “request” in the initial implementation
  • a few other edge cases that are not mentioned here and that caused not all modules to be resolved

Frontend architect & aficionado, CI/CD and automations enthusiast. Love solving problems and fixing things.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store