What Most Java Developers Miss About HashMap - And Why It Might Be Slowing You Down

AI Summary4 min read

TL;DR

HashMap's initial capacity is often misunderstood, leading to performance issues. Java 19 introduced HashMap::newHashMap to standardize sizing. Use it or tools like ArchUnit for better code quality.

Key Takeaways

•Misusing HashMap's initial capacity can cause unnecessary resizing and slow performance.
•Java 19's HashMap::newHashMap method provides a standardized way to set capacity based on expected entries.
•Before Java 19, use helper methods or ArchUnit to enforce correct HashMap initialization.

The Hidden Pitfall of HashMap’s Initial Capacity

If you’ve been working with Java for a while, chances are you’ve used HashMap without giving much thought to its inner workings. After all, it’s a staple collection class-one of those things you just expect to work.

But here’s the catch: misusing the initial capacity setting can lead to unexpected performance issues, even in welloptimized codebases. And the worst part? Most developers don’t even realize they’re making the mistake.

So, what’s going on here? And how did Java 19 finally fix the problem?

Let’s break it down.

Understanding HashMap's Parameters

At its core, a HashMap has two parameters that dictate how efficiently it stores data: initial capacity and load factor.

Capacity is simply the number of buckets available to store entries. Load factor controls when the hash table should grow.
Whenever the number of entries exceeds the product of the load factor and current capacity, Java rehashes the table, increasing the number of buckets - usually doubling them. This automatic resizing is great in theory but can introduce unnecessary overhead if developers don’t set the initial capacity correctly.

Why Default Initialization is Misleading

Here’s where the problem comes in: most developers assume that setting the initial capacity means they’re defining the number of key-value pairs their HashMap will hold. But that’s not the case.
Instead, the constructor defines the bucket count, which doesn’t directly map to the expected number of entries. Because of this, developers have spent years manually adjusting their calculations using various formulas - each with slightly different results.

Some common ways Java developers have estimated the right capacity include:

(int) (numMappings / 0.75f) + 1
(int) ((float) numMappings / 0.75f + 1.0f)
(numMappings * 4 + 2) / 3
(int) ((numMappings * 4L + 2L) / 3L)
(int) Math.ceil(numMappings / 0.75f)

This inconsistency has popped up even in official Java code take java.lang.module.Resolver::makeGraph, which also contains a flawed assumption about capacity (source).

How Java 19 Finally Fixed It

After years of developers reinventing the wheel, Java 19 introduced HashMap::newHashMap(int numMappings), which finally offers a standardized way to create a properly sized HashMap.
Here’s how it works:

public static <K, V> HashMap<K, V> newHashMap(int numMappings) {
    if (numMappings < 0) {
        throw new IllegalArgumentException("Negative number of mappings: " + numMappings);
    }
    return new HashMap<>(calculateHashMapCapacity(numMappings));
}

static final float DEFAULT_LOAD_FACTOR = 0.75f;

static int calculateHashMapCapacity(int numMappings) {
    return (int) Math.ceil(numMappings / (double) DEFAULT_LOAD_FACTOR);
}

Enter fullscreen mode Exit fullscreen mode

Instead of manually tweaking capacity values, developers can now call this method and let Java handle the calculation. This ensures that the map is correctly sized and avoids unnecessary resizing operations.

The update was part of JDK-8186958, and the full implementation can be found in this commit.

Best Practices Before Java 19

If you’re working with a Java version before 19, you have a couple of options:

Implement your own helper method similar to newHashMap to ensure correctly sized maps.
Prevent incorrect constructor usage across your team using code quality tools.

Using ArchUnit for Code Quality

One way to enforce best practices is with ArchUnit, which allows you to restrict incorrect constructor calls across your codebase.
The following rule ensures that developers don’t use new HashMap<>(capacity), forcing them to call HashMap.newHashMap(numMappings) instead:

import com.tngtech.archunit.base.DescribedPredicate;
import com.tngtech.archunit.core.domain.JavaConstructorCall;
import com.tngtech.archunit.junit.AnalyzeClasses;
import com.tngtech.archunit.junit.ArchTest;
import com.tngtech.archunit.lang.ArchRule;

import java.util.HashMap;

import static com.tngtech.archunit.lang.syntax.ArchRuleDefinition.noClasses;

@AnalyzeClasses(packages = "org.example")
public class HashMapCapacityConstructorRulesTest {
    @ArchTest
    static final ArchRule no_class_should_call_hashmap_capacity_constructor =
        noClasses().should().callConstructorWhere(new DescribedPredicate<>("Should not call HashMap<>(capacity). Use HashMap.newHashMap instead") {
            @Override
            public boolean test(JavaConstructorCall constructorCall) {
                return constructorCall.getTarget().getOwner().isEquivalentTo(HashMap.class)
                    && constructorCall.getTarget().getRawParameterTypes().size() == 1
                    && constructorCall.getTarget().getRawParameterTypes().getFirst().isEquivalentTo(int.class);
            }
        });
}

Enter fullscreen mode Exit fullscreen mode

This helps teams prevent misleading API usage, ensuring that their projects are future-proof and scalable.

Conclusion: Why Clear API Design Matters

Sometimes, even fundamental Java classes contain misleading API designs. While using the wrong HashMap constructor may not always cause obvious performance bottlenecks, it’s still better to adopt modern best practices to avoid hidden inefficiencies.

With Java 19 introducing newHashMap, developers can finally move away from unclear capacity calculations. And for those still using older versions, enforcing proper usage with tools like ArchUnit is an easy win for code quality.

At the end of the day, clear API design isn’t just about performance - it’s about making code easier to understand, maintain, and evolve.