Some notes about the right usage of memoization in Python

AI

After an outcry, OpenAI swiftly rereleased 4o to paid users. But experts say it should not have removed the model so suddenly.

OpenAI’s decision to replace 4o with the more straightforward GPT-5 follows a steady drumbeat of news about the potentially harmful effects of extensive chatbot use. Reports of incidents in which ChatGPT sparked psychosis in users have been everywhere for the past few months, and in a blog post last week, OpenAI acknowledged 4o’s failure to…

AI

‘Cheapfake’ AI Celeb Videos Are Rage-Baiting People on YouTube

“They’re tweaking my voice or whatever they’re doing, tweaking their own voice to make it sound like me, and people are commenting on it like it is me and it ain’t me,” Washington recently told WIRED, when asked about AI. “I don’t have an Instagram account. I don’t have TikTok. I don’t have any of…

AI

GPT-5 Doesn’t Dislike You—It Might Just Need a Benchmark for Emotional Intelligence

Since the all-new ChatGPT launched on Thursday, some users have mourned the disappearance of a peppy and encouraging personality in favor of a colder, more businesslike one (a move seemingly designed to reduce unhealthy user behavior.) The backlash shows the challenge of building artificial intelligence systems that exhibit anything like real emotional intelligence. Researchers at…

AI

OpenAI Designed GPT-5 to Be Safer. It Still Outputs Gay Slurs

OpenAI is trying to make its chatbot less annoying with the release of GPT-5. And I’m not talking about adjustments to its synthetic personality that many users have complained about. Before GPT-5, if the AI tool determined it couldn’t answer your prompt because the request violated OpenAI’s content guidelines, it would hit you with a…

Memoization

In computing, memoization is an optimization method that speeds up programs by caching the results of costly function calls and reusing those stored results when the same inputs appear again.Retry

Python’s `functools.lru_cache` and `functools.cache`

functools — Higher-order functions and operations on callable objects

Simple lightweight unbounded function cache. Sometimes called “memoize”.

Returns the same as lru_cache(maxsize=None), creating a thin wrapper around a dictionary lookup for the function arguments. Because it never needs to evict old values, this is smaller and faster than lru_cache() with a size limit.

For example:

@cache
def factorial(n):
    return n * factorial(n-1) if n else 1
>>> factorial(10)      # no previously cached result, makes 11 recursive calls
3628800
>>> factorial(5)       # just looks up cached value result
120
>>> factorial(12)      # makes two new recursive calls, the other 10 are cached
479001600

The cache is threadsafe so that the wrapped function can be used in multiple threads. This means that the underlying data structure will remain coherent during concurrent updates.

What are the places I have seen reused in recent years?

Avoid re-re-re-load costly yaml files

@cache
def get_glossary_data():
    return load_yaml_file("nomenclature/glossary.yml")

Implement some Multiton on stuff like localized i18n catalogs

@cache
def get(locale: str, *, domain: str) -> Translations:
    ... # a lot of heavy work
    return Translations(...)

Compile once slow jmespath expressions

@cache
def compile_query(expression: str) -> ParsedResult:
    return jmespath.compile(expression)

Extend an object from another domain

@cache
def load_settings(entity_id: str) -> Settings:
    return SettingsFactory(entity_id).load_settings()

Maybe there are other ways to use it in the code source?

Things to be aware of, things to avoid, and things to think about

🌞 When it is a pure function

from lib.i18n import get_locale
@cache
def get_heavy_stuff():
    for _ in range(0, 1_000**2):
        ...
    return "funny"

nothing to say. Just use it 😎

🚨 When implementation uses contextual content under the hood

For example

from lib.i18n import get_locale
@cache
def get_that_thing(text):
    locale = get_locale()  # ⛔️🚨🚔👮🏽📢
    return TheHeavyThing(text, locale=locale)

In this example, locale is an hidden parameter. Refactor:

from lib.i18n import DEFAULT_LOCALE
@cache
def get_that_thing(text, locale: str = DEFAULT_LOCALE): # 👼🌴🌞
    return TheHeavyThing(text, locale=locale)

🐒 Monkey patches that change constants

🐒🌴🥥

# lib.some.thing
MAGIC_VALUE = 42
@cache
def get_that_power():
    return MAGIC_VALUE ** 2
def am_i_strong_enough(me):
    return get_that_power() < me

# tests.lib.some.thing_test
from lib.some.thing_test import get_that_power
def test_get_that_number_1(monkeypatch):
    monkeypatch.setattr("lib.some.thing.MAGIC_VALUE", 1)  # 🌴🥥
    assert get_that_power() == 1
...
def test_am_i_strong_enough():
    assert am_i_strong_enough(10) is False  # 🫨😨🫨😨

Avoid to patch constant whenever it’s possible, it will make you trouble. 🫨

By refactoring the code you should be able to get by, and so should your colleagues.

😰 Using @cache to decorate methods

class DumbestCalculatorEver:
    @cache
    def double(self, x): return x * 2
DumbestCalculatorEver()  # never garbage collected
DumbestCalculatorEver()  # never garbage collected
DumbestCalculatorEver()  # never garbage collected

Don’t or do it wisely on a singleton. Because the object will become immortal, and may leads to memory leaks.

🤔 Cache entities properties naively

@cache
def is_settings_metasyntactical(settings: Settings) -> bool:
    return settings.search("foo || bar || baz || `false`")
settings1 = Settings({"id": "xxx", "foo": True})
settings2 = Settings({"id": "xxx", "bar": False})
settings3 = Settings({"id": "xxx", "baz": True})
assert is_settings_metasyntactical(settings1) is True  # ???
assert is_settings_metasyntactical(settings2) is False  # ???
assert is_settings_metasyntactical(settings3) is True  # ???
assert hash(settings1) == hash(settings2) == hash(settings3)  # the truth

What can we do for it? I don’t think that there is an existing mechanism to handle this case properly, nor there is a single way to handle this case.

This has caused tests to be flaky, but might also be causing bugs in production.

Some ideas to handle entities properties case

Given this function

@cache
def is_settings_metasyntactical(settings: Settings) -> bool:
    return settings.search("foo || bar || baz || `false`")

We expect that this function behaves like a Settings property, which would be equivalent to:

class Settings:
    @property
    def is_metasyntactical(self):
        return self.search("foo || bar || baz || `false`")

By chance, it is unlikely that Settings config is changed during a script resolution.

Moreover cached settings are always cleared a the end of a script / call.

There are several ways of handle this case:

Idea 1: always clear this cache at the end of a script / query

try:
    return resolve_the_query()
finally:
    is_settings_metasyntactical.clear_cache

Idea 2: let the entity register external attributes? for example:

class Settings:
    cache = {}
def is_settings_metasyntactical(settings: Settings) -> bool:
    cache_key = is_settings_metasyntactical
    try:
        result = settings.cache[cache_key]
    except KeyError:
        result = settings.cache[cache_key] = settings.search("foo || bar || baz || `false`")
    return result

Because we always clear settings at the end of a script / call, the cached data will be destroyed with entity instances.

Idea 3: use an external cache using weakref

The weakref module allows the Python programmer to create weak references to objects.

A weak reference to an object is not enough to keep the object alive: when the only remaining references to a referent are weak references, garbage collection is free to destroy the referent and reuse its memory for something else.

from weakref import WeakKeyDictionary
_cache = WeakKeyDictionary()
def is_settings_metasyntactical(settings: Settings) -> bool:
    try:
        result = _cache[settings]
    except KeyError:
        result = _cache[settings] = settings.search("foo || bar || baz || `false`")
    return result