Some notes about the right usage of memoization in Python




Memoization

Memoization

In computing, memoization is an optimization method that speeds up programs by caching the results of costly function calls and reusing those stored results when the same inputs appear again.Retry



Python’s functools.lru_cache and functools.cache

functools — Higher-order functions and operations on callable objects

Simple lightweight unbounded function cache. Sometimes called “memoize”.

Returns the same as lru_cache(maxsize=None), creating a thin wrapper around a dictionary lookup for the function arguments. Because it never needs to evict old values, this is smaller and faster than lru_cache() with a size limit.

For example:

@cache
def factorial(n):
    return n * factorial(n-1) if n else 1
>>> factorial(10)      # no previously cached result, makes 11 recursive calls
3628800
>>> factorial(5)       # just looks up cached value result
120
>>> factorial(12)      # makes two new recursive calls, the other 10 are cached
479001600
Enter fullscreen mode

Exit fullscreen mode

The cache is threadsafe so that the wrapped function can be used in multiple threads. This means that the underlying data structure will remain coherent during concurrent updates.



What are the places I have seen reused in recent years?

Avoid re-re-re-load costly yaml files

@cache
def get_glossary_data():
    return load_yaml_file("nomenclature/glossary.yml")
Enter fullscreen mode

Exit fullscreen mode

Implement some Multiton on stuff like localized i18n catalogs

@cache
def get(locale: str, *, domain: str) -> Translations:
    ... # a lot of heavy work
    return Translations(...)
Enter fullscreen mode

Exit fullscreen mode

Compile once slow jmespath expressions

@cache
def compile_query(expression: str) -> ParsedResult:
    return jmespath.compile(expression)
Enter fullscreen mode

Exit fullscreen mode

Extend an object from another domain

@cache
def load_settings(entity_id: str) -> Settings:
    return SettingsFactory(entity_id).load_settings()
Enter fullscreen mode

Exit fullscreen mode

Maybe there are other ways to use it in the code source?



Things to be aware of, things to avoid, and things to think about



🌞 When it is a pure function

from lib.i18n import get_locale
@cache
def get_heavy_stuff():
    for _ in range(0, 1_000**2):
        ...
    return "funny"
Enter fullscreen mode

Exit fullscreen mode

nothing to say. Just use it 😎



🚨 When implementation uses contextual content under the hood

For example

from lib.i18n import get_locale
@cache
def get_that_thing(text):
    locale = get_locale()  # ⛔️🚨🚔👮🏽📢
    return TheHeavyThing(text, locale=locale)
Enter fullscreen mode

Exit fullscreen mode

In this example, locale is an hidden parameter. Refactor:

from lib.i18n import DEFAULT_LOCALE
@cache
def get_that_thing(text, locale: str = DEFAULT_LOCALE): # 👼🌴🌞
    return TheHeavyThing(text, locale=locale)
Enter fullscreen mode

Exit fullscreen mode



🐒 Monkey patches that change constants

🐒🌴🥥

# lib.some.thing
MAGIC_VALUE = 42
@cache
def get_that_power():
    return MAGIC_VALUE ** 2
def am_i_strong_enough(me):
    return get_that_power() < me
Enter fullscreen mode

Exit fullscreen mode

# tests.lib.some.thing_test
from lib.some.thing_test import get_that_power
def test_get_that_number_1(monkeypatch):
    monkeypatch.setattr("lib.some.thing.MAGIC_VALUE", 1)  # 🌴🥥
    assert get_that_power() == 1
...
def test_am_i_strong_enough():
    assert am_i_strong_enough(10) is False  # 🫨😨🫨😨
Enter fullscreen mode

Exit fullscreen mode

Avoid to patch constant whenever it’s possible, it will make you trouble. 🫨

By refactoring the code you should be able to get by, and so should your colleagues.



😰 Using @cache to decorate methods

class DumbestCalculatorEver:
    @cache
    def double(self, x): return x * 2
DumbestCalculatorEver()  # never garbage collected
DumbestCalculatorEver()  # never garbage collected
DumbestCalculatorEver()  # never garbage collected
Enter fullscreen mode

Exit fullscreen mode

Don’t or do it wisely on a singleton. Because the object will become immortal, and may leads to memory leaks.



🤔 Cache entities properties naively

@cache
def is_settings_metasyntactical(settings: Settings) -> bool:
    return settings.search("foo || bar || baz || `false`")
settings1 = Settings({"id": "xxx", "foo": True})
settings2 = Settings({"id": "xxx", "bar": False})
settings3 = Settings({"id": "xxx", "baz": True})
assert is_settings_metasyntactical(settings1) is True  # ???
assert is_settings_metasyntactical(settings2) is False  # ???
assert is_settings_metasyntactical(settings3) is True  # ???
assert hash(settings1) == hash(settings2) == hash(settings3)  # the truth
Enter fullscreen mode

Exit fullscreen mode

What can we do for it? I don’t think that there is an existing mechanism to handle this case properly, nor there is a single way to handle this case.

This has caused tests to be flaky, but might also be causing bugs in production.



Some ideas to handle entities properties case

Given this function

@cache
def is_settings_metasyntactical(settings: Settings) -> bool:
    return settings.search("foo || bar || baz || `false`")
Enter fullscreen mode

Exit fullscreen mode

We expect that this function behaves like a Settings property, which would be equivalent to:

class Settings:
    @property
    def is_metasyntactical(self):
        return self.search("foo || bar || baz || `false`")
Enter fullscreen mode

Exit fullscreen mode

By chance, it is unlikely that Settings config is changed during a script resolution.

Moreover cached settings are always cleared a the end of a script / call.

There are several ways of handle this case:



Idea 1: always clear this cache at the end of a script / query

try:
    return resolve_the_query()
finally:
    is_settings_metasyntactical.clear_cache
Enter fullscreen mode

Exit fullscreen mode



Idea 2: let the entity register external attributes? for example:

class Settings:
    cache = {}
def is_settings_metasyntactical(settings: Settings) -> bool:
    cache_key = is_settings_metasyntactical
    try:
        result = settings.cache[cache_key]
    except KeyError:
        result = settings.cache[cache_key] = settings.search("foo || bar || baz || `false`")
    return result
Enter fullscreen mode

Exit fullscreen mode

Because we always clear settings at the end of a script / call, the cached data will be destroyed with entity instances.



Idea 3: use an external cache using weakref

The weakref module allows the Python programmer to create weak references to objects.

A weak reference to an object is not enough to keep the object alive: when the only remaining references to a referent are weak references, garbage collection is free to destroy the referent and reuse its memory for something else.

from weakref import WeakKeyDictionary
_cache = WeakKeyDictionary()
def is_settings_metasyntactical(settings: Settings) -> bool:
    try:
        result = _cache[settings]
    except KeyError:
        result = _cache[settings] = settings.search("foo || bar || baz || `false`")
    return result
Enter fullscreen mode

Exit fullscreen mode

We rely on python to cleanup the cache during garbage collection.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *