Memoization
In computing, memoization is an optimization method that speeds up programs by caching the results of costly function calls and reusing those stored results when the same inputs appear again.Retry
Python’s functools.lru_cache
and functools.cache
functools — Higher-order functions and operations on callable objects
Simple lightweight unbounded function cache. Sometimes called “memoize”.
Returns the same as lru_cache(maxsize=None)
, creating a thin wrapper around a dictionary lookup for the function arguments. Because it never needs to evict old values, this is smaller and faster than lru_cache()
with a size limit.
For example:
@cache
def factorial(n):
return n * factorial(n-1) if n else 1
>>> factorial(10) # no previously cached result, makes 11 recursive calls
3628800
>>> factorial(5) # just looks up cached value result
120
>>> factorial(12) # makes two new recursive calls, the other 10 are cached
479001600
The cache is threadsafe so that the wrapped function can be used in multiple threads. This means that the underlying data structure will remain coherent during concurrent updates.
What are the places I have seen reused in recent years?
Avoid re-re-re-load costly yaml files
@cache
def get_glossary_data():
return load_yaml_file("nomenclature/glossary.yml")
Implement some Multiton on stuff like localized i18n catalogs
@cache
def get(locale: str, *, domain: str) -> Translations:
... # a lot of heavy work
return Translations(...)
Compile once slow jmespath expressions
@cache
def compile_query(expression: str) -> ParsedResult:
return jmespath.compile(expression)
Extend an object from another domain
@cache
def load_settings(entity_id: str) -> Settings:
return SettingsFactory(entity_id).load_settings()
Maybe there are other ways to use it in the code source?
Things to be aware of, things to avoid, and things to think about
🌞 When it is a pure function
from lib.i18n import get_locale
@cache
def get_heavy_stuff():
for _ in range(0, 1_000**2):
...
return "funny"
nothing to say. Just use it 😎
🚨 When implementation uses contextual content under the hood
For example
from lib.i18n import get_locale
@cache
def get_that_thing(text):
locale = get_locale() # ⛔️🚨🚔👮🏽📢
return TheHeavyThing(text, locale=locale)
In this example, locale is an hidden parameter. Refactor:
from lib.i18n import DEFAULT_LOCALE
@cache
def get_that_thing(text, locale: str = DEFAULT_LOCALE): # 👼🌴🌞
return TheHeavyThing(text, locale=locale)
🐒 Monkey patches that change constants
🐒🌴🥥
# lib.some.thing
MAGIC_VALUE = 42
@cache
def get_that_power():
return MAGIC_VALUE ** 2
def am_i_strong_enough(me):
return get_that_power() < me
# tests.lib.some.thing_test
from lib.some.thing_test import get_that_power
def test_get_that_number_1(monkeypatch):
monkeypatch.setattr("lib.some.thing.MAGIC_VALUE", 1) # 🌴🥥
assert get_that_power() == 1
...
def test_am_i_strong_enough():
assert am_i_strong_enough(10) is False # 🫨😨🫨😨
Avoid to patch constant whenever it’s possible, it will make you trouble. 🫨
By refactoring the code you should be able to get by, and so should your colleagues.
😰 Using @cache to decorate methods
class DumbestCalculatorEver:
@cache
def double(self, x): return x * 2
DumbestCalculatorEver() # never garbage collected
DumbestCalculatorEver() # never garbage collected
DumbestCalculatorEver() # never garbage collected
Don’t or do it wisely on a singleton. Because the object will become immortal, and may leads to memory leaks.
🤔 Cache entities properties naively
@cache
def is_settings_metasyntactical(settings: Settings) -> bool:
return settings.search("foo || bar || baz || `false`")
settings1 = Settings({"id": "xxx", "foo": True})
settings2 = Settings({"id": "xxx", "bar": False})
settings3 = Settings({"id": "xxx", "baz": True})
assert is_settings_metasyntactical(settings1) is True # ???
assert is_settings_metasyntactical(settings2) is False # ???
assert is_settings_metasyntactical(settings3) is True # ???
assert hash(settings1) == hash(settings2) == hash(settings3) # the truth
What can we do for it? I don’t think that there is an existing mechanism to handle this case properly, nor there is a single way to handle this case.
This has caused tests to be flaky, but might also be causing bugs in production.
Some ideas to handle entities properties case
Given this function
@cache
def is_settings_metasyntactical(settings: Settings) -> bool:
return settings.search("foo || bar || baz || `false`")
We expect that this function behaves like a Settings property, which would be equivalent to:
class Settings:
@property
def is_metasyntactical(self):
return self.search("foo || bar || baz || `false`")
By chance, it is unlikely that Settings config is changed during a script resolution.
Moreover cached settings are always cleared a the end of a script / call.
There are several ways of handle this case:
Idea 1: always clear this cache at the end of a script / query
try:
return resolve_the_query()
finally:
is_settings_metasyntactical.clear_cache
Idea 2: let the entity register external attributes? for example:
class Settings:
cache = {}
def is_settings_metasyntactical(settings: Settings) -> bool:
cache_key = is_settings_metasyntactical
try:
result = settings.cache[cache_key]
except KeyError:
result = settings.cache[cache_key] = settings.search("foo || bar || baz || `false`")
return result
Because we always clear settings at the end of a script / call, the cached data will be destroyed with entity instances.
Idea 3: use an external cache using weakref
The weakref
module allows the Python programmer to create weak references to objects.
A weak reference to an object is not enough to keep the object alive: when the only remaining references to a referent are weak references, garbage collection is free to destroy the referent and reuse its memory for something else.
from weakref import WeakKeyDictionary
_cache = WeakKeyDictionary()
def is_settings_metasyntactical(settings: Settings) -> bool:
try:
result = _cache[settings]
except KeyError:
result = _cache[settings] = settings.search("foo || bar || baz || `false`")
return result
We rely on python to cleanup the cache during garbage collection.