A Daily Pseudo-Hash Function

2022-08-28

I've found Wait But Why's posts on procrastination extremely useful and a great read. If you haven't read them, I highly recommend them: 1, 2, 3.

In the third post, "The Procrastination Matrix", Tim Urban describes the Eisenhower Matrix as a productivity tool. I made a Google sheet to sort my tasks according to this and place each of them into one of the quadrants. In particular, I want to make sure I am giving Q2 the time it deserves. But I have quite a few different things that I want to spend time on at some point (bass clarinet, worldbuilding simulations, learning more Turkish, reading one of several books that I'm in the middle of, writing things on this website, etc...). I get stuck with choice paralysis when I have free time and I'm looking at the long list of things my past self wants me to devote time to.

So what to do? Other than triage the list so that Q2 only contains things that are truly important for my goals, I wanted to randomize the order of things within each quadrant so that I get a different priority list each day. I needed a function to assign a random number to each task. I also wanted to be able to view it on my Google Sheets app on my phone, so it had to run without using Apps Script. I wanted the order to stay the same within each day but to change from day to day, so just using RAND() or something wouldn't work. It needed to be deterministic, depend on the day, be different for each task in the list (or at least be i.i.d.), and not change when I sort the sheet to reflect the new order (so using ROW() wouldn't work).

I designed a function that depends on the string describing the task (e.g. "read Ministry for the Future" or "clarinet") and the day. For the day I just used number of days since epoch.

For the stuff depending on the string, designing a legit hash function is over my head (I've read the SHA-256 specification, which is a fun read, but yeah that is overkill for this). I couldn't find an existing one in Excel or Google Sheets. So I hacked up something that looks random enough! I needed a bunch of different numbers, all deterministic from the string, that I could throw into a blender of math to generate the random value. Here's what I chose.

In the Excel formulas, S1 denotes the cell containing the string.

DayNum () = The number of days since January 1, 1970.
=TODAY() - DATE(1970,1,1)

Len ()= The length of the string.
=LEN(S1)

LenNoVowels () = The length of the string without lowercase vowels.
=LEN(REGEXREPLACE(S1,"[aoeui]",""))

LenNoLower () = The length of the string without lowercase letters.
=LEN(REGEXREPLACE(S1,"[a-z]",""))

LenNoUpper () = The length of the string without uppercase letters.
=LEN(REGEXREPLACE(S1,"[A-Z]",""))

LenNoAM () = The length of the string without the letters A through M (upper or lower).
=LEN(REGEXREPLACE(S1,"[A-Ma-m]",""))

LenNoNZ () = The length of the string without the letters N through Z (upper or lower).
=LEN(REGEXREPLACE(S1,"[N-Zn-z]",""))

FirstCode () = The ASCII value of the first character in the string (assuming you're restricting the strings to ASCII).
=CODE(LEFT(S1,1))

LastCode () = The ASCII value of the last character in the string (assuming you're restricting the strings to ASCII).
=CODE(RIGHT(S1,1))

Now it's time to combine these in some way that will generate random-looking stuff! I decided to do a bunch of operations that would create decimals, like involving pi and weird powers and stuff, and then just take the decimal part to some precision.

DayMod ()
(LaTeX)
(Excel) =MOD(DayNum, 17.17 * PI())

AltSumProd ()
(LaTeX)
(Excel) =Len * LenNoVowels + LenNoLower * LenNoUpper + LenNoAM * LenNoNZ + FirstCode * LastCode

AltSumProd2 ()
(LaTeX)
(Excel) =MOD(Len ^ 2.1, 11) * MOD(LenNoVowels ^ 2.3, 13) + MOD(LenNoLower ^ 2.5, 17) * MOD(LenNoUpper ^ 2.7, 19)
+ MOD(LenNoAM ^ 2.9, 23) * MOD(LenNoNZ ^ 3.1, 29) + MOD(FirstCode ^ 3.3, 31) * MOD(LastCode ^ 3.5, 37)

And finally, the result:

RandVal ()
(LaTeX)
(Excel) =FLOOR(1000000 * MOD(MOD(DayMod ^ (1 + MOD(AltSumProd2, 1)), PI()) * AltSumProd, 1))

This creates integers from 0 to 999999, inclusive.

The day I am writing this is 2022-08-28, so DayNum is 19232. For the string "research visa", we have the following variable values:

Len = 13
LenNoVowels = 8
LenNoLower = 1 (the space is the only character left)
LenNoUpper = 13
LenNoAM = 6
LenNoNZ = 8
FirstCode = 114
LastCode = 97
DayMod = 28.95207308
AltSumProd = 11223
AltSumProd2 = 494.8233498
RandVal = 894912

I tested this in Python (link to script) with a variety of strings and day numbers. Strings were randomly chosen substrings of several books I have as text files for testing stuff like this, such as The Count of Monte Cristo. The resulting numbers look pretty uniform. I haven't run any actual statistical tests on it because it looked good enough for my purposes, so there could be some non-uniformity in the distribution somewhere.

This was fun to make! I find myself wondering how functions like this work. Where is the dividing line between something like just DayNum + Len that is highly predictable, and something like this RandVal function which is highly unpredictable? Is there a gradual or sudden transition between ordered and hash-like behavior as you add more steps? Where can I read about this? Let me know! :)