How AI matching works in a Jewish wisdom app: a transparent breakdown
How AI matching works in a Jewish wisdom app: a transparent breakdown
There is a reasonable suspicion users have about AI-driven religious or spiritual products. The suspicion is that the product is mostly a generic LLM in a costume — that the "Jewish wisdom" is whatever ChatGPT happened to confabulate when asked for Jewish wisdom, dressed up with a serif font and a Hebrew letter as a logo.
That suspicion is correct about a depressing share of the products in the category. So we want to be specific about what Shalem actually does, because the answer is structurally different from "we asked an LLM to be Jewish."
This is a transparent breakdown of the matching engine. It is technical in places. It is also editorial — most of the consequential choices are not algorithmic. They are decisions about what counts as a real source text, what counts as a defensible translation, and where the engine is allowed to generate versus where it is forbidden to invent.
What the engine does
A user types a sentence or two — "I'm anxious about a job interview tomorrow" or "my grandmother died last week and I don't know what to do with myself" or "I haven't been to shul in three years and I miss it but I don't know how to go back."
Three things happen:
- Embedding. The user's input is converted to a numerical representation (an embedding) using a standard sentence-embedding model. This is a one-way operation; the input cannot be reconstructed from the embedding.
- Retrieval. The embedding is compared against a pre-computed embedding for each of the 1,278 source texts in Shalem's curated database. The top candidates are ranked by semantic similarity.
- Selection and framing. A small generation model selects from the top candidates (with constraints — see below) and produces a short framing reflection in the user's language. The cited source text is real; the surrounding reflection is generated.
That is the whole pipeline. It is not "ask an LLM to give Jewish advice." It is retrieval against a vetted corpus, with constrained framing.
What the engine is forbidden to do
This is the part that matters more than the technical pipeline:
- The engine does not invent source citations. Every cited text is from the curated database. If a passage looks like Talmud, it is Talmud, with the tractate and folio reference available.
- The engine does not invent rabbinic authorities. The "Reb X said" pattern that LLMs love to confabulate is forbidden in the generation step. If a teaching is attributed to a named rabbi, the attribution is in the underlying source text, not generated.
- The engine does not translate texts on the fly. Translations come from established editions (Koren, JPS, Steinsaltz where applicable, public-domain editions where appropriate). On-the-fly translation is reserved for short phrases with established renderings.
- The engine does not pasken. It does not issue halachic rulings. If a user asks a halachic question, the engine routes to a general response that points the user to a competent rabbi.
- The engine does not diagnose. It does not name mental-health conditions. It does not assess severity. If a user's input shows signs of crisis, the engine routes to a different response that points the user to crisis resources.
- The engine does not retain user input. The reflection input is processed and immediately discarded. It is not persisted, aggregated, or used for training.
These are not soft preferences. They are hard constraints in the application layer, enforced before the response is returned. They were chosen because each of them is something LLMs do badly and confidently.
Why these constraints exist
The reason for the constraints is that the alternative is worse than nothing. An app that confabulates a Talmudic citation is worse than an app that has no citations at all, because the confident wrongness teaches users to trust a specific kind of fiction. An app that paskens halacha by averaging the LLM's training data is worse than an app that says "ask a rabbi," because the average answer is rarely the right answer for a specific person's specific question. An app that diagnoses a user's mental state is worse than an app that doesn't, because the diagnosis is uninformed and the user might believe it.
The constraint set exists because Shalem is trying to do one specific thing well, and the way to do that is to refuse to do the things adjacent to it badly.
The 1,278 source texts: how they were chosen
The curated database is composed roughly as follows:
- All 150 Tehillim. Indexed by emotional register (lament, gratitude, supplication, trust, royal, wisdom). Traditional Jewish-tradition associations preserved (Psalm 23 for grief; Psalm 121 for travel; Psalm 130 for depth; Psalm 27 for the Elul/Tishrei season).
- Selected aggadic Talmud. Tractates Berakhot, Sanhedrin, Sotah, Bava Metzia, Avot (Pirkei Avot), and Ta'anit are most heavily represented. Halachic material is largely excluded — Shalem is not a posek and the engine should not pretend to be.
- Midrash. Bereshit Rabbah, Shemot Rabbah, Vayikra Rabbah, Eicha Rabbah (especially for grief work), Tanchuma, and selected later collections. Narrative and parabolic material favored over derivational midrash.
- Chassidic. Material from Breslov (especially Likutei Moharan and Sippurei Ma'asiyot), Chabad (Tanya selections, accessible Likkutei Sichot), and Pshischa-school teachers. The selection criterion is interior-life material — what Chassidic tradition adds to the older corpus is largely psychological, and that is what Shalem draws on.
Selection criteria across the database:
- Emotional applicability. Texts that have something to say about a lived situation, not just a doctrinal one.
- Accessible without prerequisite. Texts that work for a user who has not previously studied Talmud, not just for someone who has.
- Translation defensibility. Texts where a defensible English translation exists.
- Cross-denominational viability. Texts that work across the Jewish denominational spectrum — Orthodox, Conservative, Reform, Reconstructionist, Renewal, secular-cultural. Shalem's editorial neutrality on denomination is reflected in source selection.
The database is not static. Texts are added periodically. Texts are rarely removed, and only when a translation is found unrecoverable. The current count of 1,278 is the working baseline as of May 2026.
What the engine generates, and what it doesn't
The framing reflection — the short piece of writing that introduces or contextualizes the matched source text — is generated. It is not retrieved from a fixed library of pre-written reflections. This is where the engine has the most latitude.
The generation step is constrained by:
- Tone. Contemplative rather than instructional. Short rather than long. Specific rather than general.
- Editorial policy. No invented citations, no invented authorities, no diagnostic language, no halachic claims, no proselytization.
- Length. Reflections are typically 60–180 words. Longer reflections are usually a sign that the engine is over-explaining and are flagged in editorial review.
- First-person framing. Reflections do not say "you should" or "Judaism teaches" as if they are a teacher. They name the moment and bring the text to it.
What this looks like in practice: a user types "I'm anxious about a job interview tomorrow." The engine retrieves Psalm 121 ("I lift my eyes to the mountains; from where will my help come?"). The reflection might be a paragraph noting that this psalm is traditionally recited before journeys and uncertain transitions, and pointing to its specific framing of help as accompaniment rather than rescue. The user gets the psalm, the reflection, and an optional 3-day practice. They do not get a sermon.
A note on cadence: 3 days is the default, but it's not the only option. Subscribers can activate a faster mode called "7 Days of Light" — once a month, the practice loop runs for 7 days with reflections arriving within 24 hours instead of 3 days. Same stories, same matching, same practice structure. Just compressed for weeks when waiting three days feels longer than the moment will hold. We built it because some weeks are like that, and the tradition has always understood the difference between zman (regular time) and moed (the kind of time a moment makes urgent).
What the matching does well
After roughly six months of the engine running in production, here is what it does well:
- Tehillim selection is consistently strong. The 150 Psalms are well-indexed and the matching is reliable.
- Grief and bereavement contexts match well to traditional sources — Eicha Rabbah, the seven psalms of consolation, Hilkhot Aveilut framing, specific Chassidic teachings on loss.
- Pre-Shabbat and pre-holiday transitions match well to seasonal material.
- Anxiety in specific shapes (anticipatory anxiety before an event; long-arc existential anxiety; depth-anxiety that has gone on a while) maps to recognizably appropriate texts.
What the matching does less well
This is the harder section. We want it here because the alternative is pretending the engine is more capable than it is.
- Theological doubt is a hard category. The Jewish tradition has rich material on doubt (Eichah, Iyov, Kohelet, Chassidic teaching on hester panim) but the matching engine sometimes surfaces texts that read as too confident-in-belief for someone whose anxiety is precisely about their belief.
- Trauma-related input is handled cautiously by design — the engine routes toward less interpretively dense material — but the routing is conservative and sometimes misses texts that would have served better.
- Joy and gratitude are categories where the matching is technically correct but emotionally flat more often than we would like. Joy is harder to write a contemplative reflection about than grief, and the engine reflects that asymmetry.
- Inputs in Hebrew, Italian, or any language other than English are matched approximately. Multilingual input handling is on the roadmap; it is not yet at the level we would like it.
We say this here because users who hit these failure modes deserve to know they are known failure modes, not personal failures of how they used the product. If the engine surfaces something that does not land, it is the engine, not you.
What we are not building
A handful of capabilities that would be technically possible and that we have decided not to build:
- A user model. Shalem does not learn what a specific user prefers and tune to it. Each session is independent. This is partly a privacy choice and partly an editorial choice — the engine should not be optimizing for engagement against a specific user's revealed taste.
- Streaks and gamification. The 3-day practice has no streak counter for the same reason.
- Generated psak. The engine will never issue halachic rulings.
- Generated translation of long passages. Translation of more than a short established phrase goes through the editorial process, not the engine.
What this means for users
When you type something into Shalem and get a reflection back, the cited text is real. You can verify it. You can look up the same passage on Sefaria. You can read the surrounding context. The engine is not making it up.
When the surrounding reflection lands oddly, that is the generation step doing its weakest work. Reading just the cited text — and ignoring the reflection — is a perfectly valid way to use the product. Sometimes the cited text is enough.
When you ask Shalem something it shouldn't answer (a halachic question, a clinical question, a question about whether to convert or which kind of community to join), the engine will say so and route you elsewhere. That is also working as intended.
A request
If you have used Shalem and the matching has either worked unusually well or unusually poorly, we want to hear from you. The system improves through real feedback, not through internal review alone.
Email support@shalemapp.com with the input you provided (or paraphrase it if you would rather not share the original) and the response you got, and tell us what landed or didn't. That is how the editorial layer learns.