Good question! Let me clarify - the MALM and Qwen models don't share tokens directly.
MALM has its own tokenizer where each function name becomes a single token. It encodes the query, searches its memory bank, and returns the top matching functions as plain text (function name, signature, docstring, code).
This retrieved text is then simply concatenated with the user query as a prompt to Qwen. Qwen tokenizes this combined prompt using its own tokenizer and generates code.
So the flow is:
User query goes to MALM
- MALM retrieves relevant functions as text
- Text prompt = query + retrieved code
- Qwen tokenizes this prompt with its own tokenizer
- Qwen generates output
There's no token-level integration - just text passing between the two models. MALM acts as a retrieval layer that provides relevant context, and Qwen does the generation with that context in its prompt.
The single-token-per-key insight is only within MALM for perfect retrieval. Qwen sees regular text.