Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
In a Training Loop 🔄
126.8
TFLOPS
4
43
71
Karsten Kuhnke
PRO
mindchain
Follow
GigaBoy's profile picture
Adisontan's profile picture
LeroyDyer's profile picture
8 followers
·
69 following
https://www.linkedin.com/in/jankarstenkuhnke/
KarstenKuh16443
haddock-development
karsten-kuhnke-0b23023a3
AI & ML interests
Mechanistic Interpretability, Sparse Autoencoders, JumpReLU, Reward Modeling, RLHF, AI Alignment, Function Calling, Gemma, Nemotron
Recent Activity
upvoted
a
paper
about 1 hour ago
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
reacted
to
their
post
with 🧠
about 1 hour ago
The Architecture of 2026: Beyond the Token Trap 🚀 We are witnessing a tectonic shift in Transformer architecture. It’s no longer just about "predicting the next token"—it’s about executing latent plans on a high-speed data highway. What happens when we combine DeepSeek’s stability with Google’s strategic intelligence? 1️⃣ The Infrastructure: DeepSeek’s mHC Moving from a single-lane residual stream to a multi-lane highway. Using the Birkhoff Polytope, mHC ensures mathematical stability (Identity Mapping) while routing specialized data through dedicated lanes. 2️⃣ The Intelligence: Google’s Meta-Controller An internal AI unit that lives inside the Transformer. It escapes the "Token Trap" by extracting data to create a latent plan, steering the model via Temporal Abstraction. The Synergy: In a Topological Transformer, the Meta-Controller finally has the "dedicated lanes" it needs to steer complex reasoning without causing gradient explosions. We aren't just making models bigger; we are making them architecturally smarter. 🧠 #MachineLearning #DeepSeek #GoogleAI #Transformer #AIArchitecture
reacted
to
their
post
with 🤝
about 1 hour ago
The Architecture of 2026: Beyond the Token Trap 🚀 We are witnessing a tectonic shift in Transformer architecture. It’s no longer just about "predicting the next token"—it’s about executing latent plans on a high-speed data highway. What happens when we combine DeepSeek’s stability with Google’s strategic intelligence? 1️⃣ The Infrastructure: DeepSeek’s mHC Moving from a single-lane residual stream to a multi-lane highway. Using the Birkhoff Polytope, mHC ensures mathematical stability (Identity Mapping) while routing specialized data through dedicated lanes. 2️⃣ The Intelligence: Google’s Meta-Controller An internal AI unit that lives inside the Transformer. It escapes the "Token Trap" by extracting data to create a latent plan, steering the model via Temporal Abstraction. The Synergy: In a Topological Transformer, the Meta-Controller finally has the "dedicated lanes" it needs to steer complex reasoning without causing gradient explosions. We aren't just making models bigger; we are making them architecturally smarter. 🧠 #MachineLearning #DeepSeek #GoogleAI #Transformer #AIArchitecture
View all activity
Organizations
mindchain
's datasets
37
Sort: Recently updated
mindchain/instruct
Viewer
•
Updated
Aug 30, 2023
•
100
•
7
mindchain/Text_Classification_Deutsch_Beispiel
Updated
Jul 14, 2023
•
9
mindchain/synth1
Viewer
•
Updated
Jul 13, 2023
•
1k
•
20
mindchain/text1
Viewer
•
Updated
Jul 13, 2023
•
5
•
6
mindchain/rolo
Viewer
•
Updated
Jun 1, 2023
•
8
•
11
mindchain/info
Updated
Jun 1, 2023
•
4
mindchain/afwaef
Viewer
•
Updated
Jun 1, 2023
•
3
•
6
Previous
1
2
Next