What internal structures do transformer models learn that helps them generalize so well?
1 min read · September 21, 2023
2023 · interpretability compositionality