Attention Variant Experiment


* All of the samples below consist of a 4-bar prompt provided to the models, followed by the models' generated continuations. First, a few select high-quality samples are shown, followed by samples categorized by model size and attention mechanisms.
Highest Quality Samples

Sample 1

Sample 2

Sample 3

Sample 4

Sample 5

Model Generated Samples

Parameter Size: 300M
Attention MechanismSample 1Sample 2Sample 3

MHA

MHA_Flash

GQA

MQA

Parameter Size: 100M
Attention MechanismSample 1Sample 2Sample 3

MHA

MHA_Flash

GQA

MQA

Parameter Size: 50M
Attention MechanismSample 1Sample 2Sample 3

MHA

MHA_Flash

GQA

MQA

Parameter Size: 10M
Attention MechanismSample 1Sample 2Sample 3

MHA

MHA_Flash

GQA

MQA