Context
Chinese Hamster Ovary (CHO) cells are the workhorse of biopharmaceutical production. Codon optimization — adapting gene sequences to match host codon usage preferences — can significantly improve protein expression yields.
Question
Can a transformer architecture learn host-specific codon optimization patterns from expression data, outperforming rule-based methods?
Method
Trained a transformer encoder on paired gene sequences from published CHO expression datasets, learning codon substitutions that maximize predicted expression while preserving amino acid sequence.
Result
Early results show the model captures known CHO codon preferences and discovers non-obvious patterns in rare codon contexts. Benchmarking against established tools is ongoing.
Reflection
Codon optimization sits at an interesting intersection of biology and engineering — the value may be in augmenting expert judgment rather than replacing it.