CHOFormer

Context

Chinese Hamster Ovary (CHO) cells are the workhorse of biopharmaceutical production. Codon optimization — adapting gene sequences to match host codon usage preferences — can significantly improve protein expression yields.

Question

Can a transformer architecture learn host-specific codon optimization patterns from expression data, outperforming rule-based methods?

Method

Trained a transformer encoder on paired gene sequences from published CHO expression datasets, learning codon substitutions that maximize predicted expression while preserving amino acid sequence.

Result

Early results show the model captures known CHO codon preferences and discovers non-obvious patterns in rare codon contexts. Benchmarking against established tools is ongoing.

Reflection

Codon optimization sits at an interesting intersection of biology and engineering — the value may be in augmenting expert judgment rather than replacing it.

Context

Question

Method

Result

Reflection

Links