I don't believe that AAs are perfectly randomly distributed. That there exists an equal amount of AAs in any genome. I believe randomization should be replaced by normalization. The normalization should also account for poly A tails or GC content, or any known repeating pattern in non-coding regions. But what if those non-coding regions also play some sort of role? Should a machine learning technique be used? If so what requirements should be valued. Phosphorylation often occurs by differently sized proteins with different charges, might that play a role in this experiment? Should steric hindrances, and spatial orientation play a role. Good questions, and I think at the minimum the individual genome should be taken into consideration. A and L may play a large role in these sample sequences, but does it play a large role overall? While this project was completed the question has not been resolved.
Another project has popped up recently involving a binding site problem. Taking a small clip of DNA it was analyzed to see if there were common patterns. The frequency of all AA were taken and the percentages were calculated. These percentages were compared against control samples of equal length. All of these were from the same genome. I believe these specific examples should be compared not against controlled sequences, but against a genome removing the known patterns of GC content at the minimum. What I believe to be a normalization process. I do not know the best way to compare these samples. Another question comes into the chances of a certain AA to be replaced by another with the same binding properties. Should all of these differences be weighed equally. Or should similarly charged particles be weighted more heavily. While all this idea is only a small piece of a puzzle involving machine learning, the accuracy of the learning is at best as good as the book it reads. As I enter into probability and statistics this next semester, I hope to better understand the ways in which I can use mathematics to show the relationships among data.
No comments:
Post a Comment