Coding a Thesis
Recently I used SQL code to organize and format the data involved with my water quality thesis. Learning multiple coding languages has proved to be difficult in ways I had not anticipated. While the indexing and syntax differences between the languages are interesting, they have proved to be a major obstacle in recalling what I’ve learned previously. For example, in learning python, I have confused the difference between ( and [ for R, and by using Atom I have gotten used to the luxury of autofill, something not available on all coding interfaces. Additionally, in attempting to make plots in R, I have confused the indexing system for python, where 0 is the start of a list, not 1. This has given me much more respect for ports or adaptations of original code. While I have been endlessly frustrated with the different capabilities of the mac and windows OS, I can now see how two different operating systems each build from the ground up could be extremely difficult to work with.
Choosing the coding language for my thesis was made almost arbitrarily. I only understood that R had more, or easier to use, statistical capabilities, an inference I’m only testing out now. What I didn’t know was that the code for formatting data in R would be so different that it was faster for me to code all the commands in SQL than learn how to use R to manipulate data, a goal I am still persuing.
For my thesis I am committed to providing reproducible results. To do this I am keeping a copy of my code, the software and packages I used, and using Professor Holler’s git format for reproducible results. Much of science relies on the assumption that results are replicable, one more assumption to keep in mind.