Reflection on the process of writing a PhD Thesis

It’s the beginning of my second year, doing this PhD, and I’ve been reflecting on what it is to write a PhD Thesis, on what things I’ve learned in the past year about writing a PhD Thesis, and on what to tell someone just coming to the process.

Beginning with the process of writing the thesis: Essentially, the problems I’ve run into so far have been how to organize and visualize such a large document.

Initially, I approached this problem by taking the content out of MS Word and putting it into plain HTML, so that I could use Notepad++ for the editing. I did this because Word formats things as wide as the printed page, and with the end-result font, allowing me to only see a paragraph or two at a time; this was not useful at all, in terms of the editing process. Taking it out of Word also gave me the ability to wrap sections in <div></div>, and to collapse those <div> sections down, hiding them from view. It also gave me the ability to display the text in whatever font I found to be comfortable, wrapping at the edges of the application window, rather than at the edges of the page.

It let me hide the sections which were “solid” – i.e., the main body of each section of literature review – so that I could think through how each section related to the preceding and following sections, and to view and edit the connecting paragraphs. This let me get the bulk of the writing to a state where it all articulated properly, and was cohesive as a whole, rather than each section being independent.

What Notepad++ and writing in HTML did not give me was the ability to easily tie in my citations.

I had been using Zotero, primarily because it would let me easily (automatically) import citations from JStor. Zotero integrates with Word, but not much else, and its integration with Word is quite clunky and manual. I had over 300 sources in Zotero, and would only be using about 25 as cited sources. Weeding those 25 sources out of Zotero would be possible, using tags, but citing each would prove difficult, as the citation would essentially be manual throughout the document (as I had been doing), but such citations would not automatically alert Zotero that I had cited a source and, thus, that Zotero ought to include that source in the bibliography. But I’m getting ahead of myself, as I only really realized these shortcomings after I had made the next progression: I needed to be able to print the thing as a draft, to turn in for review and discussion with my supervisors, and it was in HTML … I could either switch back to MS Word, or I could go ahead and find a tool which would allow for the type of editing I was already using, in which writing is a separate exercise from spell-checking and formatting.

I ended up switching to LaTeX, and incorporating the bibliography into BibTeX.

Converting HTML to LaTeX is a bit tricky, primarily because LaTeX uses “ and ” for quotation marks, rather than “”, but this easily accomplished by opening up the html document in Word, doing a find/replace of quotes with “smart quotes,” and then saving it back again. After that, it was mostly just find/replace of things like pairs with \emph{ and } pairs. All told, the conversion took me about 3 hours, including going through to link in my BibTeX bibliography file. That does not include the learning curve for LaTeX, nor the process of getting LaTeX installed on a PC (both of which are not trivial).

To get the BibTeX just right meant that I ended up going back into JStor and exporting my sources again. Why? Because Zotero doesn’t keep all of the information provided by JStor. For example, Zotero doesn’t keep the elements jstor_formatteddate, pages, ISSN, ISBN, reviewedwork_1, reviewedauthor_1, language, copyright, to name a few. These elements are not properly included in a printed bibliography, but are certainly useful, particularly to know which work and author was being reviewed (in the case of a book or article review). Had these elements been lost, I would not likely have remembered them, should I need to locate them by reviewed author or work later.

So, what did this give me, in the end? I have: a bibliography file which contains (most) everything I’ve actually read, along with any abstract provided by the publisher. I also have the ability to add my own annotations, and to cite these sources in such a way as to generate a bibliography automatically, along with whatever paper is citing sources from the BibTeX database (whether it’s just the PhD Thesis, or another paper sometime later in my academic career).

As to working with the BibTeX file, it’s just a plain text “database,” with only a few rules about working with it. What it gives me, aside from being able to cite sources in a meaningful manner, is the ability to record everything I read, at the time I’m reading it, and to include any notes or commentary I feel would be meaningful to me later. In this manner, keeping a BibTeX file is something akin to creating an annotated bibliography of one’s own, and is probably very good research practice. I can’t tell you how many items I’ve read over the past year which ought to have been included in this BibTeX file, but which I have only included in a collection of bookmarks, or on the Links posts on this blog.

These bookmarks are nowhere near as useful as having the content in a BibTeX file, because they are separated from the process of writing: I’d have to go out and go through extra work to determine that I needed to bring them in, to read them all over again, and to dig for their content. If I had included them in my BibTeX file, they’d be at my fingertips, and I could review them easily to determine whether they have something to add to the writing (or whether they are adding something, on a subconscious level, and ought to be cited). I would know what I know, and from where that knowledge came, rather than just knowing that I know something, but being unable to provide the source.

All of this, really, is to say that this whole process has been about determining that I need to establish a group of useful tools, which work for me, and which facilitate the process of research.

To some measure finding these tools has been easier for me, because I was already aware of things like Notepad++, already know many programming languages, and am aware of the process of writing. I was already aware that I needed to separate the process of spell-checking from that of writing, because spell-checking distracts from the act of creating. I knew that I needed to not have “autocorrect” going in and second-guessing me, because I use vocabulary which is nonstandard (e.g., the word ‘premiss’ to indicate an element of logical argument, as opposed to ‘premise’ used to indicate something less stringent than the other form).

I do wish, though, that someone had emphasized the need to establish practice, early on. The practice of documenting what is read, for example. To me, this is an important part of being a researcher / academic. It is part of the rigor needed to contribute to the field.

I also wish that someone had pointed out that there were better alternatives to writing things in MS Word, and to have presented those alternatives. Had I started out using LaTeX and BibTeX it would have benefited me immensely.

Lastly, I wish that someone had pointed out that writing a technical document, like a thesis, involves being able to change perspective easily: to be able to view the framework of the document at the highest level, then to drill down into the linking sections, then to be able to work with the detailed content. Each is needed, and working within any word processing application is to necessarily limit this ability: they just do not let you manipulate text in this manner, because they are tools for the writing of single, simple documents. Word processors focus on being general tools, to be able to turn out a “finished” document in one step. This simply is not what is involved in constructing such a document – and I would like to emphasize the term constructing, as that’s what writing a document of this type and on this scale truly is: construction.

As I begin this next year, I will probably search out new and different editors for LaTeX, as the Latex Editor is just a wee bit clunky. But I will continue to do most of my writing using Notepad++ or one of those editors, and only spell-checking somewhere towards when I need to submit, or when I’m tired. I’ll keep maintaining my list of readings in BibTeX, so that they will be there when I need them. And I’ll share this knowledge with others, so that they don’t have to go through so much evaluation of alternatives, or so that they’re aware that there are alternatives, and why they might want to consider them.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.