Companion Websites

As we set about re-structuring the Nvivo project, Sara and I learned a lot about coding- especially about the two tendencies that Pat Bazeley calls 'lumping' and 'splitting' and how they can be useful at different stages of a project.

Sara started the analysis using NVivo2. She worked very much from the ground up, deriving categories for coding from the data. Sometimes the categories stood alone as 'free nodes' - for example there is a node called 'legal issues' containing eight references to various aspects of law. Sometimes they were linked in 'tree node' hierarchies like the one called 'C[complainant] at Fault' which has 'child'; nodes under it like 'complainant is not a victim' 'complainant participated' and so on. Click here to see an image of her node structure. Before she ran out of time to finish fully coding all the documents, her project had six free nodes and eight parent tree nodes most of which had children and grandchildren. All her nodes had little content coded at them, and when Sara came back to project, she felt a bit lost in the trees! She also had a new direction for her research question. Rather than asking broadly 'what are the discourses of discrimination' she was now interested in the extent to which the parties in any dispute saw it as an individual matter or as something systemic. The new question grew out of the work she had been doing while the 'Complaints' project lay untouched. For these reasons Sara brought me in to help restructure the NVivo project.

Software like NVivo is very useful when a research question changes. Re-structuring a filing cabinet feels more daunting than re-structuring a set of nodes in an electronic project. We decided to use NVivo7 since I had been working with that version, though what we did would have been possible with version 2. (Note though that the illustrations to this account use NVivo8).

Lumping and coding down vs. splitting and coding up
We agreed that the way to go for the new research question was not to use Sara's fine grained, ground-up and 'splitting' approach to coding, but to 'lump' data into broad categories. We would look for material that signalled an understanding of discrimination as either in some way built into an organisation's practices ('systemic' conception of discrimination). This material could be about many things, such as timetables, industrial agreements or what the company has always done. For the moment, we would not worry about distinguishing types of systemic conception. We would just code for 'systemic' conceptions and their direct opposite- conceptions of discrimination as something done by individuals ('individual conception of discrimination'). We could then 'code on' from these two nodes to get a more detailed picture - working within a node rather than a source, but always able to click back to the source document if we needed to.

Lumping and splitting are about the fineness of distinctions between your categories. Imagine you are sorting fabric scraps. Will you lump everything red together, do you need to split the reds into 'vermillion' 'scarlet' and so on, or do you split along the lines of weight or texture? It's clearly a matter of purpose and preference; is the pile of fabric intended for some artwork or for the cleaning rag bag? There is nothing that makes one approach inherently superior, nor does NVivo make one approach easier than another. (See Bazeley 2007 p67)

Coding 'up' or 'down' (or 'inductive' vs. 'deductive' or 'data driven' vs. 'theory driven') is about your more general approach to coding. Are you looking at the data and asking 'what concepts are in this'; or are you asking of 'does this data contain instances of this particular concept?' Again, one is not superior to the other, but of course one may work better for some purpose than another. For the complaints project, Sara's initial thinking meant that data driven coding 'up' was sensible, but by the time of re-analysis, it made sense to look at the data for instances of 'understanding discrimination as caused by an individual' and 'understanding discrimination as systemic'.

The 'lumping' approach was not how Sara usually worked - she is a meticulous coder who likes to work in detail- but it suited me as I usually begin with broad categories and gradually create detailed trees from them. So I got the job of checking all the sources and coded material reflecting the individual and the systemic conceptions of discrimination. It soon became clear that I also needed to code who was speaking at various times in the project- was the 'voice' that of a complainant (direct or through an agent), a respondent, or was it an employee of the commission? The next section discusses how I did this.

Reworking the existing project
An advantage of working with Nvivo compared to working on paper was that Sara's existing work could be kept for later reference alongside the new work. A paper filing system would need to be dismantled and restructured. In Nvivo I simply created two tree nodes to distinguish the old project from the new. (The free nodes stayed as they were). In 'Sara's original project' sits all Sara's tree nodes. In 'modified project' I created a node 'conceptions of discrimination' with its two children 'individual' and 'systemic'. In about three days I very lumpishly coded all the documents, so that there were over 200 references coded at each coding category ('node' in NVivo). Because I was reading with a very clear purpose she found the coding fairly easy, but of course there were times when a sentence or paragraph seemed ambiguous. In this case, I made annotations, and when Sara read through the text coded at the node we would talk about the ambiguities. In this way, the coding was refined.

Sara's careful delineation in her notes of exactly who was speaking at any time meant that when we realised we needed to take into account the 'voice' uttering the conception the 'who's speaking' nodes could be created by auto coding and merging nodes. Sara had set up her notes with headings showing who was speaking, and used them to autocode some of the sources. She called the parent node thus created 'sections'. 'Sections' had many children, bearing odd but easily recognisable names like 'Com1' (first complainant) 'Res 2' (second respondent) or 'RresCres' (Respondents' Response to Complainant's response). I checked the documents, then auto coded of all sources except the diary to create a node called 'who's speaking' in the 'modified project' tree. This node had all the same children as Sara's 'sections' node, but more coding, because it covered more sources. Then I created three nodes below 'who's speaking' 'complainant' 'respondent' and 'EOC'. I then merged all the nodes where the voice was that of a respondent into the appropriate node. This took about half an hour. All this meant that three days work was plenty of time to restructure the original project.

Once the new project was established, it was fairly easy to move ahead. We knew that we wanted to ask whether one 'voice' was more prone than the others to talk about discrimination in individual terms. Asking that question using matrix queries opened up a fascinating world of more questions. Sometimes these questions could be answered using the software's tools for setting up and conducting a query. But sometimes, they meant that we needed to go on and 'code on' from the original two lumpy nodes- individual and the systemic conceptions of discrimination to give varieties of each. At times I was able to draw on Sara's original coding to do this.

Working with a tidy looking structure derived form the lumpy coding gave us a sense of security. But keeping the complexity of the original coding meant that as we realized the complexity of the material we were able to dive back into the fine-grained original. You can see images of the two projects here

Back to Project Home Page