Minimal Computing and Ontologies

This essay is derived from a presentation given at the 2020 LD4 Conference on Linked Data in Libraries

I’d like to synthesize two different areas of thought that at first glance may not seem to be totally related but i hope by the end of this presentation their connection will become clear. Specifically I’d like to provide a brief introduction to the concept of minimal computing, explaining its origins and some of the implications and how they can reorient us towards more sustainable approaches for digital projects. I’d then like to draw a line to the realm of linked data to see what applicability minimal computing concepts can have on our shared interest in this field. My argument is that the way in which we model our world, our ontologies, have direct and sometimes unforeseen effects on the infrastructure needed to realize these visions, and that minimal computing can help us to reduce complexity as we concretize those understandings into ontologies.

Minimal computing as an area of concern emerged in 2015 through a piece from Alex Gil at Columbia University. The core of this thought revolves around two concepts he explores - that it is ‘the application of minimalist principles to computing’ and a reflection on what is actually needed to realize a given project. The framing of ‘what do we need’ comes from Ernesto Oroza’s idea of Architectures of Necessity:

“On the one hand it can be read as a descriptive term, austere in its rhetorical value, almost obvious. On the other hand, the term enunciates an architecture that is its self-diagram, and this image becomes structural and programmatic. I believe that architecture should be that. The home must be a structure of agreements. ​A factual connection between needs, materials, technology, urban regulations and social conditions.​ ” As Gil notes, if we orient ourselves by this question of what do we need, it makes it easier to understand how “ease of use, ease of creation, increased access and reductions in computing” can be achieved. And while this concept emerged in the digital humanities, I believe it is applicable to pretty much any field that uses a computer.

As such, what is ‘needed’ is necessarily subjective and being prescriptive in this regard would not be helpful, as what is needed often differs by situation. However, I have found the minimal principles of Jenterey Sayers to be a valuable contribution to the exploration of minimal computing. I’d like to explore just a few of his definitions that I believe have direct impacts on linked data.

The first is minimal design, or ‘reducing the need to update/modify the structure or layout of a project in order to focus on content production and to increase the likelihood of project persistence’. For linked data i think we can see this especially in the tools we build to make linked data and disseminate it. In other words at what point does the complexity of those systems endanger their persistence and longevity into the future?

The second principle is minimal maintenance. For linked data i think this can readily be seen in the technology stacks that make that linked data available, in addition to the ontologies we use. For example, how flexible are our ontologies for handling variants in the world that they model? Or how many different APIs from different platforms do we need to write separate code for in order to be able to ingest data?

Sayer’s third principle is minimal obsolescence, or ‘reducing turnover of technologies, standards, and formats to increase reuse and decrease waste/discards’. Sort of self explanatory, but consider how many ontologies were developed a decade ago and how many of them are still in use. We can think of dead ontologies as not only a type of waste, but also as a type of noise that can confuse the landscape for those trying to figure out what is still being maintained.

Finally, the last principle of Sayers I’d like to examine is maximum justice, which is to ‘reduce the use of technological, cultural, social, and economic barriers to increase entry, access, participation, and self-representation in computing and to also build systems/projects premised on social justice and difference, not white supremacy and settler colonialism’. I think this is important for understanding what barriers in complexity have already cut off certain parts of the library and information science profession from using linked data, let alone the population at large, and how we can identify where our work is enabling others to join in these efforts or is actively pushing them out.

I think it’s especially interesting to view these concepts in light of some of the challenges linked data presents. From ontology mapping (when in human history hasn’t this been a problem?), to uri reconciliation, stack maintenance, ongoing training, and the economic frictions that at times seem to be pushing linked data in directions at odds to its original principles. That is, towards increasing centralization despite its decentralizing nature. That list is in no way exhaustive, (I wish those were the only challenges linked data faced) but they illustrate areas where, in my opinion, complexity, and more accurately, potentially needless complexity, has made difficult problems even more challenging. These are all areas for which a minimal computing lens could help clarify and ask us what is really needed to make linked data work for everyone. And i think this quote from a study on challenges in linked data points to an important overarching concern:

“Aside from concrete issues of noise, oftentimes data are modelled in a manner that is not facilitative to generic consumption: for example, common properties for labels are not re-used, properties and classes are invented and not defined, insufficient links are given to enable data discovery, etc.” (Hogan et al., 2012)

From this observation I’d like to go a step further and specifically ask in the context of linked data, not only ‘what is needed’ for it to work but also ‘to what end?’ In my opinion, if this work doesn’t benefit our users in some tangible, material way, then we have more difficult questions to ask about the directions we want to take linked data. While we could spend a long time investigating all of linked data’s challenges and applying a minimal computing lens to help find answers, iId like to explore just one area of linked data for which i think it is uniquely valuable: ontology design.

So many decisions cascade from how the real world is defined in ontologies, and minimal computing has much to offer towards simplifying the models and infrastructures and conditions that they necessarily require. The number of classes and properties one defines has direct impacts on the computing power required to process that data, let alone reconcile with other data sources, all of which eventually translates into carbon dioxide in the atmosphere. In a very real sense, our models of the world have an impact on the state of the world beyond just how we represent it. One of the concrete realities of all that carbon in the atmosphere is of course the rapid acceleration of melting glaciers. And it’s this example I would like to turn now to explore how we used minimal computing at CU Boulder to develop an ontology for glaciers.

The National Snow and Ice Data Center is housed at CU Boulder, and a few years ago through a joint CLIR grant, the NSIDC and CU Libraries digitized and made available over 25,000 glacier images, documenting them as far back as 1883 through repeat photography. One of the unique challenges encountered during the course of digitizing and cataloging these images was how to tie these glaciers together? With climate change, the ephemeral nature of these glaciers has really been laid bare and through repeat photography, we can see these glaciers melting decade by decade. In the context of this collection, how is one to trace a line between the different iterations of the glaciers to see how it is changed over time? Here the rigidity of the collection technology doesn’t allow for such a clear delineation. Naturally an ontology would be a good way to tie together not only the different iterations of the glacier together, but also the associated information about the glacier like representations and relevant datasets.

But the question then is how to model a glacier? One critical feature of glaciers has to do with the nature in which they fragment. For example, the Andrews glacier in colorado could split into two smaller fragments. The fragments won’t always get a name, but they always get a glacier ID. So how best is the way to model this? How do we show that relation between fragments?

Conceptual relations are no uncommon thing in linked data. To take just one example from the IFLA Library Reference Model (LRM), we can see the familiar Work, Expression, Manifestation, and Item categories for understanding the relations between bibliographic entities. Now obviously articles and glaciers are not the same thing, but there is a similar problem between the two concerning how to relate iterations of an entity. And this is the critical point. In the LRM model, these four categories all exist temporally at the same time, but for glaciers there can be no moment when the original Andrews glacier exists at the same time as its three fragments. You have the original glacier or the fragmented version. The reason being that the LRM model relies heavily on abstraction, and to point out the obvious, of its four categories, only one exists in a physical way - the item. Just as for glaciers, there is no thing as a ‘super glacier’, there is only the Andrews glacier and the fragments it created. And while we could certainly model it as a super glacier, what would be the tangible benefit in doing so? Super glaciers do not exist, you can’t visit one, so it’s not quite right to say that something that doesn’t exist ‘has’ something that does exist.

With this understanding we decided to orient our ontology in the most material, concrete ways that we could. We avoided abstraction in favor of foregrounding those elements that have concrete presence in the world (even if it’s just a name inscribed on computer memory). This had several advantages for us because as we had learned through other projects, ontology mapping is not easy, but in our opinion a lot of that difficulty can stem from ontological classes that are abstract in nature needing to be mapped to another abstract ontological class in a different model. Since neither exist physically, what they can ‘be’ can always be changed by non-material conditions. There is no real anchor for them in the world.

While we could have used a super glacier, we opted to not use classes that didn’t have real world correlates. It was easier to omit abstract representations that could just as easily be modeled with a real world correlate, just as in the bibliographic realm we could give ontological primacy to the item class to fulfill our modeling. From a minimal computing perspective, we approached it from the question of ‘what is needed’? Do we need to have super glaciers, or is it enough to just show the fragment relationship? To this end it’s not so much a question of what did we do, but we didn’t do. So while this ontology may seem pared down compared to the LRM model, our goal was to preserve necessary complexity while not contributing to it.

To this end I’d like to build on the work of Gil and Sayers by proposing two new principles for minimalist computing. The first is maximum materiality. By this I mean the start of any model or recreation of some aspect of the world should aim to primarily document material, physical aspects of entities in the world. To go back to a bibliographic example, subjects change, but the author (usually) does not. So prioritizing concrete fields would be ideal over something like a subject.

The second is maximum transparency. By this I mean increasing the demarcation between those elements or properties that are based in concrete reality and those that are interpretive. A glance at most ontologies will show you a mix of both concrete and abstractly oriented properties, and my contention is to focus primarily on the former. By this we can avoid a reification of interpretive aspects in a project by outlining more clearly that which is physically descriptive and that which is interpretive.

So what do these principles look like in practice? Let me first start by saying that abstractions are necessary for making sense of the world, and there are going to be instances in which abstract properties are going to necessarily be part of an ontology. But my point is to first identify the material or physical properties that we are trying to model, and to emphasize those first as the basis from which other properties can potentially be developed. From there, an abstract property might emerge at which point we would then need to ask what purpose that property serves. I believe this must necessarily be tied to serving the researcher/information seeker in some way. From this, if we can say that it does serve an information seeker, does it also generally serve society as a whole? This is difficult but crucial. As abstractions are not real world entities, we run the risk of ‘making real’ or concretizing into existence something that may not have the effects we intend. To use a very tame example, the red panda has been through the classificatory roller coaster several times, starting off as racoons before becoming bears and then finally into its own highly specific family. But it’s important to remember here that with categories can come environmental protection, and it’s not difficult to see how categories that get concretized in the real world have direct implications for similar things like gender, race, and class. To that end transparency requires a necessary historical framing so that the interpretive aspects can be better seen and understood.

Social networks are another example of this reifying process. For example, social relations have always existed, but it was in the early 20th century that the concept of the Social Network became an object of academic study. Then, starting in the 1960s, there came interest from corporate strategists in the concept of the network as organizing principle - no longer an observation of social relations, but as a method of production, which lead to the reification of the social network not only in organizational practice, but later in the internet through concrete materializations of networks (think of twitter, facebook, instagram). These are concrete networks that produce data that continuously feed back into the concept, not only producing capital, but allowing the definitions of networks to change in ways to further benefit capital.

There are lots of different directions to examine with this materially oriented approach, and this has only been my first exploration into a synthesis of these two fields but I hope this discussion has illuminated at least one way in which minimal computing principles can have direct effects on how we model the world and how that cascades into very real decisions about the technologies and infrastructures we use to build our networked projects.