On the Implications and Functionalities of Digital Libraries

Introduction

Many times, over many millennia have we lost bountiful sets of knowledge. Countless stories exist for the declines of Alexandria, Timbuktu, Pergamum, and the House of Wisdom. As a species we have gone through two identifiable dark ages, but far more information has been lost during times of light than we can track. Such examples are the governments who deem it fit to destroy, restrict, and remove books that go against their political agenda. The Nazis, Communists, and even modern-day United States. Therefore, we arrive at a thought experiment where we wish to thwart these attempts of forced arrogance.

Immediately, this blog might draw parallels in your mind to works like Fahrenheit 451 and 1984. This isn’t without purpose, as in seeing the path we are apart of can encourage you to take stand and to perhaps engage in our universal think tank. In ways that I push you towards the above-mentioned books, I will also point you towards two resources from which I draw parts of this idea, namely The Gutenberg Project, Perseus Digital Library, and Melville’s Marginalia. These resources were created for various reasons, but all exist as digital libraries of texts. The idea that I aim to propose here is in most similarity to that of The Gutenberg Project. Simply, it aims to create a shared library of eBooks that are accessible by all. In fact, the first eBook ever created was done by the creator of The Gutenberg Project and was of the United States Declaration of Independence.

Implications

Now that I have injected the idea of a digital library into your conscious, we can begin to dissect their uses and how they might be of assistance in our shared dilemma. The benefits to be gained from digital libraries is simple. Once hosted on the web, a resource such as these is accessible by anyone with a connection to the internet. While not quite universal, this is already a large step in allowing swaths of knowledge to be replicated the world over and accessible by all it’s denizens.

However, a few problems arise with the creation of such a tool. Firstly, we must reel in or accept the large scope of text that we would have to save. With quick research, we arrive at two pieces of information; the average size of a book is liberally estimated to be 1 MB (Megabyte) and the largest library in the world is The Library of Congress which contains 164 million items (Only 25 million of which are cataloged books and 74.5 million are manuscripts). If we take a more conservative approach and estimate that each book is instead 5 MB, then to store the entire Library of Congress would take approximately 780 TB (Terabyte) of space. While to any layman, this seems like an incredibly large amount of data, it would account for only .000078% of Google’s estimated storage capacity. Meaning, that Google could store the human collective of literature 1.28 million times. So, while the task of storage is a problem for a singular person, if we take in account the abilities of entities, then the storage of our library becomes a cheap trick. Perhaps, however, we would like to be completely independent in our storage, then we could still create our own storage solution with 4 TB drives from Seagate for about $19,500ⁱ.

The second problem that we run into is far harder to surpass. Collecting all our works and turning them into eBooks. All the aforementioned resources have scanned their texts so that they are simple pdf files and can be searched for strings of textⁱⁱ. The sheer amount of people you would need dedicated to this task would be insurmountable on its own, but you also must ensure that each scanned work is allowed to exist on your server. While it’s nice to assume every author and publisher would wish to contribute their work to the collective human conscious, many are greedy and would need compensation of some form. For the Perseus Digital Library and Melville’s Marginalia, this problem is nonexistent as all the works they contain are either out of copyright or are transformative enough. Gutenberg relies on the works being out of copyright or the generous donation of works by artists/the publishing of the books on Gutenberg. There isn’t an easy way to get around this problem, and the most effective way would be to take the same action as any digital library with borrowing features and buy licenses from the publishers.

A third problem arises in two ways; thus, we will break it into two smaller problems. Firstly, how should we allow for the growth of our library? It is simple to say that for any new literature that is created, we will add it. Passing the first two problems that we have already enumerated; we ask what is determined to be a piece of literature? Is every email, sentence, and comment on a YouTube video literature? You might not think that it is, but perhaps some emails contain information that would be helpful to a large portion of people. They might contain in-depth explanations on important topics, news around the world, and interesting approaches to problems. These pieces of information would often be nice to keep, but who decides their worth, and how are they added to our library? So, to summarize, the first part of our third problem is that, while our work can grow, what do we categorize as a worthy piece of literature. The second part of our problem then follows, who decides the answer to this question? The only option that we might have to this is to hire a team of moderators for our library, librarians for their namesake, and allow them to be the discerners. You could distinguish further problems from this final one, but for the sake of this essay we will make the argument that these are the three that we choose to deal with.

Functionalities

Now that we have gone over the problems that are going to stand in the way of our digital library, we can begin to address the functionalities that we wish for it to have. There are three objective that we aim to meet in functionality. First, we want to ensure that our library is accessible anywhere. Second, we want it to be accessible anytime. Finally, we want it to be always available.

The immediate answer to allowing access anywhere is a technology that we already have. The internet. To address concern, I do realize that not everyone, possibly not even a majority of the world, have access to reliable semi-respondent internet connectionsⁱⁱⁱ. This is not the concern of this paper, however, and we approach this topic from the idea that the world has already achieved perfect communication and we must now protect it from reverting. If, however, this piece encourages you to advocate for general better access to information across the globe, then I will consider my goal as thoroughly completed as the main purpose of my original notion. Perhaps, with the creation of Elon Musk’s Starlink, there will be widespread access to internet across the globe quite soon.

Of course, with the growing ability for us to store more vast amounts of information, we could instead propose a sneakernet^iv of sorts. In theory, we could then use either small portable USB devices like thumb drives or continue with the idea of hard/solid drives. If we would prefer for our sneakernet to want to move fast and without much suspicion, which for our proposed idea could perhaps be useful in the case of an attempted overthrow of human knowledge, then we would use the small thumb drives. The largest of these thumb drives that I have seen is 1 TB, and we would need 780 of them, perhaps more for redundancy in our data. These smaller drives are far more expensive, however, and could easily double the price of our library, even before considering redundancy. Therefore, I propose that we continue with our initial idea of a central server^v.

With the web, there is the obvious opponent of outages. These come in the form of power outages, warzones, and other such circumstances. Allowing patrons to download certain works from the central server(s) would allow for them to access their material anytime, so long as they can be the ones to provide the power^vi. This does come at the cost of what I am to explain next as the main functionality of our library, depending on which form of power the user is able to supply themselves with. This functionality that I intend to be implemented is thus: the creation of virtual and augmented reality deliverance systems for the works in our library.

Immediately, I expect some draw back from this idea. Many criticisms can be had of both systems. Virtual reality is flawed in that it often does not provide us with the experience we want, and, under the wrong circumstances, can leave the user feeling nauseous, and turn them away from our library. Augmented reality, on the other hand, is in very early stages of development, costing thousands of dollars to use the most basic of setups. Many points pro et contra can be created for both technologies, but, while they are in essence part of the focus of this paper, we will use them just in their implementation rather than their validity as such. This is not to say that we won’t allow for casual browsing in the normal way via a web browser or on a local machine after said files have been downloaded^vii^viii, but rather that our library will endeavor to provide an immersive experience to the patrons.

Firstly, in dealing with a virtual reality library, this shouldn’t be all that hard to implement. Since, we are assuming, we have already created the internet accessible server full of our texts, we only need to create the virtual space and have the ability to turn any works located on the server into pseudo physical representations. Creating the virtual world is a large, yet simple task. Any of a million modelers, designers, and programmers throughout the world could complete this task to a well enough degree within a few months. That is, if it even took them months to do. We could implement an approach similar to many creative sandbox games of today and generate the library off a few models that we create^ix. This technique could be done probably within a few weeks, if not days with someone experienced working quite hard and efficiently. Implementing virtual reality controls would also be quite a simple task within the right hands. The most difficult task would be the creation of an algorithm that can take any text from our server and create the virtual “book” for it. Since, however, it is not our purpose to discuss such algorithms, we will leave it at the discretion of implementors to worry about. If, perhaps, I had more time or motivation, I could easily describe in breadth the creation and implementation of such an algorithm, but in order to keep this paper theoretical, I will refrain.

Secondly, the creation of our augmented reality is much harder. Gone are the tasks of creating virtual spaces and controls and arrived are the tasks of not only generating a book layout of our texts but mapping that layout to some physical real-world object^x. To do such a thing would require our algorithm to not only generate a book preview, but to do it in real time as the user looks at, hopefully, a blank page. This task is significantly harder than simply creating an object of defined length and width, but now we must take the length and width of what the user is looking at, then generate the view off of that. If the reader ever chooses to look away, we can either freeze our view once it is generated, or choose to regenerate our view to whatever new, possibly blank area, that they are looking at. Perhaps to me, and other inexperienced readers, this task seems quite difficult. However, similar technologies have already been implemented. That is, with certain books you can download an app that will grow trees, animals, and other scenery from the pages. They do this via creating their algorithm to look for certain anchors that help it to place the content. By delivering such an anchored and static blank book to patrons, we would find ourselves with a much easier job. Therefore, we can assume that this is almost certainly the route that we would take. Perhaps, resources permitting, we could implement the feature to use any such blank paper, making it easier for patrons without the ability to have our canvas delivered to still have the ability to access our library via an augmented reality. With such implementation complete, so are our functionalities, and thus our library.

Conclusion

Now, dear reader, we have reached the conclusion of our thought experiment. I leave you with the hope that such a creation will come into existence. Hope that through almost infinite and widely accessible knowledge, our world might grow into a new golden age. Perhaps these hopes are in vain, yet they still yield to the undying mammalian emotion that is hope. If such a spark has been ignited in you by reading this that you now wish to endeavor in this great library’s creation, then I urge you to do so. Whether alone or with a group of disciples, as long as this library is open source and open to the public upon completion^xi.

Footnotes

Amazon.com currently has a 4TB Seagate drive priced at ~$100. With 780TB needed / 4TB drives we arrive at 195 needed drives * $100 and get our $19,500 price tag or $20,670 after PA sales tax
A string is simply just a collection of characters, a.k.a. any letter in the English alphabet including the space character
After some further research, the best I could find was that an estimated 63% of the world have internet access as of 2021. The site doesn’t cover whether it is good or reliable, just that it is there.
After writing more of this paper, I’ve come to realize that within our current world, this idea could never work. Since we have earlier mentioned that we might have to resort to licensing books from publishers or just buying them, we can’t make infinite copies and distribute them as such. This would violate either the terms of the license, or some form of copyright I’m sure, although I’m not a lawyer, so who really knows (other than copyright lawyers, of course).
Which, those of you versed in networking will understand, could eventually lead to multiple server instances/databases across the globe; therefore, creating our redundancy for us. Although, this would still cost more money to do, once we have created our initial library, there should be, or we can hope there should be, plenty of donors and investors across our wonderous planet willing to help create such a wealth of information available at everyone’s fingertips.
Luckily, in the modern day there are plenty of ways for one person to have constant, or near constant, access to their own form of power. First, there is the opportunity to use any type of generator, which have been around for decades now, or the increasing prevalence of solar panels, at least in the U.S.. Batteries in laptops and phones also, without needing constant access to power, allow for users to peruse files that they have stored locally.
Upon further reflection, I realize, due to how we above decided we might obtain all human works, that not everything contained in our library could be downloaded. This would violate some terms I am sure of licensing certain works, and we might be allowed to only allow so many downloads per license, and for only so long. Of course, nothing is there to say that we can stop any user from screenshotting and recreating the work themselves for local storage. Perhaps, a deal could be struck to allow publishers to input information about where to buy a work in exchange for us “holding” it.
What I mean by allowing publishers to input information is that, wherever and however we allow users to access the works, perhaps on a digital cover page of the work, or in the browsing list, there could be a link or reference to a place that the publisher will prefer the user to buy the book from if they choose to own their own copy.
This technique, simply called procedural generation, is used throughout the gaming industry to create random maps, quests, and items with the use of a seed. Those who have taken time to learn about random number generation will understand this idea better than those who have not. Since it is not the purpose of this paper to explain this idea, I will instead point you to this Wikipedia article.
You could make the argument that it would be easier, and therefore perhaps better, to just use the generation we have from the virtual reality function and leave the user with a still virtual object that they can see in their home space, but I counter with the idea we are trying for the most immersive experience possible, not stopping at what makes our lives easier, but rather what makes the patrons happier.
Sadly, however, there is no way for me to enforce that you follow this advice, but on the goodness of my own heart and upon yours, dear reader, I do hope you heed it.