Concept of Operations

Identification
This is the concept of operations (ConOps) document for the ISFDB2. This is an updated version of the original ConOps published in 2001.

Document Overview
This document identifies the high-level needs and expectations for the ISFDB2; in particular:
 * The major types of data which need to be represented in order to construct bibliographic content lists, as well as detailed author bibliographies.
 * The data organization required to support an open data project.
 * The methodology needed for data change requests to be implemented.
 * The relationship between the ISFDB2 and ISFDB1.

The primary document audience is ISFDB2 users and developers. Users of ISFDB2 will be interested in verifying that this document accurately describes their needs. Developers of ISFDB2 will use this document to drive system-level requirements for the data, support tools, and the site layout.

An overview of the situation describing ISFDB1 and justifications for ISFDB2 are provided as background information. Following this is a high-level description of the ISFDB. The various user classes of the ISFDB are then listed, providing a perspective for each class and scenarios detailing how each class uses the system. Finally, the impacts, advantages, and disadvantages of the ISFDB are summarized.

System Overview
The overall purpose of the ISFDB is to serve as a repository for speculative fiction bibliographic data, and to provide a user interface to respond to user queries. Located worldwide, ISFDB users are interested in viewing the following types of information:

Author Bibliographies - An author bibliography is an annotated listing of the works produced by an author. Annotations may include the publication history of the work, notes and synopsis, a list of awards won by the work, and the location of reviews concerning a particular work. In addition to bibliographic data, the bibliography may also contain personal details about the author such as birthdate and birthplace.

Content listings - A content listing displays publication metadata (publisher, ISBN, page count, etc...), and a detailed list of the works contained within the publication. The works may be annotated with details such as original copyright date, location of first publication, translators, variant titles, or pseudonym resolution.

Yearly Awards Listings - While individual award annotations can be found in the author bibliographies, it is often desirable to view all of the nominations and winners in all categories of a particular award for a particular year.

Magazine Checklists - While book publications are organized around the authors and editors that create then, it is desireable to organize magazine publications with checklists. A checklist shows all issues for a particular magazine, structured by year in the order of publication.

Forthcoming Books - Listing books which will be published in the coming months allows users to track the pubication status of books in which they are interested. In support of the above listed view types, the ISFDB supports the ability to search the database by author, title, or year, and provides an interface to submit or correct data.

Background, Objectives, and Scope
While the discipline of creating bibliographies and indices for the speculative fiction genre has always been rather mature, there was little such material available on-line by the mid 1990s. The best on-line bibliographic resource at the time was the series of bibliographies created by John Wenn [2], which were posted periodically to the SF-Lovers digest [3] in the late 80’s and early 90’s.

In early 1994, Al von Ruff created a private searchable database of select genre-related awards information. By September of that year, the database had been expanded to include most genre-related winners and nominees with data supplied by David G. Grubbs. By the end of 1994, a web version of these private tools was created. These tools were the precursor to the ISFDB.

In 1995, one of the most popular SF portals was The Speculative Fiction Clearing House, edited by John R. R. Leavitt [4]. The site had sections devoted to theme bibliographies, conventions, bookstores, writing advice, and awards. In the spring of 1995 the Clearing House was looking for a set of awards database tools, and the pre-ISFDB tools were offered for use there. The site however, was looking for a more tightly integrated database - the Clearing House was beginning to put magazine content listings online (via manually constructed webpages), and wanted to link those in some way with the awards information. The pre-ISFDB tools were never utilized by the Clearing House.

In the summer of 1995, Al von Ruff and Ahasuerus [5], armed with lessons learned from the Clearing House, incorporating John Wenn’s bibliography layout, and inspired by the look-and-feel of the Internet Movie Database (IMDB) [6], began to design and build the ISFDB (Internet Speculative Fiction Database). The ISFDB 0.1 went online 8 Sept 1995, and went through numerous revisions prior to its public launch in Jan 1996.

Starting as a home page at a local ISP in Champaign Illinois, the design of the ISFDB was impacted by various deficiencies commonly found in ISPs at the time, including: limited memory, lack of real database support, disk space limitations, and limited processor resources. These limitations fundamentally shaped the architecture and operation of the ISFDB.

Operational Policies and Constraints
The database is maintained and updated by a single person; this person has editorial control of the ISFDB. Diskspace, monthly throughput rates, and processing resources are constrained as per agreement with the ISP hosting the ISFDB.

Description of the Current System or Situation
Section 1.3 described the different types of bibliographic data that users of the system are concerned with. The data is displayed to the user in the form of HTML-based web pages. These web pages are dynamically constructed upon request by the user from data stored in the ISFDB database.

Database File Structure
While commercial websites were utilizing Oracle or Informix database servers (mySQL was in its infancy it the time the ISFDB was created), these types of databases were not available at the hosting ISP. Data was instead stored in flat ASCII files (one for each table), using pipe-delimited fields. This technique was a common method of storing data on UNIX systems, and felt natural for use in this setting as well.

Indices
To speed lookups in the flat database files, a series of indices are utilized. Each index uses a particular table field as its key, and associates it with a series of table offsets to the appropriate records. Executables charged with creating the dynamic webpages search for the key in one or more of these indices, then quickly seek to the needed records in the target tables.

Database Tables
The ISFDB uses six main database tables to respond to queries: Authors, Titles, Publications, Awards, Reviews, and Interviews. Figure 1 shows how different web pages can be constructed by selecting the required records from the various tables.

Table Descriptions and Rationale
A description of the six tables follows, along with the rationale for it:
 * Authors - The Authors table contains information about the author, such as birthdate. This information is persistent across all of the publications associated with the author. The primary key for this table is the author’s canonical name. This is the name that the author most commonly publishes under, not necessarily the authors legal name. The canonical name may be a pseudonym.
 * Publications - The Publications table contains all of the information about a particular publication, aside from content. This includes information such as publisher, page count, cover artists, ISBN, and so on.
 * Titles - The Titles table contains a normalized list of works. For instance, a particular story may have been published in numerous locations. In order to save space, instead of listing the title and author under each publication it was published under, it instead takes up a single record in the Titles table. In order to link the title with a particular publication, each title record contains a field which points to the publications it was published under. Every publication has a unique identifier which the title can refer to.
 * Reviews - The Reviews table is a specialized Title table. While a record in the Titles table has title and author fields, a review has an additional reviewer field. An extra field could have been added to the Titles table, but since there are very few reviews in comparision to the thousands of titles, it saved more space to break out the reviews into its own table.
 * Interviews - Like the Reviews table, this is also a specialized Titles table. An interview not only has a title, it also has a subject and an interviewer to track. Same rationale as that for reviews.
 * Awards - The Awards table contains a list of award records. Each record has an associated type (Hugo, Nebula, etc), as well as the awarded title, author, and what level of award was won.

Although the format of the six database files have good performance in the online environment, they aren’t very good documents for humans to edit. Many of the pipe-delimited fields can be empty - if a person wants to add data to a particular field of a particular record, they may have to count pipes to find the field. This tends to be error-prone, so another version of the files exist in an off-line format, which is suitable for editing. Furthermore, the larger database tables are broken down by types into smaller files, making them easier to edit on systems with small memories. The very first revision of the ISFDB had only the pipe-delimited files, and it was due to the great difficulties encountered while editing those files that the off-line format was created.

The six online database tables then are created from 55 offline text files by “compiling” them with a set of tools constructed specifically for this pupose. The relationship between the offline text files and the online database files is shown in Figure 2. Note that although the offline files are much smaller than the aggregate online files, they can still be rather large. The SHORTFICTION file is nearly 5 Megabytes in size, with about 53,000 records. This large size can make the data awkward to pass about in email, and can bog down (or even crash) certain text editors.

Another down side to slicing up the tables in this manner is the number of files which need to be touched in order to enter the data associated with a publication. For instance:
 * Novel - Entering data for a novel could require touching NOVELS and BOOKS.
 * Anthologies/Collections - Entering data for a short story collection or anthology could require touching BOOKS, COLLECTIONS, SHORTFICTION, ESSAYS, and POEMS.
 * Magazines - Entering data for a magazine could require touching ZINES, SHORTFICTION, ESSAYS, REVIEWS, INTERVIEWS, INTERIORS, POEMS, and SERIALS. Thus, adding certain types of publication data may require access to numerous database files simultaneously.

Modes of Operation for the Current System
The ISFDB is split into two distinct systems:
 * Off-line Database - The offline database occupies about about 22 Megabytes of space (which gzips down to about 5 Mbytes). The offline database is currently located in the residence of the ISFDB editor. There are 3 main reasons for this:
 * There isn’t enough space at the supporting ISP for both the on-line and off-line versions of the database.
 * The original ISP didn’t have enough system memory to actually perform the building of the on-line database.
 * The merging work is conducted by the ISFDB editor, who lived in a rural area with no access to broadband services. As such connectivity was limited to short sessions of lowbandwidth dial-up service.
 * On-line Database - The online database currently occupies about 20 Megabytes of space (which gzips down to about 7 Mbytes. Note that the compression is less efficient due to the presence of the indices).

The on-line version of the ISFDB has two distinct modes of operation:
 * Search and Display - Users have the ability to search by title, author, year, or series. Awards for a specific year can be observed. An alphabetized author directory exists which the user can browse through. Select magazines have checklist tables which the user can browse through.
 * Data Submissions - In data submission mode, forms are available to add/correct authors, publications, and title information.

User Classes and Other Involved Personnel
There are six broad user classes of personnel who need access to the ISFDB filesets, source code, or the web site. These are:
 * ISFDB maintainers - There are three primary maintainance roles for the ISFDB (note that multiple people may fulfill a particular role, or that a single person may take on multiple roles):
 * The website editor, who performs database merges, integrates user-submitted data, adds new features, and performs periodic builds.
 * The awards editor, who maintains an awards database seperate from the ISFDB. Awards informations is transformed into ISFDB format by David, who sends the data to Al, who integrates the data into the ISFDB.
 * The forthcoming books editor, who maintains forthcoming book information in a separate location (at present in an Access database). The data is exported and translated into ISFDB format, and integrated by the website editor.
 * Authors - Numerous authors desire to utilize the ISFDB as a permanent bibliography resource which represents their professional output. This often includes works which are not of a speculative nature, nor associative.
 * Fen - Members of fandom have an interest in promoting particular authors or books, and take the time to update those bibliographic items.
 * Bibliographers - Numerous single-author bibliographers create detailed bibliographies, utilizing a number of online resources, including the ISFDB.
 * Readers - Casual readers of speculative fiction refer to the ISFDB to locate books and stories by favorite authors, to locate additional books in a particular series, to locate where a particular short story may be found, and to access the various reading lists which are generated from ISFDB data.
 * Booksellers - Book sellers, particularly those who sell used books, utilize the ISFDB to locate titles, and to attempt to determine the work’s resale value.
 * Bibliographic Linkers - Numerous websites link to the ISFDB bibliographies. For instance, when con coordinators discuss writers in attendance to a particular con, they often provide a link to the author’s ISFDB bibliography. For instance, SFsite utilizes ISFDB bibliographies in their book reviews.

Organizational Structure
There is no organizational structure to the users of the ISFDB.

Profiles of User Classes
ISFDB maintainers - Although the ISFDB maintainers spend most of their time in the offline version of the ISFDB, they also utilize the online version in the course of their work. When resolving pseudonyms, or determining an author’s canonical name, author searches are often conducting. When pulling together variant titles for a particular work, title searches may be conducted.

Authors/Bibliographers - Authors are interested in finding, or establishing, their presense in the ISFDB. Once located, authors begin to inhance the accuracy of data already present in the ISFDB, and then expand it to be more inclusive. The degree of cooperation amongst authors varies greatly - some are interested in entering complete publications that their works appear in, while others only wish to enter data that pertains them. Authors require the ability to search for their bibliography, and the ability to input data into the ISFDB. Bibliographers need the ability to display the summary bibliography for a particular author, and from there display detailed publication information. Some bibliographers will place missing information into the ISFDB. As such, the online requirements for bibliographers is very similar to those of an individual author.

Fen/Readers - The interests of readers tend to be broader than those of an author or bibliographer. Aside from interest in multiple authors, readers also have interests in print series (which may span across multiple authors), and subject bibliographies (such as books about nanotechnology, cyberspace, or giant robots). Some readers, acting as oracles for forums (such as the USENET newsgroup rec.arts.sf.written), utilize the ISFDB to locate a title from vague clues given by a less knowledgable reader. These readers need the ability to list the stories found in a particular magazine, collection, or anthology; utilize story synopsis; and would like to access, or search for, subject or keyword information.

Bibliographic Linkers - User who wish to link to ISFDB bibliographies require the ability to search for a particular author’s bibliography. Once found, this user class requires the ability to link to a persistant web page. Other bibliographic services construct static web pages from database records, meaning that a particular author’s works may migrate from one static page to another across builds, making it difficult to link to that author’s works.

Booksellers - Booksellers and collectors desire the ability to download descriptive information from the ISFDB via ISBN, catalog number, bar code reader, or by simple author/title search. The data would then reside in an offline (and perhaps smaller) version of the data base. Given a specific book, the application would have the ability to automatically look up summarize current internet offerings, information from price guides, bibliographic information, or library holdings. The application should also be able to upload new data to the online database. Note that none of the functionality is currently available in the ISFDB; booksellers are required to perform all of the above steps themselves, manually.

Interactions Among User Classes
As no forums for communiation are present at the ISFDB, the only conduit for interaction among user classes is in the data deposited and integrated. Interaction may occur in other venues (such as USENET newsgroups), but these interactions are not controlled or monitored by ISFDB personnel.

Other Involved Personnel
None.

Support Environment
Support issues, such as maintaining backups, is left solely to the hosting ISP. Between releases, backups of the ISFDB source code and data are made to Zip disks, which are stored at different physical locations (which assures at least one good copy in the event of catestrophic property loss). Upon release to the web site, an archive of the ISFDB source code and data are made to CD-ROM.

Justification of Changes
Over the course of the last six years, numerous changes have been made to the ISFDB to improve its utility. There are, nonetheless, fundamental problems with the current implementation which call out for changes:

Need For a True Open Data Project
As an open data project, the ISFDB has not been entirely successful. While users have the ability to input data, that data is filtered through an editor before integration into the database. More importantly, the integrated ISFDB data is not readily available to the public at large. One reason for this is the lack of online space, but the real gating problem is the fact that the data lives in the off-line database, and that off-line database is not accessable from the Net. Access to the system which hosts the offline database is limited to removable media, and its bandwidth to the Internet is limited to modem speeds (broadband is unavailable in the system’s area). This means that uploading the data to an accessable location could take several hours of dial-up time.

Another potential problem area is complete loss of the ISFDB data. Since there is a single data custodian, and the data is generally not available online, the data custodian becomes a single point of failure. Armed with access to the online database, a knowledgable person could reconstruct the offline database. However, since the data custodian is also the web site editor, and controls access to the website, such knowledgable people may not be able to gain access to the online database.

Need For an Open Source Project
Since the ISFDB is a closed system, there has been no opportunity for individuals to contribute new and interesting applications to the project. This inhibits innovation, as ideas for new features must be filtered through a single individual for implementation. A need exists to open both the online and offline toolsets to those that wish to contribute.

Need For an Off-Line Version of the Database
The original intent (4 years ago) behind this justification was the universal lack of broadband access. Cable modems and DSL are now widely available in urban locations, and many rural locations have access to wireless broadband (which will become more widespread in the next 2 to 3 years).

Nonetheless, some ISFDB users are without broadband, and entering bibliographic data is a time-intensive process, and extended periods of data entry can lead to large dial-up costs. More importantly, some users wish to enter data from locations that have the primary sources, but do not necessarily have direct (or easily permissible) access to the online ISFDB (say from a library). These users would prefer to install the ISFDB in some form onto a laptop, make changes to the database while in the field, and then upload those changes for integration into the ISFDB. Booksellers are looking for an extensible ISFDB - a version in which additional fields can be added to support data (such as current or estimated value of a book) relevant to booksellers.

Need For Instantaneous Data Submission Feedback
Since modifictions can’t be made directly to the online database, and must be filtered through the editor, it can take weeks (and in some cases months - okay, in some cases years) for data to appear online. This not only leads to frustration on the part of the user, but can also result in multiple entries from different users for the same work. It makes it difficult for users who are trying to complete some aspect of a bibliography - as many weeks later they are unsure if they forgot to enter that data, or whether the data was lost, or if it simply hasn’t been integrated yet.

Need to Reduce Submission Errors
The current data entry pages of the ISFDB allow the users far too much latitude when entering data, resulting in a tremendous amount of erroneous data being submitted. Each of these errors require additional bibliographic research on the part of the editor, resulting in additional delays in integrating the data. The data and its associated toolset need to be structured in such a way that minimizes errors, performs detailed error analysis, and provides error feedback to the submitter.

Need to Reduce Author Centricity
The current ISFDB focus on author bibliographies gives the erroneous impression that data is also structured around author bibliographies. Many users assume that data can simply be added to an author’s bibliography, when in fact, the author bibliographies are dynamically constructed at runtime from publication data. These users don’t understand that the way in which to extend an author’s bibliography is to enter the publications in which the author’s works appeared.

This attitude has three undesireable side effects. First, it results in numerous author biographic submissions which merely consist of instructions to visit an existing web page and enter the bibliographic data found there into the ISFDB. As no staff exists to take these bibliographic joyrides, these requests are typically not granted.

The second side effect is that a great percentage of data submissions come with no publication information at all. That is, a submission for a short story is given with the title author, and date of publication - but no details on where the story was published. This results in weak bibliographies that are of little use to readers.

The third side effect of this attitude is that it doesn’t help fill out the bibliographies of any other authors. Authors typically receive a contributor’s copy of a publication (in some cases, receipt of the publication is the only payment the author will receive). If an author has published a story in a little-known publication, and that publication has not been cataloged due to its scarcity, then one of the few avenues available to cataloging that publication is for the author to enter the complete publication data - not just the data that pertains to the author.

Organization
An organization should be established. This organization would have legal ownership of the ISFDB data and source, would control access and licenses to the data and source, and would maintain and operate the website which presents the data and source to the users. The organization, if legally incorporated, would be a not-for-profit entity.