ISFDB:Data Consistency

The Data Consistency project is a place to coordinate efforts to identify and repair data consistencies, including Stray Publications, malformed ISBNs, etc.

Invalid characters in Publication titles

 * ISFDB:Invalid characters in Publication titles 14 as of 2007-11-27. Ahasuerus 19:36, 9 Dec 2007 (CST)

Malformed ISBNs

 * ISFDB:Bad ISBN List - Need a script to find ISBNs that fail ISBN consistency validation.

As of 2006-12-18, 162 ISBNs failed consistency validation and 73 failed length validation. As of 2007-11-27, the data (excluding magazines, which are handled separately) looked very different:

Valid ISBN - 86910 (83.95%) Catalog ID - 6608 (6.38%) No ISBN - 7955 (7.68%) B000 (from Amazon.com) - 1019 (0.98%) ISBN is not 10 digits - 836 (0.81%) Fails checksum validation - 200 (0.19%)

The difference is presumably due to a more granular validation script. The actual Wikified lists of bad records will be posted shortly. Ahasuerus 22:21, 9 Dec 2007 (CST)


 * ISBN fields that don't start with '#', and aren't 10 digits long" This section of the malformed ISBN field is done, but it should probably be checked again to see what was missed or whats new.Kraang 20:47, 3 Jan 2008 (CST)


 * Thanks, I'll run the validation script again early next week. I will have to modify it first to allow for 13 digit ISBNs. Ahasuerus 20:56, 3 Jan 2008 (CST)


 * When you do that can you do the Stray Pubs. again?Kraang 21:04, 3 Jan 2008 (CST)


 * I think that was Bill's SQL script, the one that he linked below, wasn't it? If so, he can run it any time while I won't be able to run any scripts until next week. Ahasuerus 21:09, 3 Jan 2008 (CST)

Duplicate ISBNs

 * Need a script to find duplicate ISBNs.


 * This query: select pub_id, pub_isbn, count(pub_isbn), pub_title from pubs group by pub_isbn having count(pub_isbn) > 1; returns 3710 rows, although some of them are non-ISBN duplicates such as "None" or "no ISBN".
 * But since the ISFDB captures multiple printings in multiple publication records, not all duplicate ISBNs are errors. This query finds publications with the same ISBN but different titles: select pub_id, pub_isbn, count(pub_isbn), pub_title from pubs group by pub_isbn having count(distinct pub_title) > 1; and returns 657 rows at the moment. Looking at the results of this query, more of them seem to be true duplicates.

Improved SQL excluding magazines, "ISBNs" flagged as not being true iSBNs by a '#' prefix, and trying to remove variants where one version is just a subset of another (e.g. "Title" as opposed to "Title, Series Name" or "Series Name: Title") gives 315 duplicate titles currently. select pub_isbn, MIN(pub_title), MAX(pub_title), count(*) from pubs where pub_isbn IS NOT NULL and  pub_isbn NOT LIKE '%Unknown%' and  pub_isbn NOT LIKE '%No ISBN%' and  pub_isbn NOT LIKE '%N/A%' and  pub_isbn NOT LIKE '%None%' and  pub_isbn NOT LIKE '#%' and  pub_isbn  != '' and  pub_ctype != 'MAGAZINE' group by pub_isbn having count(distinct pub_title) > 1 AND INSTR(MIN(pub_title), MAX(pub_title)) = 0 AND INSTR(MAX(pub_title), MIN(pub_title)) = 0;

Many are just minor punctuation differences. The following may be worth a closer look though: 2253143	The Ancient	Trash 6470238	The Gap into Ruin: This Day All Gods Die	The Runes of Sorcery 26151707	Prisoners of Power	Roadside Picnic / Tale of the Troika 30624363	Labyrinth	The Dark Crystal 60083611	The Giant Surprise	The Giants, the Mice and the Marsh-Wiggles (Step into Narnia) 60133821	Nebula Winners Fourteen	The Great Science Fiction Series 99115913	The Antelope Company at Large	The Secret Visitors Fight Back 140056513	Heroes and Villains	The Infernal Desire Machines of Doctor Hoffman 246137762	Starborne	The Wild 312198930	Greetings, Carbon-Based Bipeds: Collected Works 1934-1998	Greetings, Carbon-Based Bipeds: Collected Works     1944-1998 312857586	Ender's Shadow	Shadow of the Giant 330250337	Decade the 1940s	Decade the 1950s 330375660	Silver Screen	Storming the Bastille 345300378	A Fine and Private Place	The Last Unicorn 345439031	The Witch Queen	The Witch's Honour 370328280	The Spook's Apprentice	The Spook's Secret 373625375	Deathlands: Demons of Eden	Nightmare Passage 373720483	Brandyjack	Rebels of Merka 380007568	Bring the Jubilee	Strange Relations 380761602	Cold Chills	Cold Shocks 038513679X	In Joy Still Felt: The Autobiography of Isaac Asimov, 1954-1978	In Memory Yet Green: The Autobiography of Isaac Asimov, 1920-1954 399124098	The Dancers of Arun	The Northern Girl 425032116	Callahan's Crosstime Saloon	Farewell to Yesterday's Tomorrow 425098982	Playmates	The Maddening 426064453	The Masks of Time	Vornan-19 439176824	Such a Pretty Face	The Fall 441006973	Forever Free	The Forever War 441289037	The Girl from Farris's	The Girl From Hollywood 441777678	The Multiple Man	The Space Swimmers 441873464	Galactic Bounty	War World 445206187	Hot Sky at Midnight	Star of Gypsies 446607924	Courage of Falcons (The Secret Texts - Book 3)	The End of Magic 451145747	Of Kings and Quests	Of Quests and Kings 517551845	Men, Martians and Machines	The Joy Makers 553229281	Lord Valentine's Castle	Majipoor Chronicles 553292463	Foundation and Empire	The Trinity Paradox 553562428	Illusion	Reluctant Voyagers 583119999	The King of the Swords	The Knight of the Swords 586043136	The Grain Kings	The Passing of the Dragons 586049681	Homeworld	Wheelworld 059303564X	Star Wars: The Truce at Bakura	The Courtship of Princess Leia 593045580	The Kissed	The Standing Dead 671015443	Vulcan's Forge	Vulcan's Heart 671461494	Hitchhiker's Guide to Europe: The Galaxy	The Hitchhiker's Guide to the Galaxy 671603833	My Friend, the Vampire	The Ordinary Princess 671653938	The Bug Life Chronicles	Tower to the Sky 067169801X	The Bug Life Chronicles	Tower to the Sky 701167181	Angel Times	Merrick: A Novel 708882633	Madbond	The Burning Realm 712650504	Firebird	The Venging 712650512	Firebird	The Venging 722159048	Dragonflight	Dragonquest 739407929	Journeys of the Catechist	The Sarantine Mosaic 747253145	Mirage	Tides 749710152	After the Plague	Come Lucky April 812551257	Chaos in Lagrangia	The Lagrangists 821732684	Stones	Voodoo Fury 825695716	The Drowned World	When the Sky Burned 861402030	The Light Fantastic	The Wizards & the Warriors 884043797	L. Ron Hubbard Presents Writers of the Future, Volume V	Prosthetic Lady 886773520	Magic's Pawn	Magic's Promise 886778611	First Rider's Call	The Survivors 088733010X	Lyonesse II: the Green Pearl	Stitch in Snow 088733055X	Bitter Ends	Last Rites 909106118	An Unusual Angle	Dreamworks: Strange New Stories 937986585	The Sea of the Ravens	Yellow Men Sleep 940841320	Seeker's Mask	Wizard of the Pigeons 963094440	Now We Are Sick: An Anthology of Nasty Verse	Temporary Walls 1557420262	George Alec Effinger	Stanislaw Lem 1565041585	The Road to Science Fiction 5: The British Way	The Road to Science Fiction 6: Around the World 158846864X	Arcana: Unearthed Children of the Ruin	Children of the Rune: Tales From the Land of the Diamond Throne 1593150296	Everquest II (Everquest)	Ocean of Tears 1596871032	Green Lantern: Sleepers: Book 2	Green Lantern: Sleepers: Book 3 1841190802	The Mammoth Book of Awesome Comic Fantasy	The Mammoth Book of Historical Whodunnits: Brand New Collection 184149108X	Maul	Y: A Chromosome 1844351084	A Life in Pieces	The Purpura Pawn (Professor Bernice Summerfield Series) 1857231082	The Black Unicorn	Wizard at Large 1857231848	Treaty at Doona	Treaty Planet 1857982371	The Black Gryphon	White Gryphon 1857984382	Diaspora	Prince of Demons 1880325071	From Hell	The Beast That Shouted Love at the Heart of the World 1885865082	Doing Time	Erotica Vampirica 1887666001	Blood Wedding	Vampire Dreams BLongley 13:23, 10 Dec 2007 (CST)
 * Y: A Chromosome could be a working title of Maul. And tp of Maul was published 2003. No info about Y: A Chromosome in Maul? --Roglo 13:48, 12 Jan 2008 (CST)


 * Not really, although Y Chromosomes are relevant to the plot. This supports the "working title" hypothesis though. BLongley 15:15, 12 Jan 2008 (CST)

Stray Publications
Note: "Stray publications" can happen when: (i) The author(s) of a publication do not match the author(s) of the title to which it is connected; (ii) The Title Type of the publication does not match the Title Type of the title to which it is connected (e.g. publication is NONGENRE while title is NONFICTION); (iii) A title is constructed as a variant title, and the parent title disagrees with the variant in one of these respects.


 * Need a script that finds "novel" Publications whose Author(s) do not match the Author(s) in the associated Title record.
 * ISFDB:Missing Titles - Need a script that finds Publications without an associated Title record.
 * Titles A-Z Fixed(only stray magazines left) (30 Dec 2007) Kraang 18:15, 30 Dec 2007 (CST)
 * See also here for related SQL scriptiness. BLongley 15:01, 10 Dec 2007 (CST)

Type mismatches

 * Need a script that finds book length Publications whose type doesn't match the type of the associated Title records, e.g. "collection" vs. "novel", "novel" vs. "omnibus", etc.

Missing data

 * Need a script that finds missing data, e.g. missing page counts, missing pb/tp/hc data, 0000-00-00 dates, etc.

Exact
A script that finds tags in use by more than one publication:

select pub_tag, count(*) from pubs group by pub_tag having count(pub_tag) > 1;


 * Now appear to be 100% fixed - yay! BLongley


 * Broken again - would a magazine editor please look at "The Most Thrilling Science Fiction Ever Told, Summer 1969"? We appear to have 3 editor records for two pubs sharing the same tag. BLongley 09:30, 12 Jan 2008 (CST)


 * Good point, thanks! Michael and I are working on The Most Thrilling Science Fiction Ever Told and hope to have the whole run cleaned up by the end of January. Ahasuerus 11:27, 14 Jan 2008 (CST)

Results from backup January 13, 2008: +--+--+ +--+--+ +--+--+
 * pub_tag         | count(*) |
 * THRLLNGSFSUM1969 |       2 |

The 3 editor records for this tag have been merged. Not sure how that happened, except that when I was creating stubs for later entry of contents I may have duplicated the tag that I created for this particular issue in two other issues. I check to see if all of the other issues have editor records and that they all have individual tags. Mhhutchins 12:21, 14 Jan 2008 (CST)

Results from backup May 6, 2008: ++--+ ++--+ ++--+ (Go to Advanced Search, ISFDB Publication Search Form; it is not the Tag in Search the database on Home page). --Roglo 20:29, 8 May 2008 (UTC)
 * pub_tag       | count(*) |
 * FLNGSCRSFR1954 |       2 |
 * MSSNTML1979   |        2 |
 * SNGSFRMTHS1981 |       2 |
 * XLWTNG1975    |        2 |

Duplicate title-publication records
According to this script, the recent database dump has 136 cases of a title being in the same publication more than once. In most cases the page numbers of both entries are NULL but in a few cases the page numbers are distinct. select a.pubc_id, b.pubc_id, a.pub_id, a.title_id, a.pubc_page, b.pubc_page from pub_content a, pub_content b   where a.pub_id = b.pub_id and a.pubc_id < b.pubc_id and a.title_id = b.title_id order by a.title_id; WimLewis 16:59, 25 Mar 2007 (CDT)


 * I tried correcting one of these through the front-end: unfortunately TitleRemove takes out BOTH copies and you have to manually re-add the other. I think this is best resolved in SQL, especially as it doesn't seem to destroy the presentation much - you can lose a few page numbers, e.g. try viewing and editing this pub. BLongley 09:26, 9 Apr 2007 (CDT)


 * I was fighting recently with such publication Road to Science Fiction 5 where two essays are listed under the same title (I added '[#2]' to the second title. So I started to look for similar cases and later found this section.

select distinct p.pub_id, left(p.pub_title, 36) pub, left(t.title_title, 36) title, t.title_ttype ttype, pc1.pubc_page page1, pc2.pubc_page page2 from pub_content pc1, pub_content pc2, titles t, pubs p    where pc1.title_id = pc2.title_id and pc1.pub_id = pc2.pub_id and pc1.pubc_id < pc2.pubc_id and t.title_id = pc1.title_id and p.pub_id = pc1.pub_id order by 2; Note that if page1 = page2, you will see the title on publication listing as only one. To see the two instances, you have to open it for editing. If page1 != page2, you will two identical titles on pub's listing, but when you open this pub for editing, the page numbers will be reset to the same page number for both titles (and so one instance will disappear from the listing after changes are submitted and accepted). --Roglo 11:12, 14 Jan 2008 (CST)
 * If you open such 'repeated title's' bibliography, you'll see the publication listed twice.
 * You can't remove a single instance of such title from a pub; when you enter 'Remove titles from pub' and check only one for removal, they will be removed both. You have to remember to re-add one instance of the title again.
 * We can't do much about duplicated EDITOR records (e.g. Pulphouse: The Hardback Magazine), as they are not shown in pubs editor.


 * You can fix/delete Editor records, but it's a little tricky. First you have to change the Title Type from EDITOR to something else -- I usually change them to POEM to make it obvious -- and then you can Remove Title and Delete Title. Admittedly, it's rather painful if you can't approve your own submissions. Ahasuerus 11:30, 14 Jan 2008 (CST)


 * I've found the easier way to remove a duplicate content, is to just delete it and create a new entry. Just have to remember to merge the new record with any pre-existing records. As for duplicate pubs, some of them I'm cloning and then deleting the original record. Mhhutchins 15:15, 23 Jan 2008 (CST)

++--+--+-+---+---+ ++--+--+-+---+---+ ++--+--+-+---+---+
 * pub_id | pub                                 | title                                | ttype   | page1 | page2 |
 * 56490 | Amazing Stories, March 1934         | Editorial: Progress in Material Econ | ESSAY   | NULL  | NULL  |
 * 124121 | Count Brass                         | Count Brass                          | OMNIBUS | NULL  | NULL  |
 * 60092 | Pulphouse: The Hardback Magazine Iss | Pulphouse: The Hardback Magazine - 1 | EDITOR | NULL  | NULL  |
 * 60096 | Pulphouse: The Hardback Magazine Iss | Pulphouse: The Hardback Magazine - 1 | EDITOR | NULL  | NULL  |
 * 60099 | Pulphouse: The Hardback Magazine Iss | Pulphouse: The Hardback Magazine - 1 | EDITOR | NULL  | NULL  |
 * 60094 | Pulphouse: The Hardback Magazine Iss | Pulphouse: The Hardback Magazine - 1 | EDITOR | NULL  | NULL  |
 * 60093 | Pulphouse: The Hardback Magazine Iss | Pulphouse: The Hardback Magazine - 1 | EDITOR | NULL  | NULL  |
 * 60097 | Pulphouse: The Hardback Magazine Iss | Pulphouse: The Hardback Magazine - 1 | EDITOR | NULL  | NULL  |
 * 60100 | Pulphouse: The Hardback Magazine Iss | Pulphouse: The Hardback Magazine - 1 | EDITOR | NULL  | NULL  |
 * 60095 | Pulphouse: The Hardback Magazine Iss | Pulphouse: The Hardback Magazine - 1 | EDITOR | NULL  | NULL  |
 * 60098 | Pulphouse: The Hardback Magazine Iss | Pulphouse: The Hardback Magazine - 1 | EDITOR | NULL  | NULL  |
 * 60102 | Pulphouse: The Hardback Magazine: Is | Pulphouse: The Hardback Magazine - 1 | EDITOR | NULL  | NULL  |
 * 60101 | Pulphouse: The Hardback Magazine: Is | Pulphouse: The Hardback Magazine - 1 | EDITOR | NULL  | NULL  |

Dangling title-publication records
Likewise, there are many entries in the pub_content table whose pub_id or title_id is either NULL or refers to a nonexistent record:

select * from pub_content pc where not exists (select * from pubs pu where pc.pub_id = pu.pub_id); select * from pub_content pc where not exists (select * from titles ti where pc.title_id = title_id); WimLewis 16:59, 25 Mar 2007 (CDT)

Results from backup January 13, 2008: 747 rows; 90 rows

Disallowed URLs
See ISFDB:Data Consistency/Disallowed URLs.

Date mismatches

 * Need a script that finds Titles whose date doesn't match the date of the first published Publication record associated with the Title.


 * Too many, so I broke it down. Even this list of titles where there's a publication of a title in a year prior to the title year (so that we don't get problems with title year of XXXX-00-00) yields over 600 results.

select t.title_id, t.title_title, MIN(pub_year), title_copyright from pubs p, pub_content pc, titles t where p.pub_id = pc.pub_id and  pc.title_id = t.title_id and pub_year != '0000-00-00' group by t.title_id, t.title_title HAVING YEAR(MIN(pub_year)) < YEAR(title_copyright) order by 2;
 * Unfortunately they'll have to be manually corrected: e.g. this looks like a duff pub date rather than a duff title date. BLongley 13:29, 9 Apr 2007 (CDT)

Results from backup January 27, 2008: 949 rows in set (14.42 sec) Using slightly more liberal approach: HAVING YEAR(MIN(pub_year)) < YEAR(title_copyright)-1 467 rows in set (13.48 sec)

But I think we should start with this one: select pub_id, pub_title, pub_year from pubs where pub_year != '0000-00-00' and YEAR(pub_year) < 1700 ;

++-++ ++-++ ++-++ 1 row in set (0.20 sec) Link: Timeless Stories for Today and Tomorrow (at least the title is relevant...) --Roglo 13:43, 28 Jan 2008 (CST) (Timeless Stories...)FIXED And the same for titles (but now we have quite a few valid XV-XVII century titles) mysql> select title_id, title_title, title_copyright, title_ttype from titles where title_copyright != '0000-00-00' and YEAR(title_copyright) < 1400 ; +--++-+--+ +--++-+--+ +--++-+--+ 6 rows in set (0.27 sec) Probably pub dates were fixed and these covers were left behind. Links:
 * pub_id | pub_title                              | pub_year   |
 * 251425 | Timeless Stories for Today and Tomorrow | 0197-08-00 |
 * title_id | title_title                       | title_copyright | title_ttype  |
 * 97038 | Timaios (Excerpt)                 | 0001-00-00      | SHORTFICTION |
 * 122930 | The Amazons                       | 0500-00-00      | ESSAY        |
 * 346941 | Cover: The Heidelberg Cylinder    | 0200-10-00      | COVERART     |
 * 358191 | Cover: A Nomad of the Time Streams | 0199-00-00     | COVERART     |
 * 361501 | Cover: The Dispossessed           | 0071-00-00      | COVERART     |
 * 535853 | Cover: Strange Wine               | 0197-06-00      | COVERART     |
 * Cover: The Heidelberg Cylinder FIXED
 * Cover: A Nomad of the Time Streams FIXED
 * Cover: The Dispossessed FIXED
 * Cover: Strange Wine FIXED

Titles that point at themselves
I just looked up a Gardner Dozois collection and most of the stories said "by Gardner Dozois as by [Gardner Dozois]." Looking into this found many titles where the parent field was pointing at itself. It was easy enough to fix but I suspect there are others lurking out there. While "make variant" will no longer let you point a title to itself I suspect you can still get this if you merge two titles and let the parent title pointer stay. Marc Kupper 22:25, 22 Dec 2006 (CST)


 * Like these? There don't seem too many. What is the "easy enough" fix?

select title_id, title_title from titles where title_parent = title_id; +--++ | title_id | title_title                   | +--++ |   63117 | Command Performance            | |   63296 | Earth, Farewell! | |   92579 | Werewolves in Sheep's Clothing | +--++ BLongley 13:41, 9 Apr 2007 (CDT)


 * The second and third were easy to fix. Just removed the variant using "0" as the parent title. The first has another variant involved, so I have no idea how to fix it. Mhhutchins 21:22, 11 Jan 2008 (CST)

Results in the 2008-01-13, 2008-01-27, 2008-02-03 and 2008-02-10 backups: Empty set

Titles that point at variants
A variation on the Titles that point at themselves theme: variants where the parent is also variant (these include variants that point at themselves). select t.title_id as id, left(t.title_title, 40) as title, if(t.title_id=tp.title_id, 'SAME AS', ' ') same, tp.title_id as pid, left(tp.title_title, 40) as parent from titles t, titles tp     where tp.title_id = t.title_parent and tp.title_parent != 0; Results in the January 6, 2008 backup: 33 rows in set (Earth, Farewell! and Werewolves in Sheep's Clothing were fixed by Mhhutchins after the backup's date) --Roglo 03:47, 12 Jan 2008 (CST)
 * Some of these variants exist so that series with only short stories will appear on the authors summary page such as this one created for me "| 629713 | Kéthani Stories                         | SAME AS | 629713 | Kéthani ". At the moment series without a novel will not display.Kraang 07:16, 12 Jan 2008 (CST)
 * Amazing! So we have to careful with stories belonging to series. BTW perhaps it's time to add Kéthani book to ISFDB. --Roglo 10:24, 12 Jan 2008 (CST)
 * The author's web page mentioned this book as a future publication. Once it's in the data base the variant novel of itself can be deleted.Kraang 20:16, 12 Jan 2008 (CST)
 * Most of these are variants created by the use of pseudonyms, so it's natural that the title would be the same. I fixed some of them that were variants of variants (the title changed as well as the authorship, i.e. I at the Keyhole and The Dawning Shadow.) Mhhutchins 23:25, 12 Jan 2008 (CST)
 * Problem with such variants is that bibliography of the top-level title won't show publications of a variant of a variant. --Roglo 11:44, 14 Jan 2008 (CST)

Results in the January 13, 2008 backup: 27 rows in set


 * I believe I've fixed most of these variants of variants, but please do another check with the next backup. Thanks. Mhhutchins 17:54, 14 Jan 2008 (CST)

Results from backup January 20, 2008: 3 rows in set


 * Fixed (fingers crossed!) Mhhutchins 10:16, 23 Jan 2008 (CST)

Results from backup January 27, 2008: Empty set (1.09 sec) Results from backup February 03, 2008: Empty set

Results from backup May 6, 2008: ++--+--++--+ ++--+--++--+ ++--+--++--+ 5 rows in set (0.65 sec) Links:
 * id    | title                                    | same | pid    | parent                                   |
 * 192963 | Snip, Snip                              |      | 106560 | Snip, Snip                               |
 * 857836 | The Hitchhiker's Guide to the Galaxy: Te |     | 172262 | The Hitchhikers Guide to the Galaxy: Th |
 * 858911 | First and Only                          |      | 846435 | First and Only                           |
 * 860248 | Who Goes There?                         |      |  48111 | Who Goes There?                          |
 * 865113 | David Starr, Space Ranger               |      |  28026 | David Starr, Space Ranger                |
 * 192963 - 106560
 * 857836 - 172262
 * 858911 - 846435
 * 860248 - 48111
 * 865113 - 28026

Titles which are variants of non-existent titles
I'm not sure if this should be here or on the ISFDB:Repair_Variants page (or somewhere else) but we have such amazing titles:

select title.title_id, title.title_title, title.title_ttype, title.title_parent from titles title left outer join titles parent on title.title_parent = parent.title_id where title.title_parent <> 0 and isnull(parent.title_id) ;

+--+--+--+--+ | title_id | title_title                                 | title_ttype  | title_parent | +--+--+--+--+ |  188868 | Psi-Man                                      | NOVEL        |        15726 | |  188911 | Demon Sword                                  | NOVEL        |        17547 | |  188973 | Ship of Madness                              | NOVEL        |        19511 | |  189427 | The Aquilliad                                | NOVEL        |        25556 | |  189430 | The Throne of Madness: The Inquestor Trilogy | NOVEL        |        25650 | |  187205 | From Outer Space                             | NOVEL        |        33294 | |  189496 | There Will Be War                            | ANTHOLOGY    |        34841 | |  189512 | Aftermath: Thieves World                    | ANTHOLOGY    |        36679 | |  189560 | The Aquiliad                                 | COLLECTION   |        39607 | |  187369 | The Beach Where Time Began                   | SHORTFICTION |        57067 | |  187550 | The Best of Times                            | SHORTFICTION |        64884 | |  190728 | The Ultimate Wish                            | SHORTFICTION |        67642 | |  187612 | The Immortals                                | SHORTFICTION |        68738 | |  187839 | Dead on Sunday                               | NONGENRE     |       110723 | |  193085 | Deservedly Dead                              | NONGENRE     |       110728 | |  193086 | Death and the Delinquent                     | NONGENRE     |       110729 | |  193087 | Death Served Up Cold                         | NONGENRE     |       110730 | |  193088 | A Ceremonial Death                           | NONGENRE     |       110731 | |  193089 | Here's to the Newly Dead                     | NONGENRE     |       110732 | |  187850 | The Book of Andre Norton                     | COVERART     |       144456 | |  187851 | Definitely Maybe                             | COVERART     |       146385 | |   99435 | The Vacant Lot                               | SHORTFICTION |       208521 | +--+--+--+--+ 22 rows in set (0.17 sec)

Links:

Looks like they are never displayed on their author's page, because: 1) they are variants, 2) they don't have a parent to be displayed next to... So chances are they were added again. --Roglo 16:37, 10 Jan 2008 (CST)
 * All have been fixed. Most were variants because of pseudonyms (Somtow=Sucharitkul, Lindholm=Hobb, Tepper=Oliphant, etc.), but I'm not sure how the variant links between them were broken. So I removed them as variants (used "0" as the parent record), then started all over again by creating a new variant, then merging them with the title record under the author's canonical name. Thanks to Roglo for the script that uncovered them. Mhhutchins 17:14, 10 Jan 2008 (CST)

Results in the January 13, 2008 backup: Empty set

Results in the January 27, 2008 backup: Empty set

Results in the February 03, 2008 backup: Empty set

Results in the May 06, 2008 backup: Empty set

Authorless titles
Two mini-quests for today: titles with authors missing. First, titles that don't match anything in the canonical_author table. I check if they have any pubs (it doesn't depend on having an author):

select tid, title_title, title_ttype, pubc_id, p.pub_id, pub_tag from (select t.title_id tid, t.title_title, t.title_ttype              from titles t left outer join canonical_author ca on t.title_id = ca.title_id                   where ca.title_id is null) tmp left outer join pub_content pc on pc.title_id = tmp.tid left outer join pubs p on p.pub_id = pc.pub_id;

Results in the May 6, 2008 backup (three more titles than in the February 10, 2008 backup): ++-+--+-+++ ++-+--+-+++ ++-+--+-+++ 7 rows in set (4.64 sec)
 * tid   | title_title                                             | title_ttype  | pubc_id | pub_id | pub_tag    |
 * 281671 | The Magazine of Fantasy & Science Fiction, January 2004 | EDITOR      |  350911 | 112701 | FSFJAN2004 |
 * 406811 | Cover: Planets of Wonder                               | COVERART     |    NULL |   NULL | NULL       |
 * 406821 | Cover: A World Out of Time                             | COVERART     |    NULL |   NULL | NULL       |
 * 837920 | Quotation                                              | ESSAY        |    NULL |   NULL | NULL       |
 * 863836 | Waiting                                                | NULL         |    NULL |   NULL | NULL       |
 * 876893 | Farmer Griggs's Boggart                                | SHORTFICTION |    NULL |   NULL | NULL       |
 * 876895 | The Last of The Huggermuggers                          | SHORTFICTION |    NULL |   NULL | NULL       |

I have no idea what the first record is; the other authorless and publess titles should be safe to delete. All generate python errors while trying to match reviews by author and title. Anyway, links:


 * 281671 EDITOR Pub
 * 406811 COVERART
 * 406821 COVERART
 * 837920 ESSAY
 * 863836
 * 876893 SHORTFICTION
 * 876895 SHORTFICTION

And now titles which have their id in the canonical_author table, but the author_id from the canonical_author record is not an id of any existing author. This is more complicated, because you can have multiple canonical_author records for any title, and some of them may point to nonexistent authors. So I try to find titles where one of the authors doesn't exist, but display all the authors.

select t.title_id, t.title_title, t.title_ttype, ca.author_id as nonexistent, ca2.author_id, a2.author_canonical from titles t,          (canonical_author ca left outer join authors a on ca.author_id = a.author_id), (canonical_author ca2 left outer join authors a2 on ca2.author_id = a2.author_id) where t.title_id = ca.title_id and t.title_id = ca2.title_id and a.author_id is null ;

Results in the January 6, 2008 backup: +--+-+--+-+---++ | title_id | title_title                | title_ttype  | nonexistent | author_id | author_canonical   | +--+-+--+-+---++ |  190423 | Hunter, Come Home           | SHORTFICTION |       11231 |     11231 | NULL               | |  190423 | Hunter, Come Home           | SHORTFICTION |       11231 |     53491 | Richard M. McKenna | |  192212 | The Man the Worlds Rejected | SHORTFICTION |       19035 |     19035 | NULL               | |  192212 | The Man the Worlds Rejected | SHORTFICTION |       19035 |     68421 | Gordon Dickson     | |  198701 | Strictly Confidential       | SHORTFICTION |       19035 |     19035 | NULL               | |  431031 | Spacepaw                    | NOVEL        |       19035 |     68421 | Gordon Dickson     | |  431031 | Spacepaw                    | NOVEL        |       19035 |     19035 | NULL               | +--+-+--+-+---++ 7 rows in set (5.02 sec)

We can't really clean the canonical_author table (adding and removing authors in the title editor doesn't help) but Strictly Confidential generates python error and we can fix that by adding at least one existing author (looks like it is Gordon Dickson again?). Links to all 4 titles: --Roglo 10:46, 11 Jan 2008 (CST)
 * Hunter, Come Home SHORTFICTION by Richard M. McKenna and NULL
 * The Man the Worlds Rejected SHORTFICTION by Gordon Dickson and NULL
 * Strictly Confidential SHORTFICTION by NULL (the one interesting)
 * Spacepaw NOVEL by Gordon Dickson and NULL


 * "Strictly Confidential" looks interesting: you can narrow that down to the pub:

select * from pub_content pc where pc.title_id = 198701

pubc_id	title_id	pub_id	pubc_page 251922	198701		89769	112


 * And it's here, verified, and so theoretically the verifier could tell us if it's Gordon Dickson or Gordon R. Dickson. BLongley 12:47, 11 Jan 2008 (CST)


 * "Strictly Confidential" fixed. Attributed to "Gordon Dickson", so the pub record was edited, then a variant was created for "Gordon R. Dickson", and then merged with the existing title record. Mhhutchins 23:33, 12 Jan 2008 (CST)

Safe to auto-merge identical titles?
Related to the Gardner Dozois collection was that for nearly all of the titles I found had two title records that were identical other than one was a parent of the other. If it seems safe it seems it would safe some work to to do a sweep for title records that are identical and to auto-merge them. Marc Kupper 22:25, 22 Dec 2006 (CST)

EDITOR Titles
There are many Magazine Publications that have no EDITOR Title associated with them. This makes them invisible from their Editor's biblio page. Missing EDITOR Titles need to be identified and added throughout. Ahasuerus 20:20, 27 Dec 2006 (CST)

Also need a script to find all non-MAGAZINE Publications that contain an EDITOR Title. Ahasuerus 17:22, 29 Mar 2007 (CDT)

Hmm, this SQL script: select * from pubs pu   where pub_ctype not in ( 'MAGAZINE', 'FANZINE' ) and exists (select * from titles t, pub_content pc                     where pu.pub_id = pc.pub_id and                            pc.title_id = t.title_id and                            t.title_ttype = 'EDITOR'); turns up only one publication, The Discworld Companion, which appears to be fixed already. The inverse query turns up a couple thousand rows, but the ones I checked are also already fixed --- my db dump must be getting out of date... --WimLewis 18:26, 29 Mar 2007 (CDT)

Publess REVIEWs and INTERVIEWs
Reviews and interviews without content records. Those that have variants (e.g. pseudonymous reviews) with are not included: select t.title_id, t.title_title, t.title_ttype from titles t        left outer join pub_content pc on t.title_id = pc.title_id left outer join titles vt on t.title_id = vt.title_parent where (t.title_ttype = 'REVIEW' or t.title_ttype = 'INTERVIEW') and pc.title_id is null and vt.title_id is null; (if a variant didn't have any pub_content record, it would be included on this list; query including titles having variants without pub_content records is select t.title_id, t.title_title, t.title_ttype    from titles t         left outer join pub_content pc on t.title_id = pc.title_id         left outer join titles vt on t.title_id = vt.title_parent         left outer join pub_content vpc on vt.title_id = vpc.title_id     where (t.title_ttype = 'REVIEW' or t.title_ttype = 'INTERVIEW')         and pc.title_id is null and vpc.title_id is null; with the same result for the latest backup). --Roglo 15:02, 10 Feb 2008 (CST)

Results from backup February 03, 2008: 31 rows in set


 * Fixed all but the two tests by Bill Drussaï. Mhhutchins 17:02, 10 Feb 2008 (CST)

Results from backup May 06, 2008: +--+---+-+ +--+---+-+ +--+---+-+ Links (all are REVIEWs): (These reviews are shown on the reviewers' pages but not on reviewed titles' pages) --Roglo 21:25, 8 May 2008 (UTC)
 * title_id | title_title                      | title_ttype |
 * 741225 | Test (ï)                         | INTERVIEW   |
 * 163349 | The Glass Salamander             | REVIEW      |
 * 163350 | The Book of the Keepers          | REVIEW      |
 * 598541 | Books: Chasing Shadows           | REVIEW      |
 * 629573 | X, Y                             | REVIEW      |
 * 629581 | A College of Magics              | REVIEW      |
 * 741221 | Test (ï)                         | REVIEW      |
 * 765009 | The Fabulous World of Jules Verne | REVIEW     |
 * 792937 | The Drought                      | REVIEW      |
 * 862285 | Another Orphan                   | REVIEW      |
 * 864249 | The World of Fanzines            | REVIEW      |
 * 163349
 * 163350
 * 598541
 * 629573
 * 629581
 * 765009
 * 792937
 * 862285
 * 864249

Awards

 * ISFDB:Repair_Awards Need a script that locates bad/missing mapping records between the awards and titles tables.

Series

 * ISFDB:Data Consistency/Duplicate Series Names keeps track of duplicate series names
 * All fixed as of 2008-03-23. Ahasuerus 22:29, 24 Mar 2008 (CDT)


 * There are new as of 2008-05-19:
 * The Next Wave
 * Cutler Family
 * The Empires Trilogy
 * R. L. Stine's Ghosts of Fear Street
 * The Sangreal Trilogy
 * Black Widowers
 * Wizards, Warriors and You
 * Forgotten Realms: Nobles
 * The Dissecting Table (Book Reviews - Science Fiction Adventures
 * If
 * Battletech
 * The Avengers
 * R. L. Stine's Ghosts of Fear Street
 * As Timeless As Infinity: The Twilight Zone Scripts of Rod Serlin    (all empty?)


 * ISFDB:Data Consistency/Series Numbering Issues lists suspicious, missing and duplicate series numbers

Series with variant titles only
These series won't be displayed on author's Summary page.

select s.series_id, s.series_title, min(t.title_parent) p from series s, titles t where t.series_id=s.series_id group by s.series_id having p != 0;

Sometimes there is some reason for this, and some series are shortfiction only, nonetheless a few of these could be fixed. E.g. has parent titles in different series.
 * (there are really 2 series with this name, the second is and they get mixed on the Series Bibliography page)
 * (Well, this one is mine --Roglo 19:03, 19 May 2008 (UTC))
 * (there are really 2 series with this name, the second is and they get mixed on the Series Bibliography page)
 * (Well, this one is mine --Roglo 19:03, 19 May 2008 (UTC))
 * (there are really 2 series with this name, the second is and they get mixed on the Series Bibliography page)
 * (Well, this one is mine --Roglo 19:03, 19 May 2008 (UTC))
 * (Well, this one is mine --Roglo 19:03, 19 May 2008 (UTC))
 * (Well, this one is mine --Roglo 19:03, 19 May 2008 (UTC))
 * (Well, this one is mine --Roglo 19:03, 19 May 2008 (UTC))
 * (Well, this one is mine --Roglo 19:03, 19 May 2008 (UTC))
 * (Well, this one is mine --Roglo 19:03, 19 May 2008 (UTC))
 * (Well, this one is mine --Roglo 19:03, 19 May 2008 (UTC))
 * (Well, this one is mine --Roglo 19:03, 19 May 2008 (UTC))
 * (Well, this one is mine --Roglo 19:03, 19 May 2008 (UTC))
 * (Well, this one is mine --Roglo 19:03, 19 May 2008 (UTC))
 * (Well, this one is mine --Roglo 19:03, 19 May 2008 (UTC))
 * (Well, this one is mine --Roglo 19:03, 19 May 2008 (UTC))

The reason most of these are not displayed on the canonical author's page is because the title records that have been "serialized" are of the author as published and not the variant title attributed to the canonical author. They could be fixed by editing these variants as part of a series and removing the series from the first set of title records. MHHutchins 20:31, 19 May 2008 (UTC)

Authors in the Author Directory that don't appear to have written anything
ISFDB:Authors that only exist due to reviews I've cleared up most of the "A"s, either by correcting/regularising the name in the review, or adding the missing publication or variant. The above page I created has links to each review causing a problem - many authors can be cleared up with ONE review fix, so have away at them! BLongley 13:54, 29 Dec 2007 (CST)

Authors who are (probably) doubled
I don't know how we can browse authors summary pages by ids. Advanced search gives access only to editing authors.

These have the same canonical names and their legal names are either equal or at least one legal name is NULL (so they have to be checked before merging):

select a1.author_canonical canonical, a1.author_legalname legal, count(1) from authors a1, authors a2    where a1.author_canonical = a2.author_canonical and a1.author_id <> a2.author_id and (a1.author_legalname = a2.author_legalname or IsNull(a1.author_legalname) or IsNull(a2.author_legalname)) group by 1, 2;

Results from backup January 27, 2008: +-+--+--+ +-+--+--+ +-+--+--+ 3 rows in set (1.70 sec) Looks like one Simon Clark is new. How do these get created? --Roglo 13:15, 28 Jan 2008 (CST)
 * canonical  | legal        | count(1) |
 * Philip Kent | NULL        |        2 |
 * Simon Clark | NULL        |        1 |
 * Simon Clark | Clark, Simon |       1 |


 * You can't do it by changing author names in Titles and Publications since the software will assume that you want to use the existing author record instead of creating a new one. However, when you change an Author's name in "Author Data", the software doesn't check whether there is another author record with the same name. Thus, if we have "Simon Clark" and "Simon Clarke" on file and the latter is changed to "Simon Clark", we will end up with two "Simon Clark" records. When reviewing submissions that attempt to modify Author records, moderators have no way of telling whether another Author record with the same name already exists, so they often slip through. Ahasuerus 15:00, 28 Jan 2008 (CST)


 * P.S. I have deleted the results from the last two passes, but they are available via this page's History if anybody needs them. Ahasuerus 15:00, 28 Jan 2008 (CST)

2008-02-10 results:

+-+--+--+ +-+--+--+ +-+--+--+
 * canonical  | legal        | count(1) |
 * Simon Clark | NULL        |        1 |
 * Simon Clark | Clark, Simon |       1 |


 * Fixed on 2008-02-29. Ahasuerus 23:57, 29 Feb 2008 (CST)

Pseudonym consistency
There are 22 authors in the latest dump who are self-pseudonyms (they have a pseudonym record pointing back to themselves): select ps.pseudo_id, ps.author_id, au.author_canonical from pseudonyms ps, authors au   where ps.author_id = ps.pseudonym and ps.author_id = au.author_id;

There are two pairs of authors who are pseudonyms of each other:
 * Avram Davidson (author_id 501) and Ellery Queen (author_id 4921), twice
 * Douglas Stapleton (author_id 12921) and Doug Stapleton (author_id 12922)

There may also be longer cycles... WimLewis 22:28, 25 Mar 2007 (CDT)

Pseudonyms That Have non-Variant Titles
An Author record that is flagged as a pseudonym should have all of its Title records turned into Variant Titles. We have a script that finds all Author records that have non-Variant Titles. See ISFDB:Data Consistency/Pseudonyms With Titles for the current list. Ahasuerus 03:31, 28 April 2008 (UTC)

Database integrity constraints
It looks liky MySQL does support the standard SQL database-integrity features like ADD CONSTRAINT foo FOREIGN KEY bar. Some of the inconsistencies in the db could be made impossible by adding the appropriate constraints (after the existing inconsistencies are cleaned up). I don't know the schema well enough to write the constraints myself or I'd suggest some here. --WimLewis 15:36, 29 Mar 2007 (CDT)

Title vs. Publication Type Consistency
This section documents mismatches between Title types and associated Publication types:


 * Collections:
 * Collection-Novel mismatches
 * Collection-Anthology mismatches
 * Collection-Omnibus mismatches
 * Collection-Other mismatches


 * Novels:
 * Novel-Magazine mismatches
 * Novel-Anthology mismatches
 * Novel-Collection mismatches
 * Novel-Other mismatches


 * Short fiction:
 * Short fiction-Non-fiction mismatches
 * Short fiction-Novel mismatches


 * Non-fiction:
 * Non-fiction-Novel mismatches
 * Non-fiction-Other mismatches


 * Other:
 * Non-genre mismatches
 * Serial mismatches
 * Omnibus mismatches
 * Anthology mismatches
 * Editor mismatches

Pseudonyms in Collections
The following page lists all known Collection Publications that include pseudonymous Titles as of the August 11, 2007 backup. Although this is not always indicative of an error, we estimate that a significant percentage of these occurrences need to be fixed.


 * Pseudonyms in Collections NEW as of 2008-05-06

Serial Dates
The following page lists all known Serial records whose Title dates do not match the dates of the Publications that they appeared in as of the 2008-02-10 backup.


 * Serial Dates