{"id":100,"date":"2011-09-20T10:00:00","date_gmt":"2011-09-19T22:00:00","guid":{"rendered":"http:\/\/deborahfitchett.com\/blog\/?p=100"},"modified":"2011-09-20T10:00:00","modified_gmt":"2011-09-19T22:00:00","slug":"the-death-of-organised-data","status":"publish","type":"post","link":"https:\/\/deborahfitchett.com\/blog\/2011\/09\/the-death-of-organised-data\/","title":{"rendered":"The death of organised data"},"content":{"rendered":"<p>I&#8217;ve been hearing rumours that the big IT companies may be giving up on organised data.  Which is kind of a big thing for the same reason that it makes perfect sense:  there are terabytes upon terabytes of data pouring onto computers and servers all the time, and organising all of that into a useful format takes a heck of a lot of time.<\/p>\n<p>Especially because data organised to suit one need isn&#8217;t necessarily going to suit most actual needs.  If you&#8217;re a reference librarian (either academic or, I suspect, public) you&#8217;ll have had the student coming to your desk who can&#8217;t quite understand why typing their assignment topic into a database doesn&#8217;t return the single perfect article that explicitly answers all their questions.<\/p>\n<p>So I think there&#8217;s two ways of organising data:<\/p>\n<ul>\n<li>&#8220;pre-organising&#8221; it &#8211; eg a dictionary, which is organised alphabetically, assuming you want to find out about a given word. It has information about which are nouns and what dates they derive from (to a best guess, obviously) but there&#8217;s no way to search for nouns that were used in the 16th century because the dictionary creator never imagined someone might want to know such a thing.<\/li>\n<li>organising it at point of need &#8211; eg a database which had all this same information but allowed you to tell it you want only nouns deriving from the 16th century or earlier; or only pronunciations that end in a certain phonetic pattern; or only words that include a certain other word in the definition.<\/li>\n<\/ul>\n<p>Organising data at point of need solves one problem (it&#8217;s much more flexible) but it doesn&#8217;t actually save time on the organising end. In fact, it&#8217;s likely to take quite a lot more time.<\/p>\n<p>So is humanity doomed to be swimming in yottabytes of undifferentiated, unorganised, and thus useless data?  I frowned over this for a while, and after some time I remembered the alternative to organising data:  parsing it.  (This is just what humans do when we skim a text looking for the information we want.) So, for example, a computer could take an existing dictionary as input and look for the pattern of a line which includes &#8220;n.&#8221; (or s.b. or however the dictionary indicates a noun), and a date matching certain criteria, and returns to the user all the lines that match what was asked for.<\/p>\n<p>Parsing is hard, and computers have historically been bad at it.  (Bear in mind though that for a long time humans beat computers at chess.)  This is not because computers aren&#8217;t good at pattern-matching; it&#8217;s because humans are so good at making typos, or rephrasing things in ways that don&#8217;t fit the criteria.  (One dictionary says &#8220;noun&#8221;, one says &#8220;n.&#8221;, one says &#8220;s.b.&#8221;, one uses &#8220;n.&#8221; but it refers to something else entirely&#8230;) A computer parsing data has to account for all the myriad ways something might be said, <em>and<\/em> all the myriad things a given text might mean.<\/p>\n<p>But if you look around, you&#8217;ll see parsing is already emerging.  One of the things the <a href=\"http:\/\/www.libx.org\/\">LibX plugin<\/a> does is look for the pattern of an ISBN and provide a link to your library&#8217;s catalogue search.  You may have an email program that, when your friend writes &#8220;Want to meet at 12:30 tomorrow at the Honeypot Cafe?&#8221;, gives you a one-click option to put this appointment into your calendar.  Machine transcription from videos, recognition of subjects in images, machine translation &#8211; none of it&#8217;s anywhere near perfect, but it&#8217;s all improving, and all these are important steps in the emergence of parsing as a major player in the field of managing data.<\/p>\n<p>So yes, if I was a big IT company I might want to get out of the dead-end that is organising data, too &#8211; and get into the potentially much more productive field of parsing it.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve been hearing rumours that the big IT companies may be giving up on organised data. Which is kind of a big thing for the same reason that it makes perfect sense: there are terabytes upon terabytes of data pouring onto computers and servers all the time, and organising all of that into a useful [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[],"tags":[97,107,40],"_links":{"self":[{"href":"https:\/\/deborahfitchett.com\/blog\/wp-json\/wp\/v2\/posts\/100"}],"collection":[{"href":"https:\/\/deborahfitchett.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/deborahfitchett.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/deborahfitchett.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/deborahfitchett.com\/blog\/wp-json\/wp\/v2\/comments?post=100"}],"version-history":[{"count":0,"href":"https:\/\/deborahfitchett.com\/blog\/wp-json\/wp\/v2\/posts\/100\/revisions"}],"wp:attachment":[{"href":"https:\/\/deborahfitchett.com\/blog\/wp-json\/wp\/v2\/media?parent=100"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/deborahfitchett.com\/blog\/wp-json\/wp\/v2\/categories?post=100"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/deborahfitchett.com\/blog\/wp-json\/wp\/v2\/tags?post=100"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}