{"id":116,"date":"2011-02-13T20:08:00","date_gmt":"2011-02-13T07:08:00","guid":{"rendered":"http:\/\/deborahfitchett.com\/blog\/?p=116"},"modified":"2011-02-13T20:08:00","modified_gmt":"2011-02-13T07:08:00","slug":"converting-a-plaintext-bibliography-to-endnoteris-format-with-help-from-linuxterminal","status":"publish","type":"post","link":"https:\/\/deborahfitchett.com\/blog\/2011\/02\/converting-a-plaintext-bibliography-to-endnoteris-format-with-help-from-linuxterminal\/","title":{"rendered":"Converting a plaintext bibliography to Endnote\/RIS format with help from Linux\/Terminal"},"content":{"rendered":"<p>[<strong>Update 16\/7\/2011:<\/strong> See my more recent post on the topic, <a href=\"http:\/\/deborahfitchett.blogspot.com\/2011\/07\/launching-ref2ris-convert-your-typed.html\">Launching Ref2RIS &#8211; convert your typed bibliography to Endnote format<\/a>, which makes things even easier.]<\/p>\n<p>You won&#8217;t want to do this unless you&#8217;ve got literally hundreds of references.  Any less, and <a href=\"http:\/\/www.library.uq.edu.au\/endnote\/convert_bibliography.html\">these suggestions<\/a> are way easier.<\/p>\n<p>1. Format references so they&#8217;re each on their own line &#8211; no blank lines.<\/p>\n<p>2. Use Word&#8217;s &#8220;Find Special&#8221; capabilities to replace <i>a phrase in italics<\/i> with {it}a phrase in italics{endit} and <b>a phrase in bold<\/b> with {b}a phrase in bold{endb}.\u00a0\u00a0(Similarly if the citations contain underlines.)<\/p>\n<p>3. Save as plaintext &#8211; say, source.txt.\u00a0\u00a0Now the fun begins&#8230;\u00a0\u00a0My own source text contains 600-odd lines in ACS style, like this: <\/p>\n<pre>Bamford, C. H.; Tipper, C. F. H. {it}Comprehensive Chemical Kinetics{endit}; Elsevier: New York, {b}1977{endb}. <br>House, D. A.{it}Chem. Rev.{endit} {b}1962{endb}, {it}62{endit}, 185 <\/pre>\n<p>4. Open up Terminal or some other Linux command line.<\/p>\n<p>5. Endnote records are separated by a line <\/p>\n<pre>ER\u00a0\u00a0- <\/pre>\n<p> &#8211; that&#8217;s two spaces before the hyphen and one after.\u00a0\u00a0(All these details come from <a href=\"http:\/\/www.refman.com\/support\/risformat_intro.asp\">Endnote&#8217;s help pages<\/a>.) This is the easy part: type in <\/p>\n<pre>sed -e 's\/^\\(.*\\)\/\\1ER\u00a0\u00a0- \/' source.txt > source1.txt<\/pre>\n<p>6. The start of each Endnote record tells you what kind of citation it is &#8211; eg a book, journal etc.\u00a0\u00a0To find every line that includes a colon (ie separating the publisher from the city published in) type in <\/p>\n<pre>sed -e 's\/^\\(.*:\\)\/TY\u00a0\u00a0- BOOK@@\\1\/' source1.txt > source2.txt<\/pre>\n<p> Note 1: The &#8220;@@&#8221; is in there as a sign that you&#8217;ll need to replace this with a new line later; but we want to keep everything on one line for now.<br \/>Note 2: This is a good example of why this whole method is highly suspect, because it&#8217;ll also catch citations which have a colon in the article title or in a typo or whatever.\u00a0\u00a0So if you can think of a better sign that a citation is a book then use that instead of the colon.<\/p>\n<p>Alternatively, you could type in <\/p>\n<pre>sed -e 's\/^\\(.*{it}[0-9]*{endit}\\)\/TY\u00a0\u00a0- JOUR@@\\1\/' source1.txt > source2.txt<\/pre>\n<p> to find every line that contains {it}[some number]{endit} which, in my source, is the best indicator that I&#8217;m dealing with a journal.\u00a0\u00a0The same caveats apply &#8211; you&#8217;ll get both false positives and false negatives.<\/p>\n<p>Anyway, keep doing what seems best given your source, and fix up the inevitable mistakes by hand until each line starts with TY\u00a0\u00a0&#8211; something.\u00a0\u00a0If you want to give up and just assume that everything that isn&#8217;t already assigned as something must be a journal then try <\/p>\n<pre>sed -e 's\/^\\([^(TY\u00a0\u00a0- )].*$\\)\/TY\u00a0\u00a0- JOUR@@\\1\/' source2.txt > source3.txt<\/pre>\n<p>I now have source looking like:<\/p>\n<pre>TY\u00a0\u00a0- BOOK@@Bamford, C. H.; Tipper, C. F. H. {it}Comprehensive Chemical Kinetics{endit}; Elsevier: New York, {b}1977{endb}. <br>ER\u00a0\u00a0- <br>TY\u00a0\u00a0- JOUR@@House, D. A.{it}Chem. Rev.{endit} {b}1962{endb}, {it}62{endit}, 185 <br>ER\u00a0\u00a0- <\/pre>\n<p>7. Now we keep playing with patterns.\u00a0\u00a0(You may be able to do large chunks of this with regular find\/replace, but for illustrative purposes I&#8217;ll keep using Terminal.)<\/p>\n<p>For example, in my source the authors are nicely set off: they come after &#8220;@@&#8221; and before the first &#8220;{it}&#8221; (or &#8220;in {it}&#8221;), and if there&#8217;s more than one of them they&#8217;re separated by &#8220;;&#8221;.\u00a0\u00a0So a few commands: <\/p>\n<pre>sed -e 's\/@@\\(.* in {it}\\)\/@@A1\u00a0\u00a0- \\1\/' source3.txt > source4.txt<br>sed -e 's\/@@\\(.* {it}\\)\/@@A1\u00a0\u00a0- \\1\/' source3.txt > source4.txt<br>sed -e 's\/;\\(.*;\\)\/@@A1\u00a0\u00a0- \\1\/' source5.txt > source6.txt (This one I had to repeat a few times depending how many authors could be cited in one reference; there's supposed to be a way to do it globally but my unix fu is not strong.)<br>sed -e 's\/;\\(.*{it}\\)\/@@A1\u00a0\u00a0- \\1\/' source8.txt > source9.txt<\/pre>\n<p>Journal titles: <\/p>\n<pre>sed -e 's\/^\\(TY\u00a0\u00a0- JOUR.*\\)\\({it}.*{endit} {b}\\)\/\\1@@JO\u00a0\u00a0- \\2\/' source9.txt > source10.txt<\/pre>\n<p>Years: <\/p>\n<pre>sed -e 's\/\\({b}[0-9]*{endb}\\)\/@@Y1\u00a0\u00a0- \\1\/' source10.txt > source11.txt<\/pre>\n<p>And so forth.\u00a0\u00a0You pretty soon start to see why the first suggestion on most lists of ways to convert plaintext citations into RIS format is always &#8220;Just type it in \/ search for it again by hand&#8221;.\u00a0\u00a0The method above is really only suitable if you&#8217;ve got literally hundreds of citations. (I have 639, plus or minus.)<\/p>\n<p>8. Eventually you&#8217;ll be at a point where you can do a simple find\/replace to change @@ to a new line and nuke all the {it} and so forth.\u00a0\u00a0This will be a great relief.<\/p>\n<p>9. Rename your final saved file from source12.txt to source12.ris and open with Endnote.<\/p>\n<p>10. Bonus material:\u00a0\u00a0if this was a bibliography to a paper using <b>numbered<\/b> citations in order using eg [1], then in that paper you can do a find\/replace on [ -> { and ] -> }, then tell the Endnote plugin to format citations, and voila, the best magic ever.\u00a0\u00a0(If the paper uses author\/date citations then you&#8217;ll have to link them by hand, sorry.)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>[Update 16\/7\/2011: See my more recent post on the topic, Launching Ref2RIS &#8211; convert your typed bibliography to Endnote format, which makes things even easier.] You won&#8217;t want to do this unless you&#8217;ve got literally hundreds of references. Any less, and these suggestions are way easier. 1. Format references so they&#8217;re each on their own [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[],"tags":[84,123,15],"_links":{"self":[{"href":"https:\/\/deborahfitchett.com\/blog\/wp-json\/wp\/v2\/posts\/116"}],"collection":[{"href":"https:\/\/deborahfitchett.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/deborahfitchett.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/deborahfitchett.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/deborahfitchett.com\/blog\/wp-json\/wp\/v2\/comments?post=116"}],"version-history":[{"count":0,"href":"https:\/\/deborahfitchett.com\/blog\/wp-json\/wp\/v2\/posts\/116\/revisions"}],"wp:attachment":[{"href":"https:\/\/deborahfitchett.com\/blog\/wp-json\/wp\/v2\/media?parent=116"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/deborahfitchett.com\/blog\/wp-json\/wp\/v2\/categories?post=116"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/deborahfitchett.com\/blog\/wp-json\/wp\/v2\/tags?post=116"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}