Reform (Fan-Fiction Reformatting Tool) [http://www.motoslave.net/reform/] Copyright (c) 2002-2005 Thomas Michael EDWARDS, all rights reserved. Thomas Michael EDWARDS ------------------------------------------------------------------------ WHAT IS REFORM? ------------------------------------------------------------------------ Reform is a command-line (i.e. non-GUI) program written to reformat fan- fiction or any other type of periodical/book-ish text files. I suppose that you could use it for other things, but your mileage may vary (i.e. caveat emptor), however, if you do use it for other things I'd like to hear about it. Please submit all bug reports, feature requests, questions, and comments to me via e-mail: tmedwards@motoslave.net ------------------------------------------------------------------------ WHAT DOES IT DO? ------------------------------------------------------------------------ Reform uses a one-pass (re)formatting engine and regardless of any optional processing that may be specified the following mandatory processing is always done by Reform: - Parses and removes control tags. The tags, and the entire line that contains them, are discarded (i.e. they cannot be safely embedded with other text) unless the keeptags option (-k, --keeptags) is specified, in which case the tags are kept and passed into the output. Processing state changes take place immediately. - Removes all naked whitespace (i.e. whitespace at the end of a line or on a line that is otherwise blank). - Identifies and unwraps paragraphs. Paragraphs are defined as any contiguous lines of text that are not separated by blank lines or indention. During paragraph assembly, separator lines (i.e. lines of asterisk, hyphens, equal signs and other characters that writers use to split chapters and/or segments up with) are passed through unprocessed, except when they would extend past the line wrapping boundary, in which case they are compacted as much as possible and truncated if they are still too long. A separator line is defined as any line that contains only whitespace and one or more of the following range of characters: -_=+.~!@#$%^*()[]{}<>|:/\oO - Removes all control (non-printable) characters. - Converts the text, as necessary, to ASCII-7bit, ISO/IEC 8859-1 (Latin-1), or XHTML entities. The optional processing stages are applied to the text after the mandatory processing. See the USAGE section below for more details. What Reform does not do (i.e. Caveats): - It only handles plain text files. Do NOT feed it anything else. Although, as a side note, because of the way it reads/writes files as a side effect it can, and will, convert DOS/Mac/UNIX/VMS text files into the format required by the target system (i.e. the DOS/Windows versions convert to DOS text files, the UNIX versions convert to UNIX text files, etc). - It only handles character sets that are supersets, or extensions, of ASCII. Other, non-ASCII-compatible, character sets (e.g. EBCDIC) are not supported. - It only handles single-byte character sets (SBCS). However, I do have plans to eventually extend its functionality to support the UTF-8 "wide" character set at some point in the future. - It doesn't fix grammar, spell check, or make the content of your story any good. Those tasks it leaves to you. ------------------------------------------------------------------------ REFORM USAGE ------------------------------------------------------------------------ Note that except for those options which require an argument, the order in which you enter any of the command line arguments does not matter. USAGE: reform [OPTIONS] FILE FILE Name of the input file. You can use a hyphen ('-') to read from standard in. General options: -h, --help Print an abbreviated form of this usage information, then exit. -k, --keeptags Do not remove control tags from the output. Only really useful when you're going to pass the output of reform back into itself for more processing, either by running reform on its previous output file or by using reform in a pipe chain. -o FILE, --outfile=FILE Name of output file; defaults to the input file name. You can use a hyphen ('-') to write to standard out. Only formatted text goes to standard out, any messages generated by Reform are sent to standard err. Note that when the output file is omitted and an input file is given, the input file is copied to a backup (with the extension ".orig") and the original input file is used as the output file, except when the input file is a hyphen ('-'), to denote that reform should read from standard in, in which case the output is sent to standard out. -v, --version Print version information, then exit. Output mode options: -ma, --ascii Output text as ASCII-7bit; this is the default. ASCII-7bit only supports the basic Roman letters, numerals and basic punctuation. Use this if you want to make sure that your text will look exactly the same on all systems regardless of what character sets are supported. -m1, --8859-1, --latin1 Output text as ISO/IEC 8859-1 (Latin-1). ISO/IEC 8859-1 (Latin-1) supports everything that ASCII-7bit does as well as many, but not all, of the various accent and accented characters. It's widespread itself and also shares all of its character set with the subset of Unicode that it supports (i.e. the single-byte portion), so it's acceptable on most all systems. -mh, --html, --xhtml Output text as an XHTML 1.0 Strict document. --xhtml-bare Output text as bare XHTML, not a complete XHTML document (i.e. do not include the , , or containers). Only really useful when you want to add your own custom XHTML wrapper. Processing options: -a, --all Enables all optional processing options, except doublespace (same as: -b -d -i -s -t -w). -b, --blankline Separate paragraphs with blank lines. Does not add extra blank lines (i.e. it will not add a blank line if there was a directly preceding blank line). -d, --deindent Remove preexisting indention from paragraphs. -i, --indent Indent paragraphs with 1 tab (implies -d). -s, --space Tries to ensure proper spacing in paragraphs; uses a single space between sentences. -sd, --doublespace Tries to ensure proper spacing in paragraphs; uses a double space between sentences. -t, --tab[=NUM] Convert tabs to 4 (or NUM) spaces. -w, --wrap[=LEN] Wrap lines after 72 (or LEN) characters. Only wraps on whitespace or by splitting hyphenated words, with the exception that words that begin with a hyphen are never split and hyphenated words are never split in XHTML mode. USAGE EXAMPLES: Just apply basic processing, which includes unwrapping paragraphs, then output as ASCII: reform PRE.txt Apply basic processing, wrap paragraphs at a line length of 72 characters, then output as ASCII: reform -w PRE.txt Apply basic processing, wrap paragraphs at a line length of 60 characters, then output as ASCII: reform --wrap=60 PRE.txt Apply basic processing, add blank lines between paragraphs if needed, remove any preexisting indention, wrap paragraphs at a line length of 72 characters, then output as ASCII: reform -b -d -w PRE.txt Apply basic processing, remove any preexisting indention, then output as as an XHTML 1.0 Strict document: reform -d -mh PRE.txt Et cetera, et cetera.... COMMAND TAGS: Use Reform command tags ("<%reform=on|off|block|line%>") in the input file to control the processing of it. The arguments are switches which when encountered change the state of the processing engine. Once the tags are parsed by the engine they are then discarded and do not show up in the final output, unless you use the keeptags option (-k, --keeptags). TAG VALUES on [default] Turns the processing engine on. off Turns the processing engine off. block [default] Puts the processing engine into block mode. The engine will try to unwrap and assemble blocks (paragraphs) as normal. line Puts the processing engine into line mode. The engine will not unwrap and assemble blocks (paragraphs), all other processing takes place as normal. FORMAT <% reform = "ENGINE_STATE" %> ENGINE_STATE can be one of four values: on, off, block, or line. The tags are not case-sensitive and whitespace between the tag delimiters, name, and value do not matter. Either style of quotes may be used (" or ') or may be omitted completely. The following examples are ALL correct. <%reform=on%> <%reform="on"%> <%reform='on'%> <% reform=on %> <% reform="on" %> <% reform='on' %> <% reform = on %> <% reform = "on" %> <% reform = 'on' %> EXAMPLE Processed text here, passes to the output file after processing. <%reform=OFF%> Non-processed text here, passes to the output file unchanged. Well mostly unchanged, character set conversion and naked whitespace removal always happen. More non-processed text here. <%reform=ON%> More processed text here. ------------------------------------------------------------------------ CONTRIBUTIONS ------------------------------------------------------------------------ David Ross Portability changes and the Mac OS X ports. Michael A Chase Changes to the old makefiles and the early Linux (x86 ELF) ports. ------------------------------------------------------------------------ PORTS ------------------------------------------------------------------------ Linux (x86 ELF) current : Thomas Michael EDWARDS previous: Michael A Chase FreeBSD (x86 ELF) previous: Thomas Michael EDWARDS Mac OS X previous: David Ross Windows (x86 Win32) current : Thomas Michael EDWARDS