Release notes
	
	
	  ht://Dig Copyright © 1995-2002 The ht://Dig Group
	   Please see the file COPYING for
	  license information.
	
	
	
	  These are notes that go with each release of ht://Dig. There
	  is also a ChangeLog file which has
	  more details on the code changes.
	
	
	Release notes for htdig-3.1.6 1 Feb 2002
	As with previous releases, this version cleans up some
	remaining bugs and adds a few heavily-requested features. As
	the latest stable release, it is recommended for all
	production servers.
	
	
	- Fixed another nasty security hole in htsearch, which would
	allow a denial of service attack or forcing htsearch to read
	in config files outside of the configuration directory.
- Fixed some problems with htmerge, including problems with
	words beginning with special characters and merging multiple
	databases.
- Fixed a bug in handling hopcounts.
- Fixed problems in handling non-standard relative HTTP
	redirects.
- Fixed bugs in external parsers support including being confused by
	charset information in the Content-Type header and handling
	binary output from external converters.
- Fixed bugs in the default English endings database. (Under
	ispell, it wasn't quite intended for the accuracy needed for
	our usage.)
- Fixed additional bugs in the endings fuzzy algorithm.
- Fixed bugs with compiling with gcc-3.0 and later.
- Fixed bugs compiling and running on Mac OS X.
- Fixed problems with servers not returning a Last-Modified
	date--now assums indexing time as modification time.
- Fixed a variety of bugs in the HTML parser to more
	flexibly handle non-standard HTML.
- Fixed problems in the TCP connection code and will more
	reliably timeout when a connection hangs and will retry bad
	connections several times before giving up.
- Added the -m "minimal" flag to htdig for only indexing a set list of
	URLs and made the -l (log) flag the default behavior so that
	htdig will stop and restart automatically.
- Added htdump and htload programs for dumping ASCII
	representations of the databases and reloading the same.
- Added support for htnotify to collect multiple URLs and
	allow easy customization of notification messages, including
	the new attributes
	htnotify_replyto,
	htnotify_webmaster,
	htnotify_prefix_file, and
	htnotify_suffix_file.
	
- Added a new "accents" fuzzy algorithm to morph accents,
	including the new accents_db attribute.
- Added a 'list all' feature to htsearch with a query of '*'
	or the current prefix_match_character.
- Added date restricted searching to htsearch including relative dates.
- Added documentation on running
	ht://Dig and the rundig script.
- Added METADESCRIPTION and NSTARS variables to the htsearch templates as well as
	support for $=(var) template variable references.
- Added new config attributes to htsearch for restrict and exclude which work like the
	normal htsearch form variables if the form variables are not
	set.
- Added many new attributes, including
	ignore_dead_servers
	description_meta_tag_names,
	max_keywords,
	translate_latin1,
	url_rewrite_rules,
	search_rewrite_rules,
	anchor_target,
	ignore_alt_text,
	search_results_contenttype,
	boolean_keywords,
	boolean_syntax_errors,
	multimatch_method,
	maximum_page_buttons,
	max_excerpts,
	plural_suffix,
	any_keywords and
	use_doc_date.
- Extended the build_select_lists
	attribute to support select multiple, radio boxes and checkboxes.
- Revised the documentation to make it clearer in parts,
	including the url_part_aliases attribute.
- Updated various contributed utilities including doc2html,
	xmlsearch, rundig.sh, htparsedoc, acroconv.pl, multidig, etc.
- A variety of other bug fixes, and many documentation updates.
	    See the ChangeLog for details.
- Once again, thanks to everyone who reported bugs and bug
	    fixes.
Release notes for htdig-3.1.5 25 Feb 2000
	This version cleans up some remaining bugs in the 3.1.4
	release. As the latest stable release of ht://Dig, it is
	recommended for all production servers.
	
	
	- Fixed a nasty security hole in htsearch, which would allow
	    users to view any file on your site that had read permission.
- Fixed a bug that could cause problems with 8-bit
	    characters on some systems.
- Made some attempts to get htsearch's output to be more HTML 4.0
	    compliant. It quotes all HTML tag parameters, and uses ";"
	    instead of "&" as parameter separator in URLs for next
	    pages. Reserved characters in parameters are now encoded.
- Fixed handling of SGML entities: htdig will still decode
	    them to store as single characters in the database, but
	    htsearch now encodes some of them back for compliant results.
- Added two new formats for variables in htsearch templates,
	    $%(var), which escapes the variable for a URL, and $&(var),
	    which HTML-escapes the variable as necessary.
- Fixed htdig's handling of robots.txt, such that only the first
	    applicable User-agent field bearing its name will be used, rather
	    than only the last.
- Fixed htdig's handling of servers that return 2-digit years.
- Fixed handling of embedded quotes in quoted string lists.
- Fixed handling of relative URLs with trailing ".." or leading
	    "//".
- Fixed handling of the
	    valid_extensions
	    attribute, which sometimes failed in the previous version.
- Enhanced the handling of local filesystem indexing with the
	    local_urls,
	    local_user_urls or
	    local_default_doc
	    attributes, which now allow multiple directory or file names to
	    be tried.
- Added the build_select_lists
	    attribute to allow the config file to specify
	    <select> form elements in htsearch output as a
	    template variable, much like $(SORT) and $(METHOD).
- Added support for two additional configuration attributes:
	    max_keywords, and
	    nph.
- A variety of other bug fixes, and many documentation updates.
	    See the ChangeLog for details.
- Once again, thanks to everyone who reported bugs and bug
	    fixes.
	Release notes for htdig-3.1.4 9 Dec 1999
	This version cleans up some remaining bugs in the 3.1.3
	release. As the latest stable release of ht://Dig, it is
	recommended for all production servers.
	
	
	- Fixed a nasty bug in URL parameter parsing, which was gobbling
	    up bare ampersands (&) and CGI parameter names.
- Fixed a bug where htdig would go into an infinite loop if an
	    entry in local_urls,
	    local_user_urls or
	    server_aliases was
	    missing the "=".
- Fixed a bug in htsearch, where it failed when reading long
	    queries via the POST method.
- Fixed a bug in htdig, where it failed to close the connection
	    after certain errors.
- Fixed a bug that clobbered the hop count of initial documents.
- Fixed bugs in HTML parser's handling of META tags. It no longer
	    continues indexing meta tags when indexing is turned off for the
	    document, and it no longer gets confused by punctuation in META
	    descriptions and keywords.
- Fixed a bug in the handling of the
	    case_sensitive
	    attribute, so that it's not limited to robots.txt
	    parsing. Now, if false, it causes URLs to be mapped to
	    lowercase, to avoid mixed case duplicates as expected.
- HTML parser now indexes text in alt parameter of img tags, and
	    calculates word locations more accurately than before.
- Digging via the local filesystem can now be done even without
	    an HTTP server running, and a few more file types can be indexed
	    locally, without having to rely on the server.
- Sender name in htnotify's e-mail messages is now quoted.
- The external_parsers
	    attribute is now extended to support external converters, to avoid
	    a lot of the complications of writing external parsers.
- Added support for several new configuration attributes:
	    authorization,
	    start_highlight,
	    end_highlight,
	    local_urls_only,
	    page_number_separator,
	    script_name,
	    template_patterns, and
	    valid_extensions.
- The keywords input parameter to htsearch is now propagated to
	    followup searches, as for other input parameters.
- The query string can now be passed to htsearch as a single
	    command line argument, for use in scripts.
- Added better examples and comments in sample htdig.conf, and
	    added boolean match type to sample search.html form.
- The HTML parser in htdig now turns off indexing between
	    <style> and </style> tags.
- A variety of other bug fixes, and many documentation updates.
	    See the ChangeLog for details.
- Once again, thanks to everyone who reported bugs and bug
	    fixes.
	Release notes for htdig-3.1.3 22 Sep 1999
	This version fixes a number of bugs in the 3.1.2 release and
	is the latest stable release of ht://Dig. It is the only version
	recommended for production servers and users of all previous
	versions are suggested to upgrade.
	
	
	- Fixed a long-standing bug where search queries containing
	    punctuation would not be highlighted in excerpts.
- Fixed a bug where SGML entities inside HTML tags were not
	    expanded.
- Fixed the server_aliases
	    attribute to default to port 80 if ommitted.
	
- Fixed a bug in URL parsing, where documents ending in the
	    value used for remove_default_doc were ignored. For
	    example, a URL ending in /left_index.html would become /.
	
- Fixed META robot parsing to correctly parse multiple
	    directives.
- Fixed a coredump when generating the metaphone fuzzy
	    database on some systems.
- Fixed the behavior of the modification_time_is_now
	    attribute to work as documented.
- Fixed the behavior of htdig to block out the
	    username/password set on the command-line in process
	    listing.
- Fixed a bug with external parsers to prevent shell escapes
	    in filenames.
- Fixed a bug on some systems, where printing a date might
	    crash.
- Handles the ispell endings lists better so that suffixes
	    more closely match grammatical rules.
- Changed the maximum word length to a run-time option, set
	    with the new attribute maximum_word_length.
	
- Tests for the presence of alloca.h, which would cause
	    problems with compiling the regex code under non-GNU
	    compilers.
- Added support for <EMBED>, <OBJECT>, and
	    <LINK> HTML tags.
	
- A variety of other bugs were fixed, see the 
	    ChangeLog for details.
- When indexing, htdig should now attempt to index compound
	    words as separate words in addition to a compound word. For
	    example, "pdf_parser" would also be indexed as "pdf" and "parser."
	
- Once again, thanks to everyone who reported bugs and bug
	    fixes.
	Release notes for htdig-3.1.2 21 Apr 1999
	This version fixes a number of bugs in the 3.1.1 release and
	is the latest stable release of ht://Dig. It is highly
	recommended for production servers.	
	
	
	- Fixed a bug that ignored META description tags when they
	  were also added to the meta_keywords attribute.
- Fixed the HTML comment parsing to be more lenient about
	  non-standard comments.
- Fixed problems in the date-parsing code that made it Y2K
	  incompatible. In particular, it forgot that 2000 is a leap
	  year and wouldn't correctly parse dates after 29 Feb
	  2000.
- Fixed a variety of bugs in the HTML parser.
- Fixed an old bug that would exclude all URLs if
	  the exclude_urls attribute left empty.
- Fixed display of META description tags. Now it always
	  shows the top of a description. If no description exists, it
	  looks for the search terms in the excerpt as usual.
- Fixed some small memory leaks.
- Changed the htfuzzy endings algorithm to use a more
	  efficient regex system. Speed improvements on non-English
	  languages are noted, now taking minutes for generation that
	  would take days!
- Changed the noindex_start and noindex_end attributes to
	  allow case-insensitive matching.
- Added on-disk versions of the builtin templates to make it
	  more obvious how to change the results templates.
- Added date_format 
	attribute to change the format of dates output in search results.
- Added extra_word_characters
	   attribute that defines extra characters that should be
	  considered part of a word, rather than punctuation.
- Several other, relatively minor bugs were also
	  fixed. Many thanks to those who sent in bug reports and to
	  Gilles Detillieux for coordinating this release.
	 Release notes for htdig-3.1.1 17 Feb 1999
	 This version cleans up some remaining bugs in the 3.1.0
	 release. As the latest stable release of ht://Dig, it is
	 recommended for all production servers.
	
	
	  - Fixed a bug in the configure script under IRIX and Solaris 7.
	  
- Fixed a minor bug with the Berkeley database code under
	  AlphaLinux.
- Fixed a serious bug causing bus errors on several platforms,
	  notably Solaris SPARC, caused by unaligned access to database
	  structures.
- Fixed some bugs in the boolean search parser.
- Replaced the contributed parse_word_doc.pl script with a
	  more capable parse_doc.pl script.
- Fixed the htnotify program to parse dates as mentioned in the
	  documentation.
- Cleaned up some minor mistakes in the documentation and moved
	  to HTML 4.0 Transitional syntax.
- Fixed the documentation for the pdf_parser attribute that was
	  changed in version 3.1.0. This attribute must call the parser with
	  all command-line options.
	
	  Release notes for htdig-3.1.0 9 Feb 1999
	  This version marks the "full release" of version
	  3.1.0. Naturally, this version adds a few new feature and fixes a
	  large number of remaining bugs. This version is the latest stable
	  release of ht://Dig and is recommended for all production servers
	  for current bug-fixes and oft-requested
	  features.
	
	
	  
	    NOTE: You must rebuild
	    your databases from scratch after updating to this
	    version. Several database-related bugs were fixed and will remain
	    unless you rebuild from scratch. We're sorry for any
	    inconvenience.
	  
	
	
	  - Fixed a variety of small memory leaks.
- Fixed a bug that could duplicate documents in the document
	  databases.
- Fixed a bug that would not remove documents marked as deleted.
- Fixed a bug that could dump core with incorrectly defined
	  template_map attributes.
- Fixed a bug that could dump core or produce bogus dates when
	  a server returns the date in an incorrect format.
- Fixed a variety of string-matching bugs that caused problems
	  with restricting indexing and searching.
- Fixed a bug that could dump core if logging searches and CGI
	  environment variables were not set.
- Fixed a bug that would not hilight searches properly if they
	  contained punctuation.
- Fixed PDF parsing to support programs beyond acroread.
- Fixed a bug that caused problems with large robots.txt files.
- Fixed a bug in the sample rundig script from a non-portable
	  test for the age of databases.
- Fixed bugs in the fuzzy matching code that could prevent
	  searches from completing if fuzzy databases were not present.
- Fixed bugs in the soundex and metaphone algorithms that
	  would only return the first word of several matching
	  words. Note that to completely fix this bug, you must
	  rebuild your soundex and metaphone databases.
- Fixed up many compilation warnings and errors.
- Fixed a performance slowdown in htsearch when
	  backlink_factor and
	  date_factor are zero and can
	  be ignored.
- Improved performance when a server ignores the
	  If-Modified-Since request during update digs.
- Added a warning message if the locale: option is set
	  to a locale that is not present.
- Some minor performance improvements.
- Allow "include" keyword in config
	  file to include other config files.
- Uses latest (2.6.4) version of the Berkeley database.
- Two databases may be merged together using
	  htmerge.
- The htdig program can be safely
	  stopped and restarted in the middle of a dig. The dig will write
	  the progress to the file specified by the new
	  url_log option.
- Added support for anchors in excerpts with the
	  add_anchors_to_excerpt
	  option and the ANCHOR template variable.
- Added support for sorting results in increasing or
	  decreasing order of document date, size, title and score using
	  the search form. Note that changing
	  sort from the default of score will result in a performance
	  decrease.
- Added config options sort and
	  sort_names to change the
	  default sort and names used in the SORT template variable.
	  
- Added the option compression_level to
	  compress the document database if the zlib library is
	  present.
- Added the options
	  noindex_start and
	  noindex_stop to delimit
	  sections of HTML documents to be ignored.
- Added the option
	  allow_in_form to allow
	  specific config options to be set in the search form.
- Added the option
	  bad_querystr to ingore URLs
	  containing specified CGI queries.
- Added the option
	  search_results_wrapper
	  to replace separate header and footer files. For mor
	  information, see the general
	  htsearch documentation.
- Added option
	  no_title_text to allow
	  configuration of the text used when no title is found.
- Added option
	  url_part_aliases to allow
	  rewriting portions of URLs.
- Added option
	  common_url_parts to
	  compression common portions of URLs. Requires rebuilding
	  databases when changed.
- Added option
	  remove_default_doc to
	  control whether ht://Dig strips off the default document in a
	  folder. Set to empty will prevent problems with servers that
	  treat / and /index.html as different URLs.
- Of course there are many other bug-fixes and small
	  enhancements. Many thanks to everyone who reported a bug or
	  contributed code for this release!
	  Release notes for htdig-3.1.0b4 22 Dec 1998
	  This version fixes a security hole in htnotify. The hole has been
	  present in previous versions but was inadevertently made worse in
	  the 3.1.0 beta releases. Malicious users could contstruct pages
	  that executed commands running under the shell of the user running
	  htnotify. It is highly recommended that users of previous
	  versions switch to this release.
	
	
	  - Fixed a memory leak in htnotify and htsearch.
- Updated the contributed parse_word_doc.pl script.
	  Release notes for htdig-3.1.0b3 15 Dec 1998
	  This version adds only a few features and a significant number of
	  bug fixes. This version has been pretty thoroughly tested. Though
	  there are a few remaining issues, it is hoped that this will be
	  near the end of the beta releases before version 3.1.0. Note that
	  it's recommended to update your databases to eliminate the
	  possibility of subtle changes in the database format.
	
	
	  - Fixed a bug which would ignore the proxy settings,
	  introduced in version 3.1.0b2.
- Fixed a bug where words would remain from deleted
	  documents.
- Fixed a bug where SGML < was considered part of a tag
	  in the HTML parser, introduced in verison 3.1.0b2.
- Fixed a bug where empty boolean searches would dump
	  core.
- Fixed a bug where boolean "and," "or," and "not" would be
	  removed from a search string, causing a sytnax error.
- Fixed a bug which wouldn't keep track of the hopcounts
	  correctly.
- Added support for META refresh tags, contributed by Aidas
	  Kasparas
- Added support for using CGI
	  environment
	  variables in the search templates, contributed by Gilles
	  Detillieux.
- Improved memory requirements slightly through
	  fixing a memory leak in htdig and a general system-wide
	  adjustment.
- Improved support for multiple exclude and restrict items
	  through htsearch, contributed by William Rhee and Gilles.
- Improved support to compile under CygWinB20, contributed
	  by Klaus Mueller.
- Upgraded to the latest version (2.5.9) of the
	  Berkeley DB
	  
- Added a new option
	  server_wait_time to
	  give a delay between connections to a server. Currently this
	  can also affect local filesystem digging if set.
- Added a new option
	  server_max_docs to limit
	  the number of documents pulled down from a server in one dig.
- Added a new option
	  http_proxy_exclude
	  to ignore the proxy setting on certain URLs.
- Added a new option
	  no_excerpt_show_topto
	  show the top of a document when there is no excerpt.
- Added new options
	  date_factor,
	  backlink_factor, and
	  description_factor to
	  improve search rankings. Respectively, they can give higher
	  rankings to more recent documents, documents with a high
	  number of links pointing to them, and documents with relevant
	  URL descriptions pointing to them. See the documentation for
	  more information.
- Added a set of contributed scripts called multidig to help
	  work with multiple sets of URLs and databases.
- Fixed many compilation problems under AIX, thanks to
	  Alexander Bergolth!
- 
	  Many other bugs were fixed, so a big thanks to everyone
	  who submitted a bug report, patch or gave other feedback! See the
	  ChangeLog for more details.
	  
	  Release notes for htdig-3.1.0b2 1 Nov 1998
	  This version adds a few minor features as well as many
	  bugfixes. It is still considered beta as some bug reports have not
	  been fully examined.
	
	
	  - 
	  Fixed a major database corruption
	  problem. Since this bug corrupted the document databases, to
	  completely fix it, you will need to rebuild your databases from
	  scratch.
	  
- 
	  Fixed many problems with the Makefiles and configure
	  scripts. Using ./configure --prefix= now works.
	  
- 
	  Added fixes for connection problems with Digital Alpha-based
	  systems contributed by Paul J. Meyer!
	  
- 
	  Added support for syslog-based htsearch logging. See the
	  config documentation for more
	  details. Thanks to Leo Bergolth for this!
	  
- 
	  Added fixes to work with DNS aliases (as opposed to virtual
	  hosts) through the
	  server_aliases and
	  limit_normalized options
	  as contributed by Leo Bergolth.
	  
- 
	  Added cleanups of the HTML parser and the connection timeout
	  code contributed by René Seindal.
	  
- 
	  Now supports case insensitive servers through the
	  case_sensitive option.
	  
- 
	  Now supports ISO 8601 date format, using the
	  iso_8601 option.
	  
- 
	  Added a wrapper to emulate Exite for Web Servers (EWS)
	  contributed by John Grohol.
	  
- 
	  Added fixes to the contrib whatsnew.pl script to work with DB2
	  contributed by Jacques Reynes.
	  
- 
	  Added a new contributed synonyms file from John Banbury
	  
- 
	  Added a new template variable: CURRENT, the number of the
	  current match, from a patch by René Seindal.
	  
- 
	  Many other minor bugs were fixed, so a big thanks to everyone
	  who submitted a bug report or a patch! See the
	  ChangeLog for more details.
	  
	
	  Release notes for htdig-3.1.0b1 8 Sep
	  1998
	  This version adds several major new features as well as some
	  bug-fixes. It is considered a beta release since it has only seen
	  limited testing.
	
	
	  
	    It is 
	    extremely important that you rebuild all your databases made
	    with previous versions. This version no longer uses the GDBM database
	    format and databases produced with it will be incompatible with other
	    versions. Do not blame me for anything if you didn't do this. You have
	    been warned...
	  
	
	
	  - 
	  Added patches made by Pasi Eronen to support local filesystem access
	  
- 
	  Added a PDF parser contributed by Sylvain Wallez
	  
- 
	  Added support for META description and robots tags
	  
- 
	  Converted the database code to use the BerkeleyDB format, contibuted
	  by Esa Ahola and Jesse op den Brouw.
	  
- 
	  Added a prefix fuzzy algorithm, contributed by Esa and Jesse.
	  
- 
		Various other bugs were fixed. Thanks for all the patches
		that were sent to me and the mailing list!
	  
	
	  Release notes for htdig-3.0.8b2 15 Aug
	  1997
	  This new version contains most of the patches that Pasi Eronen
	  has posted to the list plus some other random fixes.
	
	
	  Release notes for htdig-3.0.8b1
	  27-Apr-1997
	  I consider this a beta release since I have not had time to
	  test everything. Use at your own risk...
	
	
	  - 
		Base tag problem fixed
	  
- 
		URL parser somewhat more robust
	  
- 
		Date parsing bug fixed
	  
- 
		Added Substring fuzzy algorithm.
	  
- 
		Various other bugs were fixed. Thanks for all the patches
		that were sent to me!
	  
	  Release notes for htdig-3.0.7 12-Jan-1997
	  More bug fixes and some minor new functionality. Hopefully,
	  I'll be able to finish up work on version 3.1 at some point in
	  the near future.
	  I have recently received some more patches for various things,
	  but I have not incorporated those, yet. Next version.
	
	
	  - 
		The problem with the missing words has been fixed. This was
		a problem in the Dictionary class.
	  
- 
		htsearch is a *lot* faster due to a patch by Esa Ahola.
	  
- 
		htfuzzy has some work done to it. With the addition of the
		new rx-1.4 library, the endings algorithm now actually
		works for languages other than English... It still takes an
		awfully long time to build the tables for languages with
		lots of rules.
	  
- 
		URLs now can be of the dubious form http:foo.html I have
		never seen this used and think it is bogus, but alas, it
		works now.
	  
- 
		A search form can now manually add words to any search
		using the new keywords form attribute.
	  
- 
		A problem in the plaintext parser used to cause bogus HTML
		in search results. This has been fixed.
	  
- 
		New documentation format. Lots of new documentation, as
		well.
	  
- 
		New robotstxt_name attribute. Used to match the
		'user-agent' lines in robots.txt files.
	  
- 
		The <base> tag is now properly supported.
	  
- 
		Preliminary support for lots of new features, including:
		
		  - 
			External document parsers. You'll be able to write your
			own document parser for that special document type that
			ht://Dig doesn't know about.
		  
- 
			New fuzzy search algorithms: substring, regex,
			globbing, etc.
		  
 
	  Release notes for htdig-3.0.6 26-Oct-1996
	  Just a single bug fix and one additional feature in this
	  release.
	
	
	  - 
		Fixed the problem that caused frequent crashes with virtual
		memory exhausted.
	  
- 
		Added a new attribute, keywords_meta_tag_names, which
		should contain a list of meta tag names for which the
		content should be used as keywords. The default is set to
		"keywords htdig-keywords"
	  
	  Release notes for htdig-3.0.5 13-Oct-1996
	  This release consists of more bug fixes.
	  I want to thank Elliot Lee <sopwith@cuc.edu> for his
	  help with tracking down several bugs.
	
	
	  - 
		Fixed problem with accent characters. Words with SGML
		entities and iso-8859-1 characters will now be indexed
		correctly.
	  
- 
		Changed the auto configuration to detect the need for a
		prototype for the gethostname() function. (This was
		supposed to be fixed before, but wasn't)
	  
- 
		Reduced the memory requirements for all the programs by
		changing the rehash() method in the Dictionary class.
		Access to hashes may be a little slower, but the memory
		requirements were reduced by a factor 10 or so.
	  
- 
		Hopefully fixed a problem with the time related functions
		on certain platforms. More checks are done to make sure the
		functions that are used are actually available.
	  
	  Release notes for htdig-3.0.4 2-Sep-1996
	  The previous version failed to build under Linux. This should
	  be fixed now.
	
	
	  - 
		Fixed problem with the time stuff which caused the build of
		htdig to fail.
	  
- 
		Fixed a memory problem in htdig
	  
	  Release notes for htdig-3.0.3 2-Sep-1996
	  Bugs bugs bugs... Will they ever all be found?
	
	
	  NOTE: I made extensive changes to the htdig.conf file
	  that gets installed. I would advise you to remove or rename
	  your existing htdig.conf and let the installation process
	  create a new one for you that you can then modify.
	
	
	  Also, since the rundig script has changed, you should remove
	  the old one before installing ht://Dig. (The installation
	  will refuse to overwrite existing files...)
	
	
	  - 
		The problem with htsearch crashing on some machines has
		been fixed.
	  
- 
		A bug caused the <AREA> tab to be ignored. Fixed.
	  
- 
		A bug in SunOS caused dates to be all screwed up.
	  
- 
		Added lots of comments to the example htdig.conf file. Also
		added some additional example attributes.
	  
- 
		Fixed a bug in the installation process which caused rundig
		to be created incorrectly.
	  
- 
		Added a sample synonyms file. Also modified rundig to
		create a synonyms database for it.
	  
	  Release notes for htdig-3.0.2 22-Aug-1996
	  More bug fixes.
	
	
	  - 
		Multiple start URLs now actually work. Before they were
		just documented to work, but didn't actually work.
	  
- 
		htmerge now will refuse to remove database files if it
		detects that the call to /bin/sort failed.
	  
- 
		htmerge can now tell /bin/sort to use a specific temporary
		directory. This is done by setting the TMPDIR environment
		variable.
	  
- 
		htsearch can now search for words with non-ASCII characters
		in them.
	  
- 
		Added support for finding URLs in the <frame> and
		<area> tags.
	  
- 
		There is a problem with htsearch under Linux. It causes a
		segmentation violation after the first search result is
		displayed. Don't know what the problem is, yet.
	  
- 
		Fixed bug in the auto configuration which always set the
		value for NEED_PROTO_GETHOSTNAME to 1. For most systems
		this actually needs to be 0.
	  
- 
		Release notes for htdig-3.0.1
		16-Aug-1996
 This is a maintenance release in response to several bug
		reports.
		  - 
			htdig now will display a list of errors when the
			statistics option (-s) is used. The list gives the URL
			that caused the error and a URL that referred to it.
			Hopefully this information is useful for site
			maintainers.
		  
- 
			Some problems with the SGML character entities were
			fixed. The major symptom was that the ';' that ends an
			entity used to be included as well.
		  
- 
			Major problems with htnotify were fixed. There were
			many hardcoded things in this program that made it very
			specific to SDSU and to me.
		  
- 
			malloc.h should not be included anymore. All references
			to it were replaced with stdlib.h instead. This should
			make compiles on some platforms work better.
		  
- 
			htsearch now will use the CONFIG_DIR environment
			variable to override the compiled in default. (set in
			the CONFIG file...) This was done so that htsearch can
			be called from a simple wrapper that sets that
			environment variable. Only the wrapper needs to be be
			modified to get different CONFIG_DIR values.
		  
 
	  Release notes for htdig-3.0
	  17-Jul-1996
	  I decided to make this the official 3.0 release.
	
	
	  
		It is 
		extremely important that you remove all traces
		of earlier beta versions of the software before
		installing this version or that you install in a
		completely different location. Do not blame me for
		anything if you didn't do this. You have been
		warned...
	  
	
	
	  - 
		htwrapper is no more. htsearch is now the CGI program
	  
- 
		htsearch now
		uses templates to display the results. A template is
		simply a piece of HTML code for a single match. The
		HTML code includes variables that will be expanded to
		the various items that are unique to each match, like
		URL, EXCERPT, TITLE, etc. The template can be selected
		at search time (through a menu). There are two builtin
		templates: builtin-short and 
		builtin-long. The builtin-short template
		just lists the stars and title while the 
		builtin-long template lists results in a similar
		fashion to the way Alta Vista displays results.
	  
- 
		Many runtime configuration options have been removed
		and many new ones have been added. Check the
		configuration file documentation for
		details. There are also some enhancements to the format
		of the configuration file.
		
		  - 
			Attribute values can now span multiple lines by
			ending each line that needs to be continued with a
			backslash ('\'). The file that is specified is read
			in and all newlines and starting and trailing
			whitespaces are reduced to a single space. If the
			file is not found, nothing is included and no error
			is flagged.
 Note that the backquote character is used, not the
			regular quote character.
- 
			Attribute values can now include the contents of
			files. Just put the filename in back-quotes. The
			filename can use the normal variable expansion so
			that things like:
			
			  someattribute: `${common_dir}/somefile`
			 
 Notable attribute changes:
		  - 
			All the attributes that set the heading text have
			been removed. These attributes include:
			
			  - 
				accessed_heading_text
			  
- 
				datesize_heading_text
			  
- 
				descriptions_heading_text
			  
- 
				excerpt_heading_text
			  
- 
				modified_heading_text
			  
- 
				score_heading_text
			  
- 
				size_heading_text
			  
- 
				url_heading_text
			  
- 
				wordlist_heading_text
			  
- 
				field_order
			  
 
- 
			New attributes added:
			
			  - 
				http_proxy
			  
- 
				Added to support the use of a HTTP proxy server
				to index documents
			  
- 
				locale
			  
- 
				Added to support international character sets
			  
- 
				match_method
			  
- 
				New way of specifying if a search is an 'or',
				'and', or 'boolean' search
			  
- 
				matches_per_page
			  
- 
				The new paged results uses this
			  
- 
				max_doc_size
			  
- 
				Limit the size of documents retrieved
			  
- 
				next_page_text
			  
- 
				Used in the navigation between pages
			  
- 
				no_excerpt_text
			  
- 
				Text displayed if no excerpt was available
				(this used to be hard-coded)
			  
- 
				no_next_page_text
			  
- 
				Used in the navigation between pages
			  
- 
				no_prev_page_text
			  
- 
				Used in the navigation between pages
			  
- 
				prev_page_text
			  
- 
				Used in the navigation between pages
			  
- 
				star_patterns
			  
- 
				Allow different star images to be used
				depending on the match URL
			  
- 
				synonym_dictionary
			  
- 
				Support for the new synonyms fuzzy algorithm
			  
- 
				synonym_db
			  
- 
				Support for the new synonyms fuzzy algorithm
			  
- 
				syntax_error_file
			  
- 
				HTML file displayed if there was a boolean
				expression syntax error
			  
- 
				template_map
			  
- 
				Used in the support for the new result display
				templates
			  
- 
				template_name
			  
- 
				Sets the default template name
			  
- 
				text_factor
			  
- 
				Added to allow normal text to have a variable
				weight (0, for example...)
			  
 
 
		  - 
			Some form tag names have changed. The list of
			recognized form tags are in the
			htsearch
			documentation.
		  
- 
			Multiple start urls can be specified as a value to the
			'start_url' attribute. This could be combined with the
			file inclusion to read in a file of URLs to start with.
		  
- 
			htdig now sends the 'Referer:'
			header in HTTP requests so that any link errors will be
			logged in the server's log files.
		  
- 
			In addition to the "htdig-keywords" META tag name,
			htdig now also supports just
			"keywords". This is to make it more compatible with the
			Alta Vista search engine.
		  
- 
			The verbose display of htdig
			was enhanced to show '+' for a link that will be
			followed and '-' for a link that was discarded.
		  
- 
			htmerge was changed to use
			the Unix sort program instead of doing its own sorting.
			It no longer uses mmap() to map the words into memory.
			This was causing problems on systems with limited
			virtual memory available. (What??? You mean you DON'T
			have at least a 1GB disk dedicated to swap???)
		  
- 
			The Endings algorithm was fixed up to work properly
			now. There were several well hidden bugs that made the
			algorithm come up with illegal words.
		  
- 
			The synonyms fuzzy algorithm was
			added. This is simply a mapping of words to other
			words. The input file is just a list of words which
			causes the first word on a line to be mapped to the
			rest of the words on that line. (We use this to map
			course abbreviations to full course names)
		  
- 
			SGML entities are now supported. They are translated to
			their equivalent ISO-8859-1 encoding.
		  
 
	  Release notes for htdig-3.0b5
	
	
	  - 
		The configuration has changed. There is now a CONFIG
		file which contains all the variables which control
		where things get installed. 'make install' will now
		actually attempt to set everything up with default or
		example files.
 Note that some default directories have changed. For
		example, the default configuration file location is not
		/usr/local/etc/htdig.conf anymore. Instead it is now
		defined in terms of CONFIG_DIR.
- 
		The htfuzzy/createDict.pl Perl program has been
		obsoleted. Creating the endings database is now done by
		htfuzzy itself. If you already have endings databases,
		you don't need to recreate them, they will still work.
	  
- 
		GNU rx-1.0 is now included with the distribution. This
		is used by htfuzzy to create the endings databases.
	  
- 
		The name of the whole search system has changed from
		HTDig to ht://Dig.
	  
- 
		The HTML documentation got a big facelift! This
		includes the new logo for ht://Dig. (Thanks goes to
		Keith Parks for the Images!)
	  
- 
		htsearch got a new option '-r' which will allow it to
		produce raw output. This output can easily parsed by a
		wrapper program to produce custom HTML or other output
		for the search results.
	  
Last modified: $Date: 2002/01/28 05:14:23 $