aboutsummaryrefslogtreecommitdiffstats
path: root/doc/STORY.html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/STORY.html')
-rw-r--r--doc/STORY.html209
1 files changed, 209 insertions, 0 deletions
diff --git a/doc/STORY.html b/doc/STORY.html
new file mode 100644
index 0000000..8e89f8c
--- /dev/null
+++ b/doc/STORY.html
@@ -0,0 +1,209 @@
+<html>
+<head>
+<title>History of w3m</title>
+</head>
+<body>
+<h1>History of w3m</h1>
+<div align=right>
+1999/2/18<br>
+1999/3/8 revised<br>
+1999/6/11 translated into English<br>
+Akinori Ito<br>
+aito@fw.ipsj.or.jp
+</div>
+<h2>Introduction</h2>
+W3m is a text-based pager and WWW browser.
+It is similar application to the famous text-based
+browser <a href="http://www.lynx.browser.org/">Lynx</a>.
+However, w3m has several advantages against Lynx. For example,
+<UL>
+<LI>W3m can render tables.
+<LI>W3m can render frame (by converting frame into table).
+<LI>As w3m is a pager, it can read document from standard input.
+(I heard Lynx also can display standard-input-given document, like this:
+<pre>
+ lynx /dev/fd/0 &gt; file
+</pre>
+Hmm, it works on Linux. )
+<LI>W3m is small. Its stripped binary for Sparc (compiled with
+gcc -O2, version beta-990217) is only 260kbyte, while binary size
+of Lynx is beyond 1.8Mbyte. (Actually, lynx it 800K on my i386 system, w3m is 200K + libgc.)
+</UL>
+It is true that Lynx is an excellent browser, who have many
+features w3m doesn't have. For example,
+<UL>
+<LI>Lynx can handle cookies.
+<LI>Lynx has many options.
+<LI>Lynx is multilingual. (W3m is Japanese-English bilingual)
+</UL>
+etc. It is also a great advantage that Lynx has a lot of
+documentation.
+<P>
+<b>I don't intend w3m to be a substitute of any other browsers,
+including Netscape and Lynx.</b> Why did I wrote w3m?
+Because I felt inconvenient with conventional browsers
+to `take a look' at web pages.
+I am browsing web pages in LAN environment. When I want to take
+a glance at a web page, I don't want to wait to start up Netscape.
+Lynx also takes a few seconds to start up (you can get lynx startup time to almost zero when you rm /etc/mailcap). On the other hand,
+w3m starts immediately with little load to the host machine.
+After looking at the information using w3m, I use other browser
+if I want to read the the page in detail. As for me, however,
+w3m is enough to read most of web pages.
+
+<h2>The birth of w3m</h2>
+<P>
+w3m was derived from a pager named `fm'. Fm was written before
+1991 (I don't remember the exact date) when WWW was not popular.
+At that time, the word `browser' meant a file browser like
+`more' or `less'.
+<P>
+I wrote fm to debug a program for my research. To trace the status
+of the program, it dumped megabytes of values of variables into a file,
+and I debugged it by checking the dumped file. The program dumped
+information at a certain time in one line, which made the dumped line
+several hundred characters long. When I looked the file using `more' or
+`less', one line was folded into several lines and it was very hard
+to read it. Therefore, I wrote fm, which didn't fold a line. Fm displayed
+one logical line as one physical line. When seeing the hidden
+part of a line, fm shifted entire screen. As I used 80x24 terminal at that
+time, fm was very useful for the debugging.
+<P>
+Several years later, I got to know WWW and began to use it.
+I used XMosaic and Chimera. I liked Chimera because it was light.
+As I was interested in the mechanism of WWW, I learned HTML and
+HTTP, and I felt it simpler than I expected. The earlier version
+of HTTP was very similar to Gopher protocol. HTML 2.0 was
+simple enough to render. All I have to do seemed to be line folding
+and itemized display. Then I made a little modification to fm
+and made a web browser. It was the first version of w3m.
+The name `w3m' was an abbreviation of Japanese phrase `WWW wo miru',
+which means `see WWW'. It was an inheritance from `fm', which
+was an abbreviation of `File wo miru'. The first version of w3m
+was released at the beginning of 1995.
+
+<h2>Death and rebirth of w3m</h2>
+<p>
+I had used w3m as a pager to read files, E-mails and online manuals.
+It was a substitute of less. Sometimes I used w3m as a web browser,
+but there were many pages w3m couldn't display correctly, most of
+which used table for page layout. Once I tried to implement table
+renderer, but I gave up because it seemed to be too difficult for me.
+<P>
+It was 1998 when I tried to modify w3m again. There were two reasons.
+The first is that I had some time to do it. I stayed Boston University
+as a visiting researcher at that time. The second reason is that
+I wanted to use table in my personal web page. I had written research
+log using HTML, and I wanted to write a table in it. At first I used
+&lt;pre&gt;..&lt;/pre&gt; to describe table, but it was not cool at all.
+One day I used &lt;table&gt; tag, which made me to use Netscape to
+read the research log. Then I decided to implement a table renderer
+into w3m.
+<P>
+I didn't intend to write a perfect table renderer because tables
+I used was not very complicated. However, incomplete table rendering
+made the display of table-layout pages horrible. I realized that
+it required almost-perfect table renderer
+to do well both in `rendering (real) table' and `fine display of
+table-layout page.' It was a thorn path.
+<P>
+After taking several months, I finished `fair' table renderer.
+Then I implemented form into w3m. Finally, w3m was reborn as a
+practical web browser.
+
+<h2>Table rendering algorithm in w3m</h2>
+
+HTML table rendering is difficult. Tabular environment
+of LaTeX is not very difficult, which makes the width of a column
+either a specified value or the maximum width to put items into it.
+On the other hand, HTML table renderer has to decide
+the width of a column so that the entire table can fit into the
+display appropriately, and fold the contents of the table according
+to the column width. Inappropriate column width decision makes
+the table ugly. Moreover, table can be nested, which makes the algorithm
+more complicated.
+
+<OL>
+<LI>First, calculate the maximum and minimum width of each column.
+The maximum width is the width required to display the column
+without folding the contents. Generally, it is the length of
+paragraph delimited by &lt;BR&gt; or &lt;P&gt;.
+The minimum width is the lower limit to display the contents.
+If the column contains the word `internationalization', the minimum
+width will be 20. If the column contains
+&lt;pre&gt;..&lt;/pre&gt;, the maximum width of the preformatted
+text will be the minimum width of the column.
+
+<LI>If the width of the column is specified by WIDTH attribute,
+fix the column width using that value. If the specified width is
+smaller than the minimum width of the column, fix the column width
+to the minimum width.
+
+<LI>Calculate the sum of the maximum width (or fixed width) of
+each column and check if the sum exceeds the screen width.
+If it is smaller than screen width, these values are used for
+width of each column.
+
+<LI>If the sum is larger than the screen width, determine the widths
+of each column according to the following steps.
+<OL>
+<LI>Let W be the screen width subtracted by the sum of widths of
+fixed-width columns.
+<LI>Distribute W into the columns whose width are not decided,
+in proportion to the logarithm of the maximum width of each column.
+<li>If the distributed width of a column is smaller than the minimum width,
+then fix the width of the column to the minimum width, and
+do the distribution again.
+</OL>
+</OL>
+
+In this process, distributed width is proportion to logarithm of
+maximum width, but I am not sure that this heuristic is the best.
+It can be, for example, square root of the maximum width.
+<P>
+The algorithm above assumes that the screen width is known.
+But it is not true for nested table. According the algorithm above,
+the column width of the outer table have to be known to render
+the inner table, while the total width of the inner table have to
+be known to determine the column width of the outer table.
+If WIDTH attribute exists there are no problems. Otherwise, w3m
+assumes that the inner table is 0.8 times as wide as the outer
+table. It works fine, but if there are two tables side by side in an outer
+table, the width of the outer table always exceeds the screen width.
+To render this kind of table correctly, one have to render the table once,
+check the width of outmost table, and then render the entire table again.
+Netscape might employ this kind of algorithm.
+
+<h2>Libraries</h2>
+
+w3m uses
+<a href="http://reality.sgi.com/boehm/gc.html">Boehm GC</a>
+library. This library was written by H. Boehm and A. Demers.
+I could distribute w3m without this library because one can
+get the library separately, but I decided to contain it in the
+w3m distribution for the convenience of an installer.
+W3m doesn't use libwww.
+<P>
+Boehm GC is a garbage collector for C and C++. I began to use this
+library when I implemented table, and it was great. I couldn't
+implement table and form without this library.
+<P>
+Older version than beta-990304 used
+<a href="http://home.cern.ch/~orel/libftp/libftp/libftp.html">LIBFTP</a>
+because I felt tired of writing codes to handle FTP protocol.
+But I rewrote the FTP code by myself to make w3m completely free.
+It made w3m slightly smaller.
+<P>
+By the way, w3m doesn't use UNIX standard regexp library and curses library.
+It is because I want to use Japanese. When I wrote fm, there were
+no free regexp/curses libraries that can treat Japanese. Now both libraries
+are available and they looks faster than w3m code.
+
+<h2>Future work</h2>
+
+...Nothing. As w3m's virtues are its small size and rendering speed,
+adding more features might lose these advantages. On the other hand,
+w3m is still known to have many bugs, and I will continue fixing them.
+
+</body>
+</html>