From 72f72d64a422d6628c4796f5c0bf2e508f134214 Mon Sep 17 00:00:00 2001 From: Tatsuya Kinoshita Date: Wed, 4 May 2011 16:05:14 +0900 Subject: Adding upstream version 0.5.1 --- doc/STORY.html | 209 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 209 insertions(+) create mode 100644 doc/STORY.html (limited to 'doc/STORY.html') diff --git a/doc/STORY.html b/doc/STORY.html new file mode 100644 index 0000000..8e89f8c --- /dev/null +++ b/doc/STORY.html @@ -0,0 +1,209 @@ + + +History of w3m + + +

History of w3m

+1999/2/18
+1999/3/8 revised
+1999/6/11 translated into English
+Akinori Ito
+aito@fw.ipsj.or.jp +

Introduction

+W3m is a text-based pager and WWW browser. +It is similar application to the famous text-based +browser Lynx. +However, w3m has several advantages against Lynx. For example, +

W3m can render tables. +
W3m can render frame (by converting frame into table). +
As w3m is a pager, it can read document from standard input. +(I heard Lynx also can display standard-input-given document, like this: +
```
+   lynx /dev/fd/0 > file
+
```
+Hmm, it works on Linux. ) +
W3m is small. Its stripped binary for Sparc (compiled with +gcc -O2, version beta-990217) is only 260kbyte, while binary size +of Lynx is beyond 1.8Mbyte. (Actually, lynx it 800K on my i386 system, w3m is 200K + libgc.) +

+It is true that Lynx is an excellent browser, who have many +features w3m doesn't have. For example, +

Lynx can handle cookies. +
Lynx has many options. +
Lynx is multilingual. (W3m is Japanese-English bilingual) +

+etc. It is also a great advantage that Lynx has a lot of +documentation. +

+I don't intend w3m to be a substitute of any other browsers, +including Netscape and Lynx. Why did I wrote w3m? +Because I felt inconvenient with conventional browsers +to `take a look' at web pages. +I am browsing web pages in LAN environment. When I want to take +a glance at a web page, I don't want to wait to start up Netscape. +Lynx also takes a few seconds to start up (you can get lynx startup time to almost zero when you rm /etc/mailcap). On the other hand, +w3m starts immediately with little load to the host machine. +After looking at the information using w3m, I use other browser +if I want to read the the page in detail. As for me, however, +w3m is enough to read most of web pages. + +

The birth of w3m

+w3m was derived from a pager named `fm'. Fm was written before +1991 (I don't remember the exact date) when WWW was not popular. +At that time, the word `browser' meant a file browser like +`more' or `less'. +

+I wrote fm to debug a program for my research. To trace the status +of the program, it dumped megabytes of values of variables into a file, +and I debugged it by checking the dumped file. The program dumped +information at a certain time in one line, which made the dumped line +several hundred characters long. When I looked the file using `more' or +`less', one line was folded into several lines and it was very hard +to read it. Therefore, I wrote fm, which didn't fold a line. Fm displayed +one logical line as one physical line. When seeing the hidden +part of a line, fm shifted entire screen. As I used 80x24 terminal at that +time, fm was very useful for the debugging. +

+Several years later, I got to know WWW and began to use it. +I used XMosaic and Chimera. I liked Chimera because it was light. +As I was interested in the mechanism of WWW, I learned HTML and +HTTP, and I felt it simpler than I expected. The earlier version +of HTTP was very similar to Gopher protocol. HTML 2.0 was +simple enough to render. All I have to do seemed to be line folding +and itemized display. Then I made a little modification to fm +and made a web browser. It was the first version of w3m. +The name `w3m' was an abbreviation of Japanese phrase `WWW wo miru', +which means `see WWW'. It was an inheritance from `fm', which +was an abbreviation of `File wo miru'. The first version of w3m +was released at the beginning of 1995. + +

Death and rebirth of w3m

+I had used w3m as a pager to read files, E-mails and online manuals. +It was a substitute of less. Sometimes I used w3m as a web browser, +but there were many pages w3m couldn't display correctly, most of +which used table for page layout. Once I tried to implement table +renderer, but I gave up because it seemed to be too difficult for me. +

+It was 1998 when I tried to modify w3m again. There were two reasons. +The first is that I had some time to do it. I stayed Boston University +as a visiting researcher at that time. The second reason is that +I wanted to use table in my personal web page. I had written research +log using HTML, and I wanted to write a table in it. At first I used +<pre>..</pre> to describe table, but it was not cool at all. +One day I used <table> tag, which made me to use Netscape to +read the research log. Then I decided to implement a table renderer +into w3m. +

+I didn't intend to write a perfect table renderer because tables +I used was not very complicated. However, incomplete table rendering +made the display of table-layout pages horrible. I realized that +it required almost-perfect table renderer +to do well both in `rendering (real) table' and `fine display of +table-layout page.' It was a thorn path. +

+After taking several months, I finished `fair' table renderer. +Then I implemented form into w3m. Finally, w3m was reborn as a +practical web browser. + +

Table rendering algorithm in w3m

+ +HTML table rendering is difficult. Tabular environment +of LaTeX is not very difficult, which makes the width of a column +either a specified value or the maximum width to put items into it. +On the other hand, HTML table renderer has to decide +the width of a column so that the entire table can fit into the +display appropriately, and fold the contents of the table according +to the column width. Inappropriate column width decision makes +the table ugly. Moreover, table can be nested, which makes the algorithm +more complicated. + +

First, calculate the maximum and minimum width of each column. +The maximum width is the width required to display the column +without folding the contents. Generally, it is the length of +paragraph delimited by <BR> or <P>. +The minimum width is the lower limit to display the contents. +If the column contains the word `internationalization', the minimum +width will be 20. If the column contains +<pre>..</pre>, the maximum width of the preformatted +text will be the minimum width of the column. + +
If the width of the column is specified by WIDTH attribute, +fix the column width using that value. If the specified width is +smaller than the minimum width of the column, fix the column width +to the minimum width. + +
Calculate the sum of the maximum width (or fixed width) of +each column and check if the sum exceeds the screen width. +If it is smaller than screen width, these values are used for +width of each column. + +
If the sum is larger than the screen width, determine the widths +of each column according to the following steps. +
1. Let W be the screen width subtracted by the sum of widths of +fixed-width columns. +
2. Distribute W into the columns whose width are not decided, +in proportion to the logarithm of the maximum width of each column. +
3. If the distributed width of a column is smaller than the minimum width, +then fix the width of the column to the minimum width, and +do the distribution again. +
+

+ +In this process, distributed width is proportion to logarithm of +maximum width, but I am not sure that this heuristic is the best. +It can be, for example, square root of the maximum width. +

+The algorithm above assumes that the screen width is known. +But it is not true for nested table. According the algorithm above, +the column width of the outer table have to be known to render +the inner table, while the total width of the inner table have to +be known to determine the column width of the outer table. +If WIDTH attribute exists there are no problems. Otherwise, w3m +assumes that the inner table is 0.8 times as wide as the outer +table. It works fine, but if there are two tables side by side in an outer +table, the width of the outer table always exceeds the screen width. +To render this kind of table correctly, one have to render the table once, +check the width of outmost table, and then render the entire table again. +Netscape might employ this kind of algorithm. + +

Libraries

+ +w3m uses +Boehm GC +library. This library was written by H. Boehm and A. Demers. +I could distribute w3m without this library because one can +get the library separately, but I decided to contain it in the +w3m distribution for the convenience of an installer. +W3m doesn't use libwww. +

+Boehm GC is a garbage collector for C and C++. I began to use this +library when I implemented table, and it was great. I couldn't +implement table and form without this library. +

+Older version than beta-990304 used +LIBFTP +because I felt tired of writing codes to handle FTP protocol. +But I rewrote the FTP code by myself to make w3m completely free. +It made w3m slightly smaller. +

+By the way, w3m doesn't use UNIX standard regexp library and curses library. +It is because I want to use Japanese. When I wrote fm, there were +no free regexp/curses libraries that can treat Japanese. Now both libraries +are available and they looks faster than w3m code. + +

Future work

+ +...Nothing. As w3m's virtues are its small size and rendering speed, +adding more features might lose these advantages. On the other hand, +w3m is still known to have many bugs, and I will continue fixing them. + + + -- cgit v1.2.3