Index


Page for information about making WWW pages

The most commonly used Internet service nowadays is World Wide Web (WWW). It is the single most commonly used Internet service and creates a very large fraction of the whole Internet traffic. WWW is an example of client-server model of communications using TCP/IP. When user wants to read a web page, he/she makes a connection to the web server using HTTP protocol which runs on top of TCP/IP protocol. The web browser makes a connection to web server using TCP/IP, sends HTTP protocol request to web server, receives the data from the web server, closes the connection and displays the received document on the computer screen.

If the web page consists of multiple object (pictures, sounds, video, Java applets), then web browser makes one HTTP request to the web server for each object it needs to receive. Depending on the web server and browser configuration, the system can either make a new TCP connection for each object or use one TCP connection to receive more than one object. In typical case a web browser has 1-6 TCP connections active in parallel when receiving the document objects from the server. When those objects have been received, those TCP connections are closed immediately or after a specified timeout time.

The Hypertext Markup Language (HTML) is a simple markup language used to create web pages. A typical web page consts of one HTML document, that includes the most of the information on the web page, and includes links to the other parts needed to complete the plage (pictures, JavScript code on separate file, Java applets etc..).

    Foreword on web publishing

    The World Wide Web is a multimedia hypertext informationretrieval system that sits on top of Internet.Publishing on the Web is very different from older methods of publication. A Web publication is inherently a general, device-independent and program-independent document with structural markup. The presentation of a document may vary greatly to allow viewingthe same document on a wide variety of devices, ranging fromsmall mobile phone screen to full-size movie screens.T

    he HTML language was designed to promote worldwide distribution ofdocuments in a device-independent form. HTML file consists of contentand it's structure all stored in one file in a standardized form.HTML is far from being perfect for the purpose, but it has served well and is suitable for a wide range of documents. It is easy to learn and easy to use. Practically any computer literate people can put the documents onto theWeb in a few hours, after an initial education of a day or two.

    One reason for putting up this web page is to provide people a place to find the information on web publishing from one place and learn the most necessary skills in few days.

    Some people (mostly from traditional publishing industries) think that Web authors should decide the physical appearanceof documents like font, color, layout, and other presentation features. For such reasons HTML is implemented usually with nonstandard extensions(some of them though later standardized) to control the dicumentfeatures like colors, fonts and font sizes. This kind of way of making web pages breaks the whole basic idea that HTML should be viewable with practically any kind of device. Many advertised "HTML programing skills" are quite much bluffand breaking the whole idea of HTML.

    Journalists may say that presentation issues cannot be distinguished from structure and content, so presentation must be designed for each concrete publication and issue separately. Layout will not lose its importance but it will take more and moreplace on users' systems and the users have their own preferences on layout and style (colors and margins and fonts and so on). In such case layout by the author or by the publishing side will not get through as wanted and an attempt to enforce it might fail miserably. When the presentation fails,the document will look like a mess and the user will therefore discard it. If exact presentation always in the same look and layout is essential to the document, it is usually be better to publish suchdocuments but by using other methods than HTML (for example Adobe Acrobat PDF format is very suitable format for publishingsuch documents on the web).

    Linking turns texts to hypertext, and hypertextuality is among the keyfactors that make the World-Wide Web a web of interconnected things.Using links on your Web pages, you can conveniently let your readersfind background information, technical details,definitions of terms, etc., on your pages or somewhere else.A link is a just a pointer or reference, it does not do anything by itself. A URL is rather like a telephone number or a street address which just tells how to reach a document on the web.

    The general rule proposed for linking is that one may freely set up non-framed HREF links to the web sites of others, is a rather reassuring rule since it happens to comport well with common practice and with common sense. Webmaster should be prepared for the possibility that members of the public may set up bookmarks to subpages, and that other HTML authors may set up links to subpages. Since this sort of bookmarking and linking can and will happen, the webmaster should be courteous to those visitors and HTML authors. The webmaster, upon moving a page, should have the courtesy to supply a "forwarding" page that lets the visitor know the new page URL.

    It is very good idea to have linked on the pages recognized as links so that users have no difficulty in noticing which word is a link. It belongs to the very basic skills needed in Web browsing to be able to recognize a link as displayed on one's browser and to follow a link. Any attemt to hide the link or change the apprearance what tha user had used to see on their browser will usually lead to more harms to usability than good to the presentation. When you do not show clear what is link and what is not, the browsing becomes guesswork, trial and error.

    Speed and usablity are also necessary factors to consider. Most of the web is is too slow nowadays. The slow operation comes from the available bandwidth to access the site, amount of data to be transferred and how the materialis put to the pages. The bandwidth available is controlled by the network connection your web server has to Internet and the speed of the connection your client has and the load of the core network in between. This speed hardto control by the web master in any other way than selecting good connection from a good networking company or using a good web hotel service.

    Amount of transferred data and how it is presented can be controlled by proper web design. You can reduce the amount of transferred data by making good HTML code (some HTML tools produce few times bigger HTML files than what is needed to present the document), selecting rightfile formats and conversion parameters for pictures and other documents and avoiding using unnecessary large pictures. Sometimes the loading speed seen by user is also affected by the web browser rending speed and way they do it(for example some versions of Netscape draw tables only when it has get all it's cells completely which sometimes can takelots of time). With good network connection, powerful enough web server and good web design the web page accesses can be made quick to operate. When designing a public web site, keep always in mind those uses that have slower connection than you. It is a good idea to try the web pages through a slow "average user" connection before publishing the web page to see that it is useable for them as well. A fancy filled with graphics and animation web page might look nice when demonstrated on corporate LAN environment by the web page maker, but might turn to be so slow to load that the intended users do not want to wait for your page to load (this causes your intended service/business to fail).

    When making a new large web site it is a good idea to design the sites tructure well so that you don't have to redesign it many time over and over. Usually this means designing the menus and directory structures for the whole site. When you get this ready don't hurry in putting it online.Publishing just a site skeleton with menus and subpages without realcontent in the end just frustrates the users who come by to your site. It is better to first make a small working site with little material and then enhance it later than publishing a large page structure without any content. It is better to promise what is coming later on the main page or subpage than making links to pages which just say"under construction" or "coming soon" or "page not found". Users of such site just get frustrated because you seemed to promise than your site has some interresting material by putting the link visible, but failed to keep that promise. Users of the web pages are dissappointed when they encounter many this kind of thing on your page and go to some other site and propably never come back.

    Be also careful to fullfill your promises you have made and not make to many promises. Promises that some nice things are coming usually are not believed by the users, unless you make them believble that thay may come. The fact of the life is that most of the things promised to "come soon" usually don'tcome anytime soon on-line if ever come on-line. A web site with lots of pages with just "coming soon" in them is just makes many dissapointed users who don't come back to your site later to check out of your promises of new material is true or not (because on most those promises are not kept).

    If you are running a site that you expect users to constantly come back, then keep the site running and not not temporarily close it. For a company to which online presence is important, it does not make sense to close the web pages for weeks the reason that pages are being updated! It does not make sense. Generally when you see a web page needs to be updated a lot (more than can be reasably done "on the fly") then the right procedure to do that is this: First start designing the new site but still keeping the old site running that time. Develop and put the new page version on separate server (or different directory on same server) during the time of development. When you are completely sure that new pages are ready to put on-line, replace the old pages with the new ones (change the server, change the directory from where pages are served or upload the new page version to server to replace the old one). In this way your service keeps running all the sime and the users are happy. There is no reason to close a web site for a long time because of web pages being updated. Thing can be done so smoothly that the service interruptions when moving from old pages to new pages can be kept from seconds to few hours with just a little but thinking how to do it.

    Here are some general rules on web publishing:

    • 1. The pages you publish must serve a purpose. There's no point in having a page if it has nothing useful to offer.
    • 2. Be economical with graphics. Many users have limited bandwidth and don't want to wait their pages to load for too long.
    • 3. Always use the alt field with images so that people who can't or don't wnat to see the grpahcis can still use your page.
    • 4. Use GIF or JPEG file formats appropriately for the application they are good for.
    • 5. Keep in mind that readership is international. Always write the date in words because numerical formats are interpreted differently in different countries. Be aware of words which belong to a local dialect.
    • 6. Keep your page browser-independent. The last thing you want to do is turn away potential readers merely because they have the "wrong" software.
    • 7. Support graceful degradation. Graceful degradation as a means to provide for backward compatibility is built into the W3C's recommendations. Do not try to break it. This means that if you heavily use technologies like JavaScript menus, Java menus, DHTML etc. try to make the pages also usable (possibly with reduced functionality/usability) also without those techniques. Some browsers do not support those scripting techniques and there are many users who disable scripting languages support on their web browser for various reasons (security threads, work slowly on older computers).
    • 8. Write correct HTML. It works better with any browser than incorrect one with errors.
    • 9. It is a good idea to use a footer which gives the document URL, and an email contact address. This practice leaves the reader with no doubt as to the authorship of the document.
    • 10. Publish the web site only when it has content worth to show. If you do not have much ready now, make a small working site and enhance it when you get more material.
    • 11. Do not use too small fonts. Very small fonts are from hard to read to impossible to read depending on the browser and font set used.
    • 12. Avoid using very long URLs. If you plan your document or site address to publish on some paper magazine or such, keep the URLs very short so that people can remeber them and bother them to write it to the browser. People also tend to send URLs sometimes through different electronic media (E-mail, Usenet news), and in those medias the very long URLs or URLs with many very special characters tend to get broken easily on the way from sender to receiver. To keep those users in mind it is good idea to try to keep the URLs shorter than around 70 characters so that the e-mail software does not try to cut them to two different lines.
    • 13. Test your site with many different browsers so that you know that it works.
    • 14. Keep the site up to date. Replace outdated information with new one. Putting a publication date to page is a good idea.
    • 15. Consistency is one of the most powerful usability principles: when things always behave the same, users don't have to worry about what will happen. Instead, they know what will happen based on earlier experience.
    • 16. Cool URLs do not change. If your site is such that people tend to link to it, it is good idea to keep old URLs working even if you move the contents. In many cases sensible server redirects can lead the user to the new place easily and avoid the user to see the "page not found" error. In most cases defining the redirects in the server end is not hard (it is good idea to learn how to do it).
    Web site performance review checklist:
    • Dead link free Web site
    • Compatible on all browsers (i.e. Netscape, Internet Explorer, Macintosh)
    • HTML error free Web site
    • Proper usage of Meta Tags throughout HTML and content of Web site
    • Theme of site is accurately recognized using top keywords
    • Spelling error free throughout site
    • Fast load time on various connection speeds (i.e. 56K, ISDN, T1)
    • Search Engine acceptance ready
    • Easy to navigate and fun to use
    • Nice look and feel
    • Check that it is usable with many resolutions (Need for horizontal scrolling should be avoided if possible. It's awkward to do and users hate it.)

    Web protocol information

    The protocol is the pre-defined way that someone who wants to use a service talks with that service. Every Web server on the Internet conforms to the HTTP protocol, which isthe protocol used for communication between web browser and web server.The most basic form of the protocol understood by an HTTP server involves just one command: GET. If you tell the server to "GET filename", the server will respond by sending you the contents of the named file and then disconnecting.In the original HTTP protocol, all you would have sent was the actual filename like "/" or "/web-server.htm". The protocol was later modified to handle the sending of the complete URL. This has allowed companies that host virtual domains, where many domains live on a single machine, to use one IP address for all of the domains they host. Originally HTTP protocol opened one TCP connection, requested one file/object through it and after the data is received the connection is closed. HTTP procol was later also extended to support transfer of many files through one HTTP session and to upload data to web server (used for example with forms). There are two methods to get data from a web server: GET and POST. HTTP GET is designed so that all information necessary for the interaction is part of the URI, thus promoting URI addressability. With HTTP POST, some information intended to affect change to the resource state may be part of the protocol headers, not in the URI. Whether and how one chooses between GET and POST depends on the format specification and the application context. When HTTP URIS are used for hyperlinks in HTML, SMIL, and SVG, for example, the application determines which method will be used (generally GET). However, for both HTML forms and XForms, the author can choose between GET and POST. Here is an overview of most commonly used methods:

    • GET Method: Information from a form using the GET method is appended onto the end of the action URI being requested. Your CGI program will receive the encoded form input in the environment variable QUERY_STRING. The GET method is used to ask for a specific document - when you click on a hyperlink, GET is being used. GET should probably be used when a URL access will not change the state of a database.
    • HEAD method: The HEAD method is used to ask only for information about a document, not for the document itself. HEAD is much faster than GET, as a much smaller amount of data is transferred. It's often used by clients who use caching, to see if the document has changed since it was last accessed. If it was not, then the local copy can be reused, otherwise the updated version must be retrieved with a GET.
    • POST Method: This method transmits all form input information immediately after the requested URI. Your CGI program will receive the encoded form input on stdin.

    There are two version of HTTP protocol nowadays in use. The simpler version 1.0 and more advanced version 1.1. HTTP 1.0 has enough features for web traffic, but it's limitation is that a new connection is created every time a file is transfered and closed when file loaded. A significant difference between HTTP/1.1 and earlier versions of HTTPis that persistent connections are the default behavior of any HTTPconnection. The persistent connection means that more than one file is transferred through the connection and the connection is not closed immediatly after the files for one page are transferred, but it cna be kept open to wait for new request to be sent quickly to the server (the connection is closed after some timeout info more data is to be transffered for some time).

    HTML details

    The Hypertext Markup Language (HTML) is a simple markup language used to create hypertext documents that are platform independent. HTML documents are SGML documents with generic semantics that are appropriate for representing information from a wide range of domains. HTML markup can represent hypertext news, mail, documentation, and hypermedia; menus of options; database query results; simple structured documents with in-lined graphics; and hypertext views of existing bodies of information. HTML has been in use by the World Wide Web (WWW) global information initiative since 1990. World Wide Web (WWW) is the most common use for HTML.HTML is one of the most widely used computer languages in the world. The popularity of HTML is due to the fact that it is the coding technology used to publish content on the World Wide Web (also referred to as the Internet or Web). Programmers quickly discovered that HTML is a user friendly language and is very easy to learn. This ease of coding significantly aided in the proliferation of Web sites. The latest version of HTML is 4.01 which is defined by the standard published on 24 December 1999 by the World Wide Web Consortium (W3C).HTML is not a complete programming language.First, it lacks conditional tests (IF) and flow control (GOTO, DO, and FOR) statements. Some implementation may offer extensions to the HTML languageto accomplish these functions, but they are not part of the HTML standards.The most common method to add the power of real programming languageto HTML page is to embed some suitable programming laguage codeinside HTML code such way that the HTML server or browser cantun them. The most commonly used client end solutions are to useJavaScript or Java to do this kind of functionality. In web serverssolutions like ASP (Active Server Pages), PHP, Java Server Pagesand many other techniques are used.W3C is the organization behind the development of HTML. The newest and onfly official standard on HTML is ISO 15445 (this is normal HTML standard). The newest original HTML specification by W3C is HTML 4.01. Nowadays W3C recommends to use the XML based HTML specification XTHML 1.0 (there is also a newer specification XHTML 1.1).

      Browser specific extensions

      Browser specific extensions allow you to do more than what standard HTML allows you to do. The downside of those extensions is that they work only on some specific browsers. Be careful if you use those browser specific extensions and be prepared that your page will be viewed with browsers which do not have those extensions. Generally it is a bad idea to use browser specific extensions in web documents ment for wide audiences because of potential compatibility problems with other browsers.

    XML

    The Extensible Markup Language (XML) is a subset of SGML.Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.An XML page looks something like an HTML page, but there the similarity ends. XML uses HTML-style tags not just to format documents, but also to identify the kinds of information used in documents, so that information can be reformatted for use in other documents and can also be used for information processing.To put that another way, you can use XML to represent data portably.XML is clearly the way of the future as it establishes a common base from which content can be expanded into other formats.If you peek under the hood of high-profile open-source projects such as Mozilla, Apache, Perl, and Python, you'll find a little program called "expat" handling the XML parsing.XML is not a programming language. XML files don't run or execute.By itself, an XML file doesn't do anything. It's a data format thatsits in a disk file until you run a program which reads it anddoes something with it. While XML may give a very precise descriptionof the text or data, it says nothing about what you should do with it(clever XML people select element names that hint what they areused for). Since XML is text-based, the fiel can be written or fixedby using a simple text editor; you don't necessarily need a complexprogramming tool for that (there are such tools if you want to take that approach).Before two applications which use XML can work together, the users/developersmust on basics what they are doing. They have to agree on which tagswill be used allowed, how elements may be nested with each other and amountof XML vocabulary and structure in DTD (Document Type Definition).Application programs which use XML data use validating parsers to read the DTD to before they read the XML document so they can identify the elementtypes and how they relate to one another. It is possible to write anXML without DTD, but using a DTD makes it easier for everyone involved to understand the markup and write software to process it. An alternativeto DTD is a Schema, which provides means od specifying element contentsin terms of data types (including data ranges and checks).Schemas are written as XML files.XSL is a technology related to XML. XSL provides a mechanism for formatting and transforming XML, either at the browser or on the server. For example, XSL can be used to transform XML data into HTML/CSS documents on the Web server.

      Applications for XML

      • MathML - mathematical markup language    Rate this link
      • SEML - Semi-Extensible Markup Language is a new language similar to XHTML and WML that allows the serving of both WML or HTML from a single source document    Rate this link
      • SMIL - syncronized multimedia integration language    Rate this link
      • W3C Scalable Vector Graphics (SVG) - SVG is a language for describing two-dimensional graphics in XML. SVG allows for three types of graphic objects: vector graphic shapes (e.g., paths consisting of straight lines and curves), images and text.    Rate this link
      • WML - wireless markup language, used in WAP system    Rate this link
      • XForms 1.0 - XForms is an XML application that represents the next generation of Forms for the Web. By splitting traditional XHTML forms into three parts - data model, instance data, and user interface - it separates presentation from content, allows reuse, gives strong typing - reducing the number of round-trips to the server, as well as offering device independence and a reduced need for scripting.    Rate this link
      • XHTML - HTML 4.01 defined according to XML notations    Rate this link
      • XHTML 1.1 - Module-based XHTML - XHTML 1.1 is a reformulation of XHTML 1.0 strict, but using Modularization. The purpose of XHTML 1.1 is to serve as the basis for future extended XHTML 'family members', and to provide a consistent, forward-looking document type cleanly separated from the deprecated, legacy functionality of HTML 4 and XHTML 1.0.    Rate this link
      • XHTML Basic - XHTML Basic is designed to provide a common subset across various Web clients that do not support the full set of XHTML features, and is built from basic XHTML modules, such as Structure, Text, Hypertext, List, Basic Forms, Basic Tables and Image. While the document type is simple, it is rich enough for content authoring. This profile of XHTML has been adopted for use in the WAP2 standard for mobile telephony.    Rate this link

      XHTML information

      XHTML 1.0 is the first step toward a modular and extensible web based on XML (Extensible Markup Language). It is a page description language which has XML format, but maintain compatibility with today's HTML 4 browsers. XHTML is an acronym for "eXtensible HyperText Markup Language", a reformulation of HTML 4.0 as an XML 1.0 application. XHTML provides the framework for future extensions of HTML and aims to replace HTML in the future. XHTML is the reformulation of HTML 4 as an application of XML. It looks very much like HTML 4, with a few notable exceptions, so if you're familiar with HTML 4, XHTML will be easy to learn and use.XHTML 1.0 is nothing else than HTML 4 formulated to XML format. XHTML 1.0 Transitional allows doing everything HTML 4.01 allows, execept some syntactic restrictions. XHTML 1.0 Strict is more limites. XHTML 1.1 is modular version of XHTML 1.0 Strict.The history of XHTML is very simple; it is derived directly from HTML version 4.01 and is designed to be used with XML. Indeed, XHTML is part of a whole new suite of "X" technologies, with acronyms such as XML, XPATH, XSL, and XSLT, that are destined to have a profound effect on the Internet. XHTML is a new technology. On 26 January 2000, the W3C issued the recommendation for XHTML version 1.0. It is also a rapidly evolving technology. The recommendation for version 1.1, which is a module-based concept for XHTML, has already been published.

      SMIL information

      SMIL is a XML based language for making multimedia presentations. It enables simple authoring of TV-like multimedia presentations such as training courses on the Web. The SMIL language is an easy-to-learn HTML-like language. Thus, SMIL presentations can be written using a simple text-editor. A SMIL presentation can be composed of streaming audio, streaming video, images, text or any other media type. W3C's Synchronized Multimedia Activity focusses on the design of a language for scheduling multimedia presentations where audio, video, text and graphics are combined in real-time. The language, the Synchronized Multimedia Integration Language (SMIL) is written as an XML application and is currently a W3C Recommendation. Simply put, it enables authors to specify what should be presented when.

      RSS

      Rich Site Summary (RSS) is a lightweight XML format designed for sharing headlines and other Web content. RSS is short for RDF Site Summary or Rich Site Summary or Really Simple Syndication, an XML format for syndicating Web content. A Web site that wants to allow other sites to publish some of its content creates an RSS document and registers the document with an RSS publisher. A user that can read RSS-distributed content can use the content on a different site. RSS, first developed by Netscape in the 1990s, allows newsreaders and aggregators to scrape links and article summaries for syndication. They wanted an XML format (RSS .90) that would be easy for them to get news stories and information from other sites and have them automatically added to their site. They then came out with RSS .91 and dropped it when they decided to get out of the portal business. UserLand Software picked up RSS .91 and continued to develop it. At the same time a non-commercial group picked up RSS and developed RSS 1.0 based on their interpretation of the original principles of RSS. They based RSS 1.0 on RDF and re-named it RDF Site Summary. UserLand was not happy with RSS 1.0, and continued development of their version of RSS (Really Simple Syndication), eventually releasing RSS 2.0. RSS 1.0 is a bit more verbose than 0.9x, mostly because it needs to be compatible with other versions of RSS while containing the markup that RDF processors need. The RSS technology, which is built into most blog publishing tools, is used primarily to syndicate news content. RSS has evolved into a popular means of sharing content between sites (including the BBC, CNET, CNN, Disney, Forbes, Motley Fool, Wired, Red Herring, Salon, Slashdot, ZDNet, and more). The use of it is expanding to other uses as well. Already syndicated content includes such data as news feeds, events listings, news stories, headlines, project updates, excerpts from discussion forums or even corporate information. Some people think that RSS could be a replacement for e-mail newsletters. In many ways, RSS is similar to the subscription newsletters that many sites offer to keep viewers up-to-date. The big difference is that they don't have to supply an e-mail address. The Rich Site Summary (RSS) format, previously known as the RDF Site Summary, has quietly become the dominant format for distributing news headlines on the Web. RSS defines an XML grammar (a set of HTML-like tags) for sharing news. Each RSS text file contains both static information about your site, plus dynamic information about your new stories, all surrounded by matching start and end tags. Each story is defined by an <item> tag, which contains a headline TITLE, URL, and DESCRIPTION. Future versions of RSS will incorporate popular additional fields like news category, time stamps, and more. Each RSS channel can contain up to 15 items and is easily parsed using Perl or other open source software. RSS, really a mini database containing headlines and descriptions of what's new on your site, is a natural for layering on additional services. RSS encourages in context multiple points of entry to one primary article, rather than multiple copies of the same article (which introduces its own maintenance problems). Why should I make an RSS feed available? Your viewers will thank you, and there will be more of them, because RSS allows them to see your site without going out of their way to visit it.While this seems bad at first glance, it actually improves your site's visibility; by making it easier for your users to keep up with your site - allowing them to see it the way they want to - it's more likely that they'll know when something that interests them is available on your site. Without a feed, your viewers have to remember to come to your site and see if they find anything new - if they have time. If you provide a feed for them, they can point their aggregator or other software at it, and it will give them a link and a description of developments at your site almost immediatly. By providing an RSS feed, you are in front of them constantly, improving the chances that they'll click through to an article that catches their eye. Syndication of web content via RSS is unlikely to make you rich. However, it can be an easy way to draw attention to your material, bringing you some traffic and perhaps a little net fame, depending on how good your information is. By supplying an RSS feed, you can control what information is syndicated in the feed; only the links and metadata are normally distributed. You can also protect the RSS feed itself with SSL encryption and HTTP username/password authentication too, if you'd like.

    Character sets

    The current HTTP/HTML standards (HTTP/1.0, HTML 2.0 and 3.2) only define one representation of an 8-bit character code on the net, and that is the ISO8859-1 code. No other codings are required by the current standards.If your system uses other character codes, native storage code must be mapped (=translated) into the ISO-8859-1 code that is mandated for network transmission. Uusally this mapping is done when the document is converted to HTML format.The entity names for the accented letters have been clearly defined and used in HTML from the early days, and need no special treatment here. The same goes for the low-half characters (< > & and ") that have to be "entified" because they play a role in the syntax of HTML.W3C has successfully stressed the role of Unicode as the basis for identifying characters in documents. Work is continuing on providing markup and style components for international needs.

    • A tutorial on character code issues - concepts of character repertoire, character code, and character encoding especially in the Internet context    Rate this link
    • Character Entity Set(s)    Rate this link
    • Character code coverage - browser report    Rate this link
    • ISO-8859 briefing and resources - This document started out as a brief introduction to the ISO-8859-1 character code, with pointers to a number of sources of additional information about iso-8859-1 specifically and about iso-8859 codes in general.    Rate this link
    • ISO 8859-1 Table - gzipped postscript file    Rate this link
    • Special Characters in HTML - iso8859-1 table    Rate this link
    • The euro sign in HTML and in some other contexts - The euro currency unit has an official symbol, the euro sign. In principle, there is nothing particularly specific about it as a character in data processing, except that it was introduced relatively recently. Consequently, the problems of presenting it on Web pages written in HTML include the general problems of presenting special characters, but in addition to them, there can be special problems since old fonts often lack this character due to its recent introduction. This document tries to summarize the problems and solutions of presenting characters in HTML as applied to this important special case.    Rate this link
    • Unicode and Multilingual Support in HTML, Fonts, Web Browsers and Other Applications - Have you ever tried to include a passage in a different alphabet in one of your documents, for example a quotation in Russian in an English document, only to find that you have no Cyrillic characters available? Or produced a Web page that includes technical symbols and found that it works with Windows but not with Mac OS or Unix? Problems like these arise with non-Latin alphabets and Symbol fonts because until recently most computers used fonts that contain a maximum of 256 characters. The solution is to leave behind the assortment of 8-bit fonts with their limit of 256 characters, where the same character number can represent a different character in different alphabets, and move to a system that assigns a unique number to each character in each of the major languages of the world.    Rate this link

    Forms

    Fill-out forms are used for user actions such as registration, ordering, or queries.HTML support quite nice set of form features.Forms can contain a wide range of HTML markup including several kinds of form fields such as single and multi-line text fields, radio button groups, checkboxes, and menus. Usually forms are processed by CGI scripts. An HTML user agent (web browser) begins processing a form by presenting the document with the fields in their initial state (as server sent it). The user is allowed to modify the fields, constrained by the field type etc. When the user indicates that the form should be submitted (using a submit button), the form data set is processed according to its method, action URI and enctype. NOTE: When there is only one single-line text input field in a form, the user agent should accept Enter in that field as a request to submit the form. Most forms you create will send their data using the POST method. POST is moresecure than GET, since the data isn?t sent as part of the URL, and you can send more data with POST.The advantage of GET in some applications is the fact that you can bookmark the generated result page, for example easily create a link to a search engine search with given keyword.General recommendation is thatif the processing of a form is idempotent (i.e. it has no lasting observable effect on the state of the world), then the form method should be `GET'. Many database searches have no visible side-effects and make ideal applications of query forms. If the service associated with the processing of a form has side effects (for example, modification of a database or subscription to a service), the method should be `POST'. Your web server / user browser, when user is sending form data to your CGI, encodes the data being sent.Alphanumeric characters are sent as themselves; spaces are converted to plus signs (+); other characters - like tabs, quotes, etc. - are converted to ?%HH? - a percent sign and two hexadecimal digits representing the ASCII code of the character. This is called URL encoding (mime type application/x-www-form-urlencoded). In order to do anything useful with the data, your CGI must decode these.

      From form to mail

      Sending contents of a form to a specified e-mail address is a very convient way to collect for example feedback information.

      • FormMail - FormMail is a generic WWW form to e-mail gateway, which will parse the results of any form and send them to the specified user. This script has many formatting and operational options, most of which can be specified through the form, meaning you don't need any programming knowledge or multiple scripts for multiple forms.    Rate this link
      • Mail-in web forms with "yamform" - Yamform, which stands for "Yet Another Mail Form", is a forms-handling program for use with World Wide Web forms. The difference between yamform and other common mail-based forms-handling programs is that yamform allows the designer of the form to control the format of the resulting e-mailed report -- not just the input format but the output format as well.    Rate this link

    Frames

    Frames are a way to divide the browser screen to allow easier navigation under some circumstances. Frequently, frames are used to add a side menu bar to a web site where the constant back and forth clicking would become tedious in a single page. The frameset tag is used to declare multiple frames. Frames allow you to divide the page into several rectangular areas and to display a separate document in each rectangle. Each of those rectangles is called a "frame".Frames are very popular because they are one of the few ways to keep part of the page stationary while other parts change. Frames are also one of the most controversial uses of HTML, because of the way the frames concept was designed, and because many web framed web sites are poorly implemented. Normal frames are used to divide the entire browser window (or a frame) to subwindows.Inline frames appear inside the presentation of a document and allow embedding relatively small documents onto pages.

    Web advertising

    On-line advertising is used in many web sites to collect money torun the site or just for mose extra money from hobby site.Banners, buttons, interstitials and key words are all examples of online advertisements. Digital advertisement can be text, static graphic, animated graphic, video, audio or other. Banner is a commonly used interactive online advertisement in the form of a graphic image that typically runs across the top or bottom of a webpage, or is positioned in a margin or other space reserved for ads.Different ads and different ad sources are often rotated in the same space on a webpage. This is usually done automatically by software on the website or in a separate advertisement server.Among agencies, web publishers and the companies that technically distribute and manage web advertisements, there is little agreement over whether to charge per impression, per click, per customer registration or per sale. Some even disagree on what such definitions mean.Cache operation has some effect on on-line advertising.Web cache (both cacles in network and in web browsers)store pages, images, or other items, on a local server or user's computer to speed the rate at which webpages load. Ads, like other images, are cached unless some sort of cache-busting technique is used. When ads are cached, they will be served but will not be counted by an ad server. Cache busting is process of blocking the caching of certain files to guarantee new delivery from the external server for each page view (there are many techniques for this). Cache busting is necessary for the successful execution on online advertising but it can slow down the loading of web pages considerably.

    Cookies

    Cookies are a general mechanism which server side connections (such as CGI scripts) can use to both store and retrieve information on the client side of theconnection. The addition of a simple, persistent, client-side state extends the capabilities of Web-based client/server applications.A server, when returning an HTTP object to a client, may also send a piece of state information which the client will store. Included in that state object is a descriptionof the range of URLs for which that state is valid. Any future HTTP requests made by the client which fall in that range will include a transmittal of the current value ofthe state object from the client back to the server. The state object is called a cookie, for no compelling reason.

    Multi langual web publishing

    The main language in web is English. But there are also many pages published on other languages. And web sites that support many languages. Nowadays multilingual authoring is really very limited at present on the Web. There is the overall dominance of English, and the effect that most people just write Web pages in English due to that.There are big differences in the difficulties when authoring in a language other than English. After all, there are thousands of languages, with different writing systems, and different people. For Western European languages, the difficulties are relatively small. For languages like Japanese or Chinese, the character encoding issues are certainly much bigger, but generally solvable under suitable guidance. There are many things to consider:different writing systems and encodings are handled on the WWW. Quite often, a page in an "exotic" language works in a particular cultural environment where that language is widely spoken but fails in the WWW context. This is mostly not a problem of knowledge gap among authors; rather, the problems are in servers, browsers, and authoring software.But the technology is not completely ready yet. Today the the goal of making characters show up on absolutely everyone?s browser is fundamentally unrealistic. World is aiming to that, but there are things to do. Character problems are practically important of course, for quite a many languages. One thing which will help the situation is coding systems like Unicode, which allows much larger character sets than normal ASCII character set.Some very small languages have particular problems with character codes: they might contain characters that do not belong even to Unicode. What needs to be done in the authoring side is he integration of existing technologies into mainstream software. Real multilingualism on the Web requires adequate tools for that. . It?s not that much a problem of producing pages but maintaining them. You can always pay some money to someone to translate your pages. Then what? Tomorrow you need to change a factual statement somewhere. How do you make sure it gets correctly changed in all versions?? Dealing with just two languages can be real hard, even if you have people who know the languages.

    Web caching information

    A Web cache sits between Web servers (or origin servers) and a clientand watches requests for HTML pages, images and files come by, saving a copy for itself. Then, if there is another request for the same object, it will use the copy that it has, instead of asking the origin server for it again.This is used to reduce latency and to reduce traffic.

    Security issues

    Web security is a complex topic, encompassing computer system security, network security, authentication services, message validation, personal privacy issues, and cryptography. Web is a complicated networked environment where there are lotsof computers, technology components and users interworking.This situation can cause lots of potential security risks.The security risk are related (but not limited to) informationsecurity, user authentication, web server security and the securityof the computer used for web browsing.One important feature for web sites is the ability to restrict access to part or all of the site. This is often used on subscription sites, such as online webzines or other members-only services. It's also used on administrative portions of web sites. There are two kinds of user authentication methods in use. One is HTTP authentication; this is actually done by the web server itself.With HTTP authentication, you can password-protect a directory and the files within.The other kind of authentication is done by forms and CGIs, and often uses cookies to track user sessions. When you use HTTPS to connect to a web server, the client and browser will negotiate a common protocol to secure the channel. Typical secure channel protocols used in use nowadays are PCT 1.0, SSL 3.0, and SSL 2.0.In cases where the server and client have multiple supported protocols in common, the web server will attempt to secure the channel with one of the protocols it supports (the order they are selected depends on web server preferences).The SSL protocol uses security keys to guarantee the security of the communication (makes sure that the communicating ends are what they say, data is encrypted with some key etc.). Right now, the only secure key distribution mechanism on the Internet is the SSL key mechanism, whereby a group of companies (around 5) with keys that got into the original Netscape release (and other web browsers) essentially rule the roost, because Joe Average has no idea how to install a new root key in his browser. There are also security things related to web servers itself. Most servers are launched as root so that they can open up the low numbered port 80 (the standard HTTP port) and write to the log files. They then wait for an incoming connection on port 80. As soon as they receive this connection, they fork a child process to handle the request and go back to listening. The child process, meanwhile, changes its effective user ID to the user "nobody" and then proceeds to process the remote request. All actions taken in response to the request, such as executing CGI scripts or parsing server-side includes, are done as the unprivileged "nobody" user.There are always software bugs andother problems which cause scurity risks. The idea is to minumize the security risks.Even though you can't make your server completely safe, you can increase its security significantly in a Unix environment by running it in a chroot environment. The chroot system command places the server in a "silver bubble" in such a way that it can't see any part of the file system beyond a directory tree that you have set aside for it. The directory you designate becomes the server's new root "/" directory. Anything above this directory is inaccessible


<webmaster@epanorama.net>

Back to ePanorama main page ??