The Proper Formation of Big-8 Newsgroup Names

Created and maintained by Martin X. Moleski, SJ.
First draft: 2006-03-20. Updated: 2008-10-07 .

Note well: I did my level best to assimilate all the input I was given about naming conventions.
This document is a product of a personal good-faith effort to figure them out.
Although I think it is accurate, it has no official standing whatsoever.

Table of Contents / Short Form of the Rules

Prologue

1. A group name is made up of name components separated by '.' (period or dot).

2. Rules governing the formation of name components:

2.1 The first component in the name of a Big-8 newsgroup will be the name of one of the eight hierarchies.

2.2 A component should not contain characters other than [a-z0-9+-].

2.3 A component must contain at least one non-digit.

2.4 A component must not contain ANY UPPERCASE LETTERS.

2.5 A component must begin with a letter or digit.

2.6 Reserved words must not be used as components.

2.7 As a general rule, components (and names generally) should be as short as is consistent
with comprehensibility.

3. The group name as a whole, plus at least one tab stop, plus the short description of the group should not exceed 79 or 80 columns.

4. The name should be logical, informative, self-explanatory, consistent with other relevant newsgroup names, and beyond reproach.

Appendix A: A Few Examples of Existing Big-8 Group Names and Their Descriptions

Appendix B: The Longest Names in the Big-8 (2006-03-19)

References: Various Documents Describing Usenet Protocols and Naming Standards

Prologue

This is an informal and unapproved set of observations on how to properly form names for newsgroup proposals. It represents what I have recently learned about Usenet naming conventions. I would be happy to revise the document to keep pace with revisions in Usenet policy when and if those revisions become generally accepted. Folks may not use this document to berate the custodians of the Big-8 because the board did not write this document. I did.

Those who wish to read the authoritative documents may skip straight to the References. Those who know how to read such documents may use them as a stick with which to beat the custodians of the Big-8 because these documents do describe the standards, if any, to which all Usenet groups should conform, if the administrators of a news server want them to. (Some news administrators just ignore the authoritative documents and do as they please. Someone should probably visit them and tell them to canonicalize their system!)

After this document was written, PJ Ross wrote a script to check the validity of the name and newsgroup description line.

Any flaws in this document are my fault. Any correspondence to actual Usenet practice is due to the patient answers others have made to my questions. To all of them, my heartfelt thanks.

1. A group name is made up of name components separated by '.' (period or dot).

Examples:

news.groups has two components.

news.announce.newgroups has three components.

rec.aviation.flying has three components.

rec.autos.sport.nascar.moderated has five components.

2. Rules governing the formation of name components:

2.1 The first component in the name of a Big-8 newsgroup will be the name of one of the eight hierarchies:

comp
humanities
misc
news
rec
sci
soc
talk

When we talk about "the Big-8 newsgroups," this is what we mean: all of the newsgroups that are found in these eight hierarchies.

2.2 A component should not contain characters other than [a-z0-9+-].

That is to say, the characters allowed are:

-- lowercase letters (a to z)
-- digits (0 to 9)
-- the plus sign: +
-- the minus sign: -

A few quick examples. More below.

comp.lang.c++.moderated
misc.immigration.australia+nz
misc.industry.pulp-and-paper
rec.arts.sf.tv.babylon5.info

Note Well : The underscore character ("_") is allowed in components by NNTP but is traditionally not used in the Big-8.

2.3 A component must contain at least one non-digit.

In other words, "rec.autos.makers.ford.500" is invalid because the component "500" is made up only of digits.

2.4 A component must not contain ANY UPPERCASE LETTERS.

This is just a repetition of one part of rule 2.2. But sometimes it takes a little while for the meaning of the rules to sink in. "Knowledge maketh a bloody entrance."

Some servers pay attention to the differences between REC, REc, ReC, Rec, rEC, rEc, reC,and rec. These systems are "case sensitive." To the human reader, all these variations are exactly the same syllable. To a computer, each one of these variants is a unique string of bytes; using base-16 representation of the ASCII values:

	REC = 52 45 43
	Rec = 52 65 63
	rEc = 72 45 63
	rec = 72 65 63 
To make no ambiguity arises from typos or random capitalization, the rule for newsgroup names is lowercase only; in some of the earliest Operating Systems for personal computers, the rule was that all filenames WOULD BE STORED IN UPPERCASE ONLY.)

2.5 A component must begin with a letter or digit.

Components beginning with the underscore character are reserved for future development in Usenet. So, if all goes well, you won't see a newsgroup name such as "comp._my_computer_is_better_than_yours_".

2.6 Reserved words must not be used as components.

These words have a special meaning for some of the servers that are used to host newsgroups: 'all' and 'ctl'. There may be other reserved words as well ("example", "to", "poster", "control", "junk"--cf. USEFOR).

2.7 As a general rule, components (and names generally) should be as short as is consistent
with comprehensibility.

Or one might say that the right length strikes a balance between brevity and intelligibility.

In the early years of Usenet, components could only be 14 characters long because of restrictions on the length of file names. Those limits have been lifted in practice. Wise, kindly, and knowledgeable people recommend keeping the components of names under 20 characters (one proposal) or 30 characters (another proposal).

I see a 497 UTF-8 octet limit on newsgroup names, created by a 497 UTF-8   
octet limit on command parameters in RFC 3977. I don't see much else   
limiting newsgroup names there. Yes, it does mention a prohibition on   
wildcards in the newsgroup name, but not much else. - Mark Kramer

The UKnet and some of the articles in the References section strongly discourage wasted verbage in the description such as "Discussion of ". The vast majority of newsgroups are dedicated to discussion of the topic to which the group is devoted.

Some samples of the longest components in the Big-8 (2006-19-3):

 14 characters: 
           nlang-know-rep
           administration
           net-management
           paint-shop-pro
           net-happenings
           visual-objects
           laser-printers
  19 characters: 
           extreme-programming

Observations from Jim Riley:

There may be cases where a name was shortened to less than 14. 

For example: 
     rec.arts.books.hist-fiction (12)
     rec.arts.books.historical-fiction (18)
     rec.arts.tv.uk.coronation-st (13)
     rec.arts.tv.uk.coronation-street (17)
     rec.pets.cats.health+behav (12)
     rec.pets.cats.health+behavior (15)
     rec.scouting.guide+girl (10)
     rec.scouting.??? (?)
There are some other instances where two components have been substituted 
for a longer name, as in 
     soc.culture.african.american
     soc.culture.asian.american
     soc.culture.mexican.american
In many cases, multiple components can provide enough context that the
latter components can be shorter (I'm not referring to the above 3 groups)...


In the case of extreme-programming (19), the long name helps clarify the meaning of the
sub-hierarchy comp.software.*, which is not about software per se, but more
about software development, programming, distribution, etc. It is also a
different usage than in news.software.* which is about software used for
Netnews and Usenet.
Here is a curiosity for you. Originally, the entire newsgroup name had to
fit in 14 significant characters. We now have rec.sport.* rather than
rec.sports.* because when a baseball newsgroup was established, it was
realized that net.sport.base(ball) would leave room in the namespace for
net.sport.bask(etball). Never mind that when the basketball group was
evenutally created it was called ".hoops".

>Longest names (46 to 50 characters):
>
> misc.forsale.computers.mac-specific.cards.misc
> misc.forsale.computers.mac-specific.cards.video
> misc.forsale.computers.pc-specific.cards.video
> misc.forsale.computers.pc-specific.motherboards
Somewhat a result of incremental development. Originally there were
misc.forsale and misc.forsale.computers. The first was later renamed
misc.forsale.non-computers.
The first split of the mf.computers, produced several groups including
mf.computers.mac and mf.computers.pc-clone. Several other groups failed,
including groups for "printers", "software", "storage", and "comm". This
is one instance where the 2/3 rule had an effect. Overall, the votes were
on the order of 300-odd to 150-odd, with most voters casting
straight-ticket Yes or No votes. But there was enough ticket splitting
such that the Yes percentage ranged from 63.7% to 71.4%.
A later re-organization started out with the intent to add the failed
proposed groups which were primarily for peripherals that were
_not-specific_ to a particular computer. Then the proposal was changed to
incorporate renaming mf.computers.mac and mf.computers.pc-clone to
mf.computers.mac-specific and mf.computers.pc-specific to emphasize that
the other groups were generic. And finally, the split of the mac and PC
groups was proposed (.cards.misc and .cards.video were proposed at the same
time). There were 29 separate items were on the ballot, including separate
items to create each of the mf.computers.pc-specific.* and
mf.computers.mac-specific.* groups, as well as removing the existing
mf.computers.mac and mf.computers.pc-clone groups.
> rec.games.trading-cards.marketplace.magic.auctions
> rec.games.trading-cards.marketplace.magic.sales
> rec.games.trading-cards.marketplace.magic.trades
This is somewhat odd, in that the rec.games.trading-cards.* hierarchy had a
.magic, .marketplace, and .misc group when the .marketplace group was split
into .magic.*, and .misc groups. Had the groups evolved at an earlier
time, they probably would have been rec.games.magic, which split off
rec.games.magic.marketplace that was then split into rec.games.magic.auctions, 
rec.games.magic.sales, and rec.games.magic.trades.

3. The group name as a whole, plus at least one tab stop, plus the short description of the group should not exceed 79 or 80 columns.

The purpose of the short description is to add information not found in the group name, not simply repeat it. There is no need to use expressions like "discussion of" to introduce the description, since all newsgroups are, by definition, intended to be for discussion.

For most of its history, the Big-8 allowed the group name, tab(s), and description to use all 80 columns. 79 columns is now recommended if one wishes to guarantee the fewest number of unnecessary wraparounds on 80-column terminals; in UKnet, they have dropped the period at the end of the description to free up an extra character. Some people in alt.config also recommend a combined length of no more than 79 characters as does Russ Allbery in his description of the newsgroups file.

When calculating the length of the newsgroups line, "tabs" traditionally fill up to 8 columns in a line per tab, depending on where the cursor is when the Tab key is struck. In the rulers below, I have inserted "|" after the 24th character because the rule requires inserting as many tabs as needed if a group name is shorter than 24 characters. I use "^" to show where the Tab stops are located. When all newsgroup names were short, all of the descriptions would then line up, beginning in the 25th column.

In the examples below, spaces between the group name and the description have been replaced with a middot (·) to show how the tab rule works. I've also changed the font to fixed-pitch to show how the name, spaces-to-the-tab-stop, and description should fit into 80 or fewer columns:

	         1         2         3         4         5         6         7         8   
	123456789012345678901234|6789012345678901234567890123456789012345678901234567890
	^·······^·······^·······|·······^·······^·······^·······^·······^·······^·······
	comp.ai·^·······^·······Artificial Intelligence.
	comp.ai.jair.announce···Announcements & abstracts of the Journal of AI Research.
	rec.arts.comics.info····Reviews, convention information and other comics news.
	rec.arts.startrek.reviews·······Reviews of Star Trek books, episodes, films, &c.
	rec.arts.comics.marvel.universe·Marvel Comics' shared universe and characters.
	rec.collecting.villages·Collectible houses, cottages, villages, and accessories.
	rec.outdoors.fishing.fly.tying··Issues relating to tying flies for flyfishing.
	sci.op-research·^·······Research, teaching & application of operations research.

Once upon a time, on a faraway server, the tab rule produced a very neat list of names and descriptions, according to Jim Riley:

Yes.  I have seen a checkgroups from just after the Great Renaming where
every description except two begins in column 25 (3rd tab position). The
exceptions are for:
		comp.protocols.appletalk (24)
		news.announce.conferences (25)
At that time the practical effect would have been that all descriptions had
a common left alignment, regardless whether they had short names (7 or
fewer), ordinary names (8 to 15), or long names (16 to 23), and there was a
work-around for exceptionally long names.
If you look at the Big 8 checkgroups, there are sections where this
alignment still largely works, and sections where deeper hierarchies
completely obliterate any alignment.
 

All moderated newsgroups must add the tag "(Moderated)" one space after the period that ends the newsgroup description. This moderation flag does not count against the 79/80-character limit for the newsgroup name, tab, and description.

The period at the end of the newsgroup description may now be optional. Those who have enough space to do so should probably preserve the period for backward-compatibility.

4. The name should be logical, informative, self-explanatory, consistent with other relevant newsgroup names, and beyond reproach.

The principles discussed so far are at the lowest level of grammar, spelling, and punctuation. There are higher-level issues discussed in the brilliant and irreplaceable discussion by David Wright, "Guidelines on Usenet Newsgroup Names" (1999). There is an art to locating a new newsgroup in its proper place in the existing lists and there are no rules by which disagreements over the higher-level issues of taxonomy can be resolved. Someone makes a decision, a group gets created, and Usenet goes on.

Miscification: An attentive reader says, "But looking again at those examples, I wonder if it's a good idea to include examples of *.misc groups without explaining the special circumstances in which they're created, i.e. when rec.foo exists and rec.foo.bar is proposed. (Disclaimer: I'm not entirely sure I understand the rules for adding components myself.)"

Russ Allbery: nlang-know-rep "is a great example of a group that got a completely incomprehensible name because we were enforcing the 14-character limit. It's also a great example of a group that's hard to name in even 30 characters, as I believe that was supposed to be natural-language-knowledge-representation. Which I think we'd all agree is too long, but shortening it is... hard.

"But what we've got abbreviates *every* keyword that someone would search for, making the group far harder to find."


Appendix A: A Few Examples of Existing Big-8 Group Names and Their Descriptions

I have just cut these lines from the Checkgroups list posted in news.announce.newgroups. If the formating looks funny, that's because the 'rules' (traditions) for formatting produce funny results when there are longer newsgroup names.

" (Moderated)" is an essential flag for identifying which groups are moderated; the letters in this flag do not count against the 80-character rule.

         1         2         3         4         5         6         7         8   
123456789012345678901234|6789012345678901234567890123456789012345678901234567890
comp.ai.nat-lang Natural language processing by computers. comp.ai.neural-nets All aspects of neural networks. comp.ai.nlang-know-rep Natural Language and Knowledge Representation. (Moderated) 1 2 3 4 5 6 7 8
123456789012345678901234|6789012345678901234567890123456789012345678901234567890
rec.games.trading-cards.marketplace.magic.auctions Auctions of Magic cards. rec.games.trading-cards.marketplace.magic.sales Selling Magic cards. rec.games.trading-cards.marketplace.magic.trades Trading Magic cards. rec.games.trading-cards.marketplace.misc Trading trading card stuff. 1 2 3 4 5 6 7 8
123456789012345678901234|6789012345678901234567890123456789012345678901234567890
misc.metric-system The International System of Units. misc.misc Various discussions not fitting in any other group. misc.news.bosnia News, articles, reports & information on Bosnia. (Moderated) misc.news.east-europe.rferl Radio Free Europe/Radio Liberty Daily Report. (Moderated) misc.news.internet.announce News bulletins from the Internet. (Moderated) misc.news.internet.discuss Discussion of news bulletins from the Net. 1 2 3 4 5 6 7 8
123456789012345678901234|6789012345678901234567890123456789012345678901234567890
news.announce.important General announcements of interest to all. (Moderated) news.announce.newgroups Calls for newgroups & announcements of same. (Moderated) news.announce.newusers Explanatory postings for new users. (Moderated) news.answers Repository for periodic USENET articles. (Moderated) news.groups Discussions and lists of newsgroups. 1 2 3 4 5 6 7 8
123456789012345678901234|6789012345678901234567890123456789012345678901234567890
rec.arts.books.childrens All aspects of children's literature. rec.arts.books.hist-fiction Historical fictions (novels) in general. rec.arts.books.marketplace Buying and selling of books. rec.arts.books.reviews Book reviews. (Moderated) rec.arts.books.tolkien The works of J.R.R. Tolkien. 1 2 3 4 5 6 7 8
123456789012345678901234|6789012345678901234567890123456789012345678901234567890
sci.crypt Different methods of data en/decryption. sci.crypt.random-numbers Generating cryptographic strength randomness. sci.crypt.research Cryptography, cryptanalysis, and related issues. (Moderated) sci.data.formats Modelling, storage and retrieval of scientific data. 1 2 3 4 5 6 7 8
123456789012345678901234|6789012345678901234567890123456789012345678901234567890
soc.genealogy.surnames.global Surnames queries central database. (Moderated) soc.genealogy.surnames.ireland Surnames queries - Ireland & Northern Ireland. (Moderated) soc.genealogy.surnames.misc Surnames - regions not covered elsewhere. (Moderated) soc.genealogy.surnames.usa Surnames queries - USA. (Moderated) soc.genealogy.west-indies Genealogy of the West Indies. 1 2 3 4 5 6 7 8
123456789012345678901234|6789012345678901234567890123456789012345678901234567890
talk.politics.drugs The politics of drug issues. talk.politics.european-union The EU and political integration in Europe. talk.politics.guns The politics of firearm ownership and (mis)use. talk.politics.libertarian Libertarian politics & political philosophy. talk.politics.medicine The politics and ethics involved with health care.

Appendix B: The Longest Names in the Big-8 (2006-03-19)

         1         2         3         4         5         6         7         8   
123456789012345678901234|6789012345678901234567890123456789012345678901234567890
misc.forsale.computers.mac-specific.cards.misc misc.forsale.computers.mac-specific.cards.video misc.forsale.computers.pc-specific.cards.video misc.forsale.computers.pc-specific.motherboards rec.games.trading-cards.marketplace.magic.auctions rec.games.trading-cards.marketplace.magic.sales rec.games.trading-cards.marketplace.magic.trades

References: Various Documents Describing Usenet Protocols and Naming Standards

1987: RFC 1036

A historic document that shows how Usenet grew out of the e-mail protocols. No help in forming names.

1994: Son of RFC 1036

A document that is in limbo. Some of the ideas have been put into practice; some have not.

1999: David Wright, Guidelines on Usenet Newsgroup Names

A brilliant and irreplaceable essay on the syntax of newsgroup names. One small section is slightly out of date because of the direction in which Usenet standards seem to be moving.

2006: I-Ds List Working Group, Usenet Article Standard Update (usefor):

The link agove goes to the table of contents for the next three documents. The table of contents should be checked if one is interested in finding the very latest drafts for USEAGE, USEFOR, and USEPRO. One never knows when these three documents may be replaced by a subsequent draft.

2005: USEFOR-USEAGE

2006: USEFOR-USEFOR

2006: USEFOR-USEPRO

2006: Russ Allbery, ISC README

This, in my opinion, is the gold-standard by which Big-8 names should be judged. The more I read other documents, the more I appreciated what tale, Russ, and others have done to keep Usenet from coming apart at the seams. We owe them a great debt of gratitude.

Newsgroup name/description validator
The news.groups thread that led to this article.