home *** CD-ROM | disk | FTP | other *** search
Text File | 1988-05-03 | 44.5 KB | 1,073 lines |
- --::::::::::
- --strcomp.inc
- --::::::::::
- -- Documentation (including this Include File)
- strcomp.inc
- read.me
-
- -- Source Files in Compilation Order
- latin1.a
- natascii.a
- strcomps.a
- strcompb.a
- define.a
- comline.a
- main.a
- --::::::::::
- --read.me
- --::::::::::
- The intention of this posting is not to provide a facility, but
- rather to demonstrate a technique to do string comparisons
- in a more sophisticated way than simply using ASCII values.
-
- Comments, questions etc are very welcome to:
- Erland Sommarskog
- ENEA Data, Stockholm
- sommar@enea.UUCP
-
- The posting contains seven files that can be divided into three
- groups:
- I: strcompS.a and strcompB.a
- The core of the posting. They contain a package for string
- comparisons. It has a character-transscription table to be
- loaded by the user and comparison operators for trans-
- scripted string. The exported routines are described below.
- StrcompS is the specification, whereas strcompB contains
- the package body.
- II: latin1.a and natascii.a
- They declare names for characters, to be used, for example,
- when defining a collating sequence for the package above.
- Latin1 declares names for the ISO standard 8859/1. Natascii
- declares names for national replacements of the ordinary
- ASCII set.
- III: define.a, comline.a and main.a
- An demonstration application that uses the string-comparison
- package. Define.a loads the character collating sequence.
- Comline.a reads the command line. Note that this file is
- bound to Verdix Ada for Unix and must be rewritten for another
- system.
- Main.a is the main program. It reads lines from standard
- input or a named file and writes the sorted lines to standard
- output when end-of-file is detected.
- You find a description of the options last in this file.
-
- You should compile the files in the order: latin1, natascii,
- strcompS, strcompB, define, comline, main.
-
- Four-dimensional sorting
- ------------------------
-
- The string-comparison package compares strings at four levels:
- 1) Alphabetic
- 2) Accents
- 3) Non-letters
- 4) Difference in case
- What is an alphabetic etc is up to the user. He may define "$"
- being a letter with "(" as its lowercase variant if he likes.
-
- One level is only regarded if the level above have no difference.
- As an example I take
- T^ete-`a-t^ete
- (I assume a "normal" loading of the character table here.)
- For the first level we use TETEATETE, thus we remove the accents
- and the hyphens. On the next we re-insert the accents so we get
- T^ETE`AT^ETE
- On level three we only take the hyphens in regard. When comparing
- non-letters the package uses the simple ASCII values. The earlier
- a character comes, the lower is the sort value. Thus, "trans-scription"
- will precede "transscrip-tion". (Actually, as the implementation
- is done, the position is more important than the ASCII value.)
- On the last level we use
- T^ete`at^ete
- thus, the original writing with the hyphens removed. Note that the
- user can specify case to be insigificant.
- (This isn't a description on how the package is implemented, just
- a way of illustrating the result. In practice it's done a little
- more effective.)
-
- When defining accented variants it is possible to let a character
- be a variant of a string, in this way the AE ligature can be sorted
- as "AE". The opposite is not possible, and what worse is, a string
- can't have an alphabetic value. Thus the package is not able to sort
- languages as Spanish (CH and LL) correctly.
-
- The number characters are handled in a special way if you define them
- as alphabetics. A sequence of figures will read as one number and sort
- after all other alphabetics. (Even if they were defined as the first
- characters.) So you will get
- File1 File2 File10 File11
- instead of the usual
- File1 File10 File11 File2
- If you like to sort them as they are read, this is also possible.
- E.g. load "0" as a variant of "zero".
-
- The package contains the following routines:
-
- Load Operations
- ---------------
- PROCEDURE Load_alphabetic(ch : IN character);
- Loads ch as the next alphabetic character. The order of loading
- determines the sorting values.
-
- PROCEDURE Load_variant(ch : IN character;
- Equ_ch : IN character;
- Equ_kind : IN Equivalence_kind);
- TYPE Equivalence_kind IS (Exact, Case_diff, Accented);
- PROCEDURE Load_variant(ch : IN character;
- Equ_str : IN string);
- Load_variant loads ch as a variant of Equ_ch or Equ_str. The interpretation
- of Equ_kind is:
- Exact: Exactly the same. There is no difference. What you use when you
- don't want case to be significant.
- Case_diff: Load ch as a lowercase variant of Equ_ch. There will be
- difference at level 4.
- Accented: Load ch as variant of Equ_ch at level 2.
- The latter version of Load_variant always loads ch at level 2.
-
- For simplify loading, the package also provides routines for loading
- a character and its ASCII lowercase equivalent simultaneously:
- PROCEDURE Set_case_significance(Flag : boolean);
- PROCEDURE Alpha_both_cases(ch : IN character);
- PROCEDURE Variant_both_cases(ch : IN character;
- Equ_ch : IN character);
- PROCEDURE Variant_both_cases(ch : IN character;
- Equ_str : IN string);
- With Set_case_significant you determine whether case should be
- significant when loading the pairs. Variant_both_cases loads ch
- at level 2.
-
- The loading operations raise Already_defined if an attempt is
- made to load a character twice. If Equ_ch or part of Equ_str is
- undefined, this gives the exception Undefined_equivalent.
-
- Transscription operations
- -------------------------
- These routines translates a string to the internal coding.
- TYPE Transscripted_string(Max_length : natural) IS PRIVATE;
- PROCEDURE Transscribe(ch : IN character;
- Trans_str : OUT Transscripted_string);
- PROCEDURE Transscribe(Str : IN string;
- Trans_str : OUT Transscripted_string);
- If the transscription is too long, the routines will raise
- Transscription_error.
-
- Comparison operators:
- ---------------------
- FUNCTION "<=" (Left, Right : Transscripted_string) RETURN boolean;
- FUNCTION "<" (Left, Right : Transscripted_string) RETURN boolean;
- FUNCTION ">=" (Left, Right : Transscripted_string) RETURN boolean;
- FUNCTION ">" (Left, Right : Transscripted_string) RETURN boolean;
-
- I have only included operations for comparing transscripted
- strings. Of course there could be a set for uncoded strings too.
-
- Other function
- --------------
- FUNCTION Is_letter(ch : character) RETURN boolean;
-
- The demonstration program
- -------------------------
- The program takes the options:
- -8 Use ISO/Latin-1. If not present, use 7-bit ASCII with national
- replacements.
- -e Case is significant. When omitted, case is not significant.
- -LX Selects language. X should be one of the following:
- s or S: Swedish. (Default)
- d or D: Danish
- g: German1: "A, "O and "U sorts as A, O and U.
- G: German2: "A, "O and "U sorts as AE, OE and UE.
- f or F French
-
- In the definition routine I load space as the first alphabetic
- letter. This gives the result that "Smith, Tony" will sort
- before "Smithson, Alan".
- --::::::::::
- --latin1.a
- --::::::::::
- ----------------------------------------------------------------------
- -- PACKAGE ISO_Latin_1 --
- ----------------------------------------------------------------------
- -- This package defines names for the characters in the standard
- -- ISO 8859/1, known as Latin-1, that are not in the ASCII set,
- -- i.e. characters with codes >= 160. (Control characters 128-159
- -- are excluded.
-
- WITH Unchecked_conversion;
- PACKAGE ISO_Latin_1 IS
- -- Implementation note: To define the constants within the existing
- -- character type I use Unchecked_conversion. Note that this is not
- -- legal Ada. Ada defines the character type as covering codes from
- -- 0 to 127. Thus, all these declarations should raise Constraint_error,
- -- however neither DEC Ada, nor Verdix for Unix do so.
- -- Note also that the Ada definition permits an implementation to
- -- restrict Unchecked_conversion.
- -- The proper way would be define a new enumeration type, however this
- -- requires more work, including a new Text_io.
-
- TYPE Byte IS NEW integer RANGE 0..255;
- FUNCTION Eight_bit IS NEW Unchecked_conversion(Byte, Character);
-
- No_break_space : CONSTANT character := Eight_bit(160);
- Exclaim_up_down : CONSTANT character := Eight_bit(161);
- Cent : CONSTANT character := Eight_bit(162);
- Pound : CONSTANT character := Eight_bit(163);
- Gen_currency : CONSTANT character := Eight_bit(164);
- Yen : CONSTANT character := Eight_bit(165);
- Broken_bar : CONSTANT character := Eight_bit(166);
- Paragraph : CONSTANT character := Eight_bit(167);
- Diaraesis : CONSTANT character := Eight_bit(168);
- Copyright : CONSTANT character := Eight_bit(169);
- Fem_ordinal : CONSTANT character := Eight_bit(170);
- L_angle_quote : CONSTANT character := Eight_bit(171);
- Not_sign : CONSTANT character := Eight_bit(172);
- Soft_hyphen : CONSTANT character := Eight_bit(173);
- Reg_trade : CONSTANT character := Eight_bit(174);
- Macron : CONSTANT character := Eight_bit(175);
- Degree : CONSTANT character := Eight_bit(176);
- Plus_minus : CONSTANT character := Eight_bit(177);
- Super_2 : CONSTANT character := Eight_bit(178);
- Super_3 : CONSTANT character := Eight_bit(179);
- Acute : CONSTANT character := Eight_bit(180);
- Mu : CONSTANT character := Eight_bit(181);
- Pilcrow : CONSTANT character := Eight_bit(182);
- Middle_dot : CONSTANT character := Eight_bit(183);
- Cedilla : CONSTANT character := Eight_bit(184);
- Super_1 : CONSTANT character := Eight_bit(185);
- Mask_ord : CONSTANT character := Eight_bit(186);
- R_angle_quote : CONSTANT character := Eight_bit(187);
- Quarter : CONSTANT character := Eight_bit(188);
- Half : CONSTANT character := Eight_bit(189);
- Three_quarter : CONSTANT character := Eight_bit(190);
- Query_up_down : CONSTANT character := Eight_bit(191);
- UC_A_grave : CONSTANT character := Eight_bit(192);
- UC_A_acute : CONSTANT character := Eight_bit(193);
- UC_A_circum : CONSTANT character := Eight_bit(194);
- UC_A_tilde : CONSTANT character := Eight_bit(195);
- UC_A_dots : CONSTANT character := Eight_bit(196);
- UC_A_ring : CONSTANT character := Eight_bit(197);
- UC_AE : CONSTANT character := Eight_bit(198);
- UC_C_cedilla : CONSTANT character := Eight_bit(199);
- UC_E_grave : CONSTANT character := Eight_bit(200);
- UC_E_acute : CONSTANT character := Eight_bit(201);
- UC_E_circum : CONSTANT character := Eight_bit(202);
- UC_E_dots : CONSTANT character := Eight_bit(203);
- UC_I_grave : CONSTANT character := Eight_bit(204);
- UC_I_acute : CONSTANT character := Eight_bit(205);
- UC_I_circum : CONSTANT character := Eight_bit(206);
- UC_I_dots : CONSTANT character := Eight_bit(207);
- UC_edh : CONSTANT character := Eight_bit(208);
- UC_N_tilde : CONSTANT character := Eight_bit(209);
- UC_O_grave : CONSTANT character := Eight_bit(210);
- UC_O_acute : CONSTANT character := Eight_bit(211);
- UC_O_circum : CONSTANT character := Eight_bit(212);
- UC_O_tilde : CONSTANT character := Eight_bit(213);
- UC_O_dots : CONSTANT character := Eight_bit(214);
- Mult_sign : CONSTANT character := Eight_bit(215);
- UC_O_oblique : CONSTANT character := Eight_bit(216);
- UC_U_grave : CONSTANT character := Eight_bit(217);
- UC_U_acute : CONSTANT character := Eight_bit(218);
- UC_U_circum : CONSTANT character := Eight_bit(219);
- UC_U_dots : CONSTANT character := Eight_bit(220);
- UC_Y_acute : CONSTANT character := Eight_bit(221);
- UC_thorn : CONSTANT character := Eight_bit(222);
- LC_s_sharp : CONSTANT character := Eight_bit(223);
- LC_a_grave : CONSTANT character := Eight_bit(224);
- LC_a_acute : CONSTANT character := Eight_bit(225);
- LC_a_circum : CONSTANT character := Eight_bit(226);
- LC_a_tilde : CONSTANT character := Eight_bit(227);
- LC_a_dots : CONSTANT character := Eight_bit(228);
- LC_a_ring : CONSTANT character := Eight_bit(229);
- LC_ae : CONSTANT character := Eight_bit(230);
- LC_c_cedilla : CONSTANT character := Eight_bit(231);
- LC_e_grave : CONSTANT character := Eight_bit(232);
- LC_e_acute : CONSTANT character := Eight_bit(233);
- LC_e_circum : CONSTANT character := Eight_bit(234);
- LC_e_dots : CONSTANT character := Eight_bit(235);
- LC_i_grave : CONSTANT character := Eight_bit(236);
- LC_i_acute : CONSTANT character := Eight_bit(237);
- LC_i_circum : CONSTANT character := Eight_bit(238);
- LC_i_dots : CONSTANT character := Eight_bit(239);
- LC_edh : CONSTANT character := Eight_bit(240);
- LC_n_tilde : CONSTANT character := Eight_bit(241);
- LC_o_grave : CONSTANT character := Eight_bit(242);
- LC_o_acute : CONSTANT character := Eight_bit(243);
- LC_o_circum : CONSTANT character := Eight_bit(244);
- LC_o_tilde : CONSTANT character := Eight_bit(245);
- LC_o_dots : CONSTANT character := Eight_bit(246);
- Div_sign : CONSTANT character := Eight_bit(247);
- LC_o_oblique : CONSTANT character := Eight_bit(248);
- LC_u_grave : CONSTANT character := Eight_bit(249);
- LC_u_acute : CONSTANT character := Eight_bit(250);
- LC_u_circum : CONSTANT character := Eight_bit(251);
- LC_u_dots : CONSTANT character := Eight_bit(252);
- LC_y_acute : CONSTANT character := Eight_bit(253);
- LC_thorn : CONSTANT character := Eight_bit(254);
- LC_y_dots : CONSTANT character := Eight_bit(255);
- END ISO_latin_1;
- --::::::::::
- --natascii.a
- --::::::::::
- ----------------------------------------------------------------------
- -- PACKAGE National ASCII --
- ----------------------------------------------------------------------
- -- This package declares alternate names for the ASCII codes
- -- 64, 91-94, 96 and 123-126 to be used when when these codes refers
- -- to national characters. The names are restricted to letters.
- -- Languages covered: Swedish/Finnish, Danish/Norwegian, German,
- -- French and Italian.
-
- PACKAGE National_ASCII IS
-
- -- Swedish and Finnish
- SW_UC_E_acute : CONSTANT character := '@';
- SW_UC_A_ring : CONSTANT character := ']';
- SW_UC_A_dots : CONSTANT character := '[';
- SW_UC_O_dots : CONSTANT character := '\';
- SW_UC_U_dots : CONSTANT character := '^';
- SW_LC_e_acute : CONSTANT character := '`';
- SW_LC_a_ring : CONSTANT character := '}';
- SW_LC_a_dots : CONSTANT character := '{';
- SW_LC_o_dots : CONSTANT character := '|';
- SW_LC_u_dots : CONSTANT character := '~';
-
- -- Danish and Norwegian
- DA_UC_AE : CONSTANT character := '[';
- DA_UC_O_oblique : CONSTANT character := '\';
- DA_UC_A_ring : CONSTANT character := ']';
- DA_UC_U_dots : CONSTANT character := '^';
- DA_LC_ae : CONSTANT character := '{';
- DA_LC_o_oblique : CONSTANT character := '|';
- DA_LC_a_ring : CONSTANT character := '}';
- DA_LC_u_dots : CONSTANT character := '~';
-
- -- German
- GER_UC_A_dots : CONSTANT character := '[';
- GER_UC_O_dots : CONSTANT character := '\';
- GER_UC_U_dots : CONSTANT character := ']';
- GER_LC_a_dots : CONSTANT character := '{';
- GER_LC_o_dots : CONSTANT character := '|';
- GER_LC_u_dots : CONSTANT character := '}';
- GER_LC_s_sharp : CONSTANT character := '~';
-
- -- French
- FR_LC_a_grave : CONSTANT character := '@';
- FR_LC_c_cedilla : CONSTANT character := '\';
- FR_LC_e_acute : CONSTANT character := '{';
- FR_LC_u_grave : CONSTANT character := '|';
- FR_LC_e_grave : CONSTANT character := '}';
-
- -- Italian
- IT_LC_A_ring : CONSTANT character := ']';
- IT_LC_u_grave : CONSTANT character := '`';
- IT_LC_a_grave : CONSTANT character := '}';
- IT_LC_o_grave : CONSTANT character := '{';
- IT_LC_e_grave : CONSTANT character := '|';
- IT_LC_i_grave : CONSTANT character := '~';
-
- END National_ASCII;
- --::::::::::
- --strcomps.a
- --::::::::::
- ----------------------------------------------------------------------
- -- SPECIFCATION String_comparison --
- ----------------------------------------------------------------------
- -- This package provides operations for comparing strings according to
- -- a user-defined scheme.
- -- The package contains operations for load an internal coding table,
- -- routines for coding strings and for comparing coded strings.
- PACKAGE String_comparison IS
-
- -- Load a character as the next in the primary colltating sequence
- PROCEDURE Load_alphabetic(ch : IN character);
- PROCEDURE Alpha_both_cases(ch : IN character);
-
- -- Load a variant of a character in the main sequence, on accent
- -- level, on case level or as exactly the same.
- TYPE Equivalence_kind IS (Exact, Case_diff, Accented);
- PROCEDURE Load_variant(ch : IN character;
- Equ_ch : IN character;
- Equ_kind : IN Equivalence_kind);
- -- The three below always load on accent level.
- PROCEDURE Load_variant(ch : IN character;
- Equ_str : IN string);
- PROCEDURE Variant_both_cases(ch : IN character;
- Equ_ch : IN character);
- PROCEDURE Variant_both_cases(ch : IN character;
- Equ_str : IN string);
-
- -- Exceptions that can be raised by the load operations
- Undefined_equivalent : EXCEPTION;
- Already_defined : EXCEPTION;
-
- -- Change case significance when loading both cases. Default is off.
- PROCEDURE Set_case_significance(Flag : boolean);
-
- -- Transscript type and coding operations
- TYPE Transscripted_string(Max_length : natural) IS PRIVATE;
- PROCEDURE Transscribe(ch : IN character;
- Trans_str : OUT Transscripted_string);
- PROCEDURE Transscribe(Str : IN string;
- Trans_str : OUT Transscripted_string);
- Transscription_error : EXCEPTION;
-
- -- Comparison operators
- FUNCTION "<=" (Left, Right : Transscripted_string) RETURN boolean;
- FUNCTION "<" (Left, Right : Transscripted_string) RETURN boolean;
- FUNCTION ">=" (Left, Right : Transscripted_string) RETURN boolean;
- FUNCTION ">" (Left, Right : Transscripted_string) RETURN boolean;
-
- -- Others
- FUNCTION Is_letter(ch : character) RETURN boolean;
-
- PRIVATE
- TYPE Natural_string IS ARRAY(integer RANGE <>) OF natural;
- TYPE Boolean_string IS ARRAY(integer RANGE <>) OF boolean;
- TYPE Transscripted_string(Max_length : natural) IS
- RECORD
- Length : natural := 0;
- Alphabetic : Natural_string(1..Max_length) := (OTHERS => 0);
- Accents : Natural_string(1..Max_length) := (OTHERS => 0);
- Case_part : Boolean_string(1..Max_length) := (OTHERS => false);
- Non_letter_length : natural := 0;
- Non_letters : Natural_string(1..Max_length) := (OTHERS => 256);
- END RECORD;
- END String_comparison;
- --::::::::::
- --strcompb.a
- --::::::::::
- ----------------------------------------------------------------------
- -- BODY string_comparison --
- ----------------------------------------------------------------------
- -- This file contains the implementation part of the string comparison
- -- package.
-
- PACKAGE BODY string_comparison IS
-
- -- CONTENTS
- -- --------
- -- Type declarations and simple functions
- -- Internal Load operations
- -- Exported load operations
- -- Internal routines for transscribing numbers
- -- Exported transscription operations
- -- Internal comparison procedures
- -- Exportered string comparators
-
- -- The transscription table
- -- The translation of a character is a string. This is for characters
- -- like the AE ligature. Also useful is you want "0" = "zero".
- TYPE Transscript_entry(Length : positive) IS
- RECORD
- Alphabetic : Natural_string(1..Length) := (OTHERS => 0);
- Accent : Natural_string(1..Length) := (OTHERS => 0);
- Case_variant : boolean := false;
- END RECORD;
- TYPE Entry_ptr IS ACCESS Transscript_entry;
- -- Pointer to allow different sizes
-
- -- The index in the table is the ordinal number. Ada's character type is
- -- limited to 127.
- Char_table : ARRAY (0..255) OF Entry_ptr := (OTHERS => NULL);
-
-
- -- Other types
- -- This type is for internal comparison functions
- TYPE Relation_type IS (Less_than, Equal, Greater_than);
-
- -- Range for the number characters
- SUBTYPE Numbers IS integer RANGE character'pos('0')..character'pos('9');
-
- -- Variables
- -- Case significance
- Case_significant : boolean := true;
-
- -- Last used codes
- Last_alpha_code : integer := 0;
- Last_accent_code : integer := 0;
- -- When storing an alphabetic we increment Last_alpha_code, when loading
- -- a accent variant we increment Last_accent_code.
-
- -- Simple functions
- FUNCTION Is_letter(ch : character) RETURN boolean IS
- BEGIN
- RETURN Char_table(character'pos(ch)).Length > 0;
- END;
-
- -- Set case significance for the double-case load operations
- PROCEDURE Set_case_significance(Flag : boolean) IS
- BEGIN
- Case_significant := Flag;
- END;
-
- -- Internal Load operations
- -- These take integer parametes. The exported routines call these.
- -- We're having integer to avoid problems with characters over 127.
-
- PROCEDURE Load_alphabetic(ch : integer) IS
- -- Load ch in the table as a one without any Accent part. If ch is already
- -- defined, raise Already defined.
- BEGIN
- IF Char_table(ch) /= NULL THEN
- RAISE Already_defined;
- END IF;
- Char_table(ch) := NEW Transscript_entry(1);
- Last_alpha_code := Last_alpha_code + 1;
- Char_table(ch).Alphabetic(1) := Last_alpha_code;
- END Load_alphabetic;
-
- PROCEDURE Load_variant(ch : IN integer;
- Equ_ch : IN integer;
- Equ_kind : IN Equivalence_kind) IS
- -- Load ch as an variant of Equ_ch. Equ_ch must be defined or else
- -- we raise Undefined_equivalent.
- BEGIN
- IF Char_table(ch) /= NULL THEN
- RAISE Already_defined;
- END IF;
- IF Char_table(Equ_ch) = NULL THEN
- RAISE Undefined_equivalent;
- END IF;
- Char_table(ch) := NEW Transscript_entry(Char_table(Equ_ch).Length);
- Char_table(ch).Alphabetic := Char_table(Equ_ch).Alphabetic;
- Char_table(ch).Accent := Char_table(Equ_ch).Accent;
- Char_table(ch).Case_variant := Char_table(Equ_ch).Case_variant;
- -- Actually: Char_table(ch).all := Char_table(Equ_ch).all;
- -- Alas, Verdix Ada can't handle this properly
- CASE Equ_kind IS
- WHEN Exact => NULL;
- WHEN Case_diff => Char_table(ch).Case_variant := true;
- WHEN Accented => Last_accent_code := Last_accent_code + 1;
- Char_table(ch).Accent(1) := Last_accent_code;
- END CASE;
- END Load_variant;
-
- PROCEDURE Load_variant(ch : IN integer;
- Equ_str : IN Natural_string) IS
- -- Load ch as an accented letter (digraph) of Equ_str. If not all
- -- characters in Equ_str are deifined, raise Undefined_equivalent.
- BEGIN
- IF Char_table(ch) /= NULL THEN
- RAISE Already_defined;
- END IF;
- FOR i IN Equ_str'range LOOP
- IF Char_table(Equ_str(i)) = NULL THEN
- RAISE Undefined_equivalent;
- END IF;
- END LOOP;
- Char_table(ch) := NEW Transscript_entry(Equ_str'length);
- FOR i IN Equ_str'range LOOP
- Char_table(ch).Alphabetic(i) := Char_table(Equ_str(i)).Alphabetic(1);
- Last_accent_code := Last_accent_code + 1;
- Char_table(ch).Accent(i) := Last_accent_code;
- END LOOP;
- END Load_variant;
-
- -- The exported load operations
- PROCEDURE Load_alphabetic(ch : IN character) IS
- BEGIN
- Load_alphabetic(character'pos(ch));
- END Load_alphabetic;
-
- PROCEDURE Load_variant(ch : IN character;
- Equ_ch : IN character;
- Equ_kind : IN Equivalence_kind) IS
- BEGIN
- Load_variant(character'pos(ch), character'pos(Equ_ch), Equ_kind);
- END Load_variant;
-
- PROCEDURE Load_variant(ch : IN character;
- Equ_str : IN string) IS
- Equ_int : Natural_string(Equ_str'range);
- BEGIN
- FOR i IN Equ_str'range LOOP
- Equ_int(i) := character'pos(Equ_str(i));
- END LOOP;
- Load_variant(character'pos(ch), Equ_int);
- END Load_variant;
-
-
- -- Exported double-case load operations.
- PROCEDURE Alpha_both_cases(ch : IN character) IS
- Int_ch : integer := character'pos(ch);
- BEGIN
- Load_alphabetic(Int_ch);
- IF Case_significant THEN
- Load_variant(Int_ch + 32, Int_ch, Case_diff);
- ELSE
- Load_variant(Int_ch + 32, Int_ch, Exact);
- END IF;
- END Alpha_both_cases;
-
- PROCEDURE Variant_both_cases(ch : IN character;
- Equ_ch : IN character) IS
- Int_ch : integer := character'pos(ch);
- BEGIN
- Load_variant(Int_ch, character'pos(Equ_ch), Accented);
- IF Case_significant THEN
- Load_variant(Int_ch + 32, Int_ch, Case_diff);
- ELSE
- Load_variant(Int_ch + 32, Int_ch, Exact);
- END IF;
- END Variant_both_cases;
-
- PROCEDURE Variant_both_cases(ch : IN character;
- Equ_str : IN string) IS
- Int_ch : integer := character'pos(ch);
- BEGIN
- Load_variant(ch, Equ_str);
- IF Case_significant THEN
- Load_variant(Int_ch + 32, Int_ch, Case_diff);
- ELSE
- Load_variant(Int_ch + 32, Int_ch, Exact);
- END IF;
- END Variant_both_cases;
-
- -- Internal procedure for transscribing numbers
- PROCEDURE Get_number(Str : IN string;
- Str_ix : IN OUT integer;
- Number : OUT integer) IS
- -- Assume Str(Str_ix) is a number. Read as long there are numbers.
- -- Leave Str_ix at the last number character.
- No_in_str : natural := 0;
- ch : integer := character'pos(Str(Str_ix));
- BEGIN
- WHILE ch IN Numbers LOOP
- No_in_str := 10 * No_in_str + ch - Numbers'first;
- IF Str_ix + 1 IN Str'range THEN
- Str_ix := Str_ix + 1;
- ch := character'pos(Str(Str_ix));
- ELSE
- ch := 0;
- END IF;
- END LOOP;
- Number := No_in_str;
- EXCEPTION
- WHEN Numeric_error => RAISE Transscription_error;
- END;
-
- -- Exported transscription operations
- PROCEDURE Transscribe(ch : IN character;
- Trans_str : OUT Transscripted_string) IS
- BEGIN
- Transscribe( (1 => ch), Trans_str);
- END Transscribe;
-
-
- PROCEDURE Transscribe(Str : IN string;
- Trans_str : OUT Transscripted_string) IS
- -- Transscribe Str using the table. If the transscription does
- -- not fit into the out parameter, raise Transscription_error.
- -- Characters in Str that are not defined are regarded as non-letters.
- -- Non-letters are always stored at the their index in Str.
- -- Numbers are stored specially.
- ch : natural; -- Current character;
- Tr_ix : natural := 0; -- Index in Trans_str except the non-letter part.
- Str_ix : integer := Str'first; -- Index in Str and non-letter part.
- No_in_str : natural;
- BEGIN
- WHILE Str_ix IN Str'range LOOP
- ch := character'pos(Str(Str_ix));
- IF Char_table(ch) /= NULL THEN
- IF Tr_ix + Char_table(ch).Length > Trans_str.Max_length THEN
- RAISE Transscription_error;
- END IF;
- IF ch NOT IN Numbers OR Char_table(ch).Accent(1) /= 0 THEN
- FOR i IN 1..Char_table(ch).Length LOOP
- Tr_ix := Tr_ix + 1;
- Trans_str.Alphabetic(Tr_ix) := Char_table(ch).Alphabetic(i);
- Trans_str.Case_part(Tr_ix) := Char_table(ch).Case_variant;
- Trans_str.Accents(Tr_ix) := Char_table(ch).Accent(i);
- END LOOP;
- ELSE
- Get_number(Str, Str_ix, No_in_str);
- Tr_ix := Tr_ix + 1;
- Trans_str.Alphabetic(Tr_ix) := 1000 + No_in_str;
- END IF;
- ELSE
- IF Str_ix > Trans_str.Max_length THEN
- RAISE Transscription_error;
- END IF;
- Trans_str.Non_letters(Str_ix) := ch;
- Trans_str.Non_letter_length := Str_ix;
- END IF;
- Str_ix := Str_ix + 1;
- END LOOP;
- Trans_str.Length := Tr_ix;
- END Transscribe;
-
- -- Internal comparison routines
-
- FUNCTION Relation(Left, Right : Natural_string) RETURN Relation_type IS
- -- This function is more os less obsolete. "<" etc should do the job.
- -- Verdix Ada can't this on integer arrays, unfortunately.
- i : positive := 1;
- Bug : EXCEPTION; -- Should not occur
- BEGIN
- WHILE (i <= Left'last AND i <= Right'last) AND THEN
- Left(i) = Right(i) LOOP
- i := i + 1;
- END LOOP;
- IF i > Left'last AND i > Right'last THEN
- RETURN Equal;
- ELSIF i > Left'last THEN
- RETURN Less_than;
- ELSIF i > Right'last THEN
- RETURN Greater_than;
- ELSIF Left(i) < Right(i) THEN
- RETURN Less_than;
- ELSIF Left(i) > Right(i) THEN
- RETURN Greater_than;
- ELSE
- RAISE Bug; -- This should not occur.
- END IF;
- END Relation;
-
-
- FUNCTION Relation(Left, Right : Transscripted_string) RETURN Relation_type IS
- -- Compare the parts in order. Continue as long as there is unequallity.
- Rel : Relation_type;
- BEGIN
- Rel := Relation(Left.Alphabetic(1..Left.Length),
- Right.Alphabetic(1..Right.Length));
- IF Rel /= Equal THEN
- RETURN Rel;
- END IF;
- Rel := Relation(Left.Accents(1..Left.Length),
- Right.Accents(1..Right.Length));
- IF Rel /= Equal THEN
- RETURN Rel;
- END IF;
- Rel := Relation(Left.Non_letters(1..Left.Non_letter_length),
- Right.Non_letters(1..Right.Non_letter_length));
- IF Rel /= Equal THEN
- RETURN Rel;
- END IF;
- IF Left.Case_part(1..Left.Length) <
- Right.Case_part(1..Right.Length) THEN
- RETURN Less_than;
- ELSIF Left.Case_part(1..Left.Length) >
- Right.Case_part(1..Right.Length) THEN
- RETURN Greater_than;
- ELSE
- RETURN Equal;
- END IF;
- END Relation;
-
- -- Exported comparison operators
- FUNCTION "<=" (Left, Right : Transscripted_string) RETURN boolean IS
- BEGIN
- RETURN Relation(Left, Right) /= Greater_than;
- END;
-
- FUNCTION "<" (Left, Right : Transscripted_string) RETURN boolean IS
- BEGIN
- RETURN Relation(Left, Right) = Less_than;
- END;
-
- FUNCTION ">=" (Left, Right : Transscripted_string) RETURN boolean IS
- BEGIN
- RETURN Relation(Left, Right) /= Less_than;
- END;
-
- FUNCTION ">" (Left, Right : Transscripted_string) RETURN boolean IS
- BEGIN
- RETURN Relation(Left, Right) = Greater_than;
- END;
-
- END string_comparison;
- --::::::::::
- --define.a
- --::::::::::
- ----------------------------------------------------------------------
- -- Define collating sequence --
- ----------------------------------------------------------------------
- -- This package contains a procedure with the same name that demon-
- -- strates the use of the load operations in the String_comparison
- -- package.
-
-
- PACKAGE Define IS
- TYPE Languages IS (Swedish, Danish, German1, German2, French);
- -- German1 sort "A, "O and "U as A, O and U. German2 as AE, OE and UE.
- PROCEDURE Collatting_sequence(Language : IN Languages;
- Case_significant : IN boolean;
- Eightbit : IN boolean);
- END Define;
-
-
- WITH String_comparison; USE String_comparison;
- WITH ISO_Latin_1; USE ISO_Latin_1;
- WITH National_ASCII; USE National_ASCII;
- PACKAGE BODY Define IS
- PROCEDURE Collatting_sequence(Language : IN Languages;
- Case_significant : IN boolean;
- Eightbit : IN boolean) IS
- BEGIN
- -- Set the significane of case
- Set_case_significance(Case_significant);
-
- -- Load space as the first letter and the A to Z
- Load_alphabetic(' ');
-
- -- Load the letters from A to Z to begin with
- FOR ch IN 'A'..'V' LOOP
- Alpha_both_cases(ch);
- END LOOP;
- IF Language = Swedish THEN
- Variant_both_cases('W', 'V');
- ELSE
- Alpha_both_cases('W');
- END IF;
- FOR ch IN 'X'..'Z' LOOP
- Alpha_both_cases(ch);
- END LOOP;
-
- -- And so for the specific letters. Begin with the seven-bits
- IF NOT Eightbit THEN
- CASE Language IS
- WHEN Swedish => Alpha_both_cases(SW_UC_A_ring);
- Alpha_both_cases(SW_UC_A_dots);
- Alpha_both_cases(SW_UC_O_dots);
- Variant_both_cases(SW_UC_E_acute, 'E');
- Variant_both_cases(SW_UC_U_dots, 'Y');
- WHEN Danish => Alpha_both_cases(DA_UC_AE);
- Alpha_both_cases(DA_UC_O_oblique);
- Alpha_both_cases(DA_UC_A_ring);
- WHEN German1 => Variant_both_cases(GER_UC_A_dots, 'A');
- Variant_both_cases(GER_UC_O_dots, 'O');
- Variant_both_cases(GER_UC_U_dots, 'U');
- Load_variant(GER_LC_s_sharp, "ss");
- WHEN German2 => Variant_both_cases(GER_UC_A_dots, "AE");
- Variant_both_cases(GER_UC_O_dots, "OE");
- Variant_both_cases(GER_UC_U_dots, "UE");
- Load_variant(GER_LC_s_sharp, "ss");
- WHEN French => Load_variant(FR_LC_a_grave, 'a', Accented);
- Load_variant(FR_LC_c_cedilla, 'c', Accented);
- Load_variant(FR_LC_e_acute, 'e', Accented);
- Load_variant(FR_LC_u_grave, 'u', Accented);
- Load_variant(FR_LC_e_grave, 'e', Accented);
- END CASE;
- -- Now lets take the eightbit case, ISO-Latin/1.
- ELSE
- -- First we take characters that differs from langauge to language
- -- They are oA, "A, AE, "O, /O, and "U.
- CASE Language IS
- WHEN Swedish => Alpha_both_cases(UC_A_ring);
- Alpha_both_cases(UC_A_dots);
- Variant_both_cases(UC_AE, UC_A_dots);
- Alpha_both_cases(UC_O_dots);
- Variant_both_cases(UC_O_oblique, UC_O_dots);
- Variant_both_cases(UC_U_dots, 'Y');
- WHEN Danish => Alpha_both_cases(UC_AE);
- Variant_both_cases(UC_A_dots, UC_AE);
- Alpha_both_cases(UC_O_oblique);
- Variant_both_cases(UC_O_dots, UC_O_oblique);
- Alpha_both_cases(UC_A_ring);
- Variant_both_cases(UC_U_dots, 'Y');
- WHEN German1 !
- French => Variant_both_cases(UC_A_dots, 'A');
- Variant_both_cases(UC_O_dots, 'O');
- Variant_both_cases(UC_U_dots, 'U');
- Variant_both_cases(UC_A_ring, 'A');
- Variant_both_cases(UC_O_oblique, 'O');
- Variant_both_cases(UC_AE, "AE");
- WHEN German2 => Variant_both_cases(UC_A_dots, "AE");
- Variant_both_cases(UC_O_dots, "OE");
- Variant_both_cases(UC_U_dots, "UE");
- Variant_both_cases(UC_A_ring, 'A');
- Variant_both_cases(UC_O_oblique, 'O');
- Variant_both_cases(UC_AE, "AE");
- END CASE;
-
- -- All other variants
- Variant_both_cases(UC_A_grave, 'A');
- Variant_both_cases(UC_A_acute, 'A');
- Variant_both_cases(UC_A_circum, 'A');
- Variant_both_cases(UC_A_tilde, 'A');
-
- Variant_both_cases(UC_C_cedilla, 'C');
-
- Variant_both_cases(UC_E_grave, 'E');
- Variant_both_cases(UC_E_acute, 'E');
- Variant_both_cases(UC_E_circum, 'E');
- Variant_both_cases(UC_E_dots, 'E');
-
- Variant_both_cases(UC_Edh, 'D');
-
- Variant_both_cases(UC_I_grave, 'I');
- Variant_both_cases(UC_I_acute, 'I');
- Variant_both_cases(UC_I_circum, 'I');
- Variant_both_cases(UC_I_dots, 'I');
-
- Variant_both_cases(UC_N_tilde, 'N');
-
- Variant_both_cases(UC_O_grave, 'O');
- Variant_both_cases(UC_O_acute, 'O');
- Variant_both_cases(UC_O_circum, 'O');
- Variant_both_cases(UC_O_tilde, 'O');
-
- Load_variant(LC_s_sharp, "ss");
-
- Variant_both_cases(UC_U_grave, 'U');
- Variant_both_cases(UC_U_acute, 'U');
- Variant_both_cases(UC_U_circum, 'U');
-
- Variant_both_cases(UC_Y_acute, 'Y');
- Load_variant(LC_y_dots, 'y', Accented);
- END IF;
-
- -- Finally the numbers
- FOR ch IN '0'..'9' LOOP
- Load_alphabetic(ch);
- END LOOP;
- END Collatting_sequence;
- END Define;
- --::::::::::
- --comline.a
- --::::::::::
- ----------------------------------------------------------------------
- -- PROCEDURE Read_command_line --
- ----------------------------------------------------------------------
- -- This procedure reads the command line to get the options and the
- -- input file. You will probably have to replace it, unless you also
- -- use Verdix Ada system for Unix.
- WITH Define; Use Define;
- WITH Command_line; USE Command_line; -- Verdix package
- WITH Text_io;
- WITH IO_exceptions;
- PROCEDURE Read_command_line(Language : OUT Define.Languages;
- Exact : OUT boolean;
- Eightbit : OUT boolean) IS
- BEGIN
- FOR i IN 1..argc - 1 LOOP
- IF argv(i).s(1) = '-' THEN
- CASE argv(i).s(2) IS
- WHEN '8' => Eightbit := true;
- WHEN 'E' ! 'e' => Exact := true;
- WHEN 'L' ! 'l' => CASE argv(i).s(3) IS
- WHEN 's' ! 'S' => Language := Swedish;
- WHEN 'd' ! 'D' => Language := Danish;
- WHEN 'g' => Language := German1;
- WHEN 'G' => Language := German2;
- WHEN 'f' ! 'F' => Language := French;
- WHEN OTHERS => NULL;
- END CASE;
- WHEN OTHERS => Text_io.Put_line("Unknown option: " & argv(i).s);
- END CASE;
- ELSE
- DECLARE
- USE Text_io;
- Infile : File_type;
- BEGIN
- Open(Infile, In_file, argv(i).s);
- Set_input(Infile);
- EXCEPTION
- WHEN IO_exceptions.Name_error =>
- Put_line(argv(i).s & " does not exsist");
- END;
- END IF;
- END LOOP;
- END Read_command_line;
- --::::::::::
- --main.a
- --::::::::::
- ----------------------------------------------------------------------
- -- Sort package and main program --
- ----------------------------------------------------------------------
- -- This file contains a sort package that uses the string-comparison
- -- package when sorting and the main program. The sort package is very
- -- simple, it contains just one routine for inserting into the tree
- -- and for writing the tree to standard output.
- PACKAGE Sort_package IS
- PROCEDURE Insert(Str : IN string);
- PROCEDURE Write_tree;
- END Sort_package;
-
- -- The main program. Reads line from standard input and insert them
- -- into the sort package. When end-of-fils is detected, write the
- -- tree.
- WITH Text_io;
- WITH IO_exceptions;
- WITH Sort_package;
- WITH Define; USE Define;
- WITH Read_command_line;
- PROCEDURE Main IS
- Language : Define.Languages := Swedish;
- Eightbit : boolean := false;
- Exact : boolean := false;
- Line : string(1..80);
- Len : natural;
- BEGIN
- Read_command_line(Language, Exact, Eightbit);
- Define.collatting_sequence(Language, Exact, Eightbit);
- LOOP
- Text_io.Get_line(Line, Len);
- Sort_package.Insert(Line(1..Len));
- END LOOP;
- EXCEPTION
- WHEN IO_exceptions.End_error => Sort_package.Write_tree;
- END Main;
-
- -- Below the body of the sort package
- WITH Text_io;
- WITH String_comparison; USE String_comparison;
- PACKAGE BODY Sort_package IS
- TYPE Tree_entry(Key_size : positive; Str_len : natural);
- TYPE Tree_type IS ACCESS Tree_entry;
- TYPE Tree_entry(Key_size : positive; Str_len : natural) IS
- RECORD
- Left : Tree_type := NULL;
- Right : Tree_type := NULL;
- Key : Transscripted_string(Key_size);
- Str : string(1..Str_len);
- END RECORD;
- Tree : Tree_type := NULL;
-
- -- Internal recursive insertion procedure. Called by the exported
- PROCEDURE Insert(Tree : IN OUT Tree_type;
- Key : IN Transscripted_string;
- Str : IN string) IS
- BEGIN
- IF Tree /= NULL THEN
- IF Key < Tree.Key THEN
- Insert(Tree.left, Key, Str);
- ELSIF Key > Tree.Key THEN
- Insert(Tree.right, Key, Str);
- END IF;
- ELSE
- Tree := NEW Tree_entry(Key.Max_length, Str'length);
- Tree.Key := Key;
- Tree.Str := Str;
- END IF;
- END Insert;
-
- -- Exported Insert
- PROCEDURE Insert(Str : IN string) IS
- Transscript : Transscripted_string(Str'length + 20);
- BEGIN
- Transscribe(Str, Transscript);
- Insert(Tree, Transscript, Str);
- EXCEPTION
- WHEN Transscription_error =>
- Text_io.Put_line(Str);
- Text_io.Put_line("This line has too long transscription. Skipped.");
- END Insert;
-
- -- This procedure travserse the tree and writes all entries on standard output
- PROCEDURE Write_tree(Tree : IN Tree_type) IS
- BEGIN
- IF Tree /= NULL THEN
- Write_tree(Tree.Left);
- Text_io.Put_line(Tree.Str);
- Write_tree(Tree.Right);
- END IF;
- END Write_tree;
-
- -- Exported Write_tree;
- PROCEDURE Write_tree IS
- BEGIN
- Write_tree(Tree);
- END;
-
- END Sort_package;
-