home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!spool.mu.edu!enterpoop.mit.edu!eru.mt.luth.se!lunic!sunic!news.funet.fi!funic!nokia.fi!ntc02.tele.nokia.fi!larsson
- From: larsson@ntc02.tele.nokia.fi
- Newsgroups: comp.lang.prolog
- Subject: Reading text files (continued)
- Message-ID: <1993Jan21.211044.1@ntc02.tele.nokia.fi>
- Date: 21 Jan 93 19:10:44 GMT
- Sender: usenet@noknic.nokia.fi (USENET at noknic)
- Organization: Nokia Telecommunications.
- Lines: 76
- Nntp-Posting-Host: ntc02.tele.nokia.fi
-
- This is a follow up to the previous question on
- reading text files in Prolog:
-
- Dear Jamie,
-
- Thanks for your reply, where you said:
-
- >... can
- >you read the text file sentence by sentence, and store each
- >sentence in a data structure before parsing?
-
- Yes, what I thougth of first was to read a sentence and assert
- it (or perhaps assertz it) as a list: something like
-
- sent(n,[this,is,a,sentence],additional_predicates_discribing_sent).
-
- where n might be the number of the sentence in the actual text,
- the list would be the actual sentence as found in the text, and
- additional predicates would describe that very sentence on the
- sentential and, preferably also, on the supersentential level.
-
- Here I have two silly problems:
-
- 1) How would I be able to keep track
- of my position in the file all the time? After I've read the
- first sentence, I should go on 'seeing' the same file, but reading
- the following sentence, etc, etc... (This, I thought was a typical
- uninformed problem; that's why I suspected the whole thing would
- be a FAQ.)
-
- 2) No big texts can be stored as lists, because the storage space
- alloted to atoms is restricted in most implementations of Prolog.
- This leaves me with the solution to store in some 'string' format
- (which also seems to be non-standard among diifferent implementations),
- and the converting those strings to lists, when the parser needs them.
- What string format would be a) portable and b) wellbehaved in the
- sense of NOT causing a stack overflow error message at some stage
- in the process.
-
- >Unless your parser
- >has to backtrack to a previous sentence, a possible solution
- >might be to have a top-level loop which reads in a sentence and
- >then does a standard parse of the sentence.
-
- This is probably a good way to getting started. Later on, it
- should be possible to backtrack over several sentences and,
- eventually, over any strech of text of arbitrary length, which
- can be stored in the database. My work is strongly oriented
- towards translation (and writing), so I must learn to deal with
- entire texts.
-
- > Are you using DCGs to do the parsing? That might create
- >problems, if the DCG package expects to read a file itself.
-
- I would like to stick to DCGs as much as possible, because it's
- a clean formalism, which other linguists also seem to understand
- with VERY LITTLE additional study. However, DCGs are maybe NOT
- a more final solution; they certainly do confine the programmer to
- a specific way of parsing (left corner, top down). But I think
- this is an issue, which only can be addressed after I've gotten
- started and gained some experience.
-
- By the way, I'm using ARity Prolog on a PC, so the size of the database
- per se would not be a problem. The vendor promises 2 Gbyte. The
- use of dedicated Arity methods would render the programs non-portable
- to other platforms; there is a need for portability at least to
- various shades of UNIX.
-
- --Arne.
-
- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
- Arne Larsson Nokia Telecommunications
- Translator Transmission Systems, Customer Services
- larsson@ntc02.tele.nokia.fi P.O. Box 12, SF-02611 Espoo, Finland
- Phone +358 0 5117476, Fax +358 0 51044287
- *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
-