NetNews Usenet Archive 1993 #3

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #3 / NN_1993_3.iso / spool / comp / lang / prolog / 2418 < prev next >

Wrap

Internet Message Format | 1993-01-21 | 3.6 KB

Path: sparky!uunet!spool.mu.edu!enterpoop.mit.edu!eru.mt.luth.se!lunic!sunic!news.funet.fi!funic!nokia.fi!ntc02.tele.nokia.fi!larsson From: larsson@ntc02.tele.nokia.fi Newsgroups: comp.lang.prolog Subject: Reading text files (continued) Message-ID: <1993Jan21.211044.1@ntc02.tele.nokia.fi> Date: 21 Jan 93 19:10:44 GMT Sender: usenet@noknic.nokia.fi (USENET at noknic) Organization: Nokia Telecommunications. Lines: 76 Nntp-Posting-Host: ntc02.tele.nokia.fi This is a follow up to the previous question on reading text files in Prolog: Dear Jamie, Thanks for your reply, where you said: >... can >you read the text file sentence by sentence, and store each >sentence in a data structure before parsing? Yes, what I thougth of first was to read a sentence and assert it (or perhaps assertz it) as a list: something like sent(n,[this,is,a,sentence],additional_predicates_discribing_sent). where n might be the number of the sentence in the actual text, the list would be the actual sentence as found in the text, and additional predicates would describe that very sentence on the sentential and, preferably also, on the supersentential level. Here I have two silly problems: 1) How would I be able to keep track of my position in the file all the time? After I've read the first sentence, I should go on 'seeing' the same file, but reading the following sentence, etc, etc... (This, I thought was a typical uninformed problem; that's why I suspected the whole thing would be a FAQ.) 2) No big texts can be stored as lists, because the storage space alloted to atoms is restricted in most implementations of Prolog. This leaves me with the solution to store in some 'string' format (which also seems to be non-standard among diifferent implementations), and the converting those strings to lists, when the parser needs them. What string format would be a) portable and b) wellbehaved in the sense of NOT causing a stack overflow error message at some stage in the process. >Unless your parser >has to backtrack to a previous sentence, a possible solution >might be to have a top-level loop which reads in a sentence and >then does a standard parse of the sentence. This is probably a good way to getting started. Later on, it should be possible to backtrack over several sentences and, eventually, over any strech of text of arbitrary length, which can be stored in the database. My work is strongly oriented towards translation (and writing), so I must learn to deal with entire texts. > Are you using DCGs to do the parsing? That might create >problems, if the DCG package expects to read a file itself. I would like to stick to DCGs as much as possible, because it's a clean formalism, which other linguists also seem to understand with VERY LITTLE additional study. However, DCGs are maybe NOT a more final solution; they certainly do confine the programmer to a specific way of parsing (left corner, top down). But I think this is an issue, which only can be addressed after I've gotten started and gained some experience. By the way, I'm using ARity Prolog on a PC, so the size of the database per se would not be a problem. The vendor promises 2 Gbyte. The use of dedicated Arity methods would render the programs non-portable to other platforms; there is a need for portability at least to various shades of UNIX. --Arne. *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* Arne Larsson Nokia Telecommunications Translator Transmission Systems, Customer Services larsson@ntc02.tele.nokia.fi P.O. Box 12, SF-02611 Espoo, Finland Phone +358 0 5117476, Fax +358 0 51044287 *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*