home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!ogicse!das-news.harvard.edu!spdcc!iecc!compilers-sender
- From: pab@cs.arizona.edu (Peter A. Bigot)
- Newsgroups: comp.compilers
- Subject: Re: Theoretical CFG question
- Keywords: parse
- Message-ID: <92-11-085@comp.compilers>
- Date: 16 Nov 92 20:00:27 GMT
- Article-I.D.: comp.92-11-085
- References: <92-11-067@comp.compilers>
- Sender: compilers-sender@iecc.cambridge.ma.us
- Reply-To: pab@cs.arizona.edu (Peter A. Bigot)
- Organization: U of Arizona CS Dept, Tucson
- Lines: 167
- Approved: compilers@iecc.cambridge.ma.us
-
- norlin@essex.ecn.uoknor.edu (Norman Lin) writes:
- >The problem was:
- > *
- >For i>=1, let b(i) denote the string in 1(0+1) that is the
- >binary representation of i.
- >
- >Construct a CFG generating
- > +
- > {0,1,#} - {b(1)#b(2)# ... #b(n) | n>=1}
-
- I'll refrain from specific comments about a professor who isn't interested
- in addressing questions about problems nobody in a class can solve (did
- you try asking hir in office hours?).
-
- This is problem 4.3 in Hopcroft and Ullman. Note that although the
- complement of the language is not a CFL (pump on it), CFL's aren't closed
- under complement; they are closed under union, though, so you can simply
- define CFGs for each way a string could fail to be in the form
- b_1#b_2#...#b_n.
-
- I've attached my own solution to this from a couple years back (this is
- ugly TeX before I switched to cleaner LaTeX; the idea should be present).
- The conclusion of the class (and the professor), however, was that the
- problem was way too hard; and if the even numbered b_i were given in
- reverse, the resulting grammar would be feasible (easier to generate
- incorrect mirror images). As noted, the actual grammar below is not
- correct.
-
- \def\re#1{
- \ifmmode{\rm\bf #1}
- \else{\bf #1}
- \fi}
-
- \proofsect {Method}: The idea is to split $L$ into several languages, each
- of which can be constructed using a context free grammar, then merging the
- grammars to form one for $L$.
-
- {\advance\leftskip by 0.25in
- \item{1} Strings which do not begin with $b_1$ are not in the language.\par
- \item{2} Strings in which one of the purported $b_i$ does not begin with a
- $\re1$ are not in the language.\par
- \item{3} Strings in which some sequence purported to be $b_i\#b_{i+1}$ has an
- error are not in the language. I assume that $b_i$ is well formed, and the
- error is in $b_{i+1}$; without this assumption, correctness is harder to
- ensure (variations in $b_i$ may make the supposedly ill-formed $b_{i+1}$
- correct). This set can be further subdivided:\par
- {\advance\leftskip by 0.25in
- \item{3a} If we assume that $b_i = \re1^n$, then $b_{i+1}$ should be
- $\re1\re0^n$. The error is either in the leading \re1 or in the
- sequence of $n$ \re0s of $b_{i+1}$.\par
- \item{3b}. Otherwise, $b_i$ is of the form $\rho01^n$ where $n \ge 0$.
- $b_{i+1}$ should then be $\rho10^n$. The error can occur in four places:
- extra characters before the prefix, or an incorrect character match in
- the prefix, the \re1 immediately following the prefix, or the trailing
- \re0s.\par
- }}
-
- \hang A grammar which was derived from this partitioning is attached.
- Nonterminals are enclosed in angle brackets, and productions can be combined
- using a vertical bar when the nonterminal being rewritten is the same.
- (Although this grammar does not seem to generate strings which are not in $L$,
- there are strings of $L$ which it will not generate. Right idea; wrong
- details---this really is a horrible problem.)
-
- // Solution to problem 4.3, Hopcroft & Ullman
- //
- 01# // symbols in CFG
- <S> // External nonterminals
- <S> // Start symbol
-
- <S> -> <L1> // $w where w != (1 or 1#)
- | <L2> // #w where w [1] != 1
- | <H><L3><T> // there is a bi#bj where i+1 <> j in the mix
-
- // Odds and ends--sequences of ones, digits, numbers, anything
- <O> -> 1<O>
- | //epsilon
- <D> -> 1
- | 0
- <N> -> <D><N>
- | //epsilon
- <Q> -> 0
- | 1
- | #
- <W> -> <Q><W>
- | //epsilon
-
- // Sequence starts with something other than b1
- <L1> -> 0<W>
- | #<W>
- | 1<D><W>
-
- // There is an ill-formed integer somewhere in the mix
- <L2> -> <W>#0<W>
- | <W>##<W>
-
- // Head and tail of string: ensure that <H>w<T> makes w a bi#bj sequence.
- <H> -> <W>#
- | //epsilon
- <T> -> #<W>
- | //epsilon
-
- <L3> -> <La> // Inverse of 1^n#10^n, intersect 1*#<W>
- | <Lb> // Inverse of p01^n#p10^n
-
- <La> -> 1<La>0 // 1s and 0s still balance
- | 1<La_F>1 // Bad 1 on right
- | #0<N> // Error--0 starts right
- | #1<D><N> // Too much on right
- | <O># // Too many 1s on left
- <La_F> -> 1<La_F><Q> // Maintain balance of 1s/0s
- | #<N> // Tack on whatever you want on right
-
- // Inverse of p01^n#p10^n
- // Assume error in prefix: 0 on left, 1 on right
- <P-L-A0> -> <D><P-L-A0><D> | 0 // Error is center 0; generate prefix around it
- <P-L-B0> -> <D><P-L-A0>0 // Generate 0 following prefix
- <P-L-C0> -> <D><P-L-C0>1 | <P-L-B0> // Generate trailing 1s in left side
- <P-L-D0> -> <D><P-L-C0>#<N> // Generate separating hash
- <P-L-E0> -> <D><P-L-E0><D> | <P-L-D0> // Generate upper end of right prefix
- <P-L-A1> -> <D><P-L-A1><D> | 1 // Error is center 1; generate prefix around it
- <P-L-B1> -> <D><P-L-A1>0 // Generate 0 following prefix
- <P-L-C1> -> <D><P-L-C1>1 | <P-L-B1> // Generate trailing 1s in left side
- <P-L-D1> -> <D><P-L-C1>#<N> // Generate separating hash
- <P-L-E1> -> <D><P-L-E1><D> | <P-L-D1> // Generate upper end of right prefix
- <P-R-A1> -> <D><P-R-A1><D> | 1 // Error is center 1; generate prefix around it
- <P-R-B1> -> <N>#<P-R-A1><D> // Separating hash
- <P-R-C1> -> 1<P-R-C1><D> | <P-R-B1> // Tail 1s of left side
- <P-R-D1> -> 0<P-R-C1><D> // After prefix 0 on left side
- <P-R-E1> -> <D><P-R-E1><D> | <P-R-D1> // Right edge of left prefix
- <P-R-A0> -> <D><P-R-A0><D> | 0 // Error is center 0; generate prefix around it
- <P-R-B0> -> <N>#<P-R-A0><D> // Separating hash
- <P-R-C0> -> 1<P-R-C0><D> | <P-R-B0> // Tail 1s of left side
- <P-R-D0> -> 0<P-R-C0><D> // After prefix 0 on left side
- <P-R-E0> -> <D><P-R-E0><D> | <P-R-D0> // Right edge of left prefix
-
- // Assume error at first zero after prefix: pairs with 0 on right
- <C-L-A0> -> <D><C-L-A0>1 | 0 // 0 after prefix
- <C-L-B0> -> <D><C-L-A0># // Trailing 1s plus hash
- <C-L-C0> -> <D><C-L-C0><D> | <C-L-B0> // Part of right prefix
- <C-R-Z0> -> <D><C-R-Z0>0 | 0 // Make "1" after prefix a 0
- <C-R-Y0> -> #<C-R-Z0>0 // Separating hash
- <C-R-X0> -> 1<C-R-X0>0 | <C-R-Y0> // Last 1s of left side
-
- // Assume error in trailing 1s on left side: pairs with 1 on right
-
- <T-A> -> 1<T-A><D> // Last 1s of left tail, last Xs of right tail
- | #<N>1 // Sep, prefix, 0, 1s, and error
-
- <Lb> -> <P-L-A0><P-R-E1> // Break before end of left prefix
- | <P-L-C0><P-R-C1> // Break somewhere in trailing 1s on left
- | <P-L-E0><P-R-A1> // Break somewhere in left of right prefix
- | <P-L-A1><P-R-E0> // Break before end of left prefix
- | <P-L-C1><P-R-C0> // Break somewhere in trailing 1s on left
- | <P-L-E1><P-R-A0> // Break somewhere in left of right prefix
- // Error in first zero after left side prefix
- | <C-L-A0><C-R-X0> // Break on left side
- | <C-L-C0><C-R-Z0> // Break on right side
- // Error in trailing 1s of left side after prefix
- | <N>1<T-A>
-
- --
- Peter A. Bigot -- pab@cs.arizona.edu
- Dept. of Computer Science, University of Arizona, Tucson AZ
- --
- Send compilers articles to compilers@iecc.cambridge.ma.us or
- {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
-