CHAPTER 19: LALR(1) Java Grammar |
![]() Previous |
![]() Java Language |
![]() Index |
![]() Next |
This chapter presents a grammar for Java. The grammar has been mechanically checked to insure that it is LALR(1).
The grammar for Java presented piecemeal in the preceding chapters is much better for exposition, but it cannot be parsed left-to-right with one token of lookahead because of certain syntactic peculiarities, some of them inherited from C and C++. These problems and the solutions adopted for the LALR(1) grammar are presented below, followed by the grammar itself.
There are five problems with the grammar presented in preceding chapters.
Consider the two groups of productions:
PackageName: Identifier PackageName . Identifier TypeName: Identifier PackageName . Identifier
and:
MethodName: Identifier AmbiguousName . Identifier AmbiguousName: Identifier AmbiguousName . Identifier
Now consider the partial input:
class Problem1 { int m() { hayden.
When the parser is considering the token hayden , with one-token lookahead to symbol ". ", it cannot yet tell whether hayden should be a PackageName that qualifies a type name, as in:
hayden.Dinosaur rex = new Hayden.Dinosaur(2);
or an AmbiguousName that qualifies a method name, as in:
hayden.print("Dinosaur Rex!");
Therefore, the productions shown above result in a grammar that is not LALR(1). There are also other problems with drawing distinctions among different kinds of names in the grammar.
The solution is to eliminate the nonterminals PackageName, TypeName, ExpressionName, MethodName, and AmbiguousName, replacing them all with a single nonterminal Name:
Name: SimpleName QualifiedName SimpleName: Identifier QualifiedName: Name . Identifier
A later stage of compiler analysis then sorts out the precise role of each name or name qualifier.
For related reasons, these productions in S4.3:
ClassOrInterfaceType: ClassType InterfaceType ClassType: TypeName InterfaceType: TypeName
were changed to:
ClassOrInterfaceType: Name ClassType: ClassOrInterfaceType InterfaceType: ClassOrInterfaceType
Consider the two groups of productions:
FieldDeclaration: FieldModifiersopt Type VariableDeclarators ; FieldModifiers: FieldModifier FieldModifiers FieldModifier FieldModifier: one of public protected private final static transient volatile
and:
MethodHeader: MethodModifiersopt ResultType MethodDeclarator Throwsopt MethodModifiers: MethodModifier MethodModifiers MethodModifier MethodModifier: one of public protected private static abstract final native synchronized
Now consider the partial input:
class Problem2 { public static int
When the parser is considering the token static , with one-token lookahead to symbol int -or, worse yet, considering the token public with lookahead to static -it cannot yet tell whether this will be a field declaration such as:
public static int maddie = 0;
or a method declaration such as:
public static int maddie(String art) { return art.length(); }
Therefore, the parser cannot tell with only one-token lookahead whether static (or, similarly, public ) should be reduced to FieldModifier or MethodModifier. Therefore, the productions shown above result in a grammar that is not LALR(1). There are also other problems with drawing distinctions among different kinds of modifiers in the grammar.
While not all contexts provoke the problem, the simplest solution is to combine all contexts in which such modifiers are used, eliminating all six of the nonterminals ClassModifiers (S8.1.2), FieldModifiers (S8.3.1), MethodModifiers (S8.4.3), ConstructorModifiers (S8.6.3), InterfaceModifiers (S9.1.2), and ConstantModifiers (S9.3) from the grammar, replacing them all with a single nonterminal Modifiers:
Modifiers: Modifier Modifiers Modifier Modifier: one of public protected private static abstract final native synchronized transient volatile
A later stage of compiler analysis then sorts out the precise role of each modifier and whether it is permitted in a given context.
Consider the two productions (shown after problem #2 has been corrected):
FieldDeclaration: Modifiersopt Type VariableDeclarators ;
and:
MethodHeader: Modifiersopt ResultType MethodDeclarator Throwsopt
where ResultType is defined as:
ResultType: Type void
Now consider the partial input:
class Problem3 { int julie
Note that, in this simple example, no Modifiers are present. When the parser is considering the token int , with one-token lookahead to symbol julie , it cannot yet tell whether this will be a field declaration such as:
int julie = 14;
or a method declaration such as:
int julie(String art) { return art.length(); }
Therefore, after the parser reduces int to the nonterminal Type, it cannot tell with only one-token lookahead whether Type should be further reduced to ResultType (for a method declaration) or left alone (for a field declaration). Therefore, the productions shown above result in a grammar that is not LALR(1).
The solution is to eliminate the ResultType production and to have separate alternatives for MethodHeader:
MethodHeader: Modifiersopt Type MethodDeclarator Throwsopt Modifiersopt void MethodDeclarator Throwsopt
This allows the parser to reduce int
to Type and then leave it as is, delaying the decision as to whether a field declaration or method declaration is in progress.
Consider the productions (shown after problem #1 has been corrected):
ArrayType: Type [ ]
and:
ArrayAccess: Name [ Expression ] PrimaryNoNewArray [ Expression ]
Now consider the partial input:
class Problem4 { Problem4() { peter[
When the parser is considering the token peter , with one-token lookahead to symbol [ , it cannot yet tell whether peter will be part of a type name, as in:
peter[] team;
or part of an array access, as in:
peter[3] = 12;
Therefore, after the parser reduces peter to the nonterminal Name, it cannot tell with only one-token lookahead whether Name should be reduced ultimately to Type (for an array type) or left alone (for an array access). Therefore, the productions shown above result in a grammar that is not LALR(1).
The solution is to have separate alternatives for ArrayType:
ArrayType: PrimitiveType [ ] Name [ ] ArrayType [ ]
This allows the parser to reduce peter
to Name and then leave it as is, delaying the decision as to whether an array type or array access is in progress.
Consider the production:
CastExpression: ( PrimitiveType ) UnaryExpression ( ReferenceType ) UnaryExpressionNotPlusMinus
Now consider the partial input:
class Problem5 { Problem5() { super((matthew)
When the parser is considering the token matthew , with one-token lookahead to symbol ) , it cannot yet tell whether (matthew) will be a parenthesized expression, as in:
super((matthew), 9);
or a cast, as in:
super((matthew)baz, 9);
Therefore, after the parser reduces matthew to the nonterminal Name, it cannot tell with only one-token lookahead whether Name should be further reduced to PostfixExpression and ultimately to Expression (for a parenthesized expression) or to ClassOrInterfaceType and then to ReferenceType (for a cast). Therefore, the productions shown above result in a grammar that is not LALR(1).
The solution is to eliminate the use of the nonterminal ReferenceType in the definition of CastExpression, which requires some reworking of both alternatives to avoid other ambiguities:
CastExpression: ( PrimitiveType Dimsopt ) UnaryExpression ( Expression ) UnaryExpressionNotPlusMinus ( Name Dims ) UnaryExpressionNotPlusMinus
This allows the parser to reduce matthew to Expression and then leave it there, delaying the decision as to whether a parenthesized expression or a cast is in progress. Inappropriate variants such as:
(int[])+3
and:
(matthew+1)baz
must then be weeded out and rejected by a later stage of compiler analysis.
The remaining sections of this chapter constitute a LALR(1) grammar for Java syntax, in which the five problems described above have been solved.
Goal: CompilationUnit
Literal: IntegerLiteral FloatingPointLiteral BooleanLiteral CharacterLiteral StringLiteral NullLiteral
Type: PrimitiveType ReferenceType PrimitiveType: NumericType boolean NumericType: IntegralType FloatingPointType IntegralType: one of byte short int long char FloatingPointType: one of float double ReferenceType: ClassOrInterfaceType ArrayType ClassOrInterfaceType: Name ClassType: ClassOrInterfaceType InterfaceType: ClassOrInterfaceType ArrayType: PrimitiveType [ ] Name [ ] ArrayType [ ]
Name: SimpleName QualifiedName SimpleName: Identifier QualifiedName: Name . Identifier
CompilationUnit: PackageDeclarationopt ImportDeclarationsopt TypeDeclarationsopt ImportDeclarations: ImportDeclaration ImportDeclarations ImportDeclaration TypeDeclarations: TypeDeclaration TypeDeclarations TypeDeclaration PackageDeclaration: package Name ; ImportDeclaration: SingleTypeImportDeclaration TypeImportOnDemandDeclaration SingleTypeImportDeclaration: import Name ; TypeImportOnDemandDeclaration: import Name . * ; TypeDeclaration: ClassDeclaration InterfaceDeclaration ;
Modifiers: Modifier Modifiers Modifier Modifier: one of public protected private static abstract final native synchronized transient volatile
ClassDeclaration: Modifiersopt class Identifier Superopt Interfacesopt ClassBody Super: extends ClassType Interfaces: implements InterfaceTypeList InterfaceTypeList: InterfaceType InterfaceTypeList , InterfaceType ClassBody: { ClassBodyDeclarationsopt } ClassBodyDeclarations: ClassBodyDeclaration ClassBodyDeclarations ClassBodyDeclaration ClassBodyDeclaration: ClassMemberDeclaration StaticInitializer ConstructorDeclaration ClassMemberDeclaration: FieldDeclaration MethodDeclaration
FieldDeclaration: Modifiersopt Type VariableDeclarators ; VariableDeclarators: VariableDeclarator VariableDeclarators , VariableDeclarator VariableDeclarator: VariableDeclaratorId VariableDeclaratorId = VariableInitializer VariableDeclaratorId: Identifier VariableDeclaratorId [ ] VariableInitializer: Expression ArrayInitializer
MethodDeclaration: MethodHeader MethodBody MethodHeader: Modifiersopt Type MethodDeclarator Throwsopt Modifiersopt void MethodDeclarator Throwsopt MethodDeclarator: Identifier ( FormalParameterListopt ) MethodDeclarator [ ] FormalParameterList: FormalParameter FormalParameterList , FormalParameter FormalParameter: Type VariableDeclaratorId Throws: throws ClassTypeList ClassTypeList: ClassType ClassTypeList , ClassType MethodBody: Block ;
StaticInitializer: static Block
ConstructorDeclaration: Modifiersopt ConstructorDeclarator Throwsopt ConstructorBody ConstructorDeclarator: SimpleName ( FormalParameterListopt ) ConstructorBody: { ExplicitConstructorInvocationopt BlockStatementsopt } ExplicitConstructorInvocation: this ( ArgumentListopt ) ; super ( ArgumentListopt ) ;
InterfaceDeclaration: Modifiersopt interface Identifier ExtendsInterfacesopt InterfaceBody ExtendsInterfaces: extends InterfaceType ExtendsInterfaces , InterfaceType InterfaceBody: { InterfaceMemberDeclarationsopt } InterfaceMemberDeclarations: InterfaceMemberDeclaration InterfaceMemberDeclarations InterfaceMemberDeclaration InterfaceMemberDeclaration: ConstantDeclaration AbstractMethodDeclaration ConstantDeclaration: FieldDeclaration AbstractMethodDeclaration: MethodHeader ;
ArrayInitializer: { VariableInitializersopt ,opt } VariableInitializers: VariableInitializer VariableInitializers , VariableInitializer
Block: { BlockStatementsopt } BlockStatements: BlockStatement BlockStatements BlockStatement BlockStatement: LocalVariableDeclarationStatement Statement LocalVariableDeclarationStatement: LocalVariableDeclaration ; LocalVariableDeclaration: Type VariableDeclarators Statement: StatementWithoutTrailingSubstatement LabeledStatement IfThenStatement IfThenElseStatement WhileStatement ForStatement StatementNoShortIf: StatementWithoutTrailingSubstatement LabeledStatementNoShortIf IfThenElseStatementNoShortIf WhileStatementNoShortIf ForStatementNoShortIf StatementWithoutTrailingSubstatement: Block EmptyStatement ExpressionStatement SwitchStatement DoStatement BreakStatement ContinueStatement ReturnStatement SynchronizedStatement ThrowStatement TryStatement EmptyStatement: ; LabeledStatement: Identifier : Statement LabeledStatementNoShortIf: Identifier : StatementNoShortIf ExpressionStatement: StatementExpression ; StatementExpression: Assignment PreIncrementExpression PreDecrementExpression PostIncrementExpression PostDecrementExpression MethodInvocation ClassInstanceCreationExpression
IfThenStatement: if ( Expression ) Statement IfThenElseStatement: if ( Expression ) StatementNoShortIf else Statement IfThenElseStatementNoShortIf: if ( Expression ) StatementNoShortIf else StatementNoShortIf SwitchStatement: switch ( Expression ) SwitchBlock SwitchBlock: { SwitchBlockStatementGroupsopt SwitchLabelsopt } SwitchBlockStatementGroups: SwitchBlockStatementGroup SwitchBlockStatementGroups SwitchBlockStatementGroup SwitchBlockStatementGroup: SwitchLabels BlockStatements SwitchLabels: SwitchLabel SwitchLabels SwitchLabel SwitchLabel: case ConstantExpression : default : WhileStatement: while ( Expression ) Statement WhileStatementNoShortIf: while ( Expression ) StatementNoShortIf DoStatement: do Statement while ( Expression ) ;
ForStatement: for ( ForInitopt ; Expressionopt ; ForUpdateopt ) Statement ForStatementNoShortIf: for ( ForInitopt ; Expressionopt ; ForUpdateopt ) StatementNoShortIf ForInit: StatementExpressionList LocalVariableDeclaration ForUpdate: StatementExpressionList StatementExpressionList: StatementExpression StatementExpressionList , StatementExpression BreakStatement: break Identifieropt ; ContinueStatement: continue Identifieropt ; ReturnStatement: return Expressionopt ; ThrowStatement: throw Expression ; SynchronizedStatement: synchronized ( Expression ) Block TryStatement: try Block Catches try Block Catchesopt Finally Catches: CatchClause Catches CatchClause CatchClause: catch ( FormalParameter ) Block Finally: finally Block
Primary: PrimaryNoNewArray ArrayCreationExpression PrimaryNoNewArray: Literal this ( Expression ) ClassInstanceCreationExpression FieldAccess MethodInvocation ArrayAccess ClassInstanceCreationExpression: new ClassType ( ArgumentListopt ) ArgumentList: Expression ArgumentList , Expression ArrayCreationExpression: new PrimitiveType DimExprs Dimsopt new ClassOrInterfaceType DimExprs Dimsopt DimExprs: DimExpr DimExprs DimExpr DimExpr: [ Expression ] Dims: [ ] Dims [ ] FieldAccess: Primary . Identifier super . Identifier MethodInvocation: Name ( ArgumentListopt ) Primary . Identifier ( ArgumentListopt ) super . Identifier ( ArgumentListopt ) ArrayAccess: Name [ Expression ] PrimaryNoNewArray [ Expression ] PostfixExpression: Primary Name PostIncrementExpression PostDecrementExpression PostIncrementExpression: PostfixExpression ++ PostDecrementExpression: PostfixExpression -- UnaryExpression: PreIncrementExpression PreDecrementExpression + UnaryExpression - UnaryExpression UnaryExpressionNotPlusMinus PreIncrementExpression: ++ UnaryExpression PreDecrementExpression: -- UnaryExpression UnaryExpressionNotPlusMinus: PostfixExpression ~ UnaryExpression ! UnaryExpression CastExpression CastExpression: ( PrimitiveType Dimsopt ) UnaryExpression ( Expression ) UnaryExpressionNotPlusMinus ( Name Dims ) UnaryExpressionNotPlusMinus MultiplicativeExpression: UnaryExpression MultiplicativeExpression * UnaryExpression MultiplicativeExpression / UnaryExpression MultiplicativeExpression % UnaryExpression AdditiveExpression: MultiplicativeExpression AdditiveExpression + MultiplicativeExpression AdditiveExpression - MultiplicativeExpression ShiftExpression: AdditiveExpression ShiftExpression << AdditiveExpression ShiftExpression >> AdditiveExpression ShiftExpression >>> AdditiveExpression RelationalExpression: ShiftExpression RelationalExpression < ShiftExpression RelationalExpression > ShiftExpression RelationalExpression <= ShiftExpression RelationalExpression >= ShiftExpression RelationalExpression instanceof ReferenceType EqualityExpression: RelationalExpression EqualityExpression == RelationalExpression EqualityExpression != RelationalExpression AndExpression: EqualityExpression AndExpression & EqualityExpression ExclusiveOrExpression: AndExpression ExclusiveOrExpression ^ AndExpression InclusiveOrExpression: ExclusiveOrExpression InclusiveOrExpression | ExclusiveOrExpression ConditionalAndExpression: InclusiveOrExpression ConditionalAndExpression && InclusiveOrExpression ConditionalOrExpression: ConditionalAndExpression ConditionalOrExpression || ConditionalAndExpression ConditionalExpression: ConditionalOrExpression ConditionalOrExpression ? Expression : ConditionalExpression AssignmentExpression: ConditionalExpression Assignment Assignment: LeftHandSide AssignmentOperator AssignmentExpression LeftHandSide: Name FieldAccess ArrayAccess AssignmentOperator: one of = *= /= %= += -= <<= >>= >>>= &= ^= |= Expression: AssignmentExpression ConstantExpression: Expression