Opened 3 years ago

Closed 16 months ago

#12514 closed bug (wontfix)

Can't write unboxed sum type constructors in prefix form

Reported by: RyanGlScott Owned by:
Priority: normal Milestone:
Component: Compiler (Parser) Version: 8.1
Keywords: UnboxedSums Cc: osa1
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

You can write (# Int | Char #), but not (# | #) Int Char. This is annoying since it prevents you from partially applying unboxed sum type constructors, and it also precludes you from doing cool things like reify ''(#||#) (as I woefully noted here).

Luckily, I don't think fixing this would be too hard. The special case of parsing unboxed tuple type constructors as prefix is handled here, so I think we'd just need to add a similar case for unboxed sums.

Change History (9)

comment:1 Changed 3 years ago by RyanGlScott

Well, it's not quite that simple, unfortunately. Vertical bars are a bit more finicky to parse than commas, so simply adding a new case to ntgtycon in Parser.y like so:

  • compiler/parser/Parser.y

    diff --git a/compiler/parser/Parser.y b/compiler/parser/Parser.y
    index b9479d9..fa0d0af 100644
    a b import TcEvidence ( emptyTcEvBinds ) 
    7575import ForeignCall
    7676import TysPrim          ( eqPrimTyCon )
    7777import PrelNames        ( eqTyCon_RDR )
    78 import TysWiredIn       ( unitTyCon, unitDataCon, tupleTyCon, tupleDataCon, nilDataCon,
     78import TysWiredIn       ( unitTyCon, unitDataCon, tupleTyCon, sumTyCon,
     79                          tupleDataCon, nilDataCon,
    7980                          unboxedUnitTyCon, unboxedUnitDataCon,
    8081                          listTyCon_RDR, parrTyCon_RDR, consDataCon_RDR )
    8182
    ntgtycon :: { Located RdrName } -- A "general" qualified tycon, exc 
    28612862        | '(#' commas '#)'      {% ams (sLL $1 $> $ getRdrName (tupleTyCon Unboxed
    28622863                                                        (snd $2 + 1)))
    28632864                                       (mo $1:mc $3:(mcommas (fst $2))) }
     2865        | '(#' bars '#)'        {% ams (sLL $1 $> $ getRdrName (sumTyCon
     2866                                                        (snd $2 + 1)))
     2867                                       (mo $1:mc $3:(mbars (fst $2))) }
    28642868        | '(' '->' ')'          {% ams (sLL $1 $> $ getRdrName funTyCon)
    28652869                                       [mop $1,mu AnnRarrow $2,mcp $3] }
    28662870        | '[' ']'               {% ams (sLL $1 $> $ listTyCon_RDR) [mos $1,mcs $2] }
    mcs ll = mj AnnCloseS ll 
    34683472mcommas :: [SrcSpan] -> [AddAnn]
    34693473mcommas ss = map (\s -> mj AnnCommaTuple (L s ())) ss
    34703474
     3475-- | Given a list of the locations of vertical bars, provide a [AddAnn] with an
     3476-- AnnVbar entry for each SrcSpan
     3477mbars :: [SrcSpan] -> [AddAnn]
     3478mbars ss = map (\s -> mj AnnVbar (L s ())) ss
     3479
    34713480-- |Get the location of the last element of a OrdList, or noSrcSpan
    34723481oll :: OrdList (Located a) -> SrcSpan
    34733482oll l =

doesn't quite make the cut:

Things that will parse successfully:

  • (#| #)
  • (# | #)
  • (#| | #)
  • (# | | #)

That is, all sum type constructors such that (1) there's a space between the last bar and the #), and (2) all bars are separated with at least one character of whitespace.

Things that fail to parse:

  • (#|#)
  • (# |#)
  • (#||#)
  • (#| |#)
  • (# | |#)
  • (# ||#)
  • (# || #) (interestingly, GHC will parse this as the type operator || surrounded by hash-parens)

Perhaps we require that bars must be separated by spaces as a prefix type constructor? Or perhaps we can finagle with the parser more to fix this above issues?

comment:2 Changed 3 years ago by simonpj

There's a debate to be had about what concrete syntax for sums both unboxed and (not yet implemented) boxed.

But if we stick to the current unary notation, I rather think we should not allow spaces anywhere. Ditto for tuples. Maybe we should do it in the lexer, not the parser?

Also for data constructors what is the prefix form. E.g. Instead of (#| True ||#), do we write

  • (#| ||#) True, or
  • (#_||#) True?

I prefer the latter. We should not have spaces in the middle of names? Simon

comment:3 in reply to:  2 Changed 3 years ago by RyanGlScott

Replying to simonpj:

But if we stick to the current unary notation, I rather think we should not allow spaces anywhere. Ditto for tuples. Maybe we should do it in the lexer, not the parser?

Just to be clear on the is/ought distinction being discussed, GHC currently accepts spaces in prefix tuple types/expressions/patterns:

$ /opt/ghc/head/bin/ghci -fobject-code -XUnboxedTuples
GHCi, version 8.1.20160819: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /home/rgscott/.ghci
λ> :t ( , , )
( , , ) :: a -> b -> c -> (a, b, c)
λ> :t (# , , #)
(# , , #) :: a -> b -> c -> (# a, b, c #)
λ> :k ( , , )
( , , ) :: * -> * -> * -> *
λ> :k (# , , #)
(# , , #) :: * -> * -> * -> TYPE 'GHC.Types.UnboxedTupleRep
λ> :t \a -> case a of ( , , ) x y z -> (# , , #) x y z
\a -> case a of ( , , ) x y z -> (# , , #) x y z
  :: (a, b, c) -> (# a, b, c #)

Also, there is a current restriction for unboxed sums (as noted in osa1's blog) that fully saturated applications of unboxed sum expressions must separate their bars by whitespace:

Data constructors use the same syntax, except we have to put spaces between bars. For example, if you have a type with 10 alternatives, you do something like (# | | | | value | | | | | #). Space between bars is optional in the type syntax, but not optional in the term syntax. The reason is because otherwise we’d have to steal some existing syntax. For example, (# ||| a #) can be parsed as singleton unboxed tuple of Control.Arrow.||| applied to an argument, or an unboxed sum with 4 alternatives.

So if we require that the prefix counterparts must not have whitespace, then we ought to consider what effects that would/should have on the above. Food for thought.

Also for data constructors what is the prefix form. E.g. Instead of (#| True ||#), do we write

  • (#| ||#) True, or
  • (#_||#) True?

I prefer the latter. We should not have spaces in the middle of names?

Indeed, the latter notation is what GHC is using internally, I believe. But I'll be honest in that I'm not a huge fan of that notation. For one thing, the underscore in the expression (#_||#) True feels like it could represent a typed hole. Also, if (#_||#) True is allowed to appear in pattern syntax, is the underscore a wildcard pattern? Perhaps we could rule out these possibilities by carefully designing the lexer/parser, but it's worth thinking over.

One more thing worth bringing up: in the UnpackedSumTypes wiki page, Richard brings up an interesting alternative syntax for unboxed sum expressions, where (# 0 of 3 | x #) would mean (# x | | #). If we adopted that, we could have a much less ambiguous prefix form:

(# 0 of 3 |#) x
\x -> case x of (# 0 of 3|#) x -> x

But I don't know if redesigning the term-level syntax is on the agenda. osa1 mentions it in the conclusion of his blog post, so maybe he can chime in on this.

Last edited 3 years ago by RyanGlScott (previous) (diff)

comment:4 Changed 3 years ago by simonpj

GHC currently accepts spaces in prefix tuple types/expressions/patterns:

Indeed. But I don't think it should. It an accident of implementation, not a goal.

comment:5 Changed 3 years ago by Ben Gamari <ben@…>

In 613d7455/ghc:

Template Haskell support for unboxed sums

This adds new constructors `UnboxedSumE`, `UnboxedSumT`, and
`UnboxedSumP` to represent unboxed sums in Template Haskell.

One thing you can't currently do is, e.g., `reify ''(#||#)`, since I
don't believe unboxed sum type/data constructors can be written in
prefix form.  I will look at fixing that as part of #12514.

Fixes #12478.

Test Plan: make test TEST=T12478_{1,2,3}

Reviewers: osa1, goldfire, austin, bgamari

Reviewed By: goldfire, bgamari

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D2448

GHC Trac Issues: #12478

comment:6 Changed 3 years ago by RyanGlScott

While we await a design for prefix unboxed sum type/data constructors in Haskell, a convenient workaround for this issue is to just use Template Haskell. (I've opened Phab:D2854 for this.)

comment:7 Changed 3 years ago by Ryan Scott <ryan.gl.scott@…>

In b5d788aa/ghc:

Introduce unboxedSum{Data,Type}Name to template-haskell

Summary:
In D2448 (which introduced Template Haskell support for unboxed
sums), I neglected to add `unboxedSumDataName` and `unboxedSumTypeName`
functions, since there wasn't any way you could write unboxed sum data or type
constructors in prefix form to begin with (see #12514). But even if you can't
write these `Name`s directly in source code, it would still be nice to be able
to use these `Name`s in Template Haskell (for instance, to be able to treat
unboxed sum type constructors like any other type constructors).

Along the way, this uncovered a minor bug in `isBuiltInOcc_maybe` in
`TysWiredIn`, which was calculating the arity of unboxed sum data constructors
incorrectly.

Test Plan: make test TEST=T12478_5

Reviewers: osa1, goldfire, austin, bgamari

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D2854

GHC Trac Issues: #12478, #12514

comment:8 Changed 2 years ago by RyanGlScott

Keywords: UnboxedSums added

comment:9 Changed 16 months ago by RyanGlScott

Resolution: wontfix
Status: newclosed

I've lost interest in this ticket.

Note: See TracTickets for help on using tickets.