Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to interpret stdout as utf8? #83

Open
hvr opened this issue Dec 22, 2014 · 5 comments
Open

How to interpret stdout as utf8? #83

hvr opened this issue Dec 22, 2014 · 5 comments

Comments

@hvr
Copy link

hvr commented Dec 22, 2014

If I run something like

runGit d "cat-file" ["commit", ref ]

but the environment has a non-utf8 locale set such as LANG=C then I get

Exception: fd:5: hGetLine: invalid argument (invalid byte sequence)

What's the recommended way to run a command which is known to always output utf8 to its stdout/stderr, regardless of any LANG-setting?

@hvr
Copy link
Author

hvr commented Dec 22, 2014

Fwiw, I'm currently using the following ugly non-reentrant hack as a workaround:

withUtf8 :: Sh a -> Sh a
withUtf8 act = do
    oldloc <- liftIO getLocaleEncoding
    if (textEncodingName oldloc == textEncodingName utf8)
    then act
    else do
        liftIO $ setLocaleEncoding utf8
        r <- act
        liftIO $ setLocaleEncoding oldloc
        return r

@gregwebs
Copy link
Owner

I have never seen that predicament. I always change the system locale to something like C.UTF8.

At the lower level Shelly is reading from a Handle. It has a runHandles function where one can directly operate on the handles which you could try to use. If you can verify that using setEncoding on the handle fixes this, I could add an convenience function to Shelly to set the locale on the handle for a command.

@gregwebs
Copy link
Owner

You may be able to just use runHandle depending on where the error is popping up.

@hsenag
Copy link
Contributor

hsenag commented Apr 10, 2015

I've got a similar problem with the Darcs test suite. It contains a couple of scripts with different encodings to test how Darcs handles that, and they fail when run via Shelly because the output isn't valid UTF8 (I think).

Hacking Shelly to do either hSetBinaryMode True or hSetEncoding char8 on both outH and errH in runFoldLines fixed the problem.

If you could add a hook to allow clients to do that properly that'd be great. Happy to write the code myself if you can provide some guidance on how you'd like the hook to look.

@gregwebs
Copy link
Owner

@hsenag I believe you are reporting a different issue. This issue is about setting the locale for utf8 data. Your issue is that you want to work with binary data. Lets make a separate github issue for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants