Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Platform agnostic string types #566

Open
Yangff opened this issue Jun 14, 2024 · 9 comments
Open

Platform agnostic string types #566

Yangff opened this issue Jun 14, 2024 · 9 comments
Labels
Collaborators Discussion intended for collaborators

Comments

@Yangff
Copy link
Contributor

Yangff commented Jun 14, 2024

The current implementation of UE4SS uses File::StringType = std::wstring and wchar_t as the primary character types, assuming compatibility with Unreal's main character type TCHAR, and uses UTF-16 encoding. Additionally, UE4SS uses this type as the working type for the platform file system. The following issues have been observed:

  1. On the Linux platform, the underlying type for std::wstring is wchar_t, whose sizeof is 4, using UTF-32 instead of UTF-16.
  2. Across all platforms, std::u16string has compatibility issues with existing C++ STL, such as not being accepted by std::format. On the Linux platform, neither wstring nor u16string is directly accepted by Linux or the C standard library. On Windows, although wchar_t and char16_t are essentially the same, they cannot be implicitly converted, and using u16string introduces potential issues.
  3. Similarly, when interacting with Rust, char16_t should be used.
  4. UE4SS files should use UTF-8 encoding.

Therefore, the following changes are planned (please make modifications):

  1. Remove the string type definition from File and redefine StringType as std::basic_string<TCHAR> as part of Unreal, no longer belonging to the File namespace.
  2. Use fmtlib that supports u16string to provide std::format.
  3. File will continue to provide file APIs (without breaking ABI) and might be replaced by fmt in the future (?).
  4. Unify the string type used inside UE4SS C++ code to StringType/TCHAR and only convert at the IO/Language edge.
  5. Helper will provide implementations for encoding conversion, which could be replaced by Unreal's implementation in the future.
  6. (?) Force the use of u16string on Windows. This approach has the benefit of triggering errors on Windows if they directly access the underlying type of u16string without appropriate conversion/encoding, thereby exposing potential issues.

Please comment/update here and make sure we agree on what changes go into the strings PR etc..

@UE4SS
Copy link
Collaborator

UE4SS commented Jun 15, 2024

  1. Remove the string type definition from File and redefine StringType as std::basic_string as part of Unreal, no longer belonging to the File namespace.

Just noting that doing this will make Unreal a dependency for every single one of our other small dependencies that currently depend on File for StringType, a bit awful to say the least (the ini parser would need Unreal for example) but if it solves this problem and unblocks the Linux port then maybe it's acceptable and we can figure out a better way later.

  1. File will continue to provide file APIs (without breaking ABI) and might be replaced by fmt in the future (?).

I don't think it's appropriate to replace File with fmt, does it even provide replacements for all the APIs we use ?
I think either we continue to use and maintain File, find another actual library for interacting with files, do whatever UE does, or use whatever the standard libraries provide.

These are the three original reasons why File exists:

  1. For me to learn, this also goes for most of our other first-party dependencies.
  2. I hate the C++ standard way to deal with files.
  3. I hate third-party dependencies, but this has gone out the door because the project has obviously expanded well beyond my original plan so my own homegrown solutions may not be suitable or good enough.
    I still don't like dependencies though and I hate that patternsleuth has an unbelievable amount of them.

@narknon
Copy link
Collaborator

narknon commented Jun 15, 2024

Regarding 4, CC and Arch seemed to have different thoughts on how often to use StringType vs FString than you and I did. I'd like them to chime in.

@narknon
Copy link
Collaborator

narknon commented Jun 15, 2024

@localcc @Archengius

@Yangff
Copy link
Contributor Author

Yangff commented Jun 15, 2024

  1. Remove the string type definition from File and redefine StringType as std::basic_string as part of Unreal, no longer belonging to the File namespace.

Just noting that doing this will make Unreal a dependency for every single one of our other small dependencies that currently depend on File for StringType, a bit awful to say the least (the ini parser would need Unreal for example) but if it solves this problem and unblocks the Linux port then maybe it's acceptable and we can figure out a better way later.

  1. File will continue to provide file APIs (without breaking ABI) and might be replaced by fmt in the future (?).

I don't think it's appropriate to replace File with fmt, does it even provide replacements for all the APIs we use ? I think either we continue to use and maintain File, find another actual library for interacting with files, do whatever UE does, or use whatever the standard libraries provide.

These are the three original reasons why File exists:

  1. For me to learn, this also goes for most of our other first-party dependencies.
  2. I hate the C++ standard way to deal with files.
  3. I hate third-party dependencies, but this has gone out the door because the project has obviously expanded well beyond my original plan so my own homegrown solutions may not be suitable or good enough.
    I still don't like dependencies though and I hate that patternsleuth has an unbelievable amount of them.
  1. I'm not sure where is the best place to put the StringType.. I saw this move being mentioned multiple times in the chat but personally I'd prefer leaving it outside of Unreal.. (as Unreal itself should almost always use FString in its own codebase. I see no reason to put it inside).. A more obvious apporach maybe leave them inside Helper or a dedicated header dep..

  2. fmt only gets things like read all and write all. That's why I think we will leave other parts of File as it for now, but idk other people's thought on that..

Regarding 4, CC and Arch seemed to have different thoughts on how often to use StringType vs FString than you and I did. I'd like them to chime in.

yeah.. although I'm not sure how to ... but another way is to time slice the program to use u16stirng at the beginning and switch to (somehow) fully FString later.. it might make the typing hard to manage somehow..

@narknon
Copy link
Collaborator

narknon commented Jun 15, 2024

I'm not sure where is the best place to put the StringType.. I saw this move being mentioned multiple times in the chat but personally I'd prefer leaving it outside of Unreal

I think ideally our setup would end up being very similar to Unreal's with Unreal's core source being the earliest dependency. So it makes sense to typedef it there so it propagates up.

@localcc
Copy link
Contributor

localcc commented Jun 15, 2024

So my idea was to use basic_string before we get unreal allocator, and FString after, for functions which need to work with both we just make them templated to support both. This presented an issue that FString is not as nice to work with and ofc no external library will support it.

So the end solution is we use both strings, FString at unreal boundary and basic_string everywhere else, we of course need to write conversion functions between the two, but it will mostly be no-ops at runtime so won't cause performance concerns.

@Yangff
Copy link
Contributor Author

Yangff commented Jun 17, 2024

So.. solve the circluar deps seem extra works and not related to this.. will create a string module and use macro to define which char type to use..

@Yangff
Copy link
Contributor Author

Yangff commented Jun 17, 2024

https://github.com/Yangff/RE-UE4SS/tree/port/strings-fmt

This branch should contain attempt to use fmt with StringType defined in its own module.

Currently I'm able to compile it without UVTD when setting CharType = char16_t, this should contain most typing work reuqired for Linux port.

Real windows build should use wstring which should compile without any problem because no type are changed.
Setting CharType = chat16_t is purely to ensure I processed all types.

UVTD is excluded from this because it's Windows only.

@Yangff
Copy link
Contributor Author

Yangff commented Jun 18, 2024

Okay, getting the PRs in.

Related PRs are:

FmtLib #565
StringType #574
String Port in UE4SS #575
String Port in UEPseudo https://github.com/Re-UE4SS/UEPseudo/pull/96

@UE4SS UE4SS added the Collaborators Discussion intended for collaborators label Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Collaborators Discussion intended for collaborators
Projects
None yet
Development

No branches or pull requests

4 participants