summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Support/Locale.cpp
diff options
context:
space:
mode:
authorReid Kleckner <rnk@google.com>2018-09-05 00:08:56 +0000
committerReid Kleckner <rnk@google.com>2018-09-05 00:08:56 +0000
commit2b8c69204b1d37d91cc775d51af50df7f478cecd (patch)
tree67a7bb12b6c40ec503027e5a4f735a53be435b7f /llvm/lib/Support/Locale.cpp
parent2768b52117cb6a1eb0d6a0c4bb01cf52d436adbb (diff)
downloadbcm5719-llvm-2b8c69204b1d37d91cc775d51af50df7f478cecd.tar.gz
bcm5719-llvm-2b8c69204b1d37d91cc775d51af50df7f478cecd.zip
[Windows] Convert from UTF-8 to UTF-16 when writing to a Windows console
Summary: Calling WriteConsoleW is the most reliable way to print Unicode characters to a Windows console. If binary data gets printed to the console, attempting to re-encode it shouldn't be a problem, since garbage in can produce garbage out. This breaks printing strings in the local codepage, which WriteConsoleA knows how to handle. For example, this can happen when user source code is encoded with the local codepage, and an LLVM tool quotes it while emitting a caret diagnostic. This is unfortunate, but well-behaved tools should validate that their input is UTF-8 and escape non-UTF-8 characters before sending them to raw_fd_ostream. Clang already does this, but not all LLVM tools do this. One drawback to the current implementation is printing a string a byte at a time doesn't work. Consider this LLVM code: for (char C : MyStr) outs() << C; Because outs() is now unbuffered, we wil try to convert each byte to UTF-16, which will fail. However, this already didn't work, so I think we may as well update callers that do that as we find them to print complete portions of strings. You can see a real example of this in my patch to SourceMgr.cpp Fixes PR38669 and PR36267. Reviewers: zturner, efriedma Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D51558 llvm-svn: 341433
Diffstat (limited to 'llvm/lib/Support/Locale.cpp')
-rw-r--r--llvm/lib/Support/Locale.cpp13
1 files changed, 0 insertions, 13 deletions
diff --git a/llvm/lib/Support/Locale.cpp b/llvm/lib/Support/Locale.cpp
index c4cfc5e8de0..1b3300b90f2 100644
--- a/llvm/lib/Support/Locale.cpp
+++ b/llvm/lib/Support/Locale.cpp
@@ -7,24 +7,11 @@ namespace sys {
namespace locale {
int columnWidth(StringRef Text) {
-#ifdef _WIN32
- return Text.size();
-#else
return llvm::sys::unicode::columnWidthUTF8(Text);
-#endif
}
bool isPrint(int UCS) {
-#ifdef _WIN32
- // Restrict characters that we'll try to print to the lower part of ASCII
- // except for the control characters (0x20 - 0x7E). In general one can not
- // reliably output code points U+0080 and higher using narrow character C/C++
- // output functions in Windows, because the meaning of the upper 128 codes is
- // determined by the active code page in the console.
- return ' ' <= UCS && UCS <= '~';
-#else
return llvm::sys::unicode::isPrintable(UCS);
-#endif
}
} // namespace locale
OpenPOWER on IntegriCloud