bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Make most clangd unittests pass on Windows	Nico Weber	2019-10-13	8	-7/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Windows triple currently turns on delayed template parsing, which confuses several unit tests that use templates. For now, just explicitly disable delayed template parsing. This isn't ideal, but: - the Windows triple will soon no longer use delayed template parsing by default - there's precedent for this in the clangd unit tests already - let's get the clangd tests pass on Windows first before making behavioral changes Part of PR43592. llvm-svn: 374718
*	BlockInCriticalSectionChecker - silence static analyzer dyn_cast null ↵	Simon Pilgrim	2019-10-13	1	-1/+1
\| \| \| \| \| \| \| \|	dereference warning. NFCI. The static analyzer is warning about a potential null dereference, but we should be able to use cast<> directly and if not assert will fire for us. llvm-svn: 374717
*	IRTranslator - silence static analyzer null dereference warnings. NFCI.	Simon Pilgrim	2019-10-13	1	-3/+4
\| \| \| \| \| \|	The CmpInst::getType() calls can be replaced by just using User::getType() that it was dyn_cast from, and we then need to assert that any default predicate cases came from the CmpInst. llvm-svn: 374716
*	[clang-tidy] bugprone-not-null-terminated-result: checker adjustments 4	Csaba Dabis	2019-10-13	1	-1/+1
\| \| \| \|	llvm-svn: 374715
*	[clang-tidy] bugprone-not-null-terminated-result: checker adjustments 3	Csaba Dabis	2019-10-13	2	-0/+10
\| \| \| \| \| \| \|	On Windows the signed/unsigned int conversions of APInt seems broken, so that two of the test files marked as unsupported on Windows, as a hotfix. llvm-svn: 374713
*	[clang-tidy] bugprone-not-null-terminated-result: checker adjustments 2	Csaba Dabis	2019-10-13	1	-7/+8
\| \| \| \|	llvm-svn: 374712
*	[clang-tidy] bugprone-not-null-terminated-result: checker adjustments	Csaba Dabis	2019-10-13	1	-6/+6
\| \| \| \|	llvm-svn: 374711
*	[clang-tidy] bugprone-not-null-terminated-result: Sphinx adjustments 2	Csaba Dabis	2019-10-13	1	-17/+21
\| \| \| \|	llvm-svn: 374710
*	[clang-tidy] bugprone-not-null-terminated-result: Sphinx adjustments	Csaba Dabis	2019-10-13	1	-15/+16
\| \| \| \|	llvm-svn: 374709
*	gn build: Merge r374707	GN Sync Bot	2019-10-13	1	-0/+1
\| \| \| \|	llvm-svn: 374708
*	[clang-tidy] New checker for not null-terminated result caused by strlen(), ↵	Csaba Dabis	2019-10-13	17	-0/+2168
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	size() or equal length Summary: New checker called bugprone-not-null-terminated-result. This checker finds function calls where it is possible to cause a not null-terminated result. Usually the proper length of a string is `strlen(src) + 1` or equal length of this expression, because the null terminator needs an extra space. Without the null terminator it can result in undefined behaviour when the string is read. The following and their respective `wchar_t` based functions are checked: `memcpy`, `memcpy_s`, `memchr`, `memmove`, `memmove_s`, `strerror_s`, `strncmp`, `strxfrm` The following is a real-world example where the programmer forgot to increase the passed third argument, which is `size_t length`. That is why the length of the allocated memory is not enough to hold the null terminator. ``` static char stringCpy(const std::string &str) { char result = reinterpret_cast<char >(malloc(str.size())); memcpy(result, str.data(), str.size()); return result; } ``` In addition to issuing warnings, fix-it rewrites all the necessary code. It also tries to adjust the capacity of the destination array: ``` static char stringCpy(const std::string &str) { char result = reinterpret_cast<char >(malloc(str.size() + 1)); strcpy(result, str.data()); return result; } ``` Note: It cannot guarantee to rewrite every of the path-sensitive memory allocations. Reviewed By: JonasToth, aaron.ballman, whisperity, alexfh Tags: #clang-tools-extra, #clang Differential Revision: https://reviews.llvm.org/D45050 llvm-svn: 374707
*	[X86] Add a one use check on the setcc to the min/max canonicalization code ↵	Craig Topper	2019-10-13	2	-341/+265
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in combineSelect. This seems to improve std::midpoint code where we have a min and a max with the same condition. If we split the setcc we can end up with two compares if the one of the operands is a constant. Since we aggressively canonicalize compares with constants. For non-constants it can interfere with our ability to share control flow if we need to expand cmovs into control flow. I'm also not sure I understand this min/max canonicalization code. The motivating case talks about comparing with 0. But we don't check for 0 explicitly. Removes one instruction from the codegen for PR43658. llvm-svn: 374706
*	[X86] Enable v4i32->v4i16 and v8i16->v8i8 saturating truncates to use pack ↵	Craig Topper	2019-10-13	3	-101/+17
\| \| \| \| \| \|	instructions with avx512. llvm-svn: 374705
*	[X86] Add v2i64->v2i32/v2i16/v2i8 test cases to the trunc packus/ssat/usat ↵	Craig Topper	2019-10-13	3	-0/+2721
\| \| \| \| \| \|	tests. NFC llvm-svn: 374704
*	[Attributor][FIX] Avoid splitting blocks if possible	Johannes Doerfert	2019-10-13	4	-16/+22
\| \| \| \| \| \| \|	Before, we eagerly split blocks even if it was not necessary, e.g., they had a single unreachable instruction and only a single predecessor. llvm-svn: 374703
*	[Attributor][FIX] Remove leftover, now unused, variable	Johannes Doerfert	2019-10-13	1	-2/+0
\| \| \| \|	llvm-svn: 374702
*	[Attributor] Remove unused verification flag	Johannes Doerfert	2019-10-13	1	-24/+0
\| \| \| \| \| \|	We use the verify max iteration now which is more reliable. llvm-svn: 374701
*	[Attributor][NFC] Expose call site traversal without QueryingAA	Johannes Doerfert	2019-10-13	2	-9/+26
\| \| \| \|	llvm-svn: 374700
*	[Attributor][FIX] Ensure h2s doesn't trigger on escaped pointers	Johannes Doerfert	2019-10-13	2	-53/+76
\| \| \| \| \| \| \| \|	We do not yet perform h2s because we know something is free'ed but we do it because we know the pointer does not escape. Storing the pointer allows it to escape so we have to prevent that. llvm-svn: 374699
*	[Attributor][FIX] Do not apply h2s for arbitrary mallocs	Johannes Doerfert	2019-10-13	2	-16/+27
\| \| \| \| \| \| \| \|	H2S did apply to mallocs of non-constant sizes if the uses were OK. This is now forbidden through reording of the "good" and "bad" cases in the conditional. llvm-svn: 374698
*	[Attributor][FIX] Add missing function declaration in test case	Johannes Doerfert	2019-10-13	1	-0/+2
\| \| \| \|	llvm-svn: 374696
*	[Attributor][FIX] Avoid modifying naked/optnone functions	Johannes Doerfert	2019-10-13	3	-9/+36
\| \| \| \| \| \| \| \|	The check for naked/optnone was insufficient for different reasons. We now check before we initialize an abstract attribute and we do it for all abstract attributes. llvm-svn: 374694
*	[SROA] Reuse existing lifetime markers if possible	Johannes Doerfert	2019-10-13	2	-2/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If the underlying alloca did not change, we do not necessarily need new lifetime markers. This patch adds a check and reuses the old ones if possible. Reviewers: reames, ssarda, t.p.northover, hfinkel Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68900 llvm-svn: 374692
*	Revert r374663 "[clang-format] Proposal for clang-format to give compiler ↵	Nico Weber	2019-10-12	5	-189/+61
\| \| \| \| \| \| \| \| \| \|	style warnings" The test fails on macOS and looks a bit wrong, see comments on the review. Also revert follow-up r374686. llvm-svn: 374688
*	gn build: (manually) merge r374663	Nico Weber	2019-10-12	1	-0/+1
\| \| \| \|	llvm-svn: 374686
*	[libc++][test] Silence MSVC warning in std::optional test	Casey Carter	2019-10-12	1	-1/+1
\| \| \| \| \| \|	`make_optional<string>(4, 'X')` passes `4` (an `int`) as the first argument to `string`'s `(size_t, charT)` constructor, triggering a signed/unsigned mismatch warning when compiling with MSVC at `/W4`. The incredibly simple fix is to instead use an unsigned literal (`4u`). llvm-svn: 374684
*	Revert r374648: "Reland r374388: [lit] Make internal diff work in pipelines"	Joel E. Denny	2019-10-12	5	-276/+241
\| \| \| \| \| \|	This series of patches still breaks a Windows bot. llvm-svn: 374683
*	Revert r374649: "Reland r374389: [lit] Clean up internal diff's encoding ↵	Joel E. Denny	2019-10-12	7	-86/+33
\| \| \| \| \| \| \| \|	handling" This series of patches still breaks a Windows bot. llvm-svn: 374682
*	Revert r374650: "Reland r374390: [lit] Extend internal diff to support `-` ↵	Joel E. Denny	2019-10-12	7	-146/+5
\| \| \| \| \| \| \| \|	argument" This series of patches still breaks a Windows bot. llvm-svn: 374681
*	Revert 374651: "Reland r374392: [lit] Extend internal diff to support -U"	Joel E. Denny	2019-10-12	4	-136/+6
\| \| \| \| \| \|	This series of patches still breaks a Windows bot. llvm-svn: 374680
*	Revert r374652: "[lit] Fix internal diff's --strip-trailing-cr and use it"	Joel E. Denny	2019-10-12	11	-82/+13
\| \| \| \| \| \|	This series of patches still breaks a Windows bot. llvm-svn: 374679
*	Revert r374653: "[lit] Fix a few oversights in r374651 that broke some bots"	Joel E. Denny	2019-10-12	2	-2/+2
\| \| \| \| \| \|	This series of patches still breaks a Windows bot. llvm-svn: 374678
*	Revert r374665: "[lit] Try yet again to fix new tests that fail on Windows bots"	Joel E. Denny	2019-10-12	1	-3/+0
\| \| \| \| \| \|	This series of patches still breaks a Windows bot. llvm-svn: 374677
*	Revert r374666: "[lit] Adjust error handling for decode introduced by r374665"	Joel E. Denny	2019-10-12	1	-1/+1
\| \| \| \| \| \|	This series of patches still breaks a Windows bot. llvm-svn: 374676
*	Revert r374671: "[lit] Try errors="ignore" for decode introduced by r374665"	Joel E. Denny	2019-10-12	1	-1/+1
\| \| \| \| \| \|	This series of patches still breaks a Windows bot. llvm-svn: 374675
*	[X86] scaleShuffleMask - use size_t Scale to avoid overflow warnings	Simon Pilgrim	2019-10-12	2	-7/+7
\| \| \| \|	llvm-svn: 374674
*	SymbolRecord - consistently use explicit for single operand constructors	Simon Pilgrim	2019-10-12	1	-15/+15
\| \| \| \|	llvm-svn: 374673
*	SymbolRecord - fix uninitialized variable warnings. NFCI.	Simon Pilgrim	2019-10-12	1	-111/+111
\| \| \| \|	llvm-svn: 374672
*	[lit] Try errors="ignore" for decode introduced by r374665	Joel E. Denny	2019-10-12	1	-1/+1
\| \| \| \| \| \|	Still trying to fix the same error as in r374666. llvm-svn: 374671
*	[NFC][LoopIdiom] Adjust FIXME to be self-explanatory	Roman Lebedev	2019-10-12	1	-1/+1
\| \| \| \|	llvm-svn: 374670
*	Replace for-loop of SmallVector::push_back with SmallVector::append. NFCI.	Simon Pilgrim	2019-10-12	1	-4/+2
\| \| \| \|	llvm-svn: 374669
*	Fix cppcheck shadow variable name warnings. NFCI.	Simon Pilgrim	2019-10-12	1	-6/+6
\| \| \| \|	llvm-svn: 374668
*	[X86] Use any_of/all_of patterns in shuffle mask pattern recognisers. NFCI.	Simon Pilgrim	2019-10-12	1	-24/+13
\| \| \| \|	llvm-svn: 374667
*	[lit] Adjust error handling for decode introduced by r374665	Joel E. Denny	2019-10-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	On that decode, Windows bots fail with: ``` UnicodeEncodeError: 'ascii' codec can't encode characters in position 7-8: ordinal not in range(128) ``` That's the same error as before r374665 except it's now at the decode before the write to stdout. llvm-svn: 374666
*	[lit] Try yet again to fix new tests that fail on Windows bots	Joel E. Denny	2019-10-12	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I seem to have misread the bot logs on my last attempt. When lit's internal diff runs on Windows under Python 2.7, it's text diffs not binary diffs that need decoding to avoid this error when writing the diff to stdout: ``` UnicodeEncodeError: 'ascii' codec can't encode characters in position 7-8: ordinal not in range(128) ``` There is no `decode` attribute in this case under Python 3.6.8 under Ubuntu, so this patch checks for the `decode` attribute before using it here. Hopefully nothing else is needed when `decode` isn't available. It might take a couple more attempts to figure out what error handling, if any, is needed for this decoding. llvm-svn: 374665
*	Revert r374657: "[lit] Try again to fix new tests that fail on Windows bots"	Joel E. Denny	2019-10-12	2	-7/+6
\| \| \| \|	llvm-svn: 374664
*	[clang-format] Proposal for clang-format to give compiler style warnings	Paul Hoad	2019-10-12	4	-61/+188
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Related somewhat to {D29039} On seeing a quote on twitter by @invalidop > If it's not formatted with clang-format it's a build error. This made me want to change the way I use clang-format into a tool that could optionally show me where my source code violates clang-format syle. When I'm making a change to clang-format itself, one thing I like to do to test the change is to ensure I didn't cause a huge wave of changes, what I want to do is simply run this on a known formatted directory and see if any new differences arrive in a manner I'm used to. This started me thinking that we should allow build systems to run clang-format on a whole tree and emit compiler style warnings about files that fail clang-format in a form that would make them as a warning in most build systems and because those build systems range in their construction I don't think its unreasonable to NOT expect them to have to do the directory searching or parsing the output replacements themselves, but simply transform that into an error code when there are changes required. I am starting this by suggesing adding a -n or -dry-run command line argument which would emit a warning/error of the form Support for various common compiler command line argumuments like '-Werror' and '-ferror-limit' could make this very flexible to be integrated into build systems and CI systems. ``` > $ /usr/bin/clang-format --dry-run ClangFormat.cpp -ferror-limit=3 -fcolor-diagnostics > ClangFormat.cpp:54:29: warning: code should be clang-formatted [-Wclang-format-violations] > static cl::list<std::string> > ^ > ClangFormat.cpp:55:20: warning: code should be clang-formatted [-Wclang-format-violations] > LineRanges("lines", cl::desc("<start line>:<end line> - format a range of\n" > ^ > ClangFormat.cpp:55:77: warning: code should be clang-formatted [-Wclang-format-violations] > LineRanges("lines", cl::desc("<start line>:<end line> - format a range of\n" > ^ ``` Reviewers: mitchell-stellar, klimek, owenpan Reviewed By: klimek Subscribers: mgorny, cfe-commits Tags: #clang-format, #clang-tools-extra, #clang Differential Revision: https://reviews.llvm.org/D68554 llvm-svn: 374663
*	[LoopIdiomRecognize] Recommit: BCmp loop idiom recognition	Roman Lebedev	2019-10-12	5	-603/+1280
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is a recommit, this originally landed in rL370454 but was subsequently reverted in rL370788 due to https://bugs.llvm.org/show_bug.cgi?id=43206 The reduced testcase was added to bcmp-negative-tests.ll as @pr43206_different_loops - we must ensure that the SCEV's we got are both for the same loop we are currently investigating. Original commit message: @mclow.lists brought up this issue up in IRC. It is a reasonably common problem to compare some two values for equality. Those may be just some integers, strings or arrays of integers. In C, there is `memcmp()`, `bcmp()` functions. In C++, there exists `std::equal()` algorithm. One can also write that function manually. libstdc++'s `std::equal()` is specialized to directly call `memcmp()` for various types, but not `std::byte` from C++2a. https://godbolt.org/z/mx2ejJ libc++ does not do anything like that, it simply relies on simple C++'s `operator==()`. https://godbolt.org/z/er0Zwf (GOOD!) So likely, there exists a certain performance opportunities. Let's compare performance of naive `std::equal()` (no `memcmp()`) with one that is using `memcmp()` (in this case, compiled with modified compiler). {F8768213} ``` #include <algorithm> #include <cmath> #include <cstdint> #include <iterator> #include <limits> #include <random> #include <type_traits> #include <utility> #include <vector> #include "benchmark/benchmark.h" template <class T> bool equal(T* a, T* a_end, T* b) noexcept { for (; a != a_end; ++a, ++b) { if (a != b) return false; } return true; } template <typename T> std::vector<T> getVectorOfRandomNumbers(size_t count) { std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(), std::numeric_limits<T>::max()); std::vector<T> v; v.reserve(count); std::generate_n(std::back_inserter(v), count, [&dis, &gen]() { return dis(gen); }); assert(v.size() == count); return v; } struct Identical { template <typename T> static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) { auto Tmp = getVectorOfRandomNumbers<T>(count); return std::make_pair(Tmp, std::move(Tmp)); } }; struct InequalHalfway { template <typename T> static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) { auto V0 = getVectorOfRandomNumbers<T>(count); auto V1 = V0; V1[V1.size() / size_t(2)]++; // just change the value. return std::make_pair(std::move(V0), std::move(V1)); } }; template <class T, class Gen> void BM_bcmp(benchmark::State& state) { const size_t Length = state.range(0); const std::pair<std::vector<T>, std::vector<T>> Data = Gen::template Gen<T>(Length); const std::vector<T>& a = Data.first; const std::vector<T>& b = Data.second; assert(a.size() == Length && b.size() == a.size()); benchmark::ClobberMemory(); benchmark::DoNotOptimize(a); benchmark::DoNotOptimize(a.data()); benchmark::DoNotOptimize(b); benchmark::DoNotOptimize(b.data()); for (auto _ : state) { const bool is_equal = equal(a.data(), a.data() + a.size(), b.data()); benchmark::DoNotOptimize(is_equal); } state.SetComplexityN(Length); state.counters["eltcnt"] = benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant); state.counters["eltcnt/sec"] = benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate); const size_t BytesRead = 2 * sizeof(T) * Length; state.counters["bytes_read/iteration"] = benchmark::Counter(BytesRead, benchmark::Counter::kDefaults, benchmark::Counter::OneK::kIs1024); state.counters["bytes_read/sec"] = benchmark::Counter( BytesRead, benchmark::Counter::kIsIterationInvariantRate, benchmark::Counter::OneK::kIs1024); } template <typename T> static void CustomArguments(benchmark::internal::Benchmark* b) { const size_t L2SizeBytes = []() { for (const benchmark::CPUInfo::CacheInfo& I : benchmark::CPUInfo::Get().caches) { if (I.level == 2) return I.size; } return 0; }(); // What is the largest range we can check to always fit within given L2 cache? const size_t MaxLen = L2SizeBytes / /total bufs/ 2 / /maximal elt size/ sizeof(T) / /safety margin/ 2; b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN); } BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, Identical) ->Apply(CustomArguments<uint8_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, Identical) ->Apply(CustomArguments<uint16_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, Identical) ->Apply(CustomArguments<uint32_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, Identical) ->Apply(CustomArguments<uint64_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, InequalHalfway) ->Apply(CustomArguments<uint8_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, InequalHalfway) ->Apply(CustomArguments<uint16_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, InequalHalfway) ->Apply(CustomArguments<uint32_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, InequalHalfway) ->Apply(CustomArguments<uint64_t>); ``` {F8768210} ``` $ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks build-{old,new}/test/llvm-bcmp-bench RUNNING: build-old/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpb6PEUx 2019-04-25 21:17:11 Running build-old/test/llvm-bcmp-bench Run on (8 X 4000 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 0.65, 3.90, 4.14 --------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 432131 ns 432101 ns 1613 bytes_read/iteration=1000k bytes_read/sec=2.20706G/s eltcnt=825.856M eltcnt/sec=1.18491G/s BM_bcmp<uint8_t, Identical>_BigO 0.86 N 0.86 N BM_bcmp<uint8_t, Identical>_RMS 8 % 8 % <...> BM_bcmp<uint16_t, Identical>/256000 161408 ns 161409 ns 4027 bytes_read/iteration=1000k bytes_read/sec=5.90843G/s eltcnt=1030.91M eltcnt/sec=1.58603G/s BM_bcmp<uint16_t, Identical>_BigO 0.67 N 0.67 N BM_bcmp<uint16_t, Identical>_RMS 25 % 25 % <...> BM_bcmp<uint32_t, Identical>/128000 81497 ns 81488 ns 8415 bytes_read/iteration=1000k bytes_read/sec=11.7032G/s eltcnt=1077.12M eltcnt/sec=1.57078G/s BM_bcmp<uint32_t, Identical>_BigO 0.71 N 0.71 N BM_bcmp<uint32_t, Identical>_RMS 42 % 42 % <...> BM_bcmp<uint64_t, Identical>/64000 50138 ns 50138 ns 10909 bytes_read/iteration=1000k bytes_read/sec=19.0209G/s eltcnt=698.176M eltcnt/sec=1.27647G/s BM_bcmp<uint64_t, Identical>_BigO 0.84 N 0.84 N BM_bcmp<uint64_t, Identical>_RMS 27 % 27 % <...> BM_bcmp<uint8_t, InequalHalfway>/512000 192405 ns 192392 ns 3638 bytes_read/iteration=1000k bytes_read/sec=4.95694G/s eltcnt=1.86266G eltcnt/sec=2.66124G/s BM_bcmp<uint8_t, InequalHalfway>_BigO 0.38 N 0.38 N BM_bcmp<uint8_t, InequalHalfway>_RMS 3 % 3 % <...> BM_bcmp<uint16_t, InequalHalfway>/256000 127858 ns 127860 ns 5477 bytes_read/iteration=1000k bytes_read/sec=7.45873G/s eltcnt=1.40211G eltcnt/sec=2.00219G/s BM_bcmp<uint16_t, InequalHalfway>_BigO 0.50 N 0.50 N BM_bcmp<uint16_t, InequalHalfway>_RMS 0 % 0 % <...> BM_bcmp<uint32_t, InequalHalfway>/128000 49140 ns 49140 ns 14281 bytes_read/iteration=1000k bytes_read/sec=19.4072G/s eltcnt=1.82797G eltcnt/sec=2.60478G/s BM_bcmp<uint32_t, InequalHalfway>_BigO 0.40 N 0.40 N BM_bcmp<uint32_t, InequalHalfway>_RMS 18 % 18 % <...> BM_bcmp<uint64_t, InequalHalfway>/64000 32101 ns 32099 ns 21786 bytes_read/iteration=1000k bytes_read/sec=29.7101G/s eltcnt=1.3943G eltcnt/sec=1.99381G/s BM_bcmp<uint64_t, InequalHalfway>_BigO 0.50 N 0.50 N BM_bcmp<uint64_t, InequalHalfway>_RMS 1 % 1 % RUNNING: build-new/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpQ46PP0 2019-04-25 21:19:29 Running build-new/test/llvm-bcmp-bench Run on (8 X 4000 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 1.01, 2.85, 3.71 --------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 18593 ns 18590 ns 37565 bytes_read/iteration=1000k bytes_read/sec=51.2991G/s eltcnt=19.2333G eltcnt/sec=27.541G/s BM_bcmp<uint8_t, Identical>_BigO 0.04 N 0.04 N BM_bcmp<uint8_t, Identical>_RMS 37 % 37 % <...> BM_bcmp<uint16_t, Identical>/256000 18950 ns 18948 ns 37223 bytes_read/iteration=1000k bytes_read/sec=50.3324G/s eltcnt=9.52909G eltcnt/sec=13.511G/s BM_bcmp<uint16_t, Identical>_BigO 0.08 N 0.08 N BM_bcmp<uint16_t, Identical>_RMS 34 % 34 % <...> BM_bcmp<uint32_t, Identical>/128000 18627 ns 18627 ns 37895 bytes_read/iteration=1000k bytes_read/sec=51.198G/s eltcnt=4.85056G eltcnt/sec=6.87168G/s BM_bcmp<uint32_t, Identical>_BigO 0.16 N 0.16 N BM_bcmp<uint32_t, Identical>_RMS 35 % 35 % <...> BM_bcmp<uint64_t, Identical>/64000 18855 ns 18855 ns 37458 bytes_read/iteration=1000k bytes_read/sec=50.5791G/s eltcnt=2.39731G eltcnt/sec=3.3943G/s BM_bcmp<uint64_t, Identical>_BigO 0.32 N 0.32 N BM_bcmp<uint64_t, Identical>_RMS 33 % 33 % <...> BM_bcmp<uint8_t, InequalHalfway>/512000 9570 ns 9569 ns 73500 bytes_read/iteration=1000k bytes_read/sec=99.6601G/s eltcnt=37.632G eltcnt/sec=53.5046G/s BM_bcmp<uint8_t, InequalHalfway>_BigO 0.02 N 0.02 N BM_bcmp<uint8_t, InequalHalfway>_RMS 29 % 29 % <...> BM_bcmp<uint16_t, InequalHalfway>/256000 9547 ns 9547 ns 74343 bytes_read/iteration=1000k bytes_read/sec=99.8971G/s eltcnt=19.0318G eltcnt/sec=26.8159G/s BM_bcmp<uint16_t, InequalHalfway>_BigO 0.04 N 0.04 N BM_bcmp<uint16_t, InequalHalfway>_RMS 29 % 29 % <...> BM_bcmp<uint32_t, InequalHalfway>/128000 9396 ns 9394 ns 73521 bytes_read/iteration=1000k bytes_read/sec=101.518G/s eltcnt=9.41069G eltcnt/sec=13.6255G/s BM_bcmp<uint32_t, InequalHalfway>_BigO 0.08 N 0.08 N BM_bcmp<uint32_t, InequalHalfway>_RMS 30 % 30 % <...> BM_bcmp<uint64_t, InequalHalfway>/64000 9499 ns 9498 ns 73802 bytes_read/iteration=1000k bytes_read/sec=100.405G/s eltcnt=4.72333G eltcnt/sec=6.73808G/s BM_bcmp<uint64_t, InequalHalfway>_BigO 0.16 N 0.16 N BM_bcmp<uint64_t, InequalHalfway>_RMS 28 % 28 % Comparing build-old/test/llvm-bcmp-bench to build-new/test/llvm-bcmp-bench Benchmark Time CPU Time Old Time New CPU Old CPU New --------------------------------------------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 -0.9570 -0.9570 432131 18593 432101 18590 <...> BM_bcmp<uint16_t, Identical>/256000 -0.8826 -0.8826 161408 18950 161409 18948 <...> BM_bcmp<uint32_t, Identical>/128000 -0.7714 -0.7714 81497 18627 81488 18627 <...> BM_bcmp<uint64_t, Identical>/64000 -0.6239 -0.6239 50138 18855 50138 18855 <...> BM_bcmp<uint8_t, InequalHalfway>/512000 -0.9503 -0.9503 192405 9570 192392 9569 <...> BM_bcmp<uint16_t, InequalHalfway>/256000 -0.9253 -0.9253 127858 9547 127860 9547 <...> BM_bcmp<uint32_t, InequalHalfway>/128000 -0.8088 -0.8088 49140 9396 49140 9394 <...> BM_bcmp<uint64_t, InequalHalfway>/64000 -0.7041 -0.7041 32101 9499 32099 9498 ``` What can we tell from the benchmark? * Performance of naive equality check somewhat improves with element size, maxing out at eltcnt/sec=1.58603G/s for uint16_t, or bytes_read/sec=19.0209G/s for uint64_t. I think, that instability implies performance problems. * Performance of `memcmp()`-aware benchmark always maxes out at around bytes_read/sec=51.2991G/s for every type. That is 2.6x the throughput of the naive variant! * eltcnt/sec metric for the `memcmp()`-aware benchmark maxes out at eltcnt/sec=27.541G/s for uint8_t (was: eltcnt/sec=1.18491G/s, so 24x) and linearly decreases with element size. For uint64_t, it's ~4x+ the elements/second. * The call obvious is more pricey than the loop, with small element count. As it can be seen from the full output {F8768210}, the `memcmp()` is almost universally worse, independent of the element size (and thus buffer size) when element count is less than 8. So all in all, bcmp idiom does indeed pose untapped performance headroom. This diff does implement said idiom recognition. I think a reasonable test coverage is present, but do tell if there is anything obvious missing. Now, quality. This does succeed to build and pass the test-suite, at least without any non-bundled elements. {F8768216} {F8768217} This transform fires 91 times: ``` $ /build/test-suite/utils/compare.py -m loop-idiom.NumBCmp result-new.json Tests: 1149 Metric: loop-idiom.NumBCmp Program result-new MultiSourc...Benchmarks/7zip/7zip-benchmark 79.00 MultiSource/Applications/d/make_dparser 3.00 SingleSource/UnitTests/vla 2.00 MultiSource/Applications/Burg/burg 1.00 MultiSourc.../Applications/JM/lencod/lencod 1.00 MultiSource/Applications/lemon/lemon 1.00 MultiSource/Benchmarks/Bullet/bullet 1.00 MultiSourc...e/Benchmarks/MallocBench/gs/gs 1.00 MultiSourc...gs-C/TimberWolfMC/timberwolfmc 1.00 MultiSourc...Prolangs-C/simulator/simulator 1.00 ``` The size changes are: I'm not sure what's going on with SingleSource/UnitTests/vla.test yet, did not look. ``` $ /build/test-suite/utils/compare.py -m size..text result-{old,new}.json --filter-hash Tests: 1149 Same hash: 907 (filtered out) Remaining: 242 Metric: size..text Program result-old result-new diff test-suite...ingleSource/UnitTests/vla.test 753.00 833.00 10.6% test-suite...marks/7zip/7zip-benchmark.test 1001697.00 966657.00 -3.5% test-suite...ngs-C/simulator/simulator.test 32369.00 32321.00 -0.1% test-suite...plications/d/make_dparser.test 89585.00 89505.00 -0.1% test-suite...ce/Applications/Burg/burg.test 40817.00 40785.00 -0.1% test-suite.../Applications/lemon/lemon.test 47281.00 47249.00 -0.1% test-suite...TimberWolfMC/timberwolfmc.test 250065.00 250113.00 0.0% test-suite...chmarks/MallocBench/gs/gs.test 149889.00 149873.00 -0.0% test-suite...ications/JM/lencod/lencod.test 769585.00 769569.00 -0.0% test-suite.../Benchmarks/Bullet/bullet.test 770049.00 770049.00 0.0% test-suite...HMARK_ANISTROPIC_DIFFUSION/128 NaN NaN nan% test-suite...HMARK_ANISTROPIC_DIFFUSION/256 NaN NaN nan% test-suite...CHMARK_ANISTROPIC_DIFFUSION/64 NaN NaN nan% test-suite...CHMARK_ANISTROPIC_DIFFUSION/32 NaN NaN nan% test-suite...ENCHMARK_BILATERAL_FILTER/64/4 NaN NaN nan% Geomean difference nan% result-old result-new diff count 1.000000e+01 10.00000 10.000000 mean 3.152090e+05 311695.40000 0.006749 std 3.790398e+05 372091.42232 0.036605 min 7.530000e+02 833.00000 -0.034981 25% 4.243300e+04 42401.00000 -0.000866 50% 1.197370e+05 119689.00000 -0.000392 75% 6.397050e+05 639705.00000 -0.000005 max 1.001697e+06 966657.00000 0.106242 ``` I don't have timings though. And now to the code. The basic idea is to completely replace the whole loop. If we can't fully kill it, don't transform. I have left one or two comments in the code, so hopefully it can be understood. Also, there is a few TODO's that i have left for follow-ups: * widening of `memcmp()`/`bcmp()` * step smaller than the comparison size * Metadata propagation * more than two blocks as long as there is still a single backedge? * ??? Reviewers: reames, fhahn, mkazantsev, chandlerc, craig.topper, courbet Reviewed By: courbet Subscribers: miyuki, hiraditya, xbolva00, nikic, jfb, gchatelet, courbet, llvm-commits, mclow.lists Tags: #llvm Differential Revision: https://reviews.llvm.org/D61144 llvm-svn: 374662
*	[NFC][LoopIdiom] Add bcmp loop idiom miscompile test from PR43206.	Roman Lebedev	2019-10-12	1	-3/+54
\| \| \| \| \| \| \| \|	The transform forgot to check SCEV loop scopes. https://bugs.llvm.org/show_bug.cgi?id=43206 llvm-svn: 374661
*	[NFC][LoopIdiom] Move one bcmp test into the proper place	Roman Lebedev	2019-10-12	2	-23/+47
\| \| \| \|	llvm-svn: 374660