bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Update documentation.	Adrian Prantl	2019-09-11	1	-6/+6
\| \| \| \|	llvm-svn: 371648
*	[LangRef] add link for fma intrinsic	Sanjay Patel	2019-09-11	1	-1/+3
\| \| \| \|	llvm-svn: 371615
*	[LangRef] fix punctuation; NFC	Sanjay Patel	2019-09-11	1	-1/+1
\| \| \| \|	llvm-svn: 371612
*	Update ReleaseNotes: add enabling of MemorySSA.	Alina Sbirlea	2019-09-10	1	-0/+1
\| \| \| \|	llvm-svn: 371569
*	Revert "[utils] Implement the llvm-locstats tool"	Djordje Todorovic	2019-09-10	2	-80/+0
\| \| \| \| \| \|	This reverts commit rL371520. llvm-svn: 371527
*	[utils] Implement the llvm-locstats tool	Djordje Todorovic	2019-09-10	2	-0/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The tool reports verbose output for the DWARF debug location coverage. The llvm-locstats for each variable or formal parameter DIE computes what percentage from the code section bytes, where it is in scope, it has location description. The line 0 shows the number (and the percentage) of DIEs with no location information, but the line 100 shows the number (and the percentage) of DIEs where there is location information in all code section bytes (where the variable or parameter is in the scope). The line 50..59 shows the number (and the percentage) of DIEs where the location information is in between 50 and 59 percentage of its scope covered. The tool will be very useful for tracking improvements regarding the "debugging optimized code" support with LLVM ecosystem. Differential Revision: https://reviews.llvm.org/D66526 llvm-svn: 371520
*	LangRef: mention MSan's problem with speculative conditional branches.	Evgeniy Stepanov	2019-09-09	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This short blurb aims to disallow optimizations like we had to revert (under MSan) in https://reviews.llvm.org/D21165 https://bugs.llvm.org/show_bug.cgi?id=28054 https://reviews.llvm.org/D67205 Reviewers: vitalybuka, efriedma Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67244 llvm-svn: 371461
*	[SelectionDAG] Remove ISD::FP_ROUND_INREG	Craig Topper	2019-09-09	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	I don't think anything in tree creates this node. So all of this code appears to be dead. Code coverage agrees http://lab.llvm.org:8080/coverage/coverage-reports/llvm/coverage/Users/buildslave/jenkins/workspace/clang-stage2-coverage-R/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp.html Differential Revision: https://reviews.llvm.org/D67312 llvm-svn: 371431
*	[Intrinsic] Add the llvm.umul.fix.sat intrinsic	Bjorn Pettersson	2019-09-07	1	-0/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add an intrinsic that takes 2 unsigned integers with the scale of them provided as the third argument and performs fixed point multiplication on them. The result is saturated and clamped between the largest and smallest representable values of the first 2 operands. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Patch by: leonardchan, bjope Reviewers: RKSimon, craig.topper, bevinh, leonardchan, lebedev.ri, spatel Reviewed By: leonardchan Subscribers: ychen, wuzish, nemanjai, MaskRay, jsji, jdoerfert, Ka-Ka, hiraditya, rjmccall, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57836 llvm-svn: 371308
*	Docs: Update Community section on homepage	DeForest Richards	2019-09-05	2	-32/+41
\| \| \| \| \| \|	This commit includes the following changes: Adds a Getting Involved section under Community. Moves the Development Process section under Community. Moves Sphinx Quickstart Template and How to submit an LLVM bug report from User Guides section to Getting Involved. llvm-svn: 371127
*	doc update: explain that Z3 is only for clang SA - thanks to LebedevRI for ↵	Sylvestre Ledru	2019-09-05	1	-2/+2
\| \| \| \| \| \|	the suggestion llvm-svn: 371110
*	document the LLVM_ENABLE_Z3_SOLVER option	Sylvestre Ledru	2019-09-05	1	-0/+4
\| \| \| \|	llvm-svn: 371109
*	Docs: Move Documentation sections to separate pages.	DeForest Richards	2019-09-05	4	-415/+433
\| \| \| \| \| \|	Updates the links on the homepage by moving the User Guides, Programming Documentation, and Subsystem Documentation sections to separate pages. Also changes "Overview" to "About" at the top of the LLVM Docs homepage. This work is part of the Google Season of Docs project. llvm-svn: 371096
*	[LLVM][Alignment] Make functions using log of alignment explicit	Guillaume Chatelet	2019-09-05	1	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align. The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment. A few renames uncovered dubious assignments: - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation. - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation, - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation, Reviewers: lattner, thegameg, courbet Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65945 llvm-svn: 371045
*	[docs] Add some comments to the inline LLJIT example.	Lang Hames	2019-09-04	1	-0/+2
\| \| \| \|	llvm-svn: 370950
*	[llvm-profdata] Add mode to recover from profile read failures	Vedant Kumar	2019-09-03	1	-0/+8
\| \| \| \| \| \| \| \| \| \|	Add a mode in which profile read errors are not immediately treated as fatal. In this mode, merging makes forward progress and reports failure only if no inputs can be read. Differential Revision: https://reviews.llvm.org/D66985 llvm-svn: 370827
*	Revert r370454 "[LoopIdiomRecognize] BCmp loop idiom recognition"	Roman Lebedev	2019-09-03	1	-4/+0
\| \| \| \| \| \| \| \| \| \|	https://bugs.llvm.org/show_bug.cgi?id=43206 was filed, claiming that there is a miscompilation. Reverting until i investigate. This reverts commit r370454 llvm-svn: 370788
*	[SVE][Inline-Asm] Support for SVE asm operands	Kerry McLaughlin	2019-09-02	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Adds the following inline asm constraints for SVE: - w: SVE vector register with full range, Z0 to Z31 - x: Restricted to registers Z0 to Z15 inclusive. - y: Restricted to registers Z0 to Z7 inclusive. This change also adds the "z" modifier to interpret a register as an SVE register. Not all of the bitconvert patterns added by this patch are used, but they have been included here for completeness. Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, rengolin, cameron.mcinally, greened Reviewed By: sdesmalen Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66302 llvm-svn: 370673
*	[FileCheck] Forbid using var defined on same line	Thomas Preud'homme	2019-09-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Commit r366897 introduced the possibility to set a variable from an expression, such as [[#VAR2:VAR1+3]]. While introducing this feature, it introduced extra logic to allow using such a variable on the same line later on. Unfortunately that extra logic is flawed as it relies on a mapping from variable to expression defining it when the mapping is from variable definition to expression. This flaw causes among other issues PR42896. This commit avoids the problem by forbidding all use of a variable defined on the same line, and removes the now useless logic. Redesign will be done in a later commit because it will require some amount of refactoring first for the solution to be clean. One example is the need for some sort of transaction mechanism to set a variable temporarily and from an expression and rollback if the CHECK pattern does not match so that diagnostics show the right variable values. Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk Subscribers: JonChesterfield, rogfer01, hfinkel, kristina, rnk, tra, arichardson, grimar, dblaikie, probinson, llvm-commits, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D66141 llvm-svn: 370663
*	[LangRef] Update saturating examples for llvm.smul.fix.sat. NFC	Bjorn Pettersson	2019-08-31	1	-3/+3
\| \| \| \| \| \| \| \|	Some saturation examples for llvm.smul.fix.sat were not showing the correct result. I've adjusted the operands to make sure that we actually trigger overflow in those examples. llvm-svn: 370566
*	[X86] Pass v32i16/v64i8 in zmm registers on KNL target.	Craig Topper	2019-08-30	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gcc and icc pass these types in zmm registers in zmm registers. This patch implements a quick hack to override the register type before calling convention handling to one that is legal. Longer term we might want to do something similar to 256-bit integer registers on AVX1 where we just split all the operations. Fixes PR42957 Differential Revision: https://reviews.llvm.org/D66708 llvm-svn: 370495
*	[llvm-objcopy] Allow the visibility of symbols created by --binary and	Chris Jackson	2019-08-30	1	-0/+12
\| \| \| \| \| \|	--add-symbol to be specified with --new-symbol-visibility llvm-svn: 370458
*	[LoopIdiomRecognize] BCmp loop idiom recognition	Roman Lebedev	2019-08-30	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: @mclow.lists brought up this issue up in IRC. It is a reasonably common problem to compare some two values for equality. Those may be just some integers, strings or arrays of integers. In C, there is `memcmp()`, `bcmp()` functions. In C++, there exists `std::equal()` algorithm. One can also write that function manually. libstdc++'s `std::equal()` is specialized to directly call `memcmp()` for various types, but not `std::byte` from C++2a. https://godbolt.org/z/mx2ejJ libc++ does not do anything like that, it simply relies on simple C++'s `operator==()`. https://godbolt.org/z/er0Zwf (GOOD!) So likely, there exists a certain performance opportunities. Let's compare performance of naive `std::equal()` (no `memcmp()`) with one that is using `memcmp()` (in this case, compiled with modified compiler). {F8768213} ``` #include <algorithm> #include <cmath> #include <cstdint> #include <iterator> #include <limits> #include <random> #include <type_traits> #include <utility> #include <vector> #include "benchmark/benchmark.h" template <class T> bool equal(T* a, T* a_end, T* b) noexcept { for (; a != a_end; ++a, ++b) { if (a != b) return false; } return true; } template <typename T> std::vector<T> getVectorOfRandomNumbers(size_t count) { std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(), std::numeric_limits<T>::max()); std::vector<T> v; v.reserve(count); std::generate_n(std::back_inserter(v), count, [&dis, &gen]() { return dis(gen); }); assert(v.size() == count); return v; } struct Identical { template <typename T> static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) { auto Tmp = getVectorOfRandomNumbers<T>(count); return std::make_pair(Tmp, std::move(Tmp)); } }; struct InequalHalfway { template <typename T> static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) { auto V0 = getVectorOfRandomNumbers<T>(count); auto V1 = V0; V1[V1.size() / size_t(2)]++; // just change the value. return std::make_pair(std::move(V0), std::move(V1)); } }; template <class T, class Gen> void BM_bcmp(benchmark::State& state) { const size_t Length = state.range(0); const std::pair<std::vector<T>, std::vector<T>> Data = Gen::template Gen<T>(Length); const std::vector<T>& a = Data.first; const std::vector<T>& b = Data.second; assert(a.size() == Length && b.size() == a.size()); benchmark::ClobberMemory(); benchmark::DoNotOptimize(a); benchmark::DoNotOptimize(a.data()); benchmark::DoNotOptimize(b); benchmark::DoNotOptimize(b.data()); for (auto _ : state) { const bool is_equal = equal(a.data(), a.data() + a.size(), b.data()); benchmark::DoNotOptimize(is_equal); } state.SetComplexityN(Length); state.counters["eltcnt"] = benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant); state.counters["eltcnt/sec"] = benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate); const size_t BytesRead = 2 * sizeof(T) * Length; state.counters["bytes_read/iteration"] = benchmark::Counter(BytesRead, benchmark::Counter::kDefaults, benchmark::Counter::OneK::kIs1024); state.counters["bytes_read/sec"] = benchmark::Counter( BytesRead, benchmark::Counter::kIsIterationInvariantRate, benchmark::Counter::OneK::kIs1024); } template <typename T> static void CustomArguments(benchmark::internal::Benchmark* b) { const size_t L2SizeBytes = []() { for (const benchmark::CPUInfo::CacheInfo& I : benchmark::CPUInfo::Get().caches) { if (I.level == 2) return I.size; } return 0; }(); // What is the largest range we can check to always fit within given L2 cache? const size_t MaxLen = L2SizeBytes / /total bufs/ 2 / /maximal elt size/ sizeof(T) / /safety margin/ 2; b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN); } BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, Identical) ->Apply(CustomArguments<uint8_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, Identical) ->Apply(CustomArguments<uint16_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, Identical) ->Apply(CustomArguments<uint32_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, Identical) ->Apply(CustomArguments<uint64_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, InequalHalfway) ->Apply(CustomArguments<uint8_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, InequalHalfway) ->Apply(CustomArguments<uint16_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, InequalHalfway) ->Apply(CustomArguments<uint32_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, InequalHalfway) ->Apply(CustomArguments<uint64_t>); ``` {F8768210} ``` $ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks build-{old,new}/test/llvm-bcmp-bench RUNNING: build-old/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpb6PEUx 2019-04-25 21:17:11 Running build-old/test/llvm-bcmp-bench Run on (8 X 4000 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 0.65, 3.90, 4.14 --------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 432131 ns 432101 ns 1613 bytes_read/iteration=1000k bytes_read/sec=2.20706G/s eltcnt=825.856M eltcnt/sec=1.18491G/s BM_bcmp<uint8_t, Identical>_BigO 0.86 N 0.86 N BM_bcmp<uint8_t, Identical>_RMS 8 % 8 % <...> BM_bcmp<uint16_t, Identical>/256000 161408 ns 161409 ns 4027 bytes_read/iteration=1000k bytes_read/sec=5.90843G/s eltcnt=1030.91M eltcnt/sec=1.58603G/s BM_bcmp<uint16_t, Identical>_BigO 0.67 N 0.67 N BM_bcmp<uint16_t, Identical>_RMS 25 % 25 % <...> BM_bcmp<uint32_t, Identical>/128000 81497 ns 81488 ns 8415 bytes_read/iteration=1000k bytes_read/sec=11.7032G/s eltcnt=1077.12M eltcnt/sec=1.57078G/s BM_bcmp<uint32_t, Identical>_BigO 0.71 N 0.71 N BM_bcmp<uint32_t, Identical>_RMS 42 % 42 % <...> BM_bcmp<uint64_t, Identical>/64000 50138 ns 50138 ns 10909 bytes_read/iteration=1000k bytes_read/sec=19.0209G/s eltcnt=698.176M eltcnt/sec=1.27647G/s BM_bcmp<uint64_t, Identical>_BigO 0.84 N 0.84 N BM_bcmp<uint64_t, Identical>_RMS 27 % 27 % <...> BM_bcmp<uint8_t, InequalHalfway>/512000 192405 ns 192392 ns 3638 bytes_read/iteration=1000k bytes_read/sec=4.95694G/s eltcnt=1.86266G eltcnt/sec=2.66124G/s BM_bcmp<uint8_t, InequalHalfway>_BigO 0.38 N 0.38 N BM_bcmp<uint8_t, InequalHalfway>_RMS 3 % 3 % <...> BM_bcmp<uint16_t, InequalHalfway>/256000 127858 ns 127860 ns 5477 bytes_read/iteration=1000k bytes_read/sec=7.45873G/s eltcnt=1.40211G eltcnt/sec=2.00219G/s BM_bcmp<uint16_t, InequalHalfway>_BigO 0.50 N 0.50 N BM_bcmp<uint16_t, InequalHalfway>_RMS 0 % 0 % <...> BM_bcmp<uint32_t, InequalHalfway>/128000 49140 ns 49140 ns 14281 bytes_read/iteration=1000k bytes_read/sec=19.4072G/s eltcnt=1.82797G eltcnt/sec=2.60478G/s BM_bcmp<uint32_t, InequalHalfway>_BigO 0.40 N 0.40 N BM_bcmp<uint32_t, InequalHalfway>_RMS 18 % 18 % <...> BM_bcmp<uint64_t, InequalHalfway>/64000 32101 ns 32099 ns 21786 bytes_read/iteration=1000k bytes_read/sec=29.7101G/s eltcnt=1.3943G eltcnt/sec=1.99381G/s BM_bcmp<uint64_t, InequalHalfway>_BigO 0.50 N 0.50 N BM_bcmp<uint64_t, InequalHalfway>_RMS 1 % 1 % RUNNING: build-new/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpQ46PP0 2019-04-25 21:19:29 Running build-new/test/llvm-bcmp-bench Run on (8 X 4000 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 1.01, 2.85, 3.71 --------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 18593 ns 18590 ns 37565 bytes_read/iteration=1000k bytes_read/sec=51.2991G/s eltcnt=19.2333G eltcnt/sec=27.541G/s BM_bcmp<uint8_t, Identical>_BigO 0.04 N 0.04 N BM_bcmp<uint8_t, Identical>_RMS 37 % 37 % <...> BM_bcmp<uint16_t, Identical>/256000 18950 ns 18948 ns 37223 bytes_read/iteration=1000k bytes_read/sec=50.3324G/s eltcnt=9.52909G eltcnt/sec=13.511G/s BM_bcmp<uint16_t, Identical>_BigO 0.08 N 0.08 N BM_bcmp<uint16_t, Identical>_RMS 34 % 34 % <...> BM_bcmp<uint32_t, Identical>/128000 18627 ns 18627 ns 37895 bytes_read/iteration=1000k bytes_read/sec=51.198G/s eltcnt=4.85056G eltcnt/sec=6.87168G/s BM_bcmp<uint32_t, Identical>_BigO 0.16 N 0.16 N BM_bcmp<uint32_t, Identical>_RMS 35 % 35 % <...> BM_bcmp<uint64_t, Identical>/64000 18855 ns 18855 ns 37458 bytes_read/iteration=1000k bytes_read/sec=50.5791G/s eltcnt=2.39731G eltcnt/sec=3.3943G/s BM_bcmp<uint64_t, Identical>_BigO 0.32 N 0.32 N BM_bcmp<uint64_t, Identical>_RMS 33 % 33 % <...> BM_bcmp<uint8_t, InequalHalfway>/512000 9570 ns 9569 ns 73500 bytes_read/iteration=1000k bytes_read/sec=99.6601G/s eltcnt=37.632G eltcnt/sec=53.5046G/s BM_bcmp<uint8_t, InequalHalfway>_BigO 0.02 N 0.02 N BM_bcmp<uint8_t, InequalHalfway>_RMS 29 % 29 % <...> BM_bcmp<uint16_t, InequalHalfway>/256000 9547 ns 9547 ns 74343 bytes_read/iteration=1000k bytes_read/sec=99.8971G/s eltcnt=19.0318G eltcnt/sec=26.8159G/s BM_bcmp<uint16_t, InequalHalfway>_BigO 0.04 N 0.04 N BM_bcmp<uint16_t, InequalHalfway>_RMS 29 % 29 % <...> BM_bcmp<uint32_t, InequalHalfway>/128000 9396 ns 9394 ns 73521 bytes_read/iteration=1000k bytes_read/sec=101.518G/s eltcnt=9.41069G eltcnt/sec=13.6255G/s BM_bcmp<uint32_t, InequalHalfway>_BigO 0.08 N 0.08 N BM_bcmp<uint32_t, InequalHalfway>_RMS 30 % 30 % <...> BM_bcmp<uint64_t, InequalHalfway>/64000 9499 ns 9498 ns 73802 bytes_read/iteration=1000k bytes_read/sec=100.405G/s eltcnt=4.72333G eltcnt/sec=6.73808G/s BM_bcmp<uint64_t, InequalHalfway>_BigO 0.16 N 0.16 N BM_bcmp<uint64_t, InequalHalfway>_RMS 28 % 28 % Comparing build-old/test/llvm-bcmp-bench to build-new/test/llvm-bcmp-bench Benchmark Time CPU Time Old Time New CPU Old CPU New --------------------------------------------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 -0.9570 -0.9570 432131 18593 432101 18590 <...> BM_bcmp<uint16_t, Identical>/256000 -0.8826 -0.8826 161408 18950 161409 18948 <...> BM_bcmp<uint32_t, Identical>/128000 -0.7714 -0.7714 81497 18627 81488 18627 <...> BM_bcmp<uint64_t, Identical>/64000 -0.6239 -0.6239 50138 18855 50138 18855 <...> BM_bcmp<uint8_t, InequalHalfway>/512000 -0.9503 -0.9503 192405 9570 192392 9569 <...> BM_bcmp<uint16_t, InequalHalfway>/256000 -0.9253 -0.9253 127858 9547 127860 9547 <...> BM_bcmp<uint32_t, InequalHalfway>/128000 -0.8088 -0.8088 49140 9396 49140 9394 <...> BM_bcmp<uint64_t, InequalHalfway>/64000 -0.7041 -0.7041 32101 9499 32099 9498 ``` What can we tell from the benchmark? * Performance of naive equality check somewhat improves with element size, maxing out at eltcnt/sec=1.58603G/s for uint16_t, or bytes_read/sec=19.0209G/s for uint64_t. I think, that instability implies performance problems. * Performance of `memcmp()`-aware benchmark always maxes out at around bytes_read/sec=51.2991G/s for every type. That is 2.6x the throughput of the naive variant! * eltcnt/sec metric for the `memcmp()`-aware benchmark maxes out at eltcnt/sec=27.541G/s for uint8_t (was: eltcnt/sec=1.18491G/s, so 24x) and linearly decreases with element size. For uint64_t, it's ~4x+ the elements/second. * The call obvious is more pricey than the loop, with small element count. As it can be seen from the full output {F8768210}, the `memcmp()` is almost universally worse, independent of the element size (and thus buffer size) when element count is less than 8. So all in all, bcmp idiom does indeed pose untapped performance headroom. This diff does implement said idiom recognition. I think a reasonable test coverage is present, but do tell if there is anything obvious missing. Now, quality. This does succeed to build and pass the test-suite, at least without any non-bundled elements. {F8768216} {F8768217} This transform fires 91 times: ``` $ /build/test-suite/utils/compare.py -m loop-idiom.NumBCmp result-new.json Tests: 1149 Metric: loop-idiom.NumBCmp Program result-new MultiSourc...Benchmarks/7zip/7zip-benchmark 79.00 MultiSource/Applications/d/make_dparser 3.00 SingleSource/UnitTests/vla 2.00 MultiSource/Applications/Burg/burg 1.00 MultiSourc.../Applications/JM/lencod/lencod 1.00 MultiSource/Applications/lemon/lemon 1.00 MultiSource/Benchmarks/Bullet/bullet 1.00 MultiSourc...e/Benchmarks/MallocBench/gs/gs 1.00 MultiSourc...gs-C/TimberWolfMC/timberwolfmc 1.00 MultiSourc...Prolangs-C/simulator/simulator 1.00 ``` The size changes are: I'm not sure what's going on with SingleSource/UnitTests/vla.test yet, did not look. ``` $ /build/test-suite/utils/compare.py -m size..text result-{old,new}.json --filter-hash Tests: 1149 Same hash: 907 (filtered out) Remaining: 242 Metric: size..text Program result-old result-new diff test-suite...ingleSource/UnitTests/vla.test 753.00 833.00 10.6% test-suite...marks/7zip/7zip-benchmark.test 1001697.00 966657.00 -3.5% test-suite...ngs-C/simulator/simulator.test 32369.00 32321.00 -0.1% test-suite...plications/d/make_dparser.test 89585.00 89505.00 -0.1% test-suite...ce/Applications/Burg/burg.test 40817.00 40785.00 -0.1% test-suite.../Applications/lemon/lemon.test 47281.00 47249.00 -0.1% test-suite...TimberWolfMC/timberwolfmc.test 250065.00 250113.00 0.0% test-suite...chmarks/MallocBench/gs/gs.test 149889.00 149873.00 -0.0% test-suite...ications/JM/lencod/lencod.test 769585.00 769569.00 -0.0% test-suite.../Benchmarks/Bullet/bullet.test 770049.00 770049.00 0.0% test-suite...HMARK_ANISTROPIC_DIFFUSION/128 NaN NaN nan% test-suite...HMARK_ANISTROPIC_DIFFUSION/256 NaN NaN nan% test-suite...CHMARK_ANISTROPIC_DIFFUSION/64 NaN NaN nan% test-suite...CHMARK_ANISTROPIC_DIFFUSION/32 NaN NaN nan% test-suite...ENCHMARK_BILATERAL_FILTER/64/4 NaN NaN nan% Geomean difference nan% result-old result-new diff count 1.000000e+01 10.00000 10.000000 mean 3.152090e+05 311695.40000 0.006749 std 3.790398e+05 372091.42232 0.036605 min 7.530000e+02 833.00000 -0.034981 25% 4.243300e+04 42401.00000 -0.000866 50% 1.197370e+05 119689.00000 -0.000392 75% 6.397050e+05 639705.00000 -0.000005 max 1.001697e+06 966657.00000 0.106242 ``` I don't have timings though. And now to the code. The basic idea is to completely replace the whole loop. If we can't fully kill it, don't transform. I have left one or two comments in the code, so hopefully it can be understood. Also, there is a few TODO's that i have left for follow-ups: * widening of `memcmp()`/`bcmp()` * step smaller than the comparison size * Metadata propagation * more than two blocks as long as there is still a single backedge? * ??? Reviewers: reames, fhahn, mkazantsev, chandlerc, craig.topper, courbet Reviewed By: courbet Subscribers: hiraditya, xbolva00, nikic, jfb, gchatelet, courbet, llvm-commits, mclow.lists Tags: #llvm Differential Revision: https://reviews.llvm.org/D61144 llvm-svn: 370454
*	[X86] Remove what little support we had for MPX	Craig Topper	2019-08-29	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	-Deprecate -mmpx and -mno-mpx command line options -Remove CPUID detection of mpx for -march=native -Remove MPX from all CPUs -Remove MPX preprocessor define I've left the "mpx" string in the backend so we don't fail on old IR, but its not connected to anything. gcc has also deprecated these command line options. https://www.phoronix.com/scan.php?page=news_item&px=GCC-Patch-To-Drop-MPX Differential Revision: https://reviews.llvm.org/D66669 llvm-svn: 370393
*	[X86][ReleaseNotes] Add a note about the switch to widening legalization for ↵	Craig Topper	2019-08-28	1	-0/+5
\| \| \| \| \| \|	narrow vectors. llvm-svn: 370233
*	[FPEnv] Add fptosi and fptoui constrained intrinsics.	Kevin P. Neal	2019-08-28	1	-0/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implements constrained floating point intrinsics for FP to signed and unsigned integers. Quoting from D32319: The purpose of the constrained intrinsics is to force the optimizer to respect the restrictions that will be necessary to support things like the STDC FENV_ACCESS ON pragma without interfering with optimizations when these restrictions are not needed. Reviewed by: Andrew Kaylor, Craig Topper, Hal Finkel, Cameron McInally, Roman Lebedev, Kit Barton Approved by: Craig Topper Differential Revision: http://reviews.llvm.org/D63782 llvm-svn: 370228
*	Debug Info: Support for DW_AT_export_symbols for anonymous structs	Shafik Yaghmour	2019-08-23	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implements the DWARF 5 feature described in: http://dwarfstd.org/ShowIssue.php?issue=141212.1 To support recognizing anonymous structs: struct A { struct { // Anonymous struct int y; }; } a; This patch adds a new (DI)flag to LLVM metadata: ExportSymbols Differential Revision: https://reviews.llvm.org/D66352 llvm-svn: 369781
*	Fix some regressions caused by r369553 on old versions of Debian and Ubuntu	Sylvestre Ledru	2019-08-22	3	-3/+3
\| \| \| \| \| \| \| \| \| \|	It was causing some errors like: Encoding error: 'ascii' codec can't decode byte 0xe2 in position 341: ordinal not in range(128) The full traceback has been saved in /tmp/sphinx-err-y2fq4dtb.log, if you want to report the issue to the developers. llvm-svn: 369644
*	[docs] Add GwpAsan to toctree.	Mitch Phillips	2019-08-21	2	-2/+4
\| \| \| \| \| \|	Reverts rL369556 in the process, as it's no longer needed. llvm-svn: 369560
*	[docs] Fix GwpAsan.rst	Jordan Rupprecht	2019-08-21	1	-0/+2
\| \| \| \|	llvm-svn: 369556
*	Add newline to GWP-ASan sphinx document.	Mitch Phillips	2019-08-21	1	-0/+1
\| \| \| \| \| \|	Should fix the document builder. llvm-svn: 369554
*	[docs] Convert remaining command guide entries from md to rst.	Jordan Rupprecht	2019-08-21	8	-65/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Linking between markdown and rst files is currently not supported very well, e.g. the current llvm-addr2line docs [1] link to "llvm-symbolizer" instead of "llvm-symbolizer.html". This is weirdly broken in different ways depending on which versions of sphinx and recommonmark are being used, so workaround the bug by using rst everywhere. [1] http://llvm.org/docs/CommandGuide/llvm-addr2line.html Reviewers: jhenderson Reviewed By: jhenderson Subscribers: lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66305 llvm-svn: 369553
*	[GWP-ASan] Add public-facing documentation [6].	Mitch Phillips	2019-08-21	1	-0/+279
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Note: Do not submit this documentation until Scudo support is reviewed and submitted (should be #[5]). See D60593 for further information. This patch introduces the public-facing documentation for GWP-ASan, as well as updating the definition of one of the options, which wasn't properly merged. The document describes the design and features of GWP-ASan, as well as how to use GWP-ASan from both a user's standpoint, and development documentation for supporting allocators. Reviewers: jfb, morehouse, vlad.tsyrklevich Reviewed By: morehouse, vlad.tsyrklevich Subscribers: kcc, dexonsmith, kubamracek, cryptoad, jfb, #sanitizers, llvm-commits, vlad.tsyrklevich, morehouse Tags: #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D62875 llvm-svn: 369552
*	[Docs] Test commit	DeForest Richards	2019-08-18	1	-1/+1
\| \| \| \| \| \|	Fixes typo - Removes extra space between last word of sentence and period. llvm-svn: 369216
*	Add LLVMLibC proposal to docs/index.rst.	Siva Chandra	2019-08-15	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: rupprecht Subscribers: arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66307 llvm-svn: 369030
*	[llvm] Migrate llvm::make_unique to std::make_unique	Jonas Devlieghere	2019-08-15	9	-39/+39
\| \| \| \| \| \| \| \|	Now that we've moved to C++14, we no longer need the llvm::make_unique implementation from STLExtras.h. This patch is a mechanical replacement of (hopefully) all the llvm::make_unique instances across the monorepo. llvm-svn: 369013
*	Add a proposal for a libc project under the LLVM umbrella.	Siva Chandra	2019-08-15	1	-0/+125
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: chandlerc, dlj, echristo, hfinkel, jfb, zturner Subscribers: dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64939 llvm-svn: 369012
*	Add ptrmask intrinsic	Florian Hahn	2019-08-15	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a ptrmask intrinsic which allows masking out bits of a pointer that must be zero when accessing it, because of ABI alignment requirements or a restriction of the meaningful bits of a pointer through the data layout. This avoids doing a ptrtoint/inttoptr round trip in some cases (e.g. tagged pointers) and allows us to not lose information about the underlying object. Reviewers: nlopes, efriedma, hfinkel, sanjoy, jdoerfert, aqjune Reviewed by: sanjoy, jdoerfert Differential Revision: https://reviews.llvm.org/D59065 llvm-svn: 368986
*	[llvm-objcopy] Allow 'protected' visibility to be set when using	Chris Jackson	2019-08-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	add-symbol Reviewers: Maskray, rupprecht Differential Revision: https://reviews.llvm.org/D65891 llvm-svn: 368982
*	[docs] Fix sphinx doc generation errors	Jordan Rupprecht	2019-08-14	6	-21/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Errors fixed: - GettingStarted: Duplicate explicit target name: "cmake" - GlobalISel: Unexpected indentation - LoopTerminology: Explicit markup ends without a blank line; unexpected unindent - ORCv2: Definition list ends without a blank line; unexpected unindent - Misc: document isn't included in any toctree Verified that a clean docs build (`rm -rf docs/ && ninja docs-llvm-html`) passes with no errors. Spot checked the individual pages to make sure they look OK. Reviewers: thakis, dsanders Reviewed By: dsanders Subscribers: arphaman, llvm-commits, lhames, rovka, dsanders, reames Tags: #llvm Differential Revision: https://reviews.llvm.org/D66183 llvm-svn: 368932
*	Add support in CMake to statically link the C++ standard library.	Erich Keane	2019-08-14	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is sometimes useful to have the C++ standard library linked into the assembly when compiling clang, particularly when distributing a compiler onto systems that don't have a copy of stdlibc++ or libc++ installed. This functionality should work with either GCC or Clang as the host compiler, though statically linking libc++ (as may be required for licensing purposes) is only possible if the host compiler is Clang with a copy of libc++ available. Differential Revision: https://reviews.llvm.org/D65603 llvm-svn: 368907
*	Move to C++14	JF Bastien	2019-08-14	1	-96/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: I just bumped the minimum compiler versions to support C++14 in D66188. Following [our process](http://llvm.org/docs/DeveloperPolicy.html#toolchain) and [our previous agreement](http://lists.llvm.org/pipermail/llvm-dev/2019-January/129452.html), I'm now officially bumping the C++ version to 14 and updating the documentation. Subscribers: mgorny, jkorous, dexonsmith, llvm-commits, chandlerc, thakis, EricWF, jyknight, lhames, JDevlieghere Tags: #llvm Differential Revision: https://reviews.llvm.org/D66195 llvm-svn: 368887
*	[LangRef] Remove opening [ that was missing a closing ] from ↵	Craig Topper	2019-08-14	1	-3/+3
\| \| \| \| \| \| \| \| \|	call/callbr/invoke syntax. It looks like this bracket was added when the addrspace was added. before it. So I think it can jut be removed. llvm-svn: 368861
*	Remove minimum toolchain soft-error	JF Bastien	2019-08-14	1	-9/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Back in January I changed the minimum toolchain version required to build clang and LLVM: D57264. Since then we've release LLVM 8, following [our process](http://llvm.org/docs/DeveloperPolicy.html#toolchain) it's therefore now a good time to remove the soft-error and officially deprecate older toolchains. I tried this out last Tursday night to see if any bots complained, and I saw no complaints. I also manually audited bots and didn't see any bot that should break, but their toolchain information is unreliable and some bots are offline. Once this patch stick we'll move to C++14 as we've [already agreed](http://lists.llvm.org/pipermail/llvm-dev/2019-January/129452.html). Subscribers: mgorny, jkorous, dexonsmith, llvm-commits, EricWF, thakis, chandlerc Tags: #llvm Differential Revision: https://reviews.llvm.org/D66188 llvm-svn: 368799
*	Extend coroutines to support a "returned continuation" lowering.	John McCall	2019-08-14	1	-31/+272
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A quick contrast of this ABI with the currently-implemented ABI: - Allocation is implicitly managed by the lowering passes, which is fine for frontends that are fine with assuming that allocation cannot fail. This assumption is necessary to implement dynamic allocas anyway. - The lowering attempts to fit the coroutine frame into an opaque, statically-sized buffer before falling back on allocation; the same buffer must be provided to every resume point. A buffer must be at least pointer-sized. - The resume and destroy functions have been combined; the continuation function takes a parameter indicating whether it has succeeded. - Conversely, every suspend point begins its own continuation function. - The continuation function pointer is directly returned to the caller instead of being stored in the frame. The continuation can therefore directly destroy the frame when exiting the coroutine instead of having to leave it in a defunct state. - Other values can be returned directly to the caller instead of going through a promise allocation. The frontend provides a "prototype" function declaration from which the type, calling convention, and attributes of the continuation functions are taken. - On the caller side, the frontend can generate natural IR that directly uses the continuation functions as long as it prevents IPO with the coroutine until lowering has happened. In combination with the point above, the frontend is almost totally in charge of the ABI of the coroutine. - Unique-yield coroutines are given some special treatment. llvm-svn: 368788
*	[Bugpoint redesign] Fix nonlocal URI link in doc	Diego Trevino Ferrer	2019-08-09	1	-8/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fixes documentation bot build http://lab.llvm.org:8011/builders/llvm-sphinx-docs Reviewers: JDevlieghere Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66022 llvm-svn: 368493
*	[Docs][llvm-strip] Fix an indentation issue.	Michael Pozulp	2019-08-09	1	-1/+1
\| \| \| \|	llvm-svn: 368473
*	[Docs][llvm-strip] Add help text to llvm-strip rst doc	Michael Pozulp	2019-08-09	2	-16/+167
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Addresses https://bugs.llvm.org/show_bug.cgi?id=42383 Reviewers: jhenderson, alexshap, rupprecht Reviewed By: jhenderson Subscribers: wolfgangp, jakehehrlich, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65384 llvm-svn: 368464
*	[MCA] Add flag -show-encoding to llvm-mca.	Andrea Di Biagio	2019-08-09	1	-4/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Flag -show-encoding enables the printing of instruction encodings as part of the the instruction info view. Example (with flags -mtriple=x86_64-- -mcpu=btver2): Instruction Info: [1]: #uOps [2]: Latency [3]: RThroughput [4]: MayLoad [5]: MayStore [6]: HasSideEffects (U) [7]: Encoding Size [1] [2] [3] [4] [5] [6] [7] Encodings: Instructions: 1 2 1.00 4 c5 f0 59 d0 vmulps %xmm0, %xmm1, %xmm2 1 4 1.00 4 c5 eb 7c da vhaddps %xmm2, %xmm2, %xmm3 1 4 1.00 4 c5 e3 7c e3 vhaddps %xmm3, %xmm3, %xmm4 In this example, column Encoding Size is the size in bytes of the instruction encoding. Column Encodings reports the actual instruction encodings as byte sequences in hex (objdump style). The computation of encodings is done by a utility class named mca::CodeEmitter. In future, I plan to expose the CodeEmitter to the instruction builder, so that information about instruction encoding sizes can be used by the simulator. That would be a first step towards simulating the throughput from the decoders in the hardware frontend. Differential Revision: https://reviews.llvm.org/D65948 llvm-svn: 368432
*	Added Delta IR Reduction Tool	Diego Trevino Ferrer	2019-08-08	1	-0/+105
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Tool parses input IR file, and runs the delta debugging algorithm to reduce the functions inside the input file. Reviewers: alexshap, chandlerc Subscribers: mgorny, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63672 > llvm-svn: 368071 llvm-svn: 368358