summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* [clang-scan-deps] NFC, remove outdated implementation commentAlex Lorenz2019-08-301-1/+0
| | | | | | | | There's no need to purge symlinked entries in the FileManager, as the new FileEntryRef API allows us to compute dependencies more accurately when the FileManager is reused. llvm-svn: 370493
* gn build: Merge r370490Nico Weber2019-08-301-0/+1
| | | | llvm-svn: 370492
* [LLD] [COFF] Add a missing REQUIRES line to a recently added test. NFC.Martin Storsjo2019-08-301-0/+2
| | | | | | | This should fix failing buildbots like http://lab.llvm.org:8011/builders/clang-cmake-aarch64-lld/builds/7180. llvm-svn: 370491
* MemTag: unchecked load/store optimization.Evgeniy Stepanov2019-08-308-1/+389
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: MTE allows memory access to bypass tag check iff the address argument is [SP, #imm]. This change takes advantage of this to demote uses of tagged addresses to regular FrameIndex operands, reducing register pressure in large functions. MO_TAGGED target flag is used to signal that the FrameIndex operand refers to memory that might be tagged, and needs to be handled with care. Such operand must be lowered to [SP, #imm] directly, without a scratch register. The transformation pass attempts to predict when the offset will be out of range and disable the optimization. AArch64RegisterInfo::eliminateFrameIndex has an escape hatch in case this prediction has been wrong, but it is quite inefficient and should be avoided. Reviewers: pcc, vitalybuka, ostannard Subscribers: mgorny, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66457 llvm-svn: 370490
* [DAGCombine] visitVSELECT - remove equivalent getValueType() call. NFCI.Simon Pilgrim2019-08-301-1/+0
| | | | llvm-svn: 370489
* FileManager: Remove ShouldCloseOpenFile argument from getBufferForFile, NFCDuncan P. N. Exon Smith2019-08-303-11/+4
| | | | | | Remove this dead code. We always close it. llvm-svn: 370488
* [lld-link] implement -start-lib and -end-libBob Haarman2019-08-3015-70/+316
| | | | | | | | | | | | | | | | | | | | | | | Summary: This implements -start-lib and -end-lib flags for lld-link, analogous to the similarly named options in ld.lld. Object files after -start-lib are included in the link only when needed to resolve undefined symbols. The -end-lib flag goes back to the normal behavior of always including object files in the link. This mimics the semantics of static libraries, but without needing to actually create the archive file. Reviewers: ruiu, smeenai, MaskRay Reviewed By: ruiu, MaskRay Subscribers: akhuang, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66848 llvm-svn: 370487
* [INSTRUCTIONS] Add support of const for getLoadStorePointerOperand() andWhitney Tsang2019-08-301-2/+9
| | | | | | | | | | | | getLoadStorePointerOperand(). Reviewer: hsaito, sebpop, reames, hfinkel, mkuper, bogner, haicheng, arsenm, lattner, chandlerc, grosser, rengolin Reviewed By: reames Subscribers: wdng, llvm-commits, bmahjour Tag: LLVM Differential Revision: https://reviews.llvm.org/D66595 llvm-svn: 370486
* [Attributor] Fix: do not pretend to preserve the CFGJohannes Doerfert2019-08-301-1/+0
| | | | llvm-svn: 370485
* [X86] Merge X86InstrInfo::loadRegFromAddr/storeRegToAddr into their only ↵Craig Topper2019-08-302-50/+21
| | | | | | | | | | | call site. I'm looking at unfolding broadcast loads on AVX512 which will require refactoring this code to select broadcast opcodes instead of regular load/stores in some cases. Merging them to avoid further complicating their interfaces. llvm-svn: 370484
* [lit] Fix my earlier bogus fix to not set DYLD_LIBRARY_PATH with Asan.Jonas Devlieghere2019-08-301-1/+1
| | | | | | | | | | | | | My follow-up commit to mess with DYLD_LIBRARY_PATH was bogus for two reasons: - The condition was inverted. - We were checking the OS's environment, instead of the config's. Two wrongs don't make a right, but the second mistake meant that the sanitizer bot passed. llvm-svn: 370483
* [clangd] Add highlighting for macro expansions.Johan Vikstrom2019-08-304-29/+42
| | | | | | | | | | | | | | Summary: https://github.com/clangd/clangd/issues/134 Reviewers: hokein, ilya-biryukov Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D66995 llvm-svn: 370482
* Revert [Clang Interpreter] Initial patch for the constexpr interpreterNandor Licker2019-08-3064-8945/+384
| | | | | | This reverts r370476 (git commit a5590950549719d0d9ea69ed164b0c8c0f4e02e6) llvm-svn: 370481
* [Attributor] Use existing function information for the call siteJohannes Doerfert2019-08-3016-30/+278
| | | | | | | | | | | | | | | | | | | | | | | Summary: Instead of recomputing information for call sites we now use the function information directly. This is always valid and once we have call site specific information we can improve here. This patch also bootstraps attributes that are created on-demand through an initial update call. Information that is known will then directly be available in the new attribute without causing an iteration delay. The tests show how this improves the iteration count. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66781 llvm-svn: 370480
* [Attributor] Manifest load/store alignment generallyJohannes Doerfert2019-08-302-25/+27
| | | | | | | | | | | | | | | | Summary: Any pointer could have load/store users not only floating ones so we move the manifest logic for alignment into the AAAlignImpl class. Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66922 llvm-svn: 370479
* [DAGCombine] visitVSELECT - remove duplicate getOperand calls. NFCI.Simon Pilgrim2019-08-301-4/+3
| | | | llvm-svn: 370478
* [Clang Interpreter] Initial patch for the constexpr interpreterNandor Licker2019-08-3064-384/+8945
| | | | | | | | | | | | | | | | | | Summary: This patch introduces the skeleton of the constexpr interpreter, capable of evaluating a simple constexpr functions consisting of if statements. The interpreter is described in more detail in the RFC. Further patches will add more features. Reviewers: Bigcheese, jfb, rsmith Subscribers: bruno, uenoku, ldionne, Tyker, thegameg, tschuett, dexonsmith, mgorny, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D64146 llvm-svn: 370476
* [InstCombine][AMDGPU] Simplify tbuffer loadsPiotr Sobczak2019-08-302-0/+661
| | | | | | | | | | | | | | | | Summary: Add missing tbuffer loads intrinsics in SimplifyDemandedVectorElts. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66926 llvm-svn: 370475
* [llvm-nm] Small fix to Exected<StringRef>Sid Manning2019-08-301-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D66976 llvm-svn: 370474
* [clangd] Added highlighting for structured bindings.Johan Vikstrom2019-08-302-0/+19
| | | | | | | | | | | | | | Summary: Structured bindings are in a BindingDecl. The decl the declRefExpr points to are the BindingDecls. So this adds an additional if statement in the addToken function to highlight them. Reviewers: hokein, ilya-biryukov Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D66738 llvm-svn: 370473
* [yaml2obj][obj2yaml] - Use a single "Other" field instead of "Other", ↵George Rimar2019-08-3012-187/+181
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "Visibility" and "StOther". Currenly we can encode the 'st_other' field of symbol using 3 fields. 'Visibility' is used to encode STV_* values. 'Other' is used to encode everything except the visibility, but it can't handle arbitrary values. 'StOther' is used to encode arbitrary values when 'Visibility'/'Other' are not helpfull enough. 'st_other' field is used to encode symbol visibility and platform-dependent flags and values. Problem to encode it is that it consists of Visibility part (STV_* values) which are enumeration values and the Other part, which is different and inconsistent. For MIPS the Other part contains flags for all STO_MIPS_* values except STO_MIPS_MIPS16. (Like comment in ELFDumper says: "Someones in their infinite wisdom decided to make STO_MIPS_MIPS16 flag overlapped with other ST_MIPS_xxx flags."...) And for PPC64 the Other part might actually encode any value. This patch implements custom logic for handling the st_other and removes 'Visibility' and 'StOther' fields. Here is an example of a new YAML style this patch allows: - Name: foo Other: [ 0x4 ] - Name: bar Other: [ STV_PROTECTED, 4 ] - Name: zed Other: [ STV_PROTECTED, STO_MIPS_OPTIONAL, 0xf8 ] Differential revision: https://reviews.llvm.org/D66886 llvm-svn: 370472
* [DAGCombine] visitVSELECT - use getShiftAmountTy for shift amounts.Simon Pilgrim2019-08-301-3/+3
| | | | llvm-svn: 370471
* [DAGCombine] visitMULHS - use getScalarValueSizeInBits() to make safe for ↵Simon Pilgrim2019-08-301-1/+1
| | | | | | | | vector types. This is hidden behind a (scalar-only) isOneConstant(N1) check at the moment, but once we get around to adding vector support we need to ensure we're dealing with the scalar bitwidth, not the total. llvm-svn: 370468
* [mips] Merge common checkings under the same check prefix. NFCSimon Atanasyan2019-08-301-48/+34
| | | | llvm-svn: 370467
* [RISCV] Fix a couple of tests' CHECKsLuis Marques2019-08-302-84/+659
| | | | llvm-svn: 370466
* Remove an extra ";", NFC.Haojian Wu2019-08-301-1/+1
| | | | llvm-svn: 370465
* [X86] Add tests for rotate matching. NFCAmaury Sechet2019-08-302-0/+70
| | | | llvm-svn: 370464
* [CodeGen] Introduce MachineBasicBlock::replacePhiUsesWith helper and use it. NFCBjorn Pettersson2019-08-303-24/+20
| | | | | | | | | | | | | | | | | | | | | Summary: Found a couple of places in the code where all the PHI nodes of a MBB is updated, replacing references to one MBB by reference to another MBB instead. This patch simply refactors the code to use a common helper (MachineBasicBlock::replacePhiUsesWith) for such PHI node updates. Reviewers: t.p.northover, arsenm, uabelho Subscribers: wdng, hiraditya, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66750 llvm-svn: 370463
* [dotest] Finish removing -qPavel Labath2019-08-301-1/+1
| | | | | | | One usage of this option remained, and caused dotest to error out if one happened to pass the -v flag. llvm-svn: 370462
* [ASTImporter] Do not look up lambda classesGabor Marton2019-08-302-1/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Consider this code: ``` void f() { auto L0 = [](){}; auto L1 = [](){}; } ``` First we import `L0` then `L1`. Currently we end up having only one CXXRecordDecl for the two different lambdas. And that is a problem if the body of their op() is different. This happens because when we import `L1` then lookup finds the existing `L0` and since they are structurally equivalent we just map the imported L0 to be the counterpart of L1. We have the same problem in this case: ``` template <typename F0, typename F1> void f(F0 L0 = [](){}, F1 L1 = [](){}) {} ``` In StructuralEquivalenceContext we could distinquish lambdas only by their source location in these cases. But we the lambdas are actually structrually equivalent they differn only by the source location. Thus, the solution is to disable lookup completely if the decl in the "from" context is a lambda. However, that could have other problems: what if the lambda is defined in a header file and included in several TUs? I think we'd have as many duplicates as many includes we have. I think we could live with that, because the lambda classes are TU local anyway, we cannot just access them from another TU. Reviewers: a_sidorin, a.sidorin, shafik Subscribers: rnkovacs, dkrupp, Szelethus, gamesh411, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D66348 llvm-svn: 370461
* [DAGCombine] visitMULHS/visitMULHU - isBuildVectorAllZeros doesn't mean node ↵Simon Pilgrim2019-08-301-8/+8
| | | | | | | | | | is all zeros Return a proper zero vector, just in case some elements are undef. Noticed by inspection after dealing with a similar issue in PR43159. llvm-svn: 370460
* Fix Wdocumentation warning. NFCI.Simon Pilgrim2019-08-301-2/+2
| | | | llvm-svn: 370459
* [llvm-objcopy] Allow the visibility of symbols created by --binary andChris Jackson2019-08-3011-23/+111
| | | | | | --add-symbol to be specified with --new-symbol-visibility llvm-svn: 370458
* [ASTImporter] Propagate errors during import of overridden methods.Balazs Keri2019-08-302-5/+55
| | | | | | | | | | | | | | | | | | | Summary: If importing overridden methods fails for a method it can be seen incorrectly as non-virtual. To avoid this inconsistency the method is marked with import error to avoid later use of it. Reviewers: martong, a.sidorin, shafik, a_sidorin Reviewed By: martong, shafik Subscribers: rnkovacs, dkrupp, Szelethus, gamesh411, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D66933 llvm-svn: 370457
* [Attributor] Implement AANoAliasCallSiteArgument initializationHideto Ueno2019-08-302-7/+14
| | | | | | | | | | | | | | | | Summary: This patch adds an appropriate `initialize` method for `AANoAliasCallSiteArgument`. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66927 llvm-svn: 370456
* [Clangd] ExtractFunction Added checks for broken control flowShaurya Gupta2019-08-302-17/+75
| | | | | | | | | | | | | | | | Summary: - Added checks for broken control flow - Added unittests Reviewers: sammccall, kadircet Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D66732 llvm-svn: 370455
* [LoopIdiomRecognize] BCmp loop idiom recognitionRoman Lebedev2019-08-306-588/+1271
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: @mclow.lists brought up this issue up in IRC. It is a reasonably common problem to compare some two values for equality. Those may be just some integers, strings or arrays of integers. In C, there is `memcmp()`, `bcmp()` functions. In C++, there exists `std::equal()` algorithm. One can also write that function manually. libstdc++'s `std::equal()` is specialized to directly call `memcmp()` for various types, but not `std::byte` from C++2a. https://godbolt.org/z/mx2ejJ libc++ does not do anything like that, it simply relies on simple C++'s `operator==()`. https://godbolt.org/z/er0Zwf (GOOD!) So likely, there exists a certain performance opportunities. Let's compare performance of naive `std::equal()` (no `memcmp()`) with one that is using `memcmp()` (in this case, compiled with modified compiler). {F8768213} ``` #include <algorithm> #include <cmath> #include <cstdint> #include <iterator> #include <limits> #include <random> #include <type_traits> #include <utility> #include <vector> #include "benchmark/benchmark.h" template <class T> bool equal(T* a, T* a_end, T* b) noexcept { for (; a != a_end; ++a, ++b) { if (*a != *b) return false; } return true; } template <typename T> std::vector<T> getVectorOfRandomNumbers(size_t count) { std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(), std::numeric_limits<T>::max()); std::vector<T> v; v.reserve(count); std::generate_n(std::back_inserter(v), count, [&dis, &gen]() { return dis(gen); }); assert(v.size() == count); return v; } struct Identical { template <typename T> static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) { auto Tmp = getVectorOfRandomNumbers<T>(count); return std::make_pair(Tmp, std::move(Tmp)); } }; struct InequalHalfway { template <typename T> static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) { auto V0 = getVectorOfRandomNumbers<T>(count); auto V1 = V0; V1[V1.size() / size_t(2)]++; // just change the value. return std::make_pair(std::move(V0), std::move(V1)); } }; template <class T, class Gen> void BM_bcmp(benchmark::State& state) { const size_t Length = state.range(0); const std::pair<std::vector<T>, std::vector<T>> Data = Gen::template Gen<T>(Length); const std::vector<T>& a = Data.first; const std::vector<T>& b = Data.second; assert(a.size() == Length && b.size() == a.size()); benchmark::ClobberMemory(); benchmark::DoNotOptimize(a); benchmark::DoNotOptimize(a.data()); benchmark::DoNotOptimize(b); benchmark::DoNotOptimize(b.data()); for (auto _ : state) { const bool is_equal = equal(a.data(), a.data() + a.size(), b.data()); benchmark::DoNotOptimize(is_equal); } state.SetComplexityN(Length); state.counters["eltcnt"] = benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant); state.counters["eltcnt/sec"] = benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate); const size_t BytesRead = 2 * sizeof(T) * Length; state.counters["bytes_read/iteration"] = benchmark::Counter(BytesRead, benchmark::Counter::kDefaults, benchmark::Counter::OneK::kIs1024); state.counters["bytes_read/sec"] = benchmark::Counter( BytesRead, benchmark::Counter::kIsIterationInvariantRate, benchmark::Counter::OneK::kIs1024); } template <typename T> static void CustomArguments(benchmark::internal::Benchmark* b) { const size_t L2SizeBytes = []() { for (const benchmark::CPUInfo::CacheInfo& I : benchmark::CPUInfo::Get().caches) { if (I.level == 2) return I.size; } return 0; }(); // What is the largest range we can check to always fit within given L2 cache? const size_t MaxLen = L2SizeBytes / /*total bufs*/ 2 / /*maximal elt size*/ sizeof(T) / /*safety margin*/ 2; b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN); } BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, Identical) ->Apply(CustomArguments<uint8_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, Identical) ->Apply(CustomArguments<uint16_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, Identical) ->Apply(CustomArguments<uint32_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, Identical) ->Apply(CustomArguments<uint64_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, InequalHalfway) ->Apply(CustomArguments<uint8_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, InequalHalfway) ->Apply(CustomArguments<uint16_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, InequalHalfway) ->Apply(CustomArguments<uint32_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, InequalHalfway) ->Apply(CustomArguments<uint64_t>); ``` {F8768210} ``` $ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks build-{old,new}/test/llvm-bcmp-bench RUNNING: build-old/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpb6PEUx 2019-04-25 21:17:11 Running build-old/test/llvm-bcmp-bench Run on (8 X 4000 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 0.65, 3.90, 4.14 --------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 432131 ns 432101 ns 1613 bytes_read/iteration=1000k bytes_read/sec=2.20706G/s eltcnt=825.856M eltcnt/sec=1.18491G/s BM_bcmp<uint8_t, Identical>_BigO 0.86 N 0.86 N BM_bcmp<uint8_t, Identical>_RMS 8 % 8 % <...> BM_bcmp<uint16_t, Identical>/256000 161408 ns 161409 ns 4027 bytes_read/iteration=1000k bytes_read/sec=5.90843G/s eltcnt=1030.91M eltcnt/sec=1.58603G/s BM_bcmp<uint16_t, Identical>_BigO 0.67 N 0.67 N BM_bcmp<uint16_t, Identical>_RMS 25 % 25 % <...> BM_bcmp<uint32_t, Identical>/128000 81497 ns 81488 ns 8415 bytes_read/iteration=1000k bytes_read/sec=11.7032G/s eltcnt=1077.12M eltcnt/sec=1.57078G/s BM_bcmp<uint32_t, Identical>_BigO 0.71 N 0.71 N BM_bcmp<uint32_t, Identical>_RMS 42 % 42 % <...> BM_bcmp<uint64_t, Identical>/64000 50138 ns 50138 ns 10909 bytes_read/iteration=1000k bytes_read/sec=19.0209G/s eltcnt=698.176M eltcnt/sec=1.27647G/s BM_bcmp<uint64_t, Identical>_BigO 0.84 N 0.84 N BM_bcmp<uint64_t, Identical>_RMS 27 % 27 % <...> BM_bcmp<uint8_t, InequalHalfway>/512000 192405 ns 192392 ns 3638 bytes_read/iteration=1000k bytes_read/sec=4.95694G/s eltcnt=1.86266G eltcnt/sec=2.66124G/s BM_bcmp<uint8_t, InequalHalfway>_BigO 0.38 N 0.38 N BM_bcmp<uint8_t, InequalHalfway>_RMS 3 % 3 % <...> BM_bcmp<uint16_t, InequalHalfway>/256000 127858 ns 127860 ns 5477 bytes_read/iteration=1000k bytes_read/sec=7.45873G/s eltcnt=1.40211G eltcnt/sec=2.00219G/s BM_bcmp<uint16_t, InequalHalfway>_BigO 0.50 N 0.50 N BM_bcmp<uint16_t, InequalHalfway>_RMS 0 % 0 % <...> BM_bcmp<uint32_t, InequalHalfway>/128000 49140 ns 49140 ns 14281 bytes_read/iteration=1000k bytes_read/sec=19.4072G/s eltcnt=1.82797G eltcnt/sec=2.60478G/s BM_bcmp<uint32_t, InequalHalfway>_BigO 0.40 N 0.40 N BM_bcmp<uint32_t, InequalHalfway>_RMS 18 % 18 % <...> BM_bcmp<uint64_t, InequalHalfway>/64000 32101 ns 32099 ns 21786 bytes_read/iteration=1000k bytes_read/sec=29.7101G/s eltcnt=1.3943G eltcnt/sec=1.99381G/s BM_bcmp<uint64_t, InequalHalfway>_BigO 0.50 N 0.50 N BM_bcmp<uint64_t, InequalHalfway>_RMS 1 % 1 % RUNNING: build-new/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpQ46PP0 2019-04-25 21:19:29 Running build-new/test/llvm-bcmp-bench Run on (8 X 4000 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 1.01, 2.85, 3.71 --------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 18593 ns 18590 ns 37565 bytes_read/iteration=1000k bytes_read/sec=51.2991G/s eltcnt=19.2333G eltcnt/sec=27.541G/s BM_bcmp<uint8_t, Identical>_BigO 0.04 N 0.04 N BM_bcmp<uint8_t, Identical>_RMS 37 % 37 % <...> BM_bcmp<uint16_t, Identical>/256000 18950 ns 18948 ns 37223 bytes_read/iteration=1000k bytes_read/sec=50.3324G/s eltcnt=9.52909G eltcnt/sec=13.511G/s BM_bcmp<uint16_t, Identical>_BigO 0.08 N 0.08 N BM_bcmp<uint16_t, Identical>_RMS 34 % 34 % <...> BM_bcmp<uint32_t, Identical>/128000 18627 ns 18627 ns 37895 bytes_read/iteration=1000k bytes_read/sec=51.198G/s eltcnt=4.85056G eltcnt/sec=6.87168G/s BM_bcmp<uint32_t, Identical>_BigO 0.16 N 0.16 N BM_bcmp<uint32_t, Identical>_RMS 35 % 35 % <...> BM_bcmp<uint64_t, Identical>/64000 18855 ns 18855 ns 37458 bytes_read/iteration=1000k bytes_read/sec=50.5791G/s eltcnt=2.39731G eltcnt/sec=3.3943G/s BM_bcmp<uint64_t, Identical>_BigO 0.32 N 0.32 N BM_bcmp<uint64_t, Identical>_RMS 33 % 33 % <...> BM_bcmp<uint8_t, InequalHalfway>/512000 9570 ns 9569 ns 73500 bytes_read/iteration=1000k bytes_read/sec=99.6601G/s eltcnt=37.632G eltcnt/sec=53.5046G/s BM_bcmp<uint8_t, InequalHalfway>_BigO 0.02 N 0.02 N BM_bcmp<uint8_t, InequalHalfway>_RMS 29 % 29 % <...> BM_bcmp<uint16_t, InequalHalfway>/256000 9547 ns 9547 ns 74343 bytes_read/iteration=1000k bytes_read/sec=99.8971G/s eltcnt=19.0318G eltcnt/sec=26.8159G/s BM_bcmp<uint16_t, InequalHalfway>_BigO 0.04 N 0.04 N BM_bcmp<uint16_t, InequalHalfway>_RMS 29 % 29 % <...> BM_bcmp<uint32_t, InequalHalfway>/128000 9396 ns 9394 ns 73521 bytes_read/iteration=1000k bytes_read/sec=101.518G/s eltcnt=9.41069G eltcnt/sec=13.6255G/s BM_bcmp<uint32_t, InequalHalfway>_BigO 0.08 N 0.08 N BM_bcmp<uint32_t, InequalHalfway>_RMS 30 % 30 % <...> BM_bcmp<uint64_t, InequalHalfway>/64000 9499 ns 9498 ns 73802 bytes_read/iteration=1000k bytes_read/sec=100.405G/s eltcnt=4.72333G eltcnt/sec=6.73808G/s BM_bcmp<uint64_t, InequalHalfway>_BigO 0.16 N 0.16 N BM_bcmp<uint64_t, InequalHalfway>_RMS 28 % 28 % Comparing build-old/test/llvm-bcmp-bench to build-new/test/llvm-bcmp-bench Benchmark Time CPU Time Old Time New CPU Old CPU New --------------------------------------------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 -0.9570 -0.9570 432131 18593 432101 18590 <...> BM_bcmp<uint16_t, Identical>/256000 -0.8826 -0.8826 161408 18950 161409 18948 <...> BM_bcmp<uint32_t, Identical>/128000 -0.7714 -0.7714 81497 18627 81488 18627 <...> BM_bcmp<uint64_t, Identical>/64000 -0.6239 -0.6239 50138 18855 50138 18855 <...> BM_bcmp<uint8_t, InequalHalfway>/512000 -0.9503 -0.9503 192405 9570 192392 9569 <...> BM_bcmp<uint16_t, InequalHalfway>/256000 -0.9253 -0.9253 127858 9547 127860 9547 <...> BM_bcmp<uint32_t, InequalHalfway>/128000 -0.8088 -0.8088 49140 9396 49140 9394 <...> BM_bcmp<uint64_t, InequalHalfway>/64000 -0.7041 -0.7041 32101 9499 32099 9498 ``` What can we tell from the benchmark? * Performance of naive equality check somewhat improves with element size, maxing out at eltcnt/sec=1.58603G/s for uint16_t, or bytes_read/sec=19.0209G/s for uint64_t. I think, that instability implies performance problems. * Performance of `memcmp()`-aware benchmark always maxes out at around bytes_read/sec=51.2991G/s for every type. That is 2.6x the throughput of the naive variant! * eltcnt/sec metric for the `memcmp()`-aware benchmark maxes out at eltcnt/sec=27.541G/s for uint8_t (was: eltcnt/sec=1.18491G/s, so 24x) and linearly decreases with element size. For uint64_t, it's ~4x+ the elements/second. * The call obvious is more pricey than the loop, with small element count. As it can be seen from the full output {F8768210}, the `memcmp()` is almost universally worse, independent of the element size (and thus buffer size) when element count is less than 8. So all in all, bcmp idiom does indeed pose untapped performance headroom. This diff does implement said idiom recognition. I think a reasonable test coverage is present, but do tell if there is anything obvious missing. Now, quality. This does succeed to build and pass the test-suite, at least without any non-bundled elements. {F8768216} {F8768217} This transform fires 91 times: ``` $ /build/test-suite/utils/compare.py -m loop-idiom.NumBCmp result-new.json Tests: 1149 Metric: loop-idiom.NumBCmp Program result-new MultiSourc...Benchmarks/7zip/7zip-benchmark 79.00 MultiSource/Applications/d/make_dparser 3.00 SingleSource/UnitTests/vla 2.00 MultiSource/Applications/Burg/burg 1.00 MultiSourc.../Applications/JM/lencod/lencod 1.00 MultiSource/Applications/lemon/lemon 1.00 MultiSource/Benchmarks/Bullet/bullet 1.00 MultiSourc...e/Benchmarks/MallocBench/gs/gs 1.00 MultiSourc...gs-C/TimberWolfMC/timberwolfmc 1.00 MultiSourc...Prolangs-C/simulator/simulator 1.00 ``` The size changes are: I'm not sure what's going on with SingleSource/UnitTests/vla.test yet, did not look. ``` $ /build/test-suite/utils/compare.py -m size..text result-{old,new}.json --filter-hash Tests: 1149 Same hash: 907 (filtered out) Remaining: 242 Metric: size..text Program result-old result-new diff test-suite...ingleSource/UnitTests/vla.test 753.00 833.00 10.6% test-suite...marks/7zip/7zip-benchmark.test 1001697.00 966657.00 -3.5% test-suite...ngs-C/simulator/simulator.test 32369.00 32321.00 -0.1% test-suite...plications/d/make_dparser.test 89585.00 89505.00 -0.1% test-suite...ce/Applications/Burg/burg.test 40817.00 40785.00 -0.1% test-suite.../Applications/lemon/lemon.test 47281.00 47249.00 -0.1% test-suite...TimberWolfMC/timberwolfmc.test 250065.00 250113.00 0.0% test-suite...chmarks/MallocBench/gs/gs.test 149889.00 149873.00 -0.0% test-suite...ications/JM/lencod/lencod.test 769585.00 769569.00 -0.0% test-suite.../Benchmarks/Bullet/bullet.test 770049.00 770049.00 0.0% test-suite...HMARK_ANISTROPIC_DIFFUSION/128 NaN NaN nan% test-suite...HMARK_ANISTROPIC_DIFFUSION/256 NaN NaN nan% test-suite...CHMARK_ANISTROPIC_DIFFUSION/64 NaN NaN nan% test-suite...CHMARK_ANISTROPIC_DIFFUSION/32 NaN NaN nan% test-suite...ENCHMARK_BILATERAL_FILTER/64/4 NaN NaN nan% Geomean difference nan% result-old result-new diff count 1.000000e+01 10.00000 10.000000 mean 3.152090e+05 311695.40000 0.006749 std 3.790398e+05 372091.42232 0.036605 min 7.530000e+02 833.00000 -0.034981 25% 4.243300e+04 42401.00000 -0.000866 50% 1.197370e+05 119689.00000 -0.000392 75% 6.397050e+05 639705.00000 -0.000005 max 1.001697e+06 966657.00000 0.106242 ``` I don't have timings though. And now to the code. The basic idea is to completely replace the whole loop. If we can't fully kill it, don't transform. I have left one or two comments in the code, so hopefully it can be understood. Also, there is a few TODO's that i have left for follow-ups: * widening of `memcmp()`/`bcmp()` * step smaller than the comparison size * Metadata propagation * more than two blocks as long as there is still a single backedge? * ??? Reviewers: reames, fhahn, mkazantsev, chandlerc, craig.topper, courbet Reviewed By: courbet Subscribers: hiraditya, xbolva00, nikic, jfb, gchatelet, courbet, llvm-commits, mclow.lists Tags: #llvm Differential Revision: https://reviews.llvm.org/D61144 llvm-svn: 370454
* [NFC] SCEVExpander: add SetCurrentDebugLocation() / ↵Roman Lebedev2019-08-301-2/+10
| | | | | | | | | | | | | | | | | | | | | getCurrentDebugLocation() wrappers Summary: The internal `Builder` is private, which means there is currently no way to set the debuginfo locations for `SCEVExpander`. This only adds the wrappers, but does not use them anywhere. Reviewers: mkazantsev, sanjoy, gberry, jyknight, dneilson Reviewed By: sanjoy Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61007 llvm-svn: 370453
* [clangd] Collecting main file macro expansion locations in ParsedAST.Johan Vikstrom2019-08-303-1/+94
| | | | | | | | | | | | | | Summary: TokenBuffer does not collect macro expansions inside macro arguments which is needed for semantic higlighting. Therefore collects macro expansions in the main file in a PPCallback when building the ParsedAST instead. Reviewers: hokein, ilya-biryukov Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D66928 llvm-svn: 370452
* [Tooling] Migrated APIs that take ownership of objects to unique_ptrDmitri Gribenko2019-08-3021-185/+154
| | | | | | | | | | Subscribers: jkorous, arphaman, kadircet, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D66960 llvm-svn: 370451
* dotest: improvements to the pexpect testsPavel Labath2019-08-306-192/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: While working on r370054, i've found it frustrating that the test output was compeletely unhelpful in case of failures. Therefore I've decided to improve that. In this I reuse the PExpectTest class, which was one of our mechanisms for running pexpect tests, but which has gotten orhpaned in the mean time. I've replaced the existing send methods with a "expect" method, which I've tried to design so that it has a similar interface to the expect method in regular non-pexpect dotest tests (as it essentially does something very similar). I've kept the ability to dump the transcript of the pexpect communication to stdout in the "trace" mode, as that is a very handy way to figure out what the test is doing. I've also removed the "expect_string" method used in the existing tests -- I've found this to be unhelpful because it hides the message that would be normally displayed by the EOF exception. Although vebose, this message includes some important information, like what strings we were searching for, what were the last bits of lldb output, etc. I've also beefed up the class to automatically disable the debug info test duplication, and auto-skip tests when the host platform does not support pexpect. This patch ports TestMultilineCompletion and TestIOHandlerCompletion to the new class. It also deletes TestFormats as it is not testing anything (definitely not formats) -- it was committed with the test code commented out (r228207), and then the testing code was deleted in r356000. Reviewers: teemperor, JDevlieghere, davide Subscribers: aprantl, lldb-commits Differential Revision: https://reviews.llvm.org/D66954 llvm-svn: 370449
* [LiveDebugValues] Insert entry values after bundlesDavid Stenberg2019-08-304-2/+154
| | | | | | | | | | | | | | | | | | | | Summary: Change LiveDebugValues so that it inserts entry values after the bundle which contains the clobbering instruction. Previously it would insert the debug value after the bundle head using insertAfter(), breaking the bundle. Reviewers: djtodoro, NikolaPrica, aprantl, vsk Reviewed By: vsk Subscribers: hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D66888 llvm-svn: 370448
* [clangd] Add .vscode-test to .gitignore.Haojian Wu2019-08-301-0/+1
| | | | | | | | | | | | Reviewers: jvikstrom Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D66949 llvm-svn: 370446
* [CodeGen]: fix error message for "=r" asm constraintAlexander Potapenko2019-08-302-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Nico Weber reported that the following code: char buf[9]; asm("" : "=r" (buf)); yields the "impossible constraint in asm: can't store struct into a register" error message, although |buf| is not a struct (see http://crbug.com/999160). Make the error message more generic and add a test for it. Also make sure other tests in x86_64-PR42672.c check for the full error message. Reviewers: eli.friedman, thakis Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D66948 llvm-svn: 370444
* vim: add `immarg` keywordSven van Haastregt2019-08-301-0/+1
| | | | | | The `immarg` attribute was added in r355981. llvm-svn: 370443
* gn build: Merge r370441Nico Weber2019-08-301-1/+0
| | | | llvm-svn: 370442
* [ADT] Removed VariadicFunctionDmitri Gribenko2019-08-303-440/+0
| | | | | | | | | | | | | | | Summary: It is not used. It uses macro-based unrolling instead of variadic templates, so it is not idiomatic anymore, and therefore it is a questionable API to keep "just in case". Subscribers: mgorny, dmgreen, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66961 llvm-svn: 370441
* [lldb][NFC] Move Clang-specific flags to ClangUserExpressionRaphael Isemann2019-08-303-19/+18
| | | | | | | | | LLVMUserExpression doesn't use these variables and they are all specific to Clang. Also removes m_const_object as this was actually never used by anyone (and Clang didn't report it as we assigned it in the constructor which seems to count as use). llvm-svn: 370440
* [ELF] Set `referenced` bit of Undefined created by BitcodeFileFangrui Song2019-08-303-1/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | D64136 and D65584, while fixing STB_WEAK issues and improving our compatibility with ld.bfd, can cause another STB_WEAK problem related to LTO: If %tundef.o has an undefined reference on f, and %tweakundef.o has a weak undefined reference on f, %tdef.o has a definition of f ``` ld.lld %tundef.o %tweakundef.o --start-lib %tdef.o --end-lib ``` 1) `%tundef.o` doesn't set the `referenced` bit. 2) `%weakundef.o` changes the binding from STB_GLOBAL to STB_WEAK 3) `%tdef.o` is not fetched because the binding is weak. Step (1) is incorrect. This patch sets the `referenced` bit of Undefined created by bitcode files. Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D66992 llvm-svn: 370437
* [LLD] [COFF] Support merging resource object filesMartin Storsjo2019-08-3012-43/+477
| | | | | | | | | | | | | | | | | | | | | | | | Extend WindowsResourceParser to support using a ResourceSectionRef for loading resources from an object file. Only allow merging resource object files in mingw mode; keep the existing error on multiple resource objects in link mode. If there only is one resource object file and no .res resources, don't parse and recreate the .rsrc section, but just link it in without inspecting it. This allows users to produce any .rsrc section (outside of what the parser supports), just like before. (I don't have a specific need for this, but it reduces the risk of this new feature.) Separate out the .rsrc section chunks in InputFiles.cpp, and only include them in the list of section chunks to link if we've determined that there only was one single resource object. (We need to keep other chunks from those object files, as they can legitimately contain other sections as well, in addition to .rsrc section chunks.) Differential Revision: https://reviews.llvm.org/D66824 llvm-svn: 370436
OpenPOWER on IntegriCloud