summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Revert "[LICM] Make LICM able to hoist phis"Benjamin Kramer2018-11-193-1477/+24
| | | | | | This reverts commit r347190. llvm-svn: 347225
* [lit] On Windows, don't error if MSVC is not in PATH.Zachary Turner2018-11-191-2/+4
| | | | | | | We had some logic backwards, and as a result if MSVC was not found in PATH we would throw a string concatenation exception. llvm-svn: 347224
* Remove non-ASCII characters at the beginning of file.Zachary Turner2018-11-191-1/+1
| | | | | | It's not clear how these ended up in the file, but this fixes it. llvm-svn: 347223
* [AMDGPU] Derive GCNSubtarget from MF to get overridden target featuresDavid Stuttard2018-11-191-2/+2
| | | | | | | | | | | | | | | | | | Summary: AMDGPUAsmPrinter has a getSTI function that derives a GCNSubtarget from the TM. However, this means that overridden target features are not detected and can result in incorrect behaviour. Switch to using STM which is a GCNSubtarget derived from the MF (used elsewhere in the same function). Change-Id: Ib6328ad667b7fcdc87e9c06344e59859207db9b0 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D54301 llvm-svn: 347221
* [LV] Avoid vectorizing unsafe dependencies in uniform addressAnna Thomas2018-11-198-22/+71
| | | | | | | | | | | | | | | | | | | Summary: Currently, when vectorizing stores to uniform addresses, the only instance we prevent vectorization is if there are multiple stores to the same uniform address causing an unsafe dependency. This patch teaches LAA to avoid vectorizing loops that have an unsafe cross-iteration dependency between a load and a store to the same uniform address. Fixes PR39653. Reviewers: Ayal, efriedma Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D54538 llvm-svn: 347220
* [libcxx] Add availability markup for bad_optional_access, bad_variant_access ↵Louis Dionne2018-11-1950-245/+364
| | | | | | | | | | | | and bad_any_cast Reviewers: dexonsmith, EricWF Subscribers: christof, arphaman, libcxx-commits Differential Revision: https://reviews.llvm.org/D53256 llvm-svn: 347219
* [Hexagon] make test immune to improvements in undef simplificationSanjay Patel2018-11-191-2/+2
| | | | llvm-svn: 347218
* [x86] add/make tests immune to improvements in undef simplificationSanjay Patel2018-11-193-77/+161
| | | | llvm-svn: 347217
* Fix some issues with LLDB's lit configuration files.Zachary Turner2018-11-1974-230/+314
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recently I tried to port LLDB's lit configuration files over to use a on the surface, but broke some cases that weren't broken before and also exposed some additional problems with the old approach that we were just getting lucky with. When we set up a lit environment, the goal is to make it as hermetic as possible. We should not be relying on PATH and enabling the use of arbitrary shell commands. Instead, only whitelisted commands should be allowed. These are, generally speaking, the lit builtins such as echo, cd, etc, as well as anything for which substitutions have been explicitly set up for. These substitutions should map to the build output directory, but in some cases it's useful to be able to override this (for example to point to an installed tools directory). This is, of course, how it's supposed to work. What was actually happening is that we were bringing in PATH and LD_LIBRARY_PATH and then just running the given run line as a shell command. This led to problems such as finding the wrong version of clang-cl on PATH since it wasn't even a substitution, and flakiness / non-determinism since the environment the tests were running in would change per-machine. On the other hand, it also made other things possible. For example, we had some tests that were explicitly running cl.exe and link.exe instead of clang-cl and lld-link and the only reason it worked at all is because it was finding them on PATH. Unfortunately we can't entirely get rid of these tests, because they support a few things in debug info that clang-cl and lld-link don't (notably, the LF_UDT_MOD_SRC_LINE record which makes some of the tests fail. The high level changes introduced in this patch are: 1. Removal of functionality - The lit test suite no longer respects LLDB_TEST_C_COMPILER and LLDB_TEST_CXX_COMPILER. This means there is no more support for gcc, but nobody was using this anyway (note: The functionality is still there for the dotest suite, just not the lit test suite). There is no longer a single substitution %cxx and %cc which maps to <arbitrary-compiler>, you now explicitly specify the compiler with a substitution like %clang or %clangxx or %clang_cl. We can revisit this in the future when someone needs gcc. 2. Introduction of the LLDB_LIT_TOOLS_DIR directory. This does in spirit what LLDB_TEST_C_COMPILER and LLDB_TEST_CXX_COMPILER used to do, but now more friendly. If this is not specified, all tools are expected to be the just-built tools. If it is specified, the tools which are not themselves being tested but are being used to construct and run checks (e.g. clang, FileCheck, llvm-mc, etc) will be searched for in this directory first, then the build output directory. 3. Changes to core llvm lit files. The use_lld() and use_clang() functions were introduced long ago in anticipation of using them in lldb, but since they were never actually used anywhere but their respective problems, there were some issues to be resolved regarding generality and ability to use them outside their project. 4. Changes to .test files - These are all just replacing things like clang-cl with %clang_cl and %cxx with %clangxx, etc. 5. Changes to lit.cfg.py - Previously we would load up some system environment variables and then add some new things to them. Then do a bunch of work building out our own substitutions. First, we delete the system environment variable code, making the environment hermetic. Then, we refactor the substitution logic into two separate helper functions, one which sets up substitutions for the tools we want to test (which must come from the build output directory), and another which sets up substitutions for support tools (like compilers, etc). 6. New substitutions for MSVC -- Previously we relied on location of MSVC by bringing in the entire parent's PATH and letting subprocess.Popen just run the command line. Now we set up real substitutions that should have the same effect. We use PATH to find them, and then look for INCLUDE and LIB to construct a substitution command line with appropriate /I and /LIBPATH: arguments. The nice thing about this is that it opens the door to having separate %msvc-cl32 and %msvc-cl64 substitutions, rather than only requiring the user to run vcvars first. Because we can deduce the path to 32-bit libraries from 64-bit library directories, and vice versa. Without these substitutions this would have been impossible. Differential Revision: https://reviews.llvm.org/D54567 llvm-svn: 347216
* [LoopPass] fixing 'Modification' messages in -debug-pass=Executions for loop ↵Fedor Sergeev2018-11-191-2/+4
| | | | | | | | | | | | | passes Legacy loop pass manager is issuing "Made Modification" message after each Loop Pass run, however condition for issuing it is accumulated among all the runs. That leads to confusing 'modification' messages as soon as the first modification is done. Changing condition to be "current pass made modifications", similar to how it is being done in all other pass managers. llvm-svn: 347215
* [OpenMP] Check target architecture supports unified shared memory for ↵Patrick Lyster2018-11-195-50/+143
| | | | | | requires directive. Differential Review: https://reviews.llvm.org/D54493 llvm-svn: 347214
* Don't use -O in lit tests.Zachary Turner2018-11-197-17/+21
| | | | | | | | | | Because of different shell quoting rules, and the fact that LLDB commands often contain spaces, -O is not portable for writing command lines. Instead, we should use explicit lldbinit files. Differential Revision: https://reviews.llvm.org/D54680 llvm-svn: 347213
* [SelectionDAG] simplify select FP with undef conditionSanjay Patel2018-11-192-1/+2
| | | | llvm-svn: 347212
* [x86] add test for select FP with undef condition; NFCSanjay Patel2018-11-191-0/+8
| | | | llvm-svn: 347211
* [SelectionDAG] add simplifySelect() to reduce code duplication; NFCSanjay Patel2018-11-193-36/+30
| | | | | | This should be extended to handle FP and vectors in follow-up patches. llvm-svn: 347210
* [llvm-exegesis][NFC] More tests for ExegesisTarget::fillMemoryOperands().Clement Courbet2018-11-193-23/+41
| | | | | | | | | | Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54304 llvm-svn: 347209
* Subject: [PATCH] [CodeGen] Add pass to combine interleaved loads.Martin Elshuber2018-11-199-1/+1789
| | | | | | | | | | | | | | This patch defines an interleaved-load-combine pass. The pass searches for ShuffleVector instructions that represent interleaved loads. Matches are converted such that they will be captured by the InterleavedAccessPass. The pass extends LLVMs capabilities to use target specific instruction selection of interleaved load patterns (e.g.: ld4 on Aarch64 architectures). Differential Revision: https://reviews.llvm.org/D52653 llvm-svn: 347208
* [ThinLTO] Fix comment. NFCEugene Leviant2018-11-191-1/+1
| | | | llvm-svn: 347207
* [SelectionDAG] fix formatting; NFCSanjay Patel2018-11-191-15/+13
| | | | llvm-svn: 347206
* [FileManager] getFile(open=true) after getFile(open=false) should open the file.Sam McCall2018-11-193-21/+62
| | | | | | | | | | | | | | | | | | | Summary: Old behavior is to just return the cached entry regardless of opened-ness. That feels buggy (though I guess nobody ever actually needed this). This came up in the context of clangd+clang-tidy integration: we're going to getFile(open=false) to replay preprocessor actions obscured by the preamble, but the compilation may subsequently getFile(open=true) for non-preamble includes. Reviewers: ilya-biryukov Subscribers: ioeric, kadircet, cfe-commits Differential Revision: https://reviews.llvm.org/D54691 llvm-svn: 347205
* [llvm-exegesis] (+final perf overview) ↵Roman Lebedev2018-11-192-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | InstructionBenchmarkClustering::rangeQuery(): reserve for the upper bound of Neighbors Summary: As it was pointed out in D54388+D54390, the maximal size of `Neighbors` is known, it will contain at most Points_.size() minus one (the center of the cluster) While that is the upper bound, meaning in the most cases, the actual count will be much smaller, since D54390 made the allocation persistent, we no longer have to worry about overly-optimistically `reserve()`ing. Old: (D54393) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 6553.167456 task-clock (msec) # 1.000 CPUs utilized ( +- 0.21% ) ... 6.5547 +- 0.0134 seconds time elapsed ( +- 0.20% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 6315.057872 task-clock (msec) # 0.999 CPUs utilized ( +- 0.24% ) ... 6.3187 +- 0.0160 seconds time elapsed ( +- 0.25% ) ``` And that is another -~4%. Since this is the last (as of this moment) patch in this patch series, it is a good time to summarize: Old: (svn trunk, as stated in D54381) ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m24.884s user 0m24.099s sys 0m0.785s ``` So these patches, on a given benchmark, has decreased llvm-exegesis analysis time by 74.62%. There surely is more room for further improvements. D54514 may improve thins by -11.5% more (relative to this patch). Parallelization may improve things further significantly, too. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54415 llvm-svn: 347204
* [llvm-exegesis] Move InstructionBenchmarkClustering::isNeighbour() into headerRoman Lebedev2018-11-192-12/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Old: (D54390) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7432.421721 task-clock (msec) # 1.000 CPUs utilized ( +- 0.15% ) ... 7.4336 +- 0.0115 seconds time elapsed ( +- 0.15% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 6569.936144 task-clock (msec) # 1.000 CPUs utilized ( +- 0.22% ) ... 6.5711 +- 0.0143 seconds time elapsed ( +- 0.22% ) ``` And another -12%. You'd think it would be `inline`d anyway, but no! :) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54393 llvm-svn: 347203
* [llvm-exegesis] InstructionBenchmarkClustering::rangeQuery(): write into ↵Roman Lebedev2018-11-192-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | llvm::SmallVectorImpl& output parameter Summary: I do believe this is the correct fix. We call `rangeQuery()` *very* often. And many times it's output vector is large (tens of thousands entries), so small-size-opt won't help. Old: (D54389) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7934.528363 task-clock (msec) # 1.000 CPUs utilized ( +- 0.19% ) ... 7.9354 +- 0.0148 seconds time elapsed ( +- 0.19% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7383.793440 task-clock (msec) # 1.000 CPUs utilized ( +- 0.47% ) ... 7.3868 +- 0.0340 seconds time elapsed ( +- 0.46% ) ``` And another -7%. And that isn't even the good bit yet. Old: * calls to allocation functions: 2081419 * temporary allocations: 219658 (10.55%) * bytes allocated in total (ignoring deallocations): 4.31 GB New: * calls to allocation functions: 1880295 (-10%) * temporary allocations: 18758 (1%) (-91% *sic*) * bytes allocated in total (ignoring deallocations): 545.15 MB (-88% *sic*) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54390 llvm-svn: 347202
* [llvm-exegesis] InstructionBenchmarkClustering::dbScan(): replace ↵Roman Lebedev2018-11-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | std::vector<> with std::deque<> in llvm::SetVector<> Summary: Old: (D54388) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 8606.323981 task-clock (msec) # 1.000 CPUs utilized ( +- 0.11% ) ... 8.60773 +- 0.00978 seconds time elapsed ( +- 0.11% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7971.403653 task-clock (msec) # 1.000 CPUs utilized ( +- 0.14% ) ... 7.9728 +- 0.0113 seconds time elapsed ( +- 0.14% ) ``` Another -~7%. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, RKSimon Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54389 llvm-svn: 347201
* [llvm-exegesis] InstructionBenchmarkClustering::rangeQuery(): use ↵Roman Lebedev2018-11-192-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | llvm::SmallVector<size_t, 0> for storage. Summary: Old: (D54383) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 9098.781978 task-clock (msec) # 1.000 CPUs utilized ( +- 0.16% ) ... 9.1015 +- 0.0148 seconds time elapsed ( +- 0.16% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 8553.352480 task-clock (msec) # 1.000 CPUs utilized ( +- 0.12% ) ... 8.5539 +- 0.0105 seconds time elapsed ( +- 0.12% ) ``` So another -6%. That is because the `SmallVector` **doubles** it size when reallocating, which is great here, since we can't `reserve()` since we can't know how many `Neighbors` we will have. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54388 llvm-svn: 347200
* [llvm-exegesis] Analysis: writeMeasurementValue(): don't alloc string for ↵Roman Lebedev2018-11-191-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | double each time. Summary: Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!) Old time: (D54382) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 9024.354355 task-clock (msec) # 1.000 CPUs utilized ( +- 0.18% ) ... 9.0262 +- 0.0161 seconds time elapsed ( +- 0.18% ) ``` New time: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 8996.541057 task-clock (msec) # 0.999 CPUs utilized ( +- 0.19% ) ... 9.0045 +- 0.0172 seconds time elapsed ( +- 0.19% ) ``` -~0.3%, not that much. But this isn't the important part. Old: * calls to allocation functions: 2109712 * temporary allocations: 33112 * bytes allocated in total (ignoring deallocations): 4.43 GB New: * calls to allocation functions: 2095345 (-0.68%) * temporary allocations: 18745 (-43.39% !!!) * bytes allocated in total (ignoring deallocations): 4.31 GB (-2.71%) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54383 llvm-svn: 347199
* [llvm-exegesis] Analysis::writeSnippet(): be smarter about memory allocations.Roman Lebedev2018-11-191-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!) Old time: (D54381) ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m10.487s user 0m9.745s sys 0m0.740s ``` New time: ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m9.599s user 0m8.824s sys 0m0.772s ``` Not that much, around -9%. But that is not the good part yet, again. Old: * calls to allocation functions: 3347676 * temporary allocations: 277818 * bytes allocated in total (ignoring deallocations): 10.52 GB New: * calls to allocation functions: 2109712 (-36%) * temporary allocations: 33112 (-88%) * bytes allocated in total (ignoring deallocations): 4.43 GB (-58% *sic*) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54382 llvm-svn: 347198
* [llvm-exegesis] InstructionBenchmarkClustering::dbScan(): use ↵Roman Lebedev2018-11-191-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | llvm::SetVector<> instead of ILLEGAL std::unordered_set<> Summary: Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!) Old time: ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m24.884s user 0m24.099s sys 0m0.785s ``` New time: ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m10.469s user 0m9.797s sys 0m0.672s ``` So -60%. And that isn't the good bit yet. Old: * calls to allocation functions: 106560180 (yes, 107 *million* allocations.) * bytes allocated in total (ignoring deallocations): 12.17 GB New: * calls to allocation functions: 3347676 (-96.86%) (just 3 mil) * bytes allocated in total (ignoring deallocations): 10.52 GB (~2GB less) --- Two points i want to raise: * `std::unordered_set<>` should not have been used there in the first place. It is banned by the https://llvm.org/docs/ProgrammersManual.html#other-set-like-container-options * There is no tests, so i'm not fully sure this is correct. Since it was unordered set, i guess there are zero restrictions on the order, and anything will be ok? * I tried other containers suggested in https://llvm.org/docs/ProgrammersManual.html#set-like-containers-std-set-smallset-setvector-etc, this `llvm::SetVector<>` seems to be best here. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet Subscribers: kristina, bobsayshilol, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54381 llvm-svn: 347197
* Fixed uninitialized variable issue.Anastasia Stulova2018-11-191-1/+1
| | | | | | This commit should fix failing bots. llvm-svn: 347196
* [X86] Add codegen tests for slow-shld scalar funnel shiftsSimon Pilgrim2018-11-192-198/+521
| | | | llvm-svn: 347195
* Test commit - delete trailing space.Michael Platings2018-11-191-1/+1
| | | | llvm-svn: 347194
* Test commit - delete a trailing space.Michael Platings2018-11-191-1/+1
| | | | llvm-svn: 347193
* AMDGPU/InsertWaitcnts: Some more const-correctnessNicolai Haehnle2018-11-191-4/+4
| | | | | | | | | | Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54225 llvm-svn: 347192
* [ARM] Remove trunc sinks in ARM CGPSam Parker2018-11-195-180/+364
| | | | | | | | | | | | | | | | | | | | | | | | | | Truncs are treated as sources if their produce a value of the same type as the one we currently trying to promote. Truncs used to be considered as a sink if their operand was the same value type. We now allow smaller types in the search, so we should search through truncs that produce a smaller value. These truncs can then be converted to an AND mask. This leaves sinks as being: - points where the value in the register is being observed, such as an icmp, switch or store. - points where value types have to match, such as calls and returns. - zext are included to ease the transformation and are generally removed later on. During this change, it also became apart from truncating sinks was broken: if a sink used a source, its type information had already been lost by the time the truncation happens. So I've changed the method of caching the type information. Differential Revision: https://reviews.llvm.org/D54515 llvm-svn: 347191
* [LICM] Make LICM able to hoist phisJohn Brawn2018-11-193-24/+1477
| | | | | | | | | | | | | | | The general approach taken is to make note of loop invariant branches, then when we see something conditional on that branch, such as a phi, we create a copy of the branch and (empty versions of) its successors and hoist using that. This has no impact by itself that I've been able to see, as LICM typically doesn't see such phis as they will have been converted into selects by the time LICM is run, but once we start doing phi-to-select conversion later it will be important. Differential Revision: https://reviews.llvm.org/D52827 llvm-svn: 347190
* [OpenCL] Fix address space deduction in template args.Anastasia Stulova2018-11-192-1/+34
| | | | | | | | | | | Don't deduce address spaces for non-pointer-like types in template args. Fixes PR38603! Differential Revision: https://reviews.llvm.org/D54634 llvm-svn: 347189
* Remove unused variable. NFC.Benjamin Kramer2018-11-191-1/+0
| | | | llvm-svn: 347188
* [MSP430] Optimize srl/sra in case of A >> (8 + N)Anton Korobeynikov2018-11-192-2/+37
| | | | | | | | | | | There is no variable-length shifts on MSP430. Therefore "eat" 8 bits of shift via bswap & ext. Path by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D54623 llvm-svn: 347187
* Fix disturbing warning - NFCISerge Guelton2018-11-191-1/+1
| | | | llvm-svn: 347186
* [X86] Use a pcmpgt with 0 instead of psrad 31, to fill elements with the ↵Craig Topper2018-11-193-18/+18
| | | | | | | | sign bit in v4i32 MULH lowering. The shift requires a copy to avoid clobbering a register. Comparing with 0 uses an xor to produce 0 that will be overwritten with the compare results. So still requires 2 instructions, but should be one byte shorter since it doesn't need to encode an immediate. llvm-svn: 347185
* [LoopSimplifyCFG] Add requires: asserts after rL347183Fangrui Song2018-11-191-0/+1
| | | | llvm-svn: 347184
* [LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switchesMax Kazantsev2018-11-192-9/+361
| | | | | | | | | | | | | | | | This patch introduces infrastructure and the simplest case for constant-folding of branch and switch instructions within loop into unconditional branches. It is useful as a cleanup for such passes as loop unswitching that sometimes produce such branches. Only the simplest case supported in this patch: after the folding, no block should become dead or stop being part of the loop. Support for more sophisticated cases will go separately in follow-up patches. Differential Revision: https://reviews.llvm.org/D54021 Reviewed By: anna llvm-svn: 347183
* [ProfileSummary] Standardize methods and fix commentVedant Kumar2018-11-1911-35/+34
| | | | | | | | | | | | | | | | | | | | | Every Analysis pass has a get method that returns a reference of the Result of the Analysis, for example, BlockFrequencyInfo &BlockFrequencyInfoWrapperPass::getBFI(). I believe that ProfileSummaryInfo::getPSI() is the only exception to that, as it was returning a pointer. Another change is renaming isHotBB and isColdBB to isHotBlock and isColdBlock, respectively. Most methods use BB as the argument of variable names while methods usually refer to Basic Blocks as Blocks, instead of BB. For example, Function::getEntryBlock, Loop:getExitBlock, etc. I also fixed one of the comments. Patch by Rodrigo Caetano Rocha! Differential Revision: https://reviews.llvm.org/D54669 llvm-svn: 347182
* [X86] Use compare with 0 to fill an element with sign bits when sign ↵Craig Topper2018-11-198-568/+573
| | | | | | | | extending to v2i64 pre-sse4.1 Previously we used an arithmetic shift right by 31, but that requires a copy to preserve the input. So we might as well materialize a zero and compare to it since the comparison will overwrite the register that contains the zeros. This should be one byte shorter. llvm-svn: 347181
* [X86] Remove most of the SEXTLOAD Custom setOperationAction calls under ↵Craig Topper2018-11-193-211/+118
| | | | | | | | -x86-experimental-vector-widening-legalization. Leave just the v4i8->v4i64 and v8i8->v8i64, but only enable them on pre-sse4.1 targets when 64-bit mode is enabled. In those cases we end up creating sext loads that get scalarized to code that looks better than what we get from loading into a vector register and doing a multiple step sign extend using unpacks and shifts. llvm-svn: 347180
* [PowerPC] Set the default PLT mode on OpenBSD/powerpc to Secure PLT.Brad Smith2018-11-193-4/+13
| | | | | | OpenBSD/powerpc only supports Secure PLT. llvm-svn: 347179
* Replace the UTF-8 characters in the error message.Brad Smith2018-11-182-2/+2
| | | | llvm-svn: 347178
* [X86][SSE] Add SimplifyDemandedVectorElts support for SSE packed i2fp ↵Simon Pilgrim2018-11-184-75/+109
| | | | | | conversions. llvm-svn: 347177
* [X86] Add custom type legalization for extending v4i8/v4i16->v4i64.Craig Topper2018-11-182-209/+148
| | | | | | | | Pre-SSE4.1 sext_invec for v2i64 is complicated because we don't have a v2i64 sra instruction. So instead we sign extend to i32 using unpack and sra, then copy the elements and do a v4i32 sra to fill with sign bits, then interleave the i32 sign extend and the sign bits. So really we're doing to two sign extends but only using half of the v4i32 intermediate result. When the result is more than 128 bits, default type legalization would prefer to split the destination type all the way down to v2i64 with shuffles followed by v16i8/v8i16->v2i64 sext_inreg operations. This results in more instructions than necessary because we are only utilizing the lower 2 elements of the v4i32 intermediate result. Instead we can custom split a v4i8/v4i16->v4i64 sign_extend. Then we can sign extend v4i8/v4i16->v4i32 invec producing a full v4i32 result. Create the sign bit vector as a v4i32 then split and interleave with the sign bits using an punpackldq and punpackhdq. llvm-svn: 347176
* [X86] Add a 32-bit command line with only sse2 to vector-sext.ll and ↵Craig Topper2018-11-182-2/+2073
| | | | | | | | vector-sext.ll to show some of the scalarized load sequences without 64-bit scalar support. Some of these sequeces look pretty bad since we have to copy the sign bit from a 32 bit register to a 64 bit register to finish a sign extend. llvm-svn: 347175
OpenPOWER on IntegriCloud