summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [SLP] Fix PR36481: vectorize reassociated instructions.Alexey Bataev2018-04-032-92/+289
| | | | | | | | | | | | | | | | | | Summary: If the load/extractelement/extractvalue instructions are not originally consecutive, the SLP vectorizer is unable to vectorize them. Patch allows reordering of such instructions. Patch does not support reordering of the repeated instruction, this must be handled in the separate patch. Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43776 llvm-svn: 329085
* Recommit "[SLP] Fix issues with debug output in the SLP vectorizer."Alexey Bataev2018-04-031-3/+4
| | | | | | | | | | | | | The primary issue here is that using NDEBUG alone isn't enough to guard debug printing -- instead the DEBUG() macro needs to be used so that the specific pass debug logging check is employed. Without this, every asserts-enabled build was printing out information when it hit this. I also fixed another place where we had multiple statements in a DEBUG macro to use {}s to be a bit cleaner. And I fixed a place that used errs() rather than dbgs(). llvm-svn: 329082
* [Hexagon] Remove -mhvx-double and the corresponding subtarget featureKrzysztof Parzyszek2018-04-033-50/+32
| | | | | | | Specifying the HVX vector length should be done via the -mhvx-length option. llvm-svn: 329079
* Adding optional Name parameter to createVirtualRegister and ↵Puyan Lotfi2018-04-031-4/+5
| | | | | | createGenericVirtualRegister. llvm-svn: 329076
* Revert "[SLP] Fix PR36481: vectorize reassociated instructions."Benjamin Kramer2018-04-032-291/+95
| | | | | | This reverts commit r328980 and r329046. Makes the vectorizer crash. llvm-svn: 329071
* [MC] Fix -Wmissing-field-initializer warning after r329067.Andrea Di Biagio2018-04-031-0/+1
| | | | | | | | This should fix the problem reported by the lld buildbots: - Builder lld-x86_64-darwin13, Build #19782 - Builder lld-perf-testsuite, Build #1419 llvm-svn: 329068
* [MC][Tablegen] Allow the definition of processor register files in the ↵Andrea Di Biagio2018-04-031-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | scheduling model for llvm-mca This patch allows the description of register files in processor scheduling models. This addresses PR36662. A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td. Targets can optionally describe register files for their processors using that class. In particular, class RegisterFile allows to specify: - The total number of physical registers. - Which target registers are accessible through the register file. - The cost of allocating a register at register renaming stage. Example (from this patch - see file X86/X86ScheduleBtVer2.td) def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]> Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar (btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM register definitions only cost 1 physical register. The syntax allows to specify an empty set of register classes. An empty set of register classes means: this register file models all the registers specified by the Target. For each register class, users can specify an optional register cost. By default, register costs default to 1. A value of 0 for the number of physical registers means: "this register file has an unbounded number of physical registers". This patch is structured in two parts. * Part 1 - MC/Tablegen * A first part adds the tablegen definition of RegisterFile, and teaches the SubtargetEmitter how to emit information related to register files. Information about register files is accessible through an instance of MCExtraProcessorInfo. The idea behind this design is to logically partition the processor description which is only used by external tools (like llvm-mca) from the processor information used by the llvm machine schedulers. I think that this design would make easier for targets to get rid of the extra processor information if they don't want it. * Part 2 - llvm-mca related * The second part of this patch is related to changes to llvm-mca. The main differences are: 1) class RegisterFile now needs to take into account the "cost of a register" when allocating physical registers at register renaming stage. 2) Point 1. triggered a minor refactoring which lef to the removal of the "maximum 32 register files" restriction. 3) The BackendStatistics view has been updated so that we can print out extra details related to each register file implemented by the processor. The effect of point 3. is also visible in tests register-files-[1..5].s. Differential Revision: https://reviews.llvm.org/D44980 llvm-svn: 329067
* [PowerPC] reorder entries in P9InstrResources.td in alphabetical order; NFCHiroshi Inoue2018-04-031-1/+1
| | | | | | Reorder entries added in my previous commit (rL328969) to keep alphabetical order. llvm-svn: 329064
* MSan: introduce the conservative assembly handling mode.Alexander Potapenko2018-04-031-1/+49
| | | | | | | | | | | | The default assembly handling mode may introduce false positives in the cases when MSan doesn't understand that the assembly call initializes the memory pointed to by one of its arguments. We introduce the conservative mode, which initializes the first |sizeof(type)| bytes for every |type*| pointer passed into the assembly statement. llvm-svn: 329054
* [SCEV] Fix PR36974.Serguei Katkov2018-04-031-5/+6
| | | | | | | | | | | | | The patch changes the usage of dominate to properlyDominate to satisfy the condition !(a < a) while using std::max. It is actually NFC due to set data structure is used to keep the Loops and no two identical loops can be in collection. So in reality there is no difference between usage of dominate and properlyDominate in this particular case. However it might be changed so it is better to fix it. llvm-svn: 329051
* [X86] Reduce number of OpPrefix bits in TSFlags to 2. NFCICraig Topper2018-04-033-33/+37
| | | | | | TSFlag doesn't need to disambiguate NoPrfx from PS. So shift the encodings so PS is NoPrfx|0x4. llvm-svn: 329049
* [SCEV] Make computeExitLimit more simple and more powerfulMax Kazantsev2018-04-031-58/+17
| | | | | | | | | | | | | | | | | | | | | | | Current implementation of `computeExitLimit` has a big piece of code the only purpose of which is to prove that after the execution of this block the latch will be executed. What it currently checks is actually a subset of situations where the exiting block dominates latch. This patch replaces all these checks for simple particular cases with domination check over loop's latch which is the only necessary condition of taking the exiting block into consideration. This change allows to calculate exact loop taken count for simple loops like for (int i = 0; i < 100; i++) { if (cond) {...} else {...} if (i > 50) break; . . . } Differential Revision: https://reviews.llvm.org/D44677 Reviewed By: efriedma llvm-svn: 329047
* [SLP] Fix issues with debug output in the SLP vectorizer.Chandler Carruth2018-04-031-10/+10
| | | | | | | | | | | | | The primary issue here is that using NDEBUG alone isn't enough to guard debug printing -- instead the DEBUG() macro needs to be used so that the specific pass debug logging check is employed. Without this, every asserts-enabled build was printing out information when it hit this. I also fixed another place where we had multiple statements in a DEBUG macro to use {}s to be a bit cleaner. And I fixed a place that used `errs()` rather than `dbgs()`. llvm-svn: 329046
* bpf: fix incorrect SELECT_CC loweringYonghong Song2018-04-031-6/+0
| | | | | | | | | | | | | | | | | | | | | | | Commit 37962a331c77 ("bpf: Improve expanding logic in LowerSELECT_CC") intended to improve code quality for certain jmp conditions. The commit, however, has a couple of issues: (1). In code, just swap is not enough, ConditionalCode CC should also be swapped, otherwise incorrect code will be generated. (2). The ConditionalCode swap should be subject to getHasJmpExt(). If getHasJmpExt() is False, certain conditional codes will not be supported and swap may generate incorrect code. The original goal for this patch is to optimize jmp operations which does not have JmpExt turned on. If JmpExt is on, better code could be generated. For example, the test select_ri.ll is introduced to demonstrate the optimization. The same result can be achieved with -mcpu=v2 flag. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 329043
* peel loops with runtime small trip countsIkhlas Ajbar2018-04-033-2/+17
| | | | | | | | | | For Hexagon, peeling loops with small runtime trip count is beneficial for our benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set by the target for computing the desired peel count. Differential Revision: https://reviews.llvm.org/D44880 llvm-svn: 329042
* [SLP] Distinguish "demanded and shrinkable" from "demanded and not ↵Haicheng Wu2018-04-031-17/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | shrinkable" values when determining the minimum bitwidth We use two approaches for determining the minimum bitwidth. * Demanded bits * Value tracking If demanded bits doesn't result in a narrower type, we then try value tracking. We need this if we want to root SLP trees with the indices of getelementptr instructions since all the bits of the indices are demanded. But there is a missing piece though. We need to be able to distinguish "demanded and shrinkable" from "demanded and not shrinkable". For example, the bits of %i in %i = sext i32 %e1 to i64 %gep = getelementptr inbounds i64, i64* %p, i64 %i are demanded, but we can shrink %i's type to i32 because it won't change the result of the getelementptr. On the other hand, in %tmp15 = sext i32 %tmp14 to i64 %tmp16 = insertvalue { i64, i64 } undef, i64 %tmp15, 0 it doesn't make sense to shrink %tmp15 and we can skip the value tracking. Ideas are from Matthew Simpson! Differential Revision: https://reviews.llvm.org/D44868 llvm-svn: 329035
* [Coroutines] Avoid assert splitting hidden corosBrian Gesiak2018-04-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When attempting to split a coroutine with 'hidden' visibility (for example, a C++ coroutine that is inlined when compiled with the option '-fvisibility-inlines-hidden'), LLVM would hit an assertion in include/llvm/IR/GlobalValue.h:240: "local linkage requires default visibility". The issue is that the visibility is copied from the source of the function split in the `CloneFunctionInto` function, but the linkage is not. To fix, create the new function first with external linkage, then copy the linkage from the original function *after* `CloneFunctionInto` is called. Since `GlobalValue::setLinkage` in turn calls `maybeSetDsoLocal`, the explicit call to `setDSOLocal` can be removed in CoroSplit.cpp. Test Plan: check-llvm Reviewers: GorNishanov, lewissbaker, EricWF, majnemer, rnk Reviewed By: rnk Subscribers: llvm-commits, eric_niebler Differential Revision: https://reviews.llvm.org/D44185 llvm-svn: 329033
* Align stubs for external and common global variables to pointer size.Rafael Espindola2018-04-021-0/+1
| | | | | | | | | This patch fixes PR36885: clang++ generates unaligned stub symbol holding a pointer. Patch by Rahul Chaudhry! llvm-svn: 329030
* [InstCombine] Don't strip function type casts from musttail callsReid Kleckner2018-04-021-1/+10
| | | | | | | | | | | | | | | Summary: The cast simplifications that instcombine does here do not make any attempt to obey the verifier rules for musttail calls. Therefore we have to disable them. Reviewers: efriedma, majnemer, pcc Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D45186 llvm-svn: 329027
* Treat inlining a notail call as a regular, non-tail callReid Kleckner2018-04-021-0/+6
| | | | | | | | | Otherwise, we end up inlining a musttail call into a non-tail position, which breaks verifier invariants. Fixes PR31014 llvm-svn: 329015
* [ORC] Create a new SymbolStringPool by default in ExecutionSession constructor.Lang Hames2018-04-023-12/+3
| | | | | | This makes the common case of constructing an ExecutionSession tidier. llvm-svn: 329013
* [InstCombine] add folds for icmp + sub (PR36969)Sanjay Patel2018-04-021-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | (A - B) >u A --> A <u B C <u (C - D) --> C <u D https://rise4fun.com/Alive/e7j Name: ugt %sub = sub i8 %x, %y %cmp = icmp ugt i8 %sub, %x => %cmp = icmp ult i8 %x, %y Name: ult %sub = sub i8 %x, %y %cmp = icmp ult i8 %x, %sub => %cmp = icmp ult i8 %x, %y This should fix: https://bugs.llvm.org/show_bug.cgi?id=36969 llvm-svn: 329011
* Fix header mismatch in DIBuilder Type APIsHarlan Haskins2018-04-021-5/+5
| | | | | | | Some of the headers changed slightly, and the accompanying implementation didn't change. This caused a silent failure. llvm-svn: 329003
* [llvm-pdbutil] Add an export subcommand.Zachary Turner2018-04-022-4/+13
| | | | | | | | | | | | | | | | | This command can dump the binary contents of a stream to a file. This is useful when you want to do side-by-side comparisons of a specific stream from two PDBs to examine the differences between them. You can export both of them to a file, then open them up side by side in a hex editor (for example), so as to eliminate any differences that might arise from the contents being on different blocks in the PDB. In subsequent patches I plan to improve the "explain" subcommand so that you can explain the contents of a binary file that isn't necessarily a full PDB, but one of these dumped streams, by telling the subcommand how to interpret the contents. llvm-svn: 329002
* Remove HAVE_LIBPSAPI, HAVE_SHELL32.Nico Weber2018-04-022-11/+1
| | | | | | | | | | These used to be set in the old autoconf build, but the cmake build has had a "TODO: actually check for these" comment since it was checked in, and they were set to 1 on mingw unconditionally. It seems safe to say that they always exist under mingw, so just remove them and assume they're set exactly when on mingw (with msvc, we use `pragma comment` instead of linking these via flags). llvm-svn: 328992
* [DeadArgumentElim] Clone function level metadatasRong Xu2018-04-021-5/+11
| | | | | | | | | | | | Some Function level metadatas, such as function entry count, are not cloned in DeadArgumentElim. This happens a lot in lto/thinlto because of DeadArgumentElim after internalization. This patch clones the metadatas in the original function to the new function. Differential Revision: https://reviews.llvm.org/D44127 llvm-svn: 328991
* Remove HAVE_DIRENT_H.Nico Weber2018-04-021-17/+2
| | | | | | | The autoconf manual: "This macro is obsolescent, as all current systems with directory libraries have <dirent.h>. New programs need not use this macro." llvm-svn: 328989
* [AMDGPU][MC][GFX9] Added instructions v_cvt_norm_*16_f16, v_sat_pk_u8_i16Dmitry Preobrazhensky2018-04-021-0/+8
| | | | | | | | | See bug 36847: https://bugs.llvm.org/show_bug.cgi?id=36847 Differential Revision: https://reviews.llvm.org/D45097 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328988
* [coroutines] Add support for llvm.coro.noop intrinsicsGor Nishanov2018-04-022-7/+48
| | | | | | | | | | | | | | | | Summary: A recent addition to Coroutines TS (https://wg21.link/p0913) adds a pre-defined coroutine noop_coroutine that does nothing. To implement this feature, we implemented an llvm.coro.noop intrinsic that returns a coroutine handle to a coroutine that does nothing when resumed or destroyed. Reviewers: EricWF, modocache, rnk, lewissbaker Reviewed By: modocache Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45114 llvm-svn: 328986
* [AMDGPU][MC][GFX9] Added s_atomic_* and s_buffer_atomic_* instructionsDmitry Preobrazhensky2018-04-024-1/+204
| | | | | | | | | | | Fixed a bug which caused Tablegen crash. See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837 Differential Revision: https://reviews.llvm.org/D45085 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328983
* [Hexagon] Clean up some code in HexagonAsmPrinter, NFCKrzysztof Parzyszek2018-04-022-48/+53
| | | | llvm-svn: 328981
* [SLP] Fix PR36481: vectorize reassociated instructions.Alexey Bataev2018-04-022-92/+288
| | | | | | | | | | | | | | | Summary: If the load/extractelement/extractvalue instructions are not originally consecutive, the SLP vectorizer is unable to vectorize them. Patch allows reordering of such instructions. Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43776 llvm-svn: 328980
* Revert r328975, it makes TableGen assert on the bots.Nico Weber2018-04-024-204/+1
| | | | llvm-svn: 328978
* Remove HAVE_WRITEV that's unused after r255837.Nico Weber2018-04-021-3/+0
| | | | llvm-svn: 328977
* [AMDGPU][MC][GFX9] Added s_atomic_* and s_buffer_atomic_* instructionsDmitry Preobrazhensky2018-04-024-1/+204
| | | | | | | | | See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837 Differential Revision: https://reviews.llvm.org/D45085 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328975
* Attempt to heal bots after r328970.Nico Weber2018-04-021-6/+1
| | | | llvm-svn: 328974
* [X86] Reduce Store Forward Block issues in HW - Recommit after fixing Bug 36346Lama Saba2018-04-024-0/+729
| | | | | | | | | | | | | | | | | If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory. A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load. The estimated penalty for a store forward block is ~13 cycles. This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence of a load and a store. The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies. breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM. Differential revision: https://reviews.llvm.org/D41330 Change-Id: Ib48836ccdf6005989f7d4466fa2035b7b04415d9 llvm-svn: 328973
* [PowerPC] fix assertion failure due to missing instruction in ↵Hiroshi Inoue2018-04-021-8/+4
| | | | | | | | P9InstrResources.td This patch adds L(D|W|H|B)XTLS instructions introduced by https://reviews.llvm.org/rL327635 in P9InstrResources.td. llvm-svn: 328969
* [dsymutil] Upstream emitting of papertrail warnings.Jonas Devlieghere2018-04-021-0/+1
| | | | | | | | | | When running dsymutil as part of your build system, it can be desirable for warnings to be part of the end product, rather than just being emitted to the output stream. This patch upstreams that functionality. Differential revision: https://reviews.llvm.org/D44639 llvm-svn: 328965
* [X86][Silvermont] Use correct latency and throughput information for divide ↵Craig Topper2018-04-021-0/+115
| | | | | | | | and square root in the scheduler model. Data taken from Table 16-17 in the Intel Optimization Manual. llvm-svn: 328962
* [X86][SkylakeServer] Correct throughput for 512-bit sqrt and divide.Craig Topper2018-04-021-29/+28
| | | | | | Data taken from the AVX512_SKX_PortAssign spreadsheet at http://instlatx64.atw.hu/ llvm-svn: 328961
* [X86] Correct the throughput for divide instructions in Sandy ↵Craig Topper2018-04-025-220/+318
| | | | | | | | Bridge/Haswell/Broadwell/Skylake scheduler models. Fixes most of PR36898. Still need to fix the 512-bit instructions, but Agner's tables don't have those. llvm-svn: 328960
* [X86] Fix the SchedRW for AVX512 shift instructions.Craig Topper2018-04-022-8/+14
| | | | | | It was being inadvertently defaulted to an FADD scheduler class. llvm-svn: 328959
* [X86] Give the AVX512 VEXTRACT instructions the same SchedRWs as the SSE/AVX ↵Craig Topper2018-04-021-29/+19
| | | | | | versions. llvm-svn: 328958
* [X86] Add an itinerary to BTR64rr.Craig Topper2018-04-021-1/+2
| | | | llvm-svn: 328956
* [X86] Make sure all the classes declare in the Haswell scheduler model are ↵Craig Topper2018-04-021-66/+66
| | | | | | | | prefixed with HW. The tablegen files all share a namespace so we shouldn't use a generic names in a specific scheduler model. llvm-svn: 328955
* [X86] Give VINSERTPS the same intinerary as INSERTPS.Craig Topper2018-04-021-3/+4
| | | | llvm-svn: 328954
* Add C API bindings for DIBuilder 'Type' APIsHarlan Haskins2018-04-021-1/+190
| | | | | | | | | | This patch adds a set of unstable C API bindings to the DIBuilder interface for creating structure, function, and aggregate types. This patch also removes the existing implementations of these functions from the Go bindings and updates the Go API to fit the new C APIs. llvm-svn: 328953
* [X86] Cleanup ADCX/ADOX instruction definitions.Craig Topper2018-04-011-29/+45
| | | | | | Give them both the same itineraries. Add hasSideEffects = 0 to ADOX since they don't have patterns. Rename source operands to $src1 and $src2 instead of $src0 and $src. Add ReadAfterLd to the memory form SchedRW. llvm-svn: 328952
* [AArch64] Reserve x18 register on FuchsiaPetr Hosek2018-04-011-2/+2
| | | | | | | | This register is reserved as a platform register on Fuchsia. Differential Revision: https://reviews.llvm.org/D45105 llvm-svn: 328950
OpenPOWER on IntegriCloud