summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* Minor code cleanups. NFC.Junmo Park2016-03-111-3/+3
| | | | llvm-svn: 263200
* Minor code cleanup. NFC.Junmo Park2016-03-111-1/+1
| | | | llvm-svn: 263196
* Remove llvm::getDISubprogram in favor of Function::getSubprogramPete Cooper2016-03-116-30/+8
| | | | | | | | | | | | | | | | | llvm::getDISubprogram walks the instructions in a function, looking for one in the scope of the current function, so that it can find the !dbg entry for the subprogram itself. Now that !dbg is attached to functions, this should not be necessary. This patch changes all uses to just query the subprogram directly on the function. Ideally this should be NFC, but in reality its possible that a function: has no !dbg (in which case there's likely a bug somewhere in an opt pass), or that none of the instructions had a scope referencing the function, so we used to not find the !dbg on the function but now we will Reviewed by Duncan Exon Smith. Differential Revision: http://reviews.llvm.org/D18074 llvm-svn: 263184
* [LLE] Add missed LoopSimplify dependenceAdam Nemet2016-03-101-0/+3
| | | | | | | | | The code assumed that we always had a preheader without making the pass dependent on LoopSimplify. Thanks to Mattias Eriksson V for reporting this. llvm-svn: 263173
* AArch64: only try to use scaled fcvt ops on legal vector types.Tim Northover2016-03-101-1/+2
| | | | | | | Before we ended up calling getSimpleVectorType on a <3 x float>, which asserted. llvm-svn: 263169
* [x86] don't use a shuffle when a vselect will do; NFCISanjay Patel2016-03-101-16/+5
| | | | | | | | Looking at the IR definition of a masked load made me realize there was no reason to use a shuffle here, so we don't need to convert the format of the mask at all. llvm-svn: 263167
* Test commit accessMarianne Mailhot-Sarrasin2016-03-101-1/+1
| | | | llvm-svn: 263165
* Strip trailing whitespace.Simon Pilgrim2016-03-101-66/+66
| | | | llvm-svn: 263162
* [X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to ↵Simon Pilgrim2016-03-103-11/+43
| | | | | | | | | | | | ZERO_EXTEND_VECTOR_INREG Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns. Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion). Differential Revision: http://reviews.llvm.org/D17691 llvm-svn: 263159
* Support arbitrary addrspace pointers in masked load/store intrinsicsArtur Pilipenko2016-03-102-10/+49
| | | | | | | | | | | | This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 263158
* ARM: Support relative references using the PREL31 symbol variant.Peter Collingbourne2016-03-101-0/+3
| | | | | | Differential Revision: http://reviews.llvm.org/D17937 llvm-svn: 263156
* Materialize metadata in IRLinker before value mappingTeresa Johnson2016-03-101-5/+6
| | | | | | | | | | | | | | | | | | | Summary: Unless we plan to do later postpass metadata linking (ThinLTO special mode), always invoke metadata materialization at the start of IRLinker::run(). This avoids the need for clients who use lazy metadata loading to explicitly invoke materializeMetadata before the IRMover, which in turn invokes IRLinker::run and needs materialized metadata for mapping. Came up in the context of an LLD issue (D17982). Reviewers: rafael Subscribers: silvas, llvm-commits Differential Revision: http://reviews.llvm.org/D17992 llvm-svn: 263143
* AArch64: remove pseudo-instructions used only for their patterns.Tim Northover2016-03-101-15/+42
| | | | | | | There's no real reason for these pseudos to exist, we should be writing real patterns even if it is slightly less convenient. NFC. llvm-svn: 263141
* AMDGPU/SI: add llvm.amdgcn.buffer.load/store.format intrinsicsNicolai Haehnle2016-03-102-15/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa to implement the GL_ARB_shader_image_load_store extension. The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling and loads). However, this is not currently implemented. For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller" opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32 is actually implemented since GLSL also only has a vec4 variant of the store instructions, although it's conceivable that Mesa will want to be smarter about this in the future. BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which has a legacy name, pretends not to access memory, and does not capture the full flexibility of the instruction. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17277 llvm-svn: 263140
* [X86] Correctly select registers to pop into for x86_64Michael Kuperstein2016-03-101-1/+2
| | | | | | | | | | | | | | When trying to replace an add to esp with pops, we need to choose dead registers to pop into. Registers clobbered by the call and not imp-def'd by it should be safe. Except that it's not enough to check the register itself isn't defined, we also need to make sure no overlapping registers are defined either. This fixes PR26711. Differential Revision: http://reviews.llvm.org/D18029 llvm-svn: 263139
* [AArch64] Optimize compare and branch sequence when the compare's constant ↵Balaram Makam2016-03-101-25/+82
| | | | | | | | | | | | | | | | | | | | | | | operand is power of 2 Summary: Peephole optimization that generates a single TBZ/TBNZ instruction for test and branch sequences like in the example below. This handles the cases that miss folding of AND into TBZ/TBNZ during ISelLowering of BR_CC Examples: and w8, w8, #0x400 cbnz w8, L1 to tbnz w8, #10, L1 Reviewers: MatzeB, jmolloy, mcrosier, t.p.northover Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17942 llvm-svn: 263136
* [ARM] Cortex-R8 supportAlexandros Lamprineas2016-03-101-0/+12
| | | | | | | | | | | This patch adds Cortex-R8 to Target Parser and TableGen. It also adds CodeGen tests for the build attributes. Patch by Pablo Barrio. Differential Revision: http://reviews.llvm.org/D17925 llvm-svn: 263132
* Rename -discard-value-names into -lto-discard-value-names in libLLVMLTOMehdi Amini2016-03-101-2/+2
| | | | | | | | | | This is avoiding a naming conflict with opt and llc. While opt and llc don't link to LTO usually, users that are building a monolithic libLLVM.dylib and linking the tools to it would have a runtime error because of the duplicate cl::opt registration. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 263127
* AMDGPU/SI: Define S_GETREG IntrinsicChangpeng Fang2016-03-101-0/+12
| | | | | | | | | | | | | | Summary: Define s_getreg intrinsic to generate s_getreg instruction to read hardware registers. Reviewers: tstellarAMD, arsenm Subscribers: llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D17892 llvm-svn: 263124
* ARM: follow up improvements for SVN r263118Saleem Abdulrasool2016-03-104-4/+14
| | | | | | | | | | | | | | The initial change was insufficiently complete for always getting the semantics of __builtin_longjmp correct. The builtin is translated into a `tInt_eh_sjlj_longjmp` DAG node. This node set R7 as clobbered. However, the code would then follow up with a clobber of R11. I had failed to notice the imp-def,kill on R7 in the isel. Unfortunately, it seems that it is not possible to conditionalise the Defs list via an !if. Instead, construct a new parallel WIN node and prefer that when targeting windows. This ensures that we now both correctly model the __builtin_longjmp as well as construct the frame in a more ABI conformant manner. llvm-svn: 263123
* [SROA] Fix PR25873, which Andrea Di Biagio analyzed the daylights outChandler Carruth2016-03-101-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of, and I misdiagnosed for months and months. Andrea has had a patch for this forever, but I just couldn't see how it was fixing the root cause of the problem. It didn't make sense to me, even though the patch was perfectly good and the analysis of the actual failure event was *fantastic*. Well, I came back to it today because the patch has sat for *far* too long and needs attention and decided I wouldn't let it go until I really understood what was going on. After quite some time in the debugger, I finally realized that in fact I had just missed an important case with my previous attempt to fix PR22093 in r225149. Not only do we need to handle loads that won't be split, but stores-of-loads that we won't split. We *do* actually have enough logic in the presplitting to form new slices for split stores.... *unless* we decided not to split them! I'm so sorry that it took me this long to come to the realization that this is the issue. It seems so obvious in hind sight (of course). Anyways, the fix becomes *much* smaller and more focused. The fact that we're left doing integer smashing is related to the FIXME in my original commit: fundamentally, we're not aggressive about pre-splitting for loads and stores to the same alloca. If we want to get aggressive about this, it'll need both what Andrea had put into the proposed fix, but also a *lot* more logic to essentially iteratively pre-split the alloca until we can't do any more. As I said in that commit log, its really unclear that this is the right call. Instead, the integer blending and letting targets lower this to narrower stores seems slightly better. But we definitely shouldn't really go down that path just to fix this bug. Again, tons of thanks are owed to Andrea and others at Sony for working on this bug. I really should have seen what was going on here and re-directed them sooner. =//// llvm-svn: 263121
* Unified the handling of returns in the X87 stackifier so that the stackifierDavid L Kreitzer2016-03-101-90/+93
| | | | | | | | runs successfully on routines containing IRETs. This fixes PR26410. Differential Revision: http://reviews.llvm.org/D17643 llvm-svn: 263120
* ARM: correct __builtin_longjmp on WoASaleem Abdulrasool2016-03-101-1/+3
| | | | | | | | WoA uses r11 as the FP even though it is a pure thumb-2 environment in contrast to AAPCS which states r7. This adjusts __builtin_longjmp to not clobber r7 and to properly restore the frame pointer on execution. llvm-svn: 263118
* [CG] Back out my pointless move ctor and add the explicit templateChandler Carruth2016-03-101-0/+3
| | | | | | instantiation needed for the mingw dll build bot. llvm-svn: 263114
* [SROA] Clean up some really weird code, no functionality changed.Chandler Carruth2016-03-101-3/+3
| | | | | | | | | | | We already have the instruction extracted into 'I', just cast that to a store the way we do for loads. Also, we don't enter the if unless SI is non-null, so don't test it again for null. I'm pretty sure the entire test there can be nuked, but this is just the trivial cleanup. llvm-svn: 263112
* AVX-512: Fixed a bug in i1 vector zero extending. (Skylake-avx512)Elena Demikhovsky2016-03-101-23/+27
| | | | | | | | (failed on instruction selection phase) Differential Revision: http://reviews.llvm.org/D17924 llvm-svn: 263111
* [AMDGPU] Fix SMEM instructions encoding/operand namingsValery Pykhtin2016-03-105-37/+91
| | | | | | Differential Revision: http://reviews.llvm.org/D17651 llvm-svn: 263108
* [X86][AVX] Improve target shuffle combining of BLEND+zeroSimon Pilgrim2016-03-101-1/+2
| | | | | | | | The BLEND+zero combine was failing to combine equivalent BLEND masks. Follow up to D17483 and D17858 llvm-svn: 263105
* [CG] Add a new pass manager printer pass for the old call graph andChandler Carruth2016-03-103-0/+9
| | | | | | | | | | | | | | | | | actually finish wiring up the old call graph. There were bugs in the old call graph that hadn't been caught because it wasn't being tested. It wasn't being tested because it wasn't in the pipeline system and we didn't have a printing pass to run in tests. This fixes all of that. As for why I'm still keeping the old call graph alive its so that I can port GlobalsAA to the new pass manager with out forking it to work with the lazy call graph. That's clearly the right eventual design, but it seems pragmatic to defer that until its necessary. The old call graph works just fine for GlobalsAA. llvm-svn: 263104
* [LCG] Spell the printing pass pipeline name for the lazy call graphChandler Carruth2016-03-101-1/+1
| | | | | | | | | | 'lcg' instead of just 'cg'. This makes it consistent with the analysis name of 'lcg'. No functionality changed. llvm-svn: 263103
* [X86][SSE] Basic combining of unary target shuffles of binary target shuffles.Simon Pilgrim2016-03-101-12/+18
| | | | | | | | | | | | This patch reorders the combining of target shuffle masks so that when a unary shuffle takes a binary shuffle as its input but only references one of its inputs it can correctly combine into a unary shuffle mask. This is starting to encroach on the purpose of resolveTargetShuffleInputs, but I don't want to remove it until we definitely know we won't need it for full binary shuffle combining. There is a lot more work before we can properly support binary target shuffle masks but this was an easy case to add support for. Differential Revision: http://reviews.llvm.org/D17858 llvm-svn: 263102
* [CG] Actually hoist up the generic CallGraphPrinter pass from a weirdChandler Carruth2016-03-102-0/+27
| | | | | | | | | | location in the opt tool to live along side the analysis in LLVM's libraries. No functionality changed here, but this will allow me to port the printer to the new pass manager as well. llvm-svn: 263101
* [CG] Rename the DOT printing pass to actually reference "DOT".Chandler Carruth2016-03-102-8/+8
| | | | | | | | | | | | | There is another pass by the generic name 'CallGraphPrinter' which is actually just a call graph printer tucked away inside the opt tool. I'd like to bring it out and make it follow the same patterns as the rest of the CallGraph code, but doing so would end up conflicting with the name of the DOT printing pass. So this makes the DOT printing pass name be more precise. No functionality changed here. llvm-svn: 263100
* AVX-512: Fixed a bug in shuffle for v64i8 typeElena Demikhovsky2016-03-101-0/+2
| | | | | | | | Operation SCALAR_TO_VECTOR for v64i8 and v32i16 should be lowered if BW feature is "on". Differential Revision: http://reviews.llvm.org/D17994 llvm-svn: 263097
* Add support for a preserve_most calling convention to the AArch64 backend.Roman Levenstein2016-03-105-1/+19
| | | | | | | | | | This change adds a support for a preserve_most calling convention to the AArch64 backend, similar to how it was done for X86-64. There is also a subsequent patch on top of this one to add a tail-calls support for this calling convention. Differential Revision: http://reviews.llvm.org/D18016 llvm-svn: 263092
* [SLP] Add -slp-min-reg-size command line option.Michael Zolotukhin2016-03-101-9/+21
| | | | | | | | | | MinVecRegSize is currently hardcoded to 128; this patch adds a cl::opt to allow changing it. I tried not to change any existing behavior for the default case. Differential revision: http://reviews.llvm.org/D13278 llvm-svn: 263089
* Add a flag to the LLVMContext to disable name for Value other than GlobalValueMehdi Amini2016-03-106-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is intended to be a performance flag, on the same level as clang cc1 option "--disable-free". LLVM will never initialize it by default, it will be up to the client creating the LLVMContext to request this behavior. Clang will do it by default in Release build (just like --disable-free). "opt" and "llc" can opt-in using -disable-named-value command line option. When performing LTO on llvm-tblgen, the initial merging of IR peaks at 92MB without this patch, and 86MB after this patch,setNameImpl() drops from 6.5MB to 0.5MB. The total link time goes from ~29.5s to ~27.8s. Compared to a compile-time flag (like the IRBuilder one), it performs very close. I profiled on SROA and obtain these results: 420ms with IRBuilder that preserve name 372ms with IRBuilder that strip name 375ms with IRBuilder that preserve name, and a runtime flag to strip Reviewers: chandlerc, dexonsmith, bogner Subscribers: joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D17946 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 263086
* [gvn] Fix more indenting and formatting in regions of code that willChandler Carruth2016-03-101-64/+62
| | | | | | | | | | | need to be changed for porting to the new pass manager. Also sink the comment on the ValueTable class back to that class instead of it dangling on an anonymous namespace. No functionality changed. llvm-svn: 263084
* [gvn] Reformat a chunk of the GVN code that is strangely indented priorChandler Carruth2016-03-101-241/+240
| | | | | | | | to restructuring it for porting to the new pass manager. No functionality changed. llvm-svn: 263083
* [PM] Port memdep to the new pass manager.Chandler Carruth2016-03-1011-113/+118
| | | | | | | | | | | | | | | | | | | | | | | This is a fairly straightforward port to the new pass manager with one exception. It removes a very questionable use of releaseMemory() in the old pass to invalidate its caches between runs on a function. I don't think this is really guaranteed to be safe. I've just used the more direct port to the new PM to address this by nuking the results object each time the pass runs. While this could cause some minor malloc traffic increase, I don't expect the compile time performance hit to be noticable, and it makes the correctness and other aspects of the pass much easier to reason about. In some cases, it may make things faster by making the sets and maps smaller with better locality. Indeed, the measurements collected by Bruno (thanks!!!) show mostly compile time improvements. There is sadly very limited testing at this point as there are only two tests of memdep, and both rely on GVN. I'll be porting GVN next and that will exercise this heavily though. Differential Revision: http://reviews.llvm.org/D17962 llvm-svn: 263082
* [BasicAA/MDA] Sink aliasing rules for malloc and calloc into BasicAAPhilip Reames2016-03-092-16/+17
| | | | | | | | | | MemoryDependenceAnalysis had a hard-coded exception to the general aliasing rules for malloc and calloc. The reasoning that applied there is equally valid in BasicAA and clarifies the remaining logic in MDA. In principal, this can expose slightly more optimization opportunities, but since essentially all of our aliasing aware memory optimization passes go through MDA, this will likely be NFC in practice. Differential Revision: http://reviews.llvm.org/D15912 llvm-svn: 263075
* [CGP] Duplicate addressing computation in cold paths if required to sink ↵Philip Reames2016-03-091-8/+45
| | | | | | | | | | | | | | addressing mode This patch teaches CGP to duplicate addressing mode computations into cold paths (detected via explicit cold attribute on calls) if required to let addressing mode be safely sunk into the basic block containing each load and store. In general, duplicating code into cold blocks may result in code growth, but should not effect performance. In this case, it's better to duplicate some code than to put extra pressure on the register allocator by making it keep the address through the entirely of the fast path. This patch only handles addressing computations, but in principal, we could implement a more general cold cold scheduling heuristic which tries to reduce register pressure in the fast path by duplicating code into the cold path. Getting the profitability of the general case right seemed likely to be challenging, so I stuck to the existing case (addressing computation) we already had. Differential Revision: http://reviews.llvm.org/D17652 llvm-svn: 263074
* Fix the buildPhilip Reames2016-03-091-0/+1
| | | | | | I screwed up rebasing 263072. This change fixes the build and passes all make check. llvm-svn: 263073
* [LICM] Store promotion when memory is thread localPhilip Reames2016-03-091-11/+56
| | | | | | | | | | | | This patch teaches LICM's implementation of store promotion to exploit the fact that the memory location being accessed might be provable thread local. The fact it's thread local weakens the requirements for where we can insert stores since no other thread can observe the write. This allows us perform store promotion even in cases where the store is not guaranteed to execute in the loop. Two key assumption worth drawing out is that this assumes a) no-capture is strong enough to imply no-escape, and b) standard allocation functions like malloc, calloc, and operator new return values which can be assumed not to have previously escaped. In future work, it would be nice to generalize this so that it works without directly seeing the allocation site. I believe that the nocapture return attribute should be suitable for this purpose, but haven't investigated carefully. It's also likely that we could support unescaped allocas with similar reasoning, but since SROA and Mem2Reg should destroy those, they're less interesting than they first might seem. Differential Revision: http://reviews.llvm.org/D16783 llvm-svn: 263072
* [x86] fix cost model inaccuracy for vector memory opsSanjay Patel2016-03-091-4/+4
| | | | | | | | | | | The irony of this patch is that one CPU that is affected is AMD Jaguar, and Jaguar has a completely double-pumped AVX implementation. But getting the cost model to reflect that is a much bigger problem. The small goal here is simply to improve on the lie that !AVX2 == SandyBridge. Differential Revision: http://reviews.llvm.org/D18000 llvm-svn: 263069
* [WebAssembly] Update known gcc test failuresDerek Schuff2016-03-091-3/+0
| | | | llvm-svn: 263068
* [x86, AVX] optimize masked loads with constant masksSanjay Patel2016-03-091-2/+44
| | | | | | | | Instead of a variable-blend instruction, form a blend with immediate because those are always cheaper. Differential Revision: http://reviews.llvm.org/D17899 llvm-svn: 263067
* [ValueTracking] Extract isKnownPositive [NFCI]Philip Reames2016-03-092-2/+14
| | | | | | Extract out a generic interface from a recently landed patch and document a TODO in case compile time becomes a problem. llvm-svn: 263062
* [InstCombine] (icmp sgt smin(PosA, B) 0) -> (icmp sgt B 0)Philip Reames2016-03-091-0/+13
| | | | | | | | When checking whether an smin is positive, we can move the comparison to one of the inputs if the other is known positive. If the known positive one is the min, then the other can't be negative. If the other is the min, then we compute the min. Differential Revision: http://reviews.llvm.org/D17873 llvm-svn: 263059
* [LLE] Add missing check for unit strideAdam Nemet2016-03-091-5/+13
| | | | | | | | | | I somehow missed this. The case in GCC (global_alloc) was similar to the new testcase except it had an array of structs rather than a two dimensional array. Fixes RP26885. llvm-svn: 263058
OpenPOWER on IntegriCloud