summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* Verifier: Don't crash on null entries in debug info retained types listDavid Blaikie2015-08-221-1/+1
| | | | | | | | There was already a good error path for this. Added a test for it & made a minor code change to ensure the error path was actually reached, rather than crashing before we got that far. llvm-svn: 245795
* [NVPTX] Allow undef value as global initializerJingyue Wu2015-08-221-3/+5
| | | | | | | | | | | | | | | | | | Summary: __shared__ variable may now emit undef value as initializer, do not throw error on that. Test Plan: test/CodeGen/NVPTX/global-addrspace.ll Patch by Xuetian Weng Reviewers: jholewinski, tra, jingyue Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D12242 llvm-svn: 245785
* LTO: Maintain target triple, FeatureStr and CGOptLevel in the module or ↵Peter Collingbourne2015-08-221-18/+22
| | | | | | | | LTOCodeGenerator. This makes it easier to create new TargetMachines on demand. llvm-svn: 245781
* AMDGPU: Allow specifying different opcode on VI for SMRD/SMEMMatt Arsenault2015-08-222-15/+21
| | | | | | | | Although the basic s_load_* instructions happen to use the same opcode, some of the special case SMRD instructions have different opcodes. llvm-svn: 245775
* AMDGPU: Improve accuracy of instruction rates for some FP instructionsMatt Arsenault2015-08-222-7/+27
| | | | llvm-svn: 245774
* AMDGPU: Use DFS to avoid second loop over functionMatt Arsenault2015-08-221-15/+13
| | | | llvm-svn: 245772
* AMDGPU: Make sure to run verifier after SIFixSGPRLiveRangesMatt Arsenault2015-08-221-1/+1
| | | | llvm-svn: 245769
* AMDGPU: Improve debug printing in SIFixSGPRLiveRangesMatt Arsenault2015-08-221-6/+15
| | | | llvm-svn: 245768
* AMDGPU: Move CI instructions into CIInstructions.tdMatt Arsenault2015-08-222-70/+69
| | | | | | There are still a couple of CI patterns left in SIInstructions. llvm-svn: 245767
* AMDGPU: Minor cleanups to help with f16 supportMatt Arsenault2015-08-211-9/+11
| | | | | | | | The main change is inverting the condition for the operand class classes so that VT.Size == 16 uses VGPR_32 instead of 64. llvm-svn: 245764
* Improve the determinism of MergeFunctionsJF Bastien2015-08-211-45/+157
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Merge functions previously relied on unsigned comparisons of pointer values to order functions. This caused observable non-determinism in the compiler for large bitcode programs. Basically, opt -mergefuncs program.bc | md5sum produces different hashes when run repeatedly on the same machine. Differing output was observed on three large bitcodes, but it was less frequent on the smallest file. It is possible that this only manifests on the large inputs, hence remaining undetected until now. This patch fixes this by removing (almost, see below) all places where comparisons between pointers are used to order functions. Most of these changes are local, but the comparison of global values requires assigning an identifier to each local in the order it is visited. This is very similar to the way the comparison function identifies Value*'s defined within a function. Because the order of visiting the functions and their subparts is deterministic, the identifiers assigned to the globals will be as well, and the order of functions will be deterministic. With these changes, there is no more observed non-determinism. There is also only minor slowdowns (negligible to 4%) compared to the baseline, which is likely a result of the fact that global comparisons involve hash lookups and not just pointer comparisons. The one caveat so far is that programs containing BlockAddress constants can still be non-deterministic. It is not clear what the right solution is here. In particular, even if the global numbers are used to order by function, we still need a way to order the BasicBlock*'s. Unfortunately, we cannot just bail out and fail to order the functions or consider them equal, because we require a total order over functions. Note that programs with BlockAddress constants are relatively rare, so the impact of leaving this in is minor as long as this pass is opt-in. Author: jrkoenig Reviewers: nlewycky, jfb, dschuff Subscribers: jevinskie, llvm-commits, chapuni Differential revision: http://reviews.llvm.org/D12168 llvm-svn: 245762
* [LAA] Hold bounds via ValueHandles during SCEV expansionAdam Nemet2015-08-211-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SCEV expansion can invalidate previously expanded values. For example in SCEVExpander::ReuseOrCreateCast, if we already have the requested cast value but it's not at the desired location, a new cast is inserted and the old cast will be invalidated. Therefore, when expanding the bounds for the pointers, a later entry can invalidate the IR value for an earlier one. The fix is to store a value handle rather than the value itself. The newly added test has a more detailed description of how the bug triggers. This bug can have a negative but potentially highly variable performance impact in Loop Distribution. Because one of the bound values was invalidated and is an undef expression now, InstCombine is free to transform the array overlap check: Start0 <= End1 && Start1 <= End0 into: Start0 <= End1 So depending on the runtime location of the arrays, we would detect a conflict and fall back on the original loop of the versioned loop. Also tested compile time with SPEC2006 LTO bc files. llvm-svn: 245760
* Standardized 'failed' to 'Failed' in LoopVectorizationRequirements.Tyler Nowicki2015-08-211-4/+4
| | | | llvm-svn: 245759
* LTO: Change signature of LTOCodeGenerator::setCodePICModel() to take a ↵Peter Collingbourne2015-08-211-30/+0
| | | | | | | | | Reloc::Model. This allows us to remove a bunch of code in LTOCodeGenerator and llvm-lto and has the side effect of improving error handling in the libLTO C API. llvm-svn: 245756
* AMDGPU/SI: Better handle s_wait insertionTom Stellard2015-08-211-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can wait on either VM, EXP or LGKM. The waits are independent. Without this patch, a wait inserted because of one of them would also wait for all the previous others. This patch makes s_wait only wait for the ones we need for the next instruction. Here's an example of subtle perf reduction this patch solves: This is without the patch: buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen s_load_dwordx4 s[44:47], s[8:9], 0xc s_waitcnt lgkmcnt(0) buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen s_load_dwordx4 s[48:51], s[8:9], 0x10 s_waitcnt vmcnt(1) buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen The s_waitcnt vmcnt(1) is useless. The reason it is added is because the last buffer_load_format_xyzw needs s[44:47], which was issued by the first s_load_dwordx4. It waits for all VM before that call to have finished. Internally after every instruction, 3 counters (for VM, EXP and LGTM) are updated after every instruction. For example buffer_load_format_xyzw will increase the VM counter, and s_load_dwordx4 the LGKM one. Without the patch, for every defined register, the current 3 counters are stored, and are used to know how long to wait when an instruction needs the register. Because of that, the s[44:47] counter includes that to use the register you need to wait for the previous buffer_load_format_xyzw. Instead this patch stores only the counters that matter for the register, and puts zero for the other ones, since we don't need any wait for them. Patch by: Axel Davy Differential Revision: http://reviews.llvm.org/D11883 llvm-svn: 245755
* Re-apply r245635, "[InstCombine] Transform A & (L - 1) u< L --> L != 0"Sanjoy Das2015-08-211-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | The original checkin was buggy, this change has a fix. Original commit message: [InstCombine] Transform A & (L - 1) u< L --> L != 0 Summary: This transform is never a pessimization at the IR level (since it replaces an `icmp` with another), and has potentiall payoffs: 1. It may make the `icmp` fold away or become loop invariant. 2. It may make the `A & (L - 1)` computation dead. This shows up in Java, in range checks generated by array accesses of the form `a[i & (a.length - 1)]`. Reviewers: reames, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12210 llvm-svn: 245753
* Range-for-ify some things in GlobalMergeDavid Blaikie2015-08-211-19/+14
| | | | llvm-svn: 245752
* [opaque pointer types] Fix a few easy places in GlobalMerge that were ↵David Blaikie2015-08-211-10/+7
| | | | | | accessing value types through pointee types llvm-svn: 245746
* MIR Serialization: Serialize the pointer IR expression values in the machineAlex Lorenz2015-08-214-9/+46
| | | | | | memory operands. llvm-svn: 245745
* [ARM] Fix MachO CPU Subtype selectionVedant Kumar2015-08-212-13/+38
| | | | | | Differential Revision: http://reviews.llvm.org/D12040 llvm-svn: 245744
* MIRParser: Split the 'parseIRConstant' method into two methods. NFC.Alex Lorenz2015-08-211-3/+12
| | | | | | | One variant of this method can be reused when parsing the quoted IR pointer expressions in the machine memory operands. llvm-svn: 245743
* [opaque pointer types] Push the passing of value types up from ↵David Blaikie2015-08-212-5/+5
| | | | | | | | | Function/GlobalVariable to GlobalObject (coming next, pushing this up into GlobalValue, so it can store the value type directly) llvm-svn: 245742
* [PowerPC] PPCVSXFMAMutate should not segfault on undef input registersHal Finkel2015-08-211-0/+5
| | | | | | | | | | When PPCVSXFMAMutate would look at the input addend register, it would get its input value number. This would fail, however, if the register was undef, causing a segfault. Don't segfault (just skip such FMA instructions). Fixes the test case from PR24542 (although that may have been over-reduced). llvm-svn: 245741
* AsmParser: Save and restore the parsing state for types using SlotMapping.Alex Lorenz2015-08-213-4/+27
| | | | | | | | | | | | | | | | | | | | | This commit extends the 'SlotMapping' structure and includes mappings for named and numbered types in it. The LLParser is extended accordingly to fill out those mappings at the end of module parsing. This information is useful when we want to parse standalone constant values at a later stage using the 'parseConstantValue' method. The constant values can be constant expressions, which can contain references to types. In order to parse such constant values, we have to restore the internal named and numbered mappings for the types in LLParser, otherwise the parser will report a parsing error. Therefore, this commit also introduces a new method called 'restoreParsingState' to LLParser, which uses the slot mappings to restore some of its internal parsing state. This commit is required to serialize constant value pointers in the machine memory operands for the MIR format. Reviewers: Duncan P. N. Exon Smith llvm-svn: 245740
* [LVI] Use a SmallVector instead of SmallPtrSet. NFCBruno Cardoso Lopes2015-08-211-2/+2
| | | | llvm-svn: 245739
* MIR Serialization: Print MCSymbol operands.Alex Lorenz2015-08-212-4/+5
| | | | | | | | This commit allows the MIR printer to print the MCSymbol machine operands. Unfortunately they can't be parsed at this time. I will create a bug that will track the fact that the MCSymbol operands can't be parsed yet. llvm-svn: 245737
* [x86] enable machine combiner reassociations for 256-bit vector min/maxSanjay Patel2015-08-211-0/+4
| | | | llvm-svn: 245735
* remove 'FeatureSlowUAMem' from AMD CPUs based on 10H micro-arch or laterSanjay Patel2015-08-211-11/+7
| | | | | | | See discussion in D12154 ( http://reviews.llvm.org/D12154 ), AMD Software Optimization Guides for 10H/12H/15H/16H, and Agner Fog's experimental data. llvm-svn: 245733
* Add comment as follow up to r245712David Blaikie2015-08-211-0/+1
| | | | llvm-svn: 245730
* [x86] invert logic for attribute 'FeatureFastUAMem'Sanjay Patel2015-08-215-89/+98
| | | | | | | | | | | | | | | | This is a 'no functional change intended' patch. It removes one FIXME, but adds several more. Motivation: the FeatureFastUAMem attribute may be too general. It is used to determine if any sized misaligned memory access under 32-bytes is 'fast'. From the added FIXME comments, however, you can see that we're not consistent about this. Changing the name of the attribute makes it clearer to see the logic holes. Changing this to a 'slow' attribute also means we don't have to add an explicit 'fast' attribute to new chips; fast unaligned accesses have been standard for several generations of CPUs now. Differential Revision: http://reviews.llvm.org/D12154 llvm-svn: 245729
* [opaque pointer type]: Pass explicit pointee type when building a constant GEP.David Blaikie2015-08-212-7/+15
| | | | | | | | | | Gets a bit tricky in the ValueMapper, of course - not sure if we should just expose a list of explicit types for each Value so that the ValueMapper can be neutral to these special cases (it's OK for things like load, where the explicit type is the result type - but when that's not the case, it means plumbing through another "special" type... ) llvm-svn: 245728
* [x86] enable machine combiner reassociations for 128-bit vector min/maxSanjay Patel2015-08-211-0/+8
| | | | llvm-svn: 245715
* Remove an unnecessary use of pointee types introduced in r194220David Blaikie2015-08-211-3/+2
| | | | | | | | David Majnemer (the original author) believes this to be an impossible condition to reach anyway, and no test cases cover this so we'll go with that. llvm-svn: 245712
* Disable Visual C++ 2013 Debug mode assert on null pointer in some STL ↵Yaron Keren2015-08-211-1/+1
| | | | | | | | | | | | | | | algorithms, such as std::equal on the third argument. This reverts previous workarounds. Predefining _DEBUG_POINTER_IMPL disables Visual C++ 2013 headers from defining it to a function performing the null pointer check. In practice, it's not that bad since any function actually using the nullptr will seg fault. The other iterator sanity checks remain enabled in the headers. Reviewed by Aaron Ballmanþ and Duncan P. N. Exon Smith. llvm-svn: 245711
* [APFloat] Remove else after return and replace loop with std::equal. NFC.Benjamin Kramer2015-08-211-11/+5
| | | | llvm-svn: 245707
* Fix typo - symetric -> symmetric.Eric Christopher2015-08-211-1/+1
| | | | llvm-svn: 245705
* [DAGCombiner] Fold together mul and shl when both are by a constantJohn Brawn2015-08-211-0/+8
| | | | | | | | | | This is intended to improve code generation for GEPs, as the index value is shifted by the element size and in GEPs of multi-dimensional arrays the index of higher dimensions is multiplied by the lower dimension size. Differential Revision: http://reviews.llvm.org/D12197 llvm-svn: 245689
* Revert r245635, "[InstCombine] Transform A & (L - 1) u< L --> L != 0"NAKAMURA Takumi2015-08-211-13/+0
| | | | | | It caused miscompilation in clang. llvm-svn: 245678
* Linker: Remove empty destructor.Peter Collingbourne2015-08-211-3/+0
| | | | llvm-svn: 245672
* LTO: Simplify ownership of LTOCodeGenerator::TargetMach.Peter Collingbourne2015-08-211-6/+3
| | | | llvm-svn: 245671
* LTO: Simplify ownership of LTOCodeGenerator::CodegenOptions.Peter Collingbourne2015-08-211-15/+9
| | | | llvm-svn: 245670
* [Sparc] Support user-specified stack object overalignment.James Y Knight2015-08-214-27/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note: I do not implement a base pointer, so it's still impossible to have dynamic realignment AND dynamic alloca in the same function. This also moves the code for determining the frame index reference into getFrameIndexReference, where it belongs, instead of inline in eliminateFrameIndex. [Begin long-winded screed] Now, stack realignment for Sparc is actually a silly thing to support, because the Sparc ABI has no need for it -- unlike the situation on x86, the stack is ALWAYS aligned to the required alignment for the CPU instructions: 8 bytes on sparcv8, and 16 bytes on sparcv9. However, LLVM unfortunately implements user-specified overalignment using stack realignment support, so for now, I'm going to go along with that tradition. GCC instead treats objects which have alignment specification greater than the maximum CPU-required alignment for the target as a separate block of stack memory, with their own virtual base pointer (which gets aligned). Doing it that way avoids needing to implement per-target support for stack realignment, except for the targets which *actually* have an ABI-specified stack alignment which is too small for the CPU's requirements. Further unfortunately in LLVM, the default canRealignStack for all targets effectively returns true, despite that implementing that is something a target needs to do specifically. So, the previous behavior on Sparc was to silently ignore the user's specified stack alignment. Ugh. Yet MORE unfortunate, if a target actually does return false from canRealignStack, that also causes the user-specified alignment to be *silently ignored*, rather than emitting an error. (I started looking into fixing that last, but it broke a bunch of tests, because LLVM actually *depends* on having it silently ignored: some architectures (e.g. non-linux i386) have smaller stack alignment than spilled-register alignment. But, the fact that a register needs spilling is not known until within the register allocator. And by that point, the decision to not reserve the frame pointer has been frozen in place. And without a frame pointer, stack realignment is not possible. So, canRealignStack() returns false, and needsStackRealignment() then returns false, assuming everyone can just go on their merry way assuming the alignment requirements were probably just suggestions after-all. Sigh...) Differential Revision: http://reviews.llvm.org/D12208 llvm-svn: 245668
* TransformUtils: Introduce module splitter.Peter Collingbourne2015-08-213-0/+125
| | | | | | | | | | | | | The module splitter splits a module into linkable partitions. It will be used to implement parallel LTO code generation. This initial version of the splitter does not attempt to deal with the somewhat subtle symbol visibility issues around module splitting. These will be dealt with in a future change. Differential Revision: http://reviews.llvm.org/D12132 llvm-svn: 245662
* SparcAsmParser.cpp: Appease msc x86.NAKAMURA Takumi2015-08-211-1/+1
| | | | llvm-svn: 245661
* AArch64: Fix cmp;ccmp orderingMatthias Braun2015-08-201-3/+10
| | | | | | | | | | | | When producing conditional compare sequences for or operations we need to negate the operands and the finally tested flags. The thing is if we negate the finally tested flags this equals a logical negation of all previously emitted expressions. There was a case missing where we have to order OR expressions so they get emitted first. This fixes http://llvm.org/PR24459 llvm-svn: 245641
* AArch64: Do not create CCMP on multiple users.Matthias Braun2015-08-201-1/+1
| | | | | | | | | Create CMP;CCMP sequences from and/or trees does not gain us anything if the and/or tree is materialized to a GP register anyway. While most of the code already checked for hasOneUse() there was one important case missing. llvm-svn: 245640
* [InstSimplify] add nuw %x, C2 must be at least C2David Majnemer2015-08-201-0/+3
| | | | | | | Use the fact that add nuw always creates a larger bit pattern when trying to simplify comparisons. llvm-svn: 245638
* [WebAssembly] Mark more operators as Expand.Dan Gohman2015-08-201-0/+26
| | | | llvm-svn: 245636
* [InstCombine] Transform A & (L - 1) u< L --> L != 0Sanjoy Das2015-08-201-0/+13
| | | | | | | | | | | | | | | | | | | | Summary: This transform is never a pessimization at the IR level (since it replaces an `icmp` with another), and has potentiall payoffs: 1. It may make the `icmp` fold away or become loop invariant. 2. It may make the `A & (L - 1)` computation dead. This shows up in Java, in range checks generated by array accesses of the form `a[i & (a.length - 1)]`. Reviewers: reames, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12210 llvm-svn: 245635
* [SLP] Propagate 'nontemporal' attribute into vectorized instructions.Michael Zolotukhin2015-08-201-0/+3
| | | | llvm-svn: 245633
OpenPOWER on IntegriCloud