summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* Fix typos in ADCE commentsTobias Grosser2017-03-141-7/+7
| | | | llvm-svn: 297726
* [ValueTracking] Out of range shifts might be undefOliver Stannard2017-03-141-0/+8
| | | | | | | | | | | | | | If it is possible for the RHS of a shift operation to be greater than or equal to the bit-width, then the result might be undef, and we can't report any known bits. In some cases, this was allowing a transformation in instcombine which widened an undef value from i1 to i32, increasing the range of values that a function could return. Differential revision: https://reviews.llvm.org/D30781 llvm-svn: 297724
* [ARM] Move SMULW[B|T] isel to DAG CombineSam Parker2017-03-146-150/+147
| | | | | | | | | | | | Create nodes for smulwb and smulwt and move their selection from DAGToDAG to DAG combine. smlawb and smlawt can then be selected using tablegen. Added some helper functions to detect shift patterns as well as a wrapper around SimplifyDemandBits. Added a couple of extra tests. Differential Revision: https://reviews.llvm.org/D30708 llvm-svn: 297716
* Disable Callee Saved RegistersOren Ben Simhon2017-03-1413-34/+143
| | | | | | | | | | | | | | Each Calling convention (CC) defines a static list of registers that should be preserved by a callee function. All other registers should be saved by the caller. Some CCs use additional condition: If the register is used for passing/returning arguments – the caller needs to save it - even if it is part of the Callee Saved Registers (CSR) list. The current LLVM implementation doesn’t support it. It will save a register if it is part of the static CSR list and will not care if the register is passed/returned by the callee. The solution is to dynamically allocate the CSR lists (Only for these CCs). The lists will be updated with actual registers that should be saved by the callee. Since we need the allocated lists to live as long as the function exists, the list should reside inside the Machine Register Info (MRI) which is a property of the Machine Function and managed by it (and has the same life span). The lists should be saved in the MRI and populated upon LowerCall and LowerFormalArguments. The patch will also assist to implement future no_caller_saved_regsiters attribute intended for interrupt handler CC. Differential Revision: https://reviews.llvm.org/D28566 llvm-svn: 297715
* [AVX-512] Use iPTR instead of i64 in patterns for ↵Craig Topper2017-03-141-6/+6
| | | | | | extract_subvector/insert_subvector index. llvm-svn: 297707
* [TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improvedJonas Paulsson2017-03-147-41/+49
| | | | | | | | | | | | | | | | | | | | | getIntrinsicInstrCost() used to only compute scalarization cost based on types. This patch improves this so that the actual arguments are checked when they are available, in order to handle only unique non-constant operands. Tests updates: Analysis/CostModel/X86/arith-fp.ll Transforms/LoopVectorize/AArch64/interleaved_cost.ll Transforms/LoopVectorize/ARM/interleaved_cost.ll The improvement in getOperandsScalarizationOverhead() to differentiate on constants made it necessary to update the interleaved_cost.ll tests even though they do not relate to intrinsics. Review: Hal Finkel https://reviews.llvm.org/D29540 llvm-svn: 297705
* [AVX-512] Pre-emptively fix more places in fastisel where we might copy a ↵Craig Topper2017-03-141-9/+28
| | | | | | VK1 register into a AH/BH/CH/DH register. llvm-svn: 297704
* Recommitting Craig Topper's patch now that r296476 has been recommitted.Nirav Dave2017-03-141-1/+1
| | | | | | | | When checking if chain node is foldable, make sure the intermediate nodes have a single use across all results not just the result that was used to reach the chain node. This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency. llvm-svn: 297698
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2017-03-144-372/+397
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Recommiting with compiler time improvements Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 297695
* [libFuzzer] Reorder includes in testVitaly Buka2017-03-131-2/+2
| | | | llvm-svn: 297692
* [libFuzzer] Fix compilation of CustomCrossOverAndMutateTest on WindowsVitaly Buka2017-03-131-1/+2
| | | | llvm-svn: 297690
* Revert "Debug Info: Add basic support for external types references."Adrian Prantl2017-03-134-28/+1
| | | | | | | | | | | | | | This reverts commit r242302. External type refs of this form were never used by any LLVM frontend so this is effectively dead code. (They were introduced to support clang module debug info, but in the end we came up with a better design that doesn't use this feature at all.) rdar://problem/25897929 Differential Revision: https://reviews.llvm.org/D30917 llvm-svn: 297684
* [Thumb1] combine ADDC/SUBC with a negative immediateArtyom Skrobov2017-03-132-6/+20
| | | | | | | | | | | | Summary: This simple optimization has been split out of https://reviews.llvm.org/D30400 Reviewers: efriedma, jmolloy Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D30829 llvm-svn: 297682
* Make FileOutputBuffer fail early if you pass a directory.Rui Ueyama2017-03-131-0/+2
| | | | | | | | | | Previously, it created a temporary directory and then failed when FileOutputBuffer tried to rename that file to the destination file (which is actually a directory name). Differential Revision: https://reviews.llvm.org/D30912 llvm-svn: 297679
* [AVX-512] Fix another case where we are copying from a mask register using ↵Craig Topper2017-03-131-1/+2
| | | | | | | | AH/BH/CH/DH with fastisel. Fixes PR32256. Still planning to do an audit for other possible cases. llvm-svn: 297678
* Fix llvm-symbolizer to navigate both DW_AT_abstract_origin and ↵David Blaikie2017-03-131-22/+10
| | | | | | | | | DW_AT_specification in a single chain In a recent refactoring (r291959) this regressed to only following one or the other, not both, in a single chain. llvm-svn: 297676
* [IPRA] Change algorithm for RegUsageInfoCollector.Marcello Maggioni2017-03-131-3/+21
| | | | | | | | | | | | | | | | | The previous algorithm for RegUsageInfoCollector had pretty bad performance on architectures with a lot of registers that alias a lot one another, because we potentially iterate for every register over all the aliasing registers. This costs even more if the function is small and doesn't define a lot of registers. This patch changes the algorithm to one that while iterating over all the registers it will iterate over the aliasing registers only if the register itself is defined. This should be faster based on the assumption that only a subset of the whole LLVM registers set is actually defined in the function. Differential Revision: https://reviews.llvm.org/D30880 llvm-svn: 297673
* GlobalISel: Translate ConstantDataVectorVolkan Keles2017-03-131-0/+7
| | | | | | | | | | | | Reviewers: qcolombet, aditya_nandakumar, dsanders, t.p.northover, javed.absar, ab Reviewed By: qcolombet, dsanders, ab Subscribers: dberris, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30216 llvm-svn: 297670
* [X86][MMX] Fix folding of shift value loads to cover whole 64-bitsSimon Pilgrim2017-03-132-21/+0
| | | | | | | | | | | | rL230225 made the assumption that only the lower 32-bits of an MMX register load is used as a shift value, when in fact the whole 64-bits are reloaded and treated as a i64 to determine the shift value. This patch reverts rL230225 to ensure that the whole 64-bits of memory are folded and ensures that the upper 32-bit are zero'd for cases where the shift value has come from a scalar source. Found during fuzz testing. Differential Revision: https://reviews.llvm.org/D30833 llvm-svn: 297667
* Revert r295004 (Add MXCSR) due to errors reported by MachineVerifierAndrew Kaylor2017-03-133-37/+24
| | | | | | I am leaving the code in clang which filters mxcsr from the clobber list because that is still technically correct and will be useful again when the MXCSR register is reintroduced. llvm-svn: 297664
* AMDGPU: Re-use TM.getNullPointerValueMatt Arsenault2017-03-131-10/+8
| | | | llvm-svn: 297662
* Bring back r297624.Rafael Espindola2017-03-131-1/+1
| | | | | | The issues was just a missing REQUIRES in the test. llvm-svn: 297661
* AMDGPU: Treat 0 as private null pointer in addrspacecast loweringMatt Arsenault2017-03-132-8/+14
| | | | llvm-svn: 297658
* Revert "Fix crash when multiple raw_fd_ostreams to stdout are created."Rafael Espindola2017-03-131-1/+1
| | | | | | | This reverts commit r297624. It was failing on the bots. llvm-svn: 297657
* [Outliner] Add tail call supportJessica Paquette2017-03-133-34/+108
| | | | | | | | | | | | | | This commit adds tail call support to the MachineOutliner pass. This allows the outliner to insert jumps rather than calls in areas where tail calling is possible. Outlined tail calls include the return or terminator of the basic block being outlined from. Tail call support allows the outliner to take returns and terminators into consideration while finding candidates to outline. It also allows the outliner to save more instructions. For example, in the X86-64 outliner, a tail called outlined function saves one instruction since no return has to be inserted. llvm-svn: 297653
* [X86] Lower AVX2 gather intrinsics similar to AVX-512. Apply the same input ↵Craig Topper2017-03-133-84/+50
| | | | | | | | source optimizations to break execution dependencies. For AVX-512 we force the input to zero if the input is undef or the mask is all ones to break an execution dependency. This patch brings the same behavior to AVX2. llvm-svn: 297652
* [AVX-512] If gather mask is all ones, force the input to a zero vector.Craig Topper2017-03-131-1/+4
| | | | | | | | We were already forcing undef inputs to become a zero vector, this now catches an all ones mask too. Ideally we'd use undef and let execution dep fix handle picking the best register/clearance for the undef, but I don't think it can handle the early clobber today. llvm-svn: 297651
* AMDGPU: Fold icmp/fcmp into icmp intrinsicMatt Arsenault2017-03-131-0/+87
| | | | | | | The typical use is a library vote function which compares to 0. Fold the user condition into the intrinsic. llvm-svn: 297650
* [Linker] Provide callback for internalizationJonas Devlieghere2017-03-131-19/+27
| | | | | | Differential Revision: https://reviews.llvm.org/D30738 llvm-svn: 297649
* API gardening: Rename FindAllocaDbgValue to findDbgValue (NFC)Adrian Prantl2017-03-131-8/+6
| | | | | | | | and use have it use SmallVectorImpl. There is nothing specific about allocas in this function. llvm-svn: 297643
* [llvm-pdbdump] Add support for dumping symbols from Yaml -> PDB.Zachary Turner2017-03-132-0/+53
| | | | | | | | Previously we could round-trip type records from PDB -> Yaml -> PDB, but for symbols we could only go from PDB -> Yaml. This completes the round-tripping for symbols as well. llvm-svn: 297625
* Fix crash when multiple raw_fd_ostreams to stdout are created.Rafael Espindola2017-03-131-1/+1
| | | | | | | | | | | | | | | | | | | | | If raw_fd_ostream is constructed with the path of "-", it claims ownership of the stdout file descriptor. This means that it closes stdout when it is destroyed. If there are multiple users of raw_fd_ostream wrapped around stdout, then a crash can occur because of operations on a closed stream. An example of this would be running something like "clang -S -o - -MD -MF - test.cpp". Alternatively, using outs() (which creates a local version of raw_fd_stream to stdout) anywhere combined with such a stream usage would cause the crash. The fix duplicates the stdout file descriptor when used within raw_fd_ostream, so that only that particular descriptor is closed when the stream is destroyed. Patch by James Henderson! llvm-svn: 297624
* [ARM] GlobalISel: Support SP in regbankselectDiana Picus2017-03-131-0/+1
| | | | | | | We used to hit an unreachable in getRegBankFromRegClass when dealing with the stack pointer. This commit adds support for the GPRsp reg class. llvm-svn: 297621
* Reverting r297617 because it broke some bots:Aaron Ballman2017-03-133-81/+31
| | | | | | http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/49970 llvm-svn: 297618
* Add support for getting file system permissions and implement ↵Aaron Ballman2017-03-133-31/+81
| | | | | | | | sys::fs::permissions to set them. Patch by James Henderson. llvm-svn: 297617
* [AArch64] Map Sched Read/Write resources for Falkor.Balaram Makam2017-03-131-1/+183
| | | | llvm-svn: 297611
* [LV] Set memcheck metadata also for VF==1Gil Rapaport2017-03-131-5/+1
| | | | | | | | This commit is a follow-up on r297580. It fixes the FIXME added temporarily by that commit to keep the removal of Unroller's specialized version of scalarizeInstruction() an NFC. See https://reviews.llvm.org/D30715 for details. llvm-svn: 297610
* ARMDisassembler: loop over ARM decode tablesSjoerd Meijer2017-03-131-57/+20
| | | | | | | | | Loop over the ARM decode tables; this is a clean-up to reduce some code duplication. Differential Revision: https://reviews.llvm.org/D30814 llvm-svn: 297608
* [AVX-512] Add VEX_WIG to VEX vcvtsd2ss/vcvtss2sd intrinsic instructions so ↵Craig Topper2017-03-131-8/+8
| | | | | | they can be correctly matched by EVEX2VEX table generation. llvm-svn: 297601
* [AVX-512] Use sse_loadf32/f64 for vcvtss2sd and vcvtsd2ss intrinsic patterns.Craig Topper2017-03-131-3/+2
| | | | llvm-svn: 297600
* [AVX-512] Use sse_load_f64/f32 in VCVTSS2SI/VCVTSD2SI patterns.Craig Topper2017-03-131-10/+10
| | | | llvm-svn: 297599
* [X86] Remove unused SDTypeProfile. NFCCraig Topper2017-03-121-2/+0
| | | | llvm-svn: 297594
* [X86] Lower SSE/AVX cmpps/pd intrinsics directly to X86ISD::CMPP SDNodes.Craig Topper2017-03-123-46/+17
| | | | | | This allows us to remove a duplicate set of patterns. llvm-svn: 297593
* [AVX-512] Fix the valid immediates for the scatter/gather prefetch intrinsics.Craig Topper2017-03-121-2/+3
| | | | | | The immediate should be 1 or 2, not 0 or 1. This was found while adding bounds checking to clang. In fact the existing clang builtin test failed if we ran it all the way to assembly. llvm-svn: 297591
* [x86] don't blindly transform SETB into SBBSanjay Patel2017-03-121-39/+50
| | | | | | | | | | | | | | | | | | | | I noticed unnecessary 'sbb' instructions in D30472 and while looking at 'ptest' codegen recently. This happens because we were transforming any 'setb' - even when we only wanted a single-bit result. This patch moves those transforms under visitAdd/visitSub, so we we're only creating sbb/adc when it is a win. I don't know why we need a SETCC_CARRY node type, but I'm not proposing to change that existing behavior in this patch. Also, I'm skeptical that sbb/adc are a win for all micro-arches, so I added comments to the test files where this transform still fires. The test changes here are all cases where we no longer produce sbb/adc. Avoiding partial register stalls (generating an xor to clear a register) is not handled in some cases, but that's a separate issue. Differential Revision: https://reviews.llvm.org/D30611 llvm-svn: 297586
* [LVI] Add Datalayout to the class LazyValueInfo since all its Impls require ↵Anna Thomas2017-03-121-1/+1
| | | | | | it. NFC llvm-svn: 297583
* Remove CRC32 instructions from AArch64InstrInfo::hasShiftedRegAzharuddin Mohammed2017-03-121-8/+0
| | | | | | | | | | | | | | | | | | | | | | | Summary: A53 scheduler causes an assertion failure on all CRC instructions: include/llvm/CodeGen/MachineInstr.h:280: const llvm::MachineOperand &llvm::MachineInstr::getOperand(unsigned int) const: Assertion `i < getNumOperands() && "getOperand() out of range!"' failed. The case statements corresponding to CRC instructions are incorrect and should be removed. Also adding a testcase while on this. Reviewers: t.p.northover, javed.absar, apazos, rengolin Reviewed By: rengolin Subscribers: evandro, aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D30274 llvm-svn: 297582
* [LV] A unified scalarizeInstruction() for Vectorizer and Unroller; NFCGil Rapaport2017-03-121-66/+8
| | | | | | | | | | | | | | | | Unroller's specialized scalarizeInstruction() is mostly duplicating Vectorizer's variant. OTOH Vectorizer's scalarizeInstruction() already supports the special case of VF==1 except for avoiding mask-bit extraction in that case. This patch removes Unroller's specialized version in favor of a unified method. The only functional difference between the two variants seems to be setting memcheck metadata for loads and stores only in Vectorizer's variant, which is a bug in Unroller. To keep this patch an NFC the unified method doesn't set memcheck metadata for VF==1. Differential Revision: https://reviews.llvm.org/D30715 llvm-svn: 297580
* Test commit.Ayal Zaks2017-03-121-0/+1
| | | | llvm-svn: 297579
* Split NewGVN class into a legacy pass and an impl, instead of a merged class.Daniel Berlin2017-03-122-83/+87
| | | | llvm-svn: 297576
OpenPOWER on IntegriCloud