summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [ARM] GlobalISel: Fixup r307365Diana Picus2017-07-071-11/+10
| | | | | | | Rename member DebugLoc -> DbgLoc (so it doesn't conflict with the class name). llvm-svn: 307366
* [ARM] GlobalISel: Select hard G_FCMP for s32Diana Picus2017-07-071-63/+234
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We lower to a sequence consisting of: - MOVi 0 into a register - VCMPS to do the actual comparison and set the VFP flags - FMSTAT to move the flags out of the VFP unit - MOVCCi to either use the "zero register" that we have previously set with the MOVi, or move 1 into the result register, based on the values of the flags As was the case with soft-float, for some predicates (one, ueq) we actually need two comparisons instead of just one. When that happens, we generate two VCMPS-FMSTAT-MOVCCi sequences and chain them by means of using the result of the first MOVCCi as the "zero register" for the second one. This is a bit overkill, since one comparison followed by two non-flag-setting conditional moves should be enough. In any case, the backend manages to CSE one of the comparisons away so it doesn't matter much. Note that unlike SelectionDAG and FastISel, we always use VCMPS, and not VCMPES. This makes the code a lot simpler, and it also seems correct since the LLVM Lang Ref defines simple true/false returns if the operands are QNaN's. For SNaN's, even VCMPS throws an Invalid Operand exception, so they won't be slipping through unnoticed. Implementation-wise, this introduces a template so we can share the same code that we use for handling integer comparisons, since the only differences are in the details (exact opcodes to be used etc). Hopefully this will be easy to extend to s64 G_FCMP. llvm-svn: 307365
* LiveRegUnits: Rename accumulateBackward()->accumulate()Matthias Braun2017-07-071-1/+1
| | | | | | | | | Contrary to the stepForward()/stepBackward() method accumulate() doesn't have a direction as defs, uses and clobbers all have the same effect. Also improve the documentation comment. llvm-svn: 307351
* Extend memcpy expansion in Transform/Utils to handle wider operand types.Sean Fertile2017-07-072-12/+36
| | | | | | | | | | | Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the target to provide the operand types through TTI callbacks. The default values for the TTI callbacks use int8 operand types and matches the existing behaviour if they aren't overridden by the target. Differential revision: https://reviews.llvm.org/D32536 llvm-svn: 307346
* Reverting r307326 because it breaks clang tests.Michael Kuperstein2017-07-063-32/+11
| | | | llvm-svn: 307334
* [NVPTX] Add lowering of i128 params.Michael Kuperstein2017-07-063-11/+32
| | | | | | | | | | | | | | | | | The patch adds support of i128 params lowering. The changes are quite trivial to support i128 as a "special case" of integer type. With this patch, we lower i128 params the same way as aggregates of size 16 bytes: .param .b8 _ [16]. Currently, NVPTX can't deal with the 128 bit integers: * in some cases because of failed assertions like ValVTs.size() == OutVals.size() && "Bad return value decomposition" * in other cases emitting PTX with .i128 or .u128 types (which are not valid [1]) [1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types Differential Revision: https://reviews.llvm.org/D34555 Patch by: Denys Zariaiev (denys.zariaiev@gmail.com) llvm-svn: 307326
* [COFF, AArch64] Set the private label prefix to .LMartin Storsjo2017-07-061-0/+2
| | | | | | | | | | This fixes calls to external functions starting with a capital L, fixing errors like this: fatal error: error in backend: assembler label 'LocalFree' can not be undefined Differential Revision: https://reviews.llvm.org/D35079 llvm-svn: 307317
* AMDGPU: Add macro fusion schedule DAG mutationMatt Arsenault2017-07-064-0/+86
| | | | | | Try to increase opportunities to shrink vcc uses. llvm-svn: 307313
* AMDGPU: Minor cleanup of shrinking logicMatt Arsenault2017-07-061-8/+4
| | | | llvm-svn: 307312
* [AMDGPU] Always use rcp + mul with fast mathStanislav Mekhanoshin2017-07-062-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | Regardless of relaxation options such as -cl-fast-relaxed-math we are producing rather long code for fdiv via amdgcn_fdiv_fast intrinsic. This intrinsic is used to replace fdiv with 2.5ulp metadata and does not handle denormals, thus believed to be fast. An fdiv instruction can also have fast math flag either by itself or together with fpmath metadata. Clang used with a relaxation flag always produces both metadata and fast flag: %div = fdiv fast float %v, %0, !fpmath !12 !12 = !{float 2.500000e+00} Current implementation ignores fast flag and favors metadata. An instruction with just fast flag would be lowered to a fastest rcp + mul, but that never happen on practice because of described mutual clang and BE behavior. This change allows an "fdiv fast" to be always lowered as rcp + mul. Differential Revision: https://reviews.llvm.org/D34844 llvm-svn: 307308
* [GISel]: Enhance the MachineIRBuilder APIAditya Nandakumar2017-07-061-3/+2
| | | | | | | | | | | | | | | | | | | | Allows the MachineIRBuilder APIs to directly create registers (based on LLT or TargetRegisterClass) as well as accept MachineInstrBuilders and implicitly converts to register(with getOperand(0).getReg()). Eg usage: LLT s32 = LLT::scalar(32); auto C32 = Builder.buildConstant(s32, 32); auto Tmp = Builder.buildInstr(TargetOpcode::G_SUB, s32, C32, OtherReg); auto Tmp2 = Builder.buildInstr(Opcode, DstReg, Builder.buildConstant(s32, 31)); .... Only a few methods added for now. Reviewed by Tim llvm-svn: 307302
* [Constants] If we already have a ConstantInt*, prefer to use ↵Craig Topper2017-07-061-1/+1
| | | | | | | | isZero/isOne/isMinusOne instead of isNullValue/isOneValue/isAllOnesValue inherited from Constant. NFCI Going through the Constant methods requires redetermining that the Constant is a ConstantInt and then calling isZero/isOne/isMinusOne. llvm-svn: 307292
* Fix spelling in comments. NFCI.Simon Pilgrim2017-07-061-3/+3
| | | | llvm-svn: 307288
* [X86][SSE4A] Add support for shuffle combining to INSERTQI.Simon Pilgrim2017-07-061-0/+16
| | | | llvm-svn: 307268
* Doxygen formatting. NFCIJoel Jones2017-07-061-2/+12
| | | | llvm-svn: 307263
* [X86][SSE] combineX86ShuffleChain - merge duplicate creations of integer ↵Simon Pilgrim2017-07-061-20/+12
| | | | | | mask types llvm-svn: 307257
* [X86][SSE] combineX86ShuffleChain - merge duplicate 'Zeroable' element masksSimon Pilgrim2017-07-061-20/+12
| | | | llvm-svn: 307255
* [X86][SSE4A] Add support for shuffle combining to EXTRQ.Simon Pilgrim2017-07-061-1/+28
| | | | llvm-svn: 307254
* [X86][SSE4A] Split EXTRQ/INSERTQ shuffle matching from lowering. NFCI.Simon Pilgrim2017-07-061-99/+112
| | | | | | First step toward supporting shuffle combining to EXTRQ/INSERTQ. llvm-svn: 307250
* [ARM] GlobalISel: Map s32 G_FCMP in reg bank selectDiana Picus2017-07-061-0/+14
| | | | | | Map hard G_FCMP operands to FPR and the result to GPR. llvm-svn: 307245
* [ARM] GlobalISel: Legalize G_FCMP for s32Diana Picus2017-07-062-0/+162
| | | | | | | | | | | | | | | | | | | | | This covers both hard and soft float. Hard float is easy, since it's just Legal. Soft float is more involved, because there are several different ways to handle it based on the predicate: one and ueq need not only one, but two libcalls to get a result. Furthermore, we have large differences between the values returned by the AEABI and GNU functions. AEABI functions return a nice 1 or 0 representing true and respectively false. GNU functions generally return a value that needs to be compared against 0 (e.g. for ogt, the value returned by the libcall is > 0 for true). We could introduce redundant comparisons for AEABI as well, but they don't seem easy to remove afterwards, so we do different processing based on whether or not the result really needs to be compared against something (and just truncate if it doesn't). llvm-svn: 307243
* [ARM] GlobalISel: Widen s1, s8, s16 G_CONSTANTDiana Picus2017-07-061-0/+2
| | | | | | Get the legalizer to widen small constants. llvm-svn: 307239
* [WebAssembly] Fix types for address taken functionsSam Clegg2017-07-054-17/+28
| | | | | | Differential Revision: https://reviews.llvm.org/D34966 llvm-svn: 307198
* [AMDGPU] Move GISel accessor initialization from TargetMachine to Subtarget.Quentin Colombet2017-07-052-48/+50
| | | | | | NFC llvm-svn: 307186
* [Power9] Disable removing extra swaps on P9.Sean Fertile2017-07-051-1/+3
| | | | | | | | | | | | On power 8 we sometimes insert swaps to deal with the difference between Little-Endian and Big-Endian. The swap removal pass is supposed to clean up these swaps. On power 9 we don't need this pass since we do not need to insert the swaps in the first place. Commiting on behalf of Stefan Pintilie. Differential Revision: https://reviews.llvm.org/D34627 llvm-svn: 307185
* [PowerPC] Make sure that we remove dead PHI nodes after the PPCCTRLoops pass.Sean Fertile2017-07-051-1/+4
| | | | | | | Commiting on behalf of Stefan Pintilie. Differential Revision: https://reviews.llvm.org/D34829 llvm-svn: 307180
* [Power9] Exploit vector extract with variable index.Tony Jiang2017-07-051-0/+92
| | | | | | | | | | | | | | | | This patch adds the exploitation for new power 9 instructions which extract variable elements from vectors: VEXTUBLX VEXTUBRX VEXTUHLX VEXTUHRX VEXTUWLX VEXTUWRX Differential Revision: https://reviews.llvm.org/D34032 Commit on behalf of Zaara Syeda (syzaara@ca.ibm.com) llvm-svn: 307174
* [Power9] Exploit vector integer extend instructions when indices aren't correct.Tony Jiang2017-07-054-26/+216
| | | | | | | | | | | | | | | This patch adds on to the exploitation added by https://reviews.llvm.org/D33510. This now catches build vector nodes where the inputs are coming from sign extended vector extract elements where the indices used by the vector extract are not correct. We can still use the new hardware instructions by adding a shuffle to move the elements to the correct indices. I introduced a new PPCISD node here because adding a vector_shuffle and changing the elements of the vector_extracts was getting undone by another DAG combine. Commit on behalf of Zaara Syeda (syzaara@ca.ibm.com) Differential Revision: https://reviews.llvm.org/D34009 llvm-svn: 307169
* [SystemZ] Simplify handling of 128-bit multiply/divide instructionUlrich Weigand2017-07-057-106/+106
| | | | | | | | | | | Several integer multiply/divide instructions require use of a register pair as input and output. This patch moves setting up the input register pair from C++ code to TableGen, simplifying the whole process and making it more easily extensible. No functional change. llvm-svn: 307155
* [SystemZ] Small cleanups to SystemZScheduleZ13.tdUlrich Weigand2017-07-051-25/+36
| | | | | | | | | | Fixes a couple of whitespace errors, re-sorts the vector floating-point instructions to make them more easily extensible, and adds a missing pseudo instruction. No functional change. llvm-svn: 307154
* [GlobalISel] Refactor Legalizer helpers for libcallsDiana Picus2017-07-051-4/+9
| | | | | | | | | | We used to have a helper that replaced an instruction with a libcall. That turns out to be too aggressive, since sometimes we need to replace the instruction with at least two libcalls. Therefore, change our existing helper to only create the libcall and leave the instruction removal as a separate step. Also rename the helper accordingly. llvm-svn: 307149
* [AsmParser] Mnemonic Spell CorrectorSjoerd Meijer2017-07-051-2/+8
| | | | | | | | | | | | | | | | | | This implements suggesting other mnemonics when an invalid one is specified, for example: $ echo "adXd r1,r2,#3" | llvm-mc -triple arm <stdin>:1:1: error: invalid instruction, did you mean: add, qadd? adXd r1,r2,#3 ^ The implementation is target agnostic, but as a first step I have added it only to the ARM backend; so the ARM backend is a good example if someone wants to enable this too for another target. Differential Revision: https://reviews.llvm.org/D33128 llvm-svn: 307148
* [ARM] GlobalISel: Extract tiny helper. NFCDiana Picus2017-07-051-2/+5
| | | | | | Extract functionality for determining if the target uses AEABI. llvm-svn: 307145
* [GlobalISel][X86] For now don't handle not trivial function arguments lowering.Igor Breger2017-07-051-1/+11
| | | | llvm-svn: 307142
* [GlobalISel][X86] Allow graceful fallback for struct/array argument/return ↵Igor Breger2017-07-052-11/+26
| | | | | | value lowering. Going to support it in follow patch. llvm-svn: 307125
* [PowerPC] Fix for PR33636Nemanja Ivanovic2017-07-051-2/+4
| | | | | | | | Remove casts to a constant when a node can be an undef. Differential Revision: https://reviews.llvm.org/D34808 llvm-svn: 307120
* Revert "[AVR] Add the branch selection pass from the GitHub repository"Dylan McKay2017-07-053-269/+0
| | | | | | This reverts commit 602ef067c1d58ecb425d061f35f2bc4c7e92f4f3. llvm-svn: 307111
* [AVR] Add the branch selection pass from the GitHub repositoryDylan McKay2017-07-053-0/+269
| | | | | | | We should rewrite this using the generic branch relaxation pass, but for the moment having this pass is better than hitting an assertion error. llvm-svn: 307109
* [X86][SSE4A] Add support for combining from non-v16i8 EXTRQI/INSERTQI shufflesSimon Pilgrim2017-07-041-3/+3
| | | | | | With the improved shuffle decoding we can now combine EXTRQI/INSERTQI shuffles from non-v16i8 vector types llvm-svn: 307099
* Fix signed/unsigned comparison warningsSimon Pilgrim2017-07-041-4/+4
| | | | llvm-svn: 307098
* [AMDGPU] Switch scalarize global loads ON by defaultAlexander Timofeev2017-07-041-1/+1
| | | | | | Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307097
* [X86][SSE4A] Generalized EXTRQI/INSERTQI shuffle decodesSimon Pilgrim2017-07-044-31/+41
| | | | | | The existing decodes only worked for v16i8 vectors, this adds support for any 128-bit vector llvm-svn: 307095
* fix trivial typos in comments; NFCHiroshi Inoue2017-07-042-2/+2
| | | | llvm-svn: 307094
* [AMDGPU] Fix latency of MIMG instructionsMarek Olsak2017-07-041-0/+1
| | | | | | Patch by cwabbott (Connor Abbott). llvm-svn: 307081
* [globalisel][tablegen] Partially fix compile-time regressions by converting ↵Daniel Sanders2017-07-043-0/+6
| | | | | | | | | | | | | | | | | | | | | | matcher to state-machine(s) Summary: Replace the matcher if-statements for each rule with a state-machine. This significantly reduces compile time, memory allocations, and cumulative memory allocation when compiling AArch64InstructionSelector.cpp.o after r303259 is recommitted. The following patches will expand on this further to fully fix the regressions. Reviewers: rovka, ab, t.p.northover, qcolombet, aditya_nandakumar Reviewed By: ab Subscribers: vitalybuka, aemerson, javed.absar, igorb, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33758 llvm-svn: 307079
* [X86] Add comment string for broadcast loads from the constant pool.Craig Topper2017-07-041-37/+156
| | | | | | | | | | | | | | | | | Summary: When broadcasting from the constant pool its useful to print out the final vector similar to what we do for normal moves from the constant pool. I changed only a couple tests that were broadcast focused. One of them had been previously hand tweaked after running the script so that it could check the constant pool declaration. But I think this patch makes that unnecessary now since we can check the comment instead. Reviewers: spatel, RKSimon, zvi Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34923 llvm-svn: 307062
* [X86] Add RDRAND feature to GLM CPUCraig Topper2017-07-041-0/+1
| | | | | | | | | | | | | | Summary: I believe this should be supported on GLM since RDSEED is. Reviewers: m_zuckerman, zvi, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34828 llvm-svn: 307060
* [AVR] Fix bug which caused assertion errors for some FRMIDX instructionsDylan McKay2017-07-041-3/+8
| | | | | | | | | | | | | Previously, if a basic block ended with a FRMIDX instruction, we would end up doing something like this. *std::next(MBB.end()) Which would hit an error: "Assertion `!NodePtr->isKnownSentinel()' failed." llvm-svn: 307057
* [AVR] Add a missing clobber declaration to LPMWDylan McKay2017-07-041-6/+6
| | | | llvm-svn: 307056
* Revert r307026, "[AMDGPU] Switch scalarize global loads ON by default"NAKAMURA Takumi2017-07-041-1/+1
| | | | | | | | | It broke a testcase. Failing Tests (1): LLVM :: CodeGen/AMDGPU/alignbit-pat.ll llvm-svn: 307054
OpenPOWER on IntegriCloud