summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [ARM] Promote small global constants to constant poolsJames Molloy2016-09-238-6/+275
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a constant is unamed_addr and is only used within one function, we can save on the code size and runtime cost of an indirection by changing the global's storage to inside the constant pool. For example, instead of: ldr r0, .CPI0 bl printf bx lr .CPI0: &format_string format_string: .asciz "hello, world!\n" We can emit: adr r0, .CPI0 bl printf bx lr .CPI0: .asciz "hello, world!\n" This can cause significant code size savings when many small strings are used in one function (4 bytes per string). This recommit contains fixes for a nasty bug related to fast-isel fallback - because fast-isel doesn't know about this optimization, if it runs and emits references to a string that we inline (because fast-isel fell back to SDAG) we will end up with an inlined string and also an out-of-line string, and we won't emit the out-of-line string, causing backend failures. It also contains fixes for emitting .text relocations which made the sanitizer bots unhappy. llvm-svn: 282241
* [AMDGPU] Refactor VOP1 and VOP2 instruction TD definitionsValery Pykhtin2016-09-2311-1691/+1379
| | | | | | Differential revision: https://reviews.llvm.org/D24738 llvm-svn: 282234
* [AVX-512] Split X86ISD::VFPROUND and X86ISD::VFPEXT into separate opcodes ↵Craig Topper2016-09-235-29/+24
| | | | | | | | for each type constraint. This revealed that scalar intrinsics could create nodes with a rounding mode of FROUND_CUR_DIRECTION, but the patterns didn't check for it. It just worked because isel doesn't check operand count and we had a pattern without the rounding mode argument at all. llvm-svn: 282231
* [AVX-512] Add separate ISD opcodes for each form of CVT instructions. Don't ↵Craig Topper2016-09-235-100/+104
| | | | | | reuse non-X86 ISD opcodes with extra X86 specific arguments. llvm-svn: 282230
* [AVX-512] Use different ISD opcodes for some of the scalar intrinsic ↵Craig Topper2016-09-234-27/+34
| | | | | | lowering. Isel is not very robust against using the same ISD opcode with different number of operands so its better to separate. llvm-svn: 282229
* AMDGPU/SI: Include implicit arguments in kernarg_segment_byte_sizeTom Stellard2016-09-233-1/+25
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D24835 llvm-svn: 282223
* [AArch64][RegisterBankInfo] Sanity check TableGen'ed like inputs.Quentin Colombet2016-09-231-0/+47
| | | | | | | Make sure the entries written to mimic the behavior of TableGen are sane. llvm-svn: 282220
* [AArch64][RegisterBankInfo] Switch to TableGen'ed like PartialMapping.Quentin Colombet2016-09-232-20/+70
| | | | | | | | | Statically instanciate the most common PartialMappings. This should be closer to what the code would look like when TableGen support is added for GlobalISel. As a side effect, this should improve compile time. llvm-svn: 282215
* [RegisterBankInfo] Use array instead of SmallVector for BreakDown.Quentin Colombet2016-09-231-5/+12
| | | | | | | | | | | | | This is another step toward TableGen'ed like structures. The BreakDown of the mapping of the value will be statically computed by TableGen, thus we only have to point to the right entry in the table instead of dynamically allocate the mapping for each instruction. We still support the dynamic allocation through a factory of PartialMapping to ease the bring-up of the targets while the TableGen backend is not available. llvm-svn: 282213
* [RDF] Add initial support for lane masks in the DFGKrzysztof Parzyszek2016-09-225-62/+163
| | | | | | | Use lane masks for calculating covering and aliasing of register references. llvm-svn: 282194
* [Hexagon] Remove USR_OVF from CtrRegs register classKrzysztof Parzyszek2016-09-221-1/+4
| | | | | | | | | USR_OVF is a subregister of USR, which is a member of CtrRegs. Having both a register and its proper subregister in the same register class has bad consequences for lane mask calculation: based solely on the lane mask info, USR_OVF would not appear to be a subregister of USR. llvm-svn: 282192
* [RDF] Print the function name for calls in dumpsKrzysztof Parzyszek2016-09-221-3/+18
| | | | llvm-svn: 282191
* [RDF] Use uint32_t for register numbers instead of unsignedKrzysztof Parzyszek2016-09-222-8/+8
| | | | llvm-svn: 282190
* i386 does not support optimized swifterror handlingArnold Schwaighofer2016-09-222-3/+5
| | | | | | rdar://28432565 llvm-svn: 282186
* Win64: Don't emit unwind info for "leaf" functions (PR30337)Hans Wennborg2016-09-222-3/+21
| | | | | | | | | | | | According to MSDN (see the PR), functions which don't touch any callee-saved registers (including %rsp) don't need any unwind info. This patch makes LLVM not emit unwind info for such functions, to save binary size. Differential Revision: https://reviews.llvm.org/D24748 llvm-svn: 282185
* [PowerPC] Sign extend sub-word values for atomic comparisonsNemanja Ivanovic2016-09-221-2/+11
| | | | | | | | Atomic comparison instructions use the sub-word load instruction on Power8 and up but the value is not sign extended prior to the signed word compare instruction. This patch adds that sign extension. llvm-svn: 282182
* [PPC] Set SP after loading data from stack frame, if no red zone is presentKrzysztof Parzyszek2016-09-221-50/+195
| | | | | | | | | | | Follow-up to r280705: Make sure that the SP is only restored after all data is loaded from the stack frame, if there is no red zone. This completes the fix for https://llvm.org/bugs/show_bug.cgi?id=26519. Differential Revision: https://reviews.llvm.org/D24466 llvm-svn: 282174
* GlobalISel: handle stack-based parameters on AArch64.Tim Northover2016-09-222-73/+199
| | | | llvm-svn: 282153
* [AMDGPU][mc] Add support for absolute expressions in DPP modifiers.Artem Tamazov2016-09-221-35/+22
| | | | | | | | | Also added range checking for DPP attributes. Assembler tests added as well. Differential Revision: https://reviews.llvm.org/D24755 llvm-svn: 282145
* [PowerPC] Remove LE patterns matching generic stores/loads to VSX permuting opsNemanja Ivanovic2016-09-221-5/+10
| | | | | | | | | | | | | This patch corresponds to: https://reviews.llvm.org/D21409 The LXVD2X, LXVW4X, STXVD2X and STXVW4X instructions permute the two doublewords in the vector register when in little-endian mode. Custom code ensures that the necessary swaps are inserted for these. This patch simply removes the possibilty that a load/store node will match one of these instructions in the SDAG as that would not insert the necessary swaps. llvm-svn: 282144
* [Power9] Add exploitation of non-permuting memory opsNemanja Ivanovic2016-09-225-21/+68
| | | | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D19825 The new lxvx/stxvx instructions do not require the swaps to line the elements up correctly. In order to select them over the lxvd2x/lxvw4x instructions which require swaps, the patterns for the old instruction have a predicate that ensures they won't be selected on Power9 and newer CPUs. llvm-svn: 282143
* [AVX-512] Add support for commuting VPTERNLOG instructions.Craig Topper2016-09-224-39/+171
| | | | | | | | | | VPTERNLOG is a ternary instruction with an immediate specifying the logical operation to perform. For each bit position in the 3 source vectors the bit from each source is concatenated together and the resulting 3-bit value is used to select a bit in the immediate. This bit value is written to the result vector. We can commute this by swapping operands and modifying the immediate. To modify the immediate we need to swap two pairs of bits. The pairs correspond to the locations in the immediate where the commuted operands bits have opposite values and the uncommuted operand has the same value. Bits 0 and 7 will never be swapped since the relevant bits from all sources are the same value. This refactors and reuses parts of the FMA3 commuting code which is also a three operand instruction. llvm-svn: 282132
* [RegisterBankInfo] Move to statically allocated RegisterBank.Quentin Colombet2016-09-223-1/+51
| | | | | | | | | | | | This commit is basically the first step toward what will RegisterBankInfo look when it gets TableGen'ed. It introduces a XXXGenRegisterBankInfo.def file that is what TableGen will issue at some point. Moreover, the RegBanks field in RegisterBankInfo changed to reflect the static (compile time) aspect of the information. llvm-svn: 282131
* [AMDGPU][mc] Add support for ds_add_[rtn_]f32.Artem Tamazov2016-09-211-0/+5
| | | | | | | | | Lit tests added. Resolves https://github.com/RadeonOpenCompute/hcc/issues/122. Differential Revision: https://reviews.llvm.org/D24765 llvm-svn: 282086
* Revert r281715, it caused PR30475Nico Weber2016-09-215-214/+3
| | | | llvm-svn: 282076
* GlobalISel: produce correct code for signext/zeroext ABI flags.Tim Northover2016-09-213-92/+111
| | | | | | | | We still don't really have an equivalent of "AssertXExt" in DAG, so we don't exploit the guarantees on the receiving side yet, but this should produce conservatively correct code on iOS ABIs. llvm-svn: 282069
* GlobalISel: pass Function to lowerFormalArguments directly (NFC).Tim Northover2016-09-214-13/+10
| | | | | | | | The only implementation that exists immediately looks it up anyway, and the information is needed to handle various parameter attributes (stored on the function itself). llvm-svn: 282068
* [AMDGPU] Assembler: remove unused AMDGPUMCObjectWriter.Sam Kolton2016-09-211-25/+0
| | | | | | | | | | | | Summary: It is replaced by AMDGPUELFObjectWriter Reviewers: tstellarAMD, vpykhtin, artem.tamazov Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl Differential Revision: https://reviews.llvm.org/D24654 llvm-svn: 282065
* [mips] LLVM PR/30197 - Tail call incorrectly clobbers arguments for mipsSimon Dardis2016-09-211-1/+3
| | | | | | | | | | | | | | | | | The postRA scheduler performs alias analysis to determine if stores and loads can moved past each other. When a function has more arguments than argument registers for the calling convention used, excess arguments are spilled onto the stack. LLVM by default assumes that argument slots are immutable, unless the function contains a tail call. Without the knowledge of that a function contains a tail call site, stores and loads to fixed stack slots may be re-ordered causing the out-going arguments to clobber the incoming arguments before the incoming arguments are supposed to be dead. Reviewers: vkalintiris Differential Review: https://reviews.llvm.org/D24077 llvm-svn: 282063
* Revert "AArch64: Set shift bit of TLSLE HI12 add instruction"Diana Picus2016-09-211-6/+0
| | | | | | | This reverts commit r282057 because it broke the buildbots - see e.g. http://lab.llvm.org:8011/builders/clang-cmake-aarch64-42vma/builds/12063 llvm-svn: 282058
* AArch64: Set shift bit of TLSLE HI12 add instructionLei Liu2016-09-211-0/+6
| | | | | | | | | | | | Summary: AArch64 LLVM assembler emits add instruction without shift bit to calculate the higher 12-bit address of TLS variables in local exec model. This generates wrong code sequence to access TLS variables with thread offset larger than 0x1000. Reviewers: t.p.northover, peter.smith, rovka Subscribers: salim.nasser, aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D24702 llvm-svn: 282057
* [AVX-512] Split the 3 different usages of the X86ISD::FSETCC opcode into 3 ↵Craig Topper2016-09-214-12/+19
| | | | | | | | | | | | different opcodes. It turns out isel is really not robust against having different type profiles for the same opcode. It turns out that if you put an illegal rounding mode(i.e. not CUR_DIRECTION or NO_EXC) on a comiss intrinsic we would generate the FSETCC form with the rounding mode added, but then pattern match to an instruction with ROUND_CUR_DIRECTION. We can probably get away with just one FSETCCM opcode that always contains the rounding mode and explicitly put ROUND_CUR_DIRECTION in the pattern, but I'll leave that for future work. With this change the clang tests for the comiss intrinsics that used an incorrect rounding mode of 3 properly fail isel instead of silently doing the wrong thing. Those clang tests will be fixed in a follow up commit and I also plan to add rounding mode checking to clang. llvm-svn: 282055
* [AVX-512] Don't add an additional rounding mode operand to the avx512 ↵Craig Topper2016-09-213-14/+11
| | | | | | | | | | vcvtps2ph intrinsic lowering. There was no way to control its value so it was always FROUND_CURRENT making it unnecessary. The true rounding mode is encoded in the immediate operand of the instruction. This also removes the pattern from the rb form of the instructions since there is no way to specify the FROUND_NO_EXC rounding mode it required. llvm-svn: 282052
* [AVX-512] Simplify handling of INTR_TYPE_1OP_MASK_RM to remove support for ↵Craig Topper2016-09-211-7/+1
| | | | | | | | the second opcode since its never used. This makes it consistent with INTR_TYPE_2OP_MASK_RM and INTR_TYPE_3OP_MASK_RM. And even if it was used we were passing the same operands to both so it wouldn't make sense to have two opcodes. llvm-svn: 282051
* [AVX-512] Don't lower avx512 vcvtps2ph/vcvtph2ps nodes to ↵Craig Topper2016-09-214-10/+15
| | | | | | ISD::FP16_TO_FP/ISD::FP_TO_FP16 with an extra x86 specific rounding mode operand. We should use a target specific ISD opcode. llvm-svn: 282046
* [NVPTX] Check if callsite is defined when computing argument allignmentJacques Pienaar2016-09-212-13/+20
| | | | | | | | | | | | Summary: In getArgumentAlignment check if the ImmutableCallSite pointer CS is non-null before dereferencing. If CS is 0x0 fall back to the ABI type alignment else compute the alignment as before. Reviewers: eliben, jpienaar Subscribers: jlebar, vchuravy, cfe-commits, jholewinski Differential Revision: https://reviews.llvm.org/D9168 llvm-svn: 282045
* Remove the default subtarget from the x86 port as it isn't necessary (orEric Christopher2016-09-202-4/+1
| | | | | | correct) anymore. llvm-svn: 282031
* Revert "Remove extra argument used once onEric Christopher2016-09-201-2/+10
| | | | | | | | | | | | TargetMachine::getNameWithPrefix and inline the result into the singular caller." and "Remove more guts of TargetMachine::getNameWithPrefix and migrate one check to the TLOF mach-o version." temporarily until I can get the whole call migrated out of the TargetMachine as we could hit places where TLOF isn't valid. This reverts commits r281981 and r281983. llvm-svn: 282028
* Revert part of "AArch64: Do not test for CPUs, use SubtargetFeatures"Evandro Menezes2016-09-202-6/+0
| | | | | | | | This reverts part of commit 119e358d9635c8d1f3e7aee67e3ea3b8a62f8db6 by removing FeatureUseRSqrt et al per request by Eric Christopher <echristo@gmail.com> (v. http://bit.ly/2cmz6kW). llvm-svn: 282001
* Revert "[AArch64] Use the reciprocal estimation machinery"Evandro Menezes2016-09-205-101/+3
| | | | | | | This reverts commit b7d42b0048f65346e9fa37fb65defeea7ce8c337 per request by Eric Christopher <echristo@gmail.com> (v. http://bit.ly/2cmz6kW). llvm-svn: 282000
* Revert "[AArch64] Properly validate the reciprocal estimation."Evandro Menezes2016-09-201-6/+0
| | | | | | | This reverts commit ad8ca1528242e2a4cb363e3779309e70eb7a430e per request by Eric Christopher <echristo@gmail.com> (v. http://bit.ly/2cmz6kW). llvm-svn: 281999
* X86: loosen an overly aggressive MachO assertionSaleem Abdulrasool2016-09-201-2/+6
| | | | | | | | | | | | | We would assert that the FP setup CFI used esp/rsp always. This held up in practice when the code was generated from IR. However, with the integrated assembler, it is possible to have the input be user specified assembly. In such a case, we cannot assume that the function implementation has a compact unwind representation. Loosen the assertion into a check and bail if we cannot represent the frame pointer in the compact unwinding. Addresses PR30453! llvm-svn: 281986
* Remove more guts of TargetMachine::getNameWithPrefix and migrate one check ↵Eric Christopher2016-09-201-8/+1
| | | | | | | | to the TLOF mach-o version. NFC intended. llvm-svn: 281983
* Remove a use of subtarget initialization in the X86 backend so we can get ↵Eric Christopher2016-09-201-1/+4
| | | | | | | | rid of the default subtarget. NFC intended. llvm-svn: 281982
* Remove extra argument used once on TargetMachine::getNameWithPrefix and ↵Eric Christopher2016-09-201-3/+2
| | | | | | inline the result into the singular caller. llvm-svn: 281981
* GlobalISel: split aggregates for PCS loweringTim Northover2016-09-202-41/+136
| | | | | | | | | | | This should match the existing behaviour for passing complicated struct and array types, in particular HFAs come through like that from Clang. For C & C++ we still need to somehow support all the weird ABI flags, or at least those that are present in the IR (signext, byval, ...), and stack-based parameter passing. llvm-svn: 281977
* AVX-512: Fixed a bug in lowering saturated operations on KNL.Elena Demikhovsky2016-09-201-2/+8
| | | | | | | | The generated code is still not optimal. Differential Revision: https://reviews.llvm.org/D24723 llvm-svn: 281966
* [AMDGPU] Refactor VOP3 instruction TD definitionsValery Pykhtin2016-09-206-373/+448
| | | | | | Differential revision: https://reviews.llvm.org/D24664 llvm-svn: 281965
* [AVX-512] Teach X86InstrInfo::copyPhysReg to use a 512-bit move if ↵Craig Topper2016-09-203-5/+38
| | | | | | | | XMM16-XMM31 or YMM16-YMM31 are the source or dest of the copy and VLX is not supported. This can happen with SUBREG_TO_REG of ZMM16-ZMM31. Fixes PR30430. llvm-svn: 281959
* [AVX-512] Use 512-bit vcvtps2ph/vcvtph2ps to implement fp_to_f16/f16_to_fp ↵Craig Topper2016-09-203-2/+32
| | | | | | | | when F16C and VLX are not supported. Fixes PR23941. llvm-svn: 281958
OpenPOWER on IntegriCloud