summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* bpf: check illegal usage of XADD insn return valueYonghong Song2018-09-204-0/+100
| | | | | | | | | | | | | | | | | | | | | | | | | Currently, BPF has XADD (locked add) insn support and the asm looks like: lock *(u32 *)(r1 + 0) += r2 lock *(u64 *)(r1 + 0) += r2 The instruction itself does not have a return value. At the source code level, users often use __sync_fetch_and_add() which eventually translates to XADD. The return value of __sync_fetch_and_add() is supposed to be the old value in the xadd memory location. Since BPF::XADD insn does not support such a return value, this patch added a PreEmit phase to check such a usage. If such an illegal usage pattern is detected, a fatal error will be reported like line 4: Invalid usage of the XADD return value if compiled with -g, or Invalid usage of the XADD return value if compiled without -g. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 342692
* [WebAssembly] Add V128 value type to binary formatThomas Lively2018-09-205-23/+15
| | | | | | | | | | | | | | Summary: Adds the necessary support to lib/ObjectYAML and fixes SIMD calls to allow the tests to work. Also removes some dead code that would otherwise have to have been updated. Reviewers: aheejin, dschuff, sbc100 Subscribers: jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52105 llvm-svn: 342689
* [SelectionDAG] replace duplicated peekThroughBitcast helper functions; NFCISanjay Patel2018-09-201-13/+0
| | | | | | | | | | | | | | x86 had 2 versions of peekThroughBitcast. DAGCombiner had 1. Plus, it had a 1-off implementation for the one-use variant. Move the x86 versions of the code to SelectionDAG, so we don't have different copies of the code. No functional change intended. I'm putting this next to isBitwiseNot() because I am planning to use it in there. Another option is next to the helpers in the ISD namespace (eg, ISD::isConstantSplatVector()). But if there's no good reason for those to be there, I'd prefer to pull other helpers over to SelectionDAG in follow-up steps. Differential Revision: https://reviews.llvm.org/D52285 llvm-svn: 342669
* [X86][SSE] Remove UNPCKL(SHUFFLE)->UNPCKH custom combineSimon Pilgrim2018-09-201-34/+0
| | | | | | This can be achieved more generally by combineX86ShufflesRecursively. llvm-svn: 342645
* [X86][SSE] Remove PSHUFLW/PSHUFHW combineRedundantHalfShuffle combineSimon Pilgrim2018-09-201-71/+0
| | | | | | This can be achieved more generally by combineX86ShufflesRecursively and was causing a fuzz test failure found by Mikael Holmén. llvm-svn: 342642
* [RISCV][MC] Modify evaluateConstantImm interface to allow reuse from addExprAlex Bradbury2018-09-201-35/+34
| | | | | | | | | | | | | This is a trivial refactoring that I'm committing now as it makes a patch I'm about to post for review easier to follow. There is some overlap between evaluateConstantImm and addExpr in RISCVAsmParser. This patch allows evaluateConstantImm to be reused from addExpr to remove this overlap. The benefit will be greater when a future patch adds extra code to allows immediates to be evaluated from constant symbols (e.g. `.equ CONST, 0x1234`). No functional change intended. llvm-svn: 342641
* [RISCV][MC] Improve parsing of jal/j operandsAlex Bradbury2018-09-202-9/+31
| | | | | | | | | | | | Examples such as `jal a3`, `j a3` and `jal a3, a3` are accepted by gas but rejected by LLVM MC. This patch rectifies this. I introduce RISCVAsmParser::parseJALOffset to ensure that symbol names that coincide with register names can safely be parsed. This is made a somewhat fiddly due to the single-operand alias form (see the comment in parseJALOffset for more info). Differential Revision: https://reviews.llvm.org/D52029 llvm-svn: 342629
* Fix for bug 34002 - label generated before it block is finalized. ↵Maya Madhavan2018-09-201-1/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D52258 llvm-svn: 342615
* [PowerPC] Fix the assert of combineBVOfConsecutiveLoads when element num is 1QingShan Zhang2018-09-201-1/+2
| | | | | | | | | | Building a vector out of multiple loads can be converted to a load of the vector type if the loads are consecutive. But the special condition is that the element number is 1, such as <1 x i128>. So just early exit to fix the assert. Patch By: wuzish (Zixuan Wu) Differential Revision: https://reviews.llvm.org/D52072 llvm-svn: 342611
* [WebAssembly] Renumber SIMD opsThomas Lively2018-09-201-35/+35
| | | | | | | | | | | | | | | Summary: This change leaves holes in the opcode space where missing instructions could logically be added later if they were found to be useful. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52282 llvm-svn: 342610
* AArch64: Add FuseCryptoEOR fusion rulesMatthias Braun2018-09-193-0/+27
| | | | | | | | There's some additional rules available on newer apple CPUs. rdar://41235346 llvm-svn: 342590
* [ARM] Adjust the feature set for ExynosEvandro Menezes2018-09-191-0/+6
| | | | | | Fine tune the cost model for all Exynos processors. llvm-svn: 342585
* [ARM] Refactor Exynos feature set (NFC)Evandro Menezes2018-09-193-71/+23
| | | | | | | Since all Exynos processors share the same feature set, fold them in the implied fatures list for the subtarget. llvm-svn: 342583
* [X86] Handle COPYs of physregs better (regalloc hints)Simon Pilgrim2018-09-191-0/+2
| | | | | | | | | | | | | | Enable enableMultipleCopyHints() on X86. Original Patch by @jonpa: While enabling the mischeduler for SystemZ, it was discovered that for some reason a test needed one extra seemingly needless COPY (test/CodeGen/SystemZ/call-03.ll). The handling for that is resulted in this patch, which improves the register coalescing by providing not just one copy hint, but a sorted list of copy hints. On SystemZ, this gives ~12500 less register moves on SPEC, as well as marginally less spilling. Instead of improving just the SystemZ backend, the improvement has been implemented in common-code (calculateSpillWeightAndHint(). This gives a lot of test failures, but since this should be a general improvement I hope that the involved targets will help and review the test updates. Differential Revision: https://reviews.llvm.org/D38128 llvm-svn: 342578
* [x86] change names of vector splitting helper functions; NFCSanjay Patel2018-09-191-16/+15
| | | | | | | | | | | As the code comments suggest, these are about splitting, and they are not necessarily limited to lowering, so that misled me. There's nothing that's actually x86-specific in these either, so they might be better placed in a common header so any target can use them. llvm-svn: 342575
* [mips][microMIPS] Extending size reduction pass with MOVEPSimon Atanasyan2018-09-192-11/+109
| | | | | | | | | | | The patch extends size reduction pass for MicroMIPS. Two MOVE instructions are transformed into one MOVEP instrucition. Patch by Milena Vujosevic Janicic. Differential revision: https://reviews.llvm.org/D52037 llvm-svn: 342572
* [mips][microMIPS] Fix the definition of MOVEP instructionSimon Atanasyan2018-09-197-134/+117
| | | | | | | | | | | | The patch fixes definition of MOVEP instruction. Two registers are used instead of register pairs. This is necessary as machine verifier cannot handle register pairs. Patch by Milena Vujosevic Janicic. Differential revision: https://reviews.llvm.org/D52035 llvm-svn: 342571
* [X86] Add initial SimplifyDemandedVectorEltsForTargetNode supportSimon Pilgrim2018-09-192-0/+100
| | | | | | | | | | | | | | This patch adds an initial x86 SimplifyDemandedVectorEltsForTargetNode implementation to handle target shuffles. Currently the patch only decodes a target shuffle, calls SimplifyDemandedVectorElts on its input operands and removes any shuffle that reduces to undef/zero/identity. Future work will need to integrate this with combineX86ShufflesRecursively, add support for other x86 ops, etc. NOTE: There is a minor regression that appears to be affecting further (extractelement?) combines which I haven't been able to solve yet - possibly something to do with how nodes are added to the worklist after simplification. Differential Revision: https://reviews.llvm.org/D52140 llvm-svn: 342564
* [AMDGPU] Add instruction selection for i1 to f16 conversionCarl Ritson2018-09-191-0/+10
| | | | | | | | | | | | | | | | | | Summary: This is required for GPUs with 16 bit instructions where f16 is a legal register type and hence int_to_fp i1 to f16 is not lowered by legalizing. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52018 Change-Id: Ie4c0fd6ced7cf10ad612023c6879724d9ded5851 llvm-svn: 342558
* [bpf] Symbol sizes and types in object fileYonghong Song2018-09-191-2/+2
| | | | | | | | | | | | Clang-compiled object files currently don't include the symbol sizes and types. Some tools however need that information. For example, ctfconvert uses that information to generate FreeBSD's CTF representation from ELF files. With this patch, symbol sizes and types are included in object files. Signed-off-by: Paul Chaignon <paul.chaignon@orange.com> Reported-by: Yutaro Hayakawa <yhayakawa3720@gmail.com> llvm-svn: 342556
* [TableGen][SubtargetEmitter] Add the ability for processor models to ↵Andrea Di Biagio2018-09-192-73/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | describe dependency breaking instructions. This patch adds the ability for processor models to describe dependency breaking instructions. Different processors may specify a different set of dependency-breaking instructions. That means, we cannot assume that all processors of the same target would use the same rules to classify dependency breaking instructions. The main goal of this patch is to provide the means to describe dependency breaking instructions directly via tablegen, and have the following TargetSubtargetInfo hooks redefined in overrides by tabegen'd XXXGenSubtargetInfo classes (here, XXX is a Target name). ``` virtual bool isZeroIdiom(const MachineInstr *MI, APInt &Mask) const { return false; } virtual bool isDependencyBreaking(const MachineInstr *MI, APInt &Mask) const { return isZeroIdiom(MI); } ``` An instruction MI is a dependency-breaking instruction if a call to method isDependencyBreaking(MI) on the STI (TargetSubtargetInfo object) evaluates to true. Similarly, an instruction MI is a special case of zero-idiom dependency breaking instruction if a call to STI.isZeroIdiom(MI) returns true. The extra APInt is used for those targets that may want to select which machine operands have their dependency broken (see comments in code). Note that by default, subtargets don't know about the existence of dependency-breaking. In the absence of external information, those method calls would always return false. A new tablegen class named STIPredicate has been added by this patch to let processor models classify instructions that have properties in common. The idea is that, a MCInstrPredicate definition can be used to "generate" an instruction equivalence class, with the idea that instructions of a same class all have a property in common. STIPredicate definitions are essentially a collection of instruction equivalence classes. Also, different processor models can specify a different variant of the same STIPredicate with different rules (i.e. predicates) to classify instructions. Tablegen backends (in this particular case, the SubtargetEmitter) will be able to process STIPredicate definitions, and automatically generate functions in XXXGenSubtargetInfo. This patch introduces two special kind of STIPredicate classes named IsZeroIdiomFunction and IsDepBreakingFunction in tablegen. It also adds a definition for those in the BtVer2 scheduling model only. This patch supersedes the one committed at r338372 (phabricator review: D49310). The main advantages are: - We can describe subtarget predicates via tablegen using STIPredicates. - We can describe zero-idioms / dep-breaking instructions directly via tablegen in the scheduling models. In future, the STIPredicates framework can be used for solving other problems. Examples of future developments are: - Teach how to identify optimizable register-register moves - Teach how to identify slow LEA instructions (each subtarget defining its own concept of "slow" LEA). - Teach how to identify instructions that have undocumented false dependencies on the output registers on some processors only. It is also (in my opinion) an elegant way to expose knowledge to both external tools like llvm-mca, and codegen passes. For example, machine schedulers in LLVM could reuse that information when internally constructing the data dependency graph for a code region. This new design feature is also an "opt-in" feature. Processor models don't have to use the new STIPredicates. It has all been designed to be as unintrusive as possible. Differential Revision: https://reviews.llvm.org/D52174 llvm-svn: 342555
* [DAGCombiner][x86] add transform/hook to decompose integer multiply into ↵Sanjay Patel2018-09-192-0/+19
| | | | | | | | | | | | | | | | | | | | | shift/add This is an alternative to D37896. I don't see a way to decompose multiplies generically without a target hook to tell us when it's profitable. ARM and AArch64 may be able to remove some duplicate code that overlaps with this transform. As a first step, we're only getting the most clear wins on the vector examples requested in PR34474: https://bugs.llvm.org/show_bug.cgi?id=34474 As noted in the code comment, it's likely that the x86 constraints are tighter than necessary, but it may not always be a win to replace a pmullw/pmulld. Differential Revision: https://reviews.llvm.org/D52195 llvm-svn: 342554
* [AtomicExpandPass]: Add a hook for custom cmpxchg expansion in IRAlex Bradbury2018-09-196-13/+25
| | | | | | | | | | | | | | | | | This involves changing the shouldExpandAtomicCmpXchgInIR interface, but I have updated the in-tree backends using this hook (ARM, AArch64, Hexagon) so they will see no functional change. Previously this hook returned bool, but it now returns AtomicExpansionKind. This hook allows targets to select how a given cmpxchg is to be expanded. D48131 uses this to expand part-word cmpxchg to a target-specific intrinsic. See my associated RFC for more info on the motivation for this change <http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html>. Differential Revision: https://reviews.llvm.org/D48130 llvm-svn: 342550
* [ARM] Fix unwind information for floating point registersOliver Stannard2018-09-191-3/+7
| | | | | | | | | | | | Fixes the unwind information generated for floating-point registers. Previously, all padding registers were assumed to be four bytes wide. Now, the width of the register is used to specify the amount of padding. Patch by Jackson Woodruff! Differential revision: https://reviews.llvm.org/D51494 llvm-svn: 342545
* Verify commit access in fixing typoCalixte Denizet2018-09-191-1/+1
| | | | llvm-svn: 342538
* [RISCV] Codegen for i8, i16, and i32 atomicrmw with RV32AAlex Bradbury2018-09-198-3/+693
| | | | | | | | | | | | | | | | | | | | Introduce a new RISCVExpandPseudoInsts pass to expand atomic pseudo-instructions after register allocation. This is necessary in order to ensure that register spills aren't introduced between LL and SC, thus breaking the forward progress guarantee for the operation. AArch64 does something similar for CmpXchg (though only at O0), and Mips is moving towards this approach (see D31287). See also [this mailing list post](http://lists.llvm.org/pipermail/llvm-dev/2016-May/099490.html) from James Knight, which summarises the issues with lowering to ll/sc in IR or pre-RA. See the [accompanying RFC thread](http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html) for an overview of the lowering strategy. Differential Revision: https://reviews.llvm.org/D47882 llvm-svn: 342534
* [COFF] Emit @feat.00 on 64-bit and set the CFG bit when emitting guardcf tablesHans Wennborg2018-09-191-8/+15
| | | | | | | | | | | | | | | The 0x800 bit in @feat.00 needs to be set in order to make LLD pick up the .gfid$y table. I believe this is fine to set even if we don't emit the instrumentation. We haven't emitted @feat.00 on 64-bit before. I see that MSVC does emit it, but I'm not entirely sure what the default value should be. I went with zero since that seems as safe as not emitting the symbol in the first place. Differential Revision: https://reviews.llvm.org/D52235 llvm-svn: 342532
* [WebAssembly][NFC] Remove extra space in WebAssemblyInstrSIMD.tdThomas Lively2018-09-191-1/+1
| | | | llvm-svn: 342522
* AArch64MacroFusion: Factor out some opcode handling code; NFCMatthias Braun2018-09-191-121/+110
| | | | llvm-svn: 342521
* ScheduleDAG: Cleanup dumping code; NFCMatthias Braun2018-09-196-21/+17
| | | | | | | | | | | | - Instead of having both `SUnit::dump(ScheduleDAG*)` and `ScheduleDAG::dumpNode(ScheduleDAG*)`, just keep the latter around. - Add `ScheduleDAG::dump()` and avoid code duplication in several places. Implement it for different ScheduleDAG variants. - Add `ScheduleDAG::dumpNodeName()` in favor of the `SUnit::print()` functions. They were only ever used for debug dumping and putting the function into ScheduleDAG is consistent with the `dumpNode()` change. llvm-svn: 342520
* [WebAssembly] v4f32.abs and v2f64.absThomas Lively2018-09-181-0/+8
| | | | | | | | | | Summary: implement lowering of @llvm.fabs for vector types. Reviewers: aheejin, dschuff Subscribers: llvm-svn: 342513
* [AMDGPU] Match udot8 patternFarhana Aleen2018-09-181-22/+47
| | | | | | | | | | | | | | | | | | | | | Summary: D.u32 = S0.u4[0] * S1.u4[0] + S0.u4[1] * S1.u4[1] + S0.u4[2] * S1.u4[2] + S0.u4[3] * S1.u4[3] + S0.u4[4] * S1.u4[4] + S0.u4[5] * S1.u4[5] + S0.u4[6] * S1.u4[6] + S0.u4[7] * S1.u4[7] + S2.u32 Author: FarhanaAleen Reviewed By: arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D51947 llvm-svn: 342497
* [RISCV][MC] Use a custom ParserMethod for the bare_symbol operand typeAlex Bradbury2018-09-182-33/+36
| | | | | | | | | | | | | | This allows the hard-coded shouldForceImmediate logic to be removed because the generated MatchOperandParserImpl makes use of the current context (i.e. the current mnemonic) to determine parsing behaviour, and so won't first try to parse a register before parsing a symbol name. No functional change is intended. gas accepts immediate arguments for call, tail and lla. This patch doesn't address this discrepancy. Differential Revision: https://reviews.llvm.org/D51733 llvm-svn: 342488
* [RISCV][MC] Reject bare symbols for the simm12 operand typeAlex Bradbury2018-09-181-3/+5
| | | | | | | addi a0, a0, foo and lw a0, foo(a0) and similar are now rejected. An explicit %lo and %pcrel_lo modifier is required. This matches gas behaviour. llvm-svn: 342487
* [RISCV][MC] Tighten up checking of sybol operands to lui and auipcAlex Bradbury2018-09-182-13/+42
| | | | | | | | | | | | Reject bare symbols and accept only %pcrel_hi(sym) for auipc and %hi(sym) for lui. Also test valid operand modifiers in rv32i-valid.s. Note this is slightly stricter than gas, which will accept either %pcrel_hi or %hi for both lui and auipc. Differential Revision: https://reviews.llvm.org/D51731 llvm-svn: 342486
* [PowerPC] Do not emit record-form rotates when record-form andi/andis sufficesNemanja Ivanovic2018-09-181-6/+28
| | | | | | | | | | | | This is a follow-up to the previous patch that eliminated some of the rotates. With this addition, we will also emit the record-form andis. This patch increases the number of record-form rotates we eliminate by more than 70%. Differential revision: https://reviews.llvm.org/D44897 llvm-svn: 342478
* [PowerPC] Optimize compares fed by ANDISoNemanja Ivanovic2018-09-181-1/+2
| | | | | | | | | | | | | | | Both ANDIo and ANDISo (and the 64-bit versions) are record-form instructions. When optimizing compares, we handle the former in order to eliminate the compare instruction but not the latter. This patch just adds the latter to the set of instructions we optimize. The reason these instructions need to be handled separately is that they are not part of the RecFormRel map (since they don't have a non-record-form). The missing "and-immediate-shifted" is just an oversight in the initial implementation. Differential revision: https://reviews.llvm.org/D51353 llvm-svn: 342472
* [X86][SSE] LowerShift - pull out repeated getTargetVShiftUniformOpcode ↵Simon Pilgrim2018-09-181-25/+19
| | | | | | calls. NFCI. llvm-svn: 342462
* [AArch64] Attempt to parse more operands as expressionsDavid Green2018-09-181-24/+11
| | | | | | | | | | | | | | This tries to make use of evaluateAsRelocatable in AArch64AsmParser::classifySymbolRef to parse more complex expressions as relocatable operands. It is hopefully better than the existing code which only handles Symbol +- Constant. This allows us to parse more complex adr/adrp, mov, ldr/str and add operands. It also loosens the requirements on parsing addends in ld/st and mov's and adds a number of tests. Differential Revision: https://reviews.llvm.org/D51792 llvm-svn: 342455
* AMDGPU: Don't form fmed3 if it will require materializationMatt Arsenault2018-09-181-2/+9
| | | | | | | If there is a single use constant, it can be folded into the min/max, but not into med3. llvm-svn: 342443
* [PowerPC] Add Itineraries of IIC_IntMulHD for P7/P8QingShan Zhang2018-09-182-0/+8
| | | | | | | | | | | | | | | | | | When doing some instruction scheduling work, we noticed some missing itineraries. Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling, because we can still get same latency due to default values. With machine scheduler, however, itineraries will have impact to scheduling. eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class. And most of the instruction class with itineraries will have NumMicroOps default to 1. This will has impact on the count of RetiredMOps, affects the Pending/Available Queue, then causing different scheduling or suboptimal scheduling further. Patch By: jsji (Jinsong Ji) Differential Revision: https://reviews.llvm.org/D52040 llvm-svn: 342441
* AMDGPU: Expand vector canonicalizesMatt Arsenault2018-09-181-0/+1
| | | | llvm-svn: 342439
* Revert "[ARM] Cleanup ARM CGP isSupportedValue"Volodymyr Sapsai2018-09-181-19/+42
| | | | | | | | | | | | | | | This reverts r342395 as it caused error > Argument value type does not match pointer operand type! > %0 = atomicrmw volatile xchg i8* %_Value1, i32 1 monotonic, !dbg !25 > i8in function atomic_flag_test_and_set > fatal error: error in backend: Broken function found, compilation aborted! on bot http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/ More details are available at https://reviews.llvm.org/D52080 llvm-svn: 342431
* [mips] Fix MIPS N32 ABI triples supportSimon Atanasyan2018-09-173-2/+9
| | | | | | | | | | | | Add support mips64(el)-linux-gnuabin32 triples, and set them to N32. Debian architecture name mipsn32/mipsn32el are also added. Set UseIntegratedAssembler for N32 if we can detect it. Patch by YunQiang Su. Differential revision: https://reviews.llvm.org/D51408 llvm-svn: 342416
* [X86ISel] Implement byval lowering for Win64 calling conventionKeno Fischer2018-09-172-9/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The IR reference for the `byval` attribute states: ``` This indicates that the pointer parameter should really be passed by value to the function. The attribute implies that a hidden copy of the pointee is made between the caller and the callee, so the callee is unable to modify the value in the caller. This attribute is only valid on LLVM pointer arguments. ``` However, on Win64, this attribute is unimplemented and the raw pointer is passed to the callee instead. This is problematic, because frontend authors relying on the implicit hidden copy (as happens for every other calling convention) will see the passed value silently (if mutable memory) or loudly (by means of a crash) modified because the callee treats the location as scratch memory space it is allowed to mutate. At this point, it's worth taking a step back to understand the context. In most calling conventions, aggregates that are too large to be passed in registers, instead get *copied* to the stack at a fixed (computable from the signature) offset of the stack pointer. At the LLVM, we hide this hidden copy behind the byval attribute. The caller passes a pointer to the desired data and the callee receives a pointer, but these pointers are not the same. In particular, the pointer that the callee receives points to temporary stack memory allocated as part of the call lowering. In most calling conventions, this pointer is never realized in registers or memory. The temporary memory is simply defined by an implicit offset from the stack pointer at function entry. Win64, uniquely, works differently. The structure is still passed in memory, but instead of being stored at an implicit memory offset, the caller computes a pointer to the temporary memory and passes it to the callee as a regular pointer (taking up a register, or if all registers are taken up, an additional stack slot). Presumably, this was done to allow eliding the copy when passing aggregates through several functions on the stack. This explains why ignoring the `byval` attribute mostly works on Win64. The argument simply gets passed as a pointer and as long as we're ok with the callee trampling all over that memory, there are no ill effects. However, it does contradict the documentation of the `byval` attribute which specifies that there is to be an implicit copy. Frontends can of course work around this by never emitting the `byval` attribute for Win64 and creating `alloca`s for the requisite temporary stack slots (and that does appear to be what frontends are doing). However, the presence of the `byval` attribute is not a trap for frontend authors, since it seems to work, but silently modifies the passed memory contrary to documentation. I see two solutions: - Disallow the `byval` attribute in the verifier if using the Win64 calling convention. - Make it work by simply emitting a temporary stack copy as we would with any other calling convention (frontends can of course always not use the attribute if they want to elide the copy). This patch implements the second option (make it work), though I would be fine with the first also. Ref: https://github.com/JuliaLang/julia/issues/28338 Reviewers: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51842 llvm-svn: 342402
* [AMDGPU] Initialize instruction itinerary from GCNSubtargetStanislav Mekhanoshin2018-09-172-0/+6
| | | | | | | | I need to use it in the GCN codegen. Differential Revision: https://reviews.llvm.org/D52123 llvm-svn: 342400
* [ARM] Cleanup ARM CGP isSupportedValueSam Parker2018-09-171-42/+19
| | | | | | | | | | | | isSupportedValue explicitly checked and accepted many types of value, primarily for debugging reasons. Remove most of these checks and do a bit of refactoring now that the pass is more stable. This also enables ZExts to be sources, but this has very little practical benefit at the moment extend instructions will still be introduced. Differential Revision: https://reviews.llvm.org/D52080 llvm-svn: 342395
* [ARM] Disallow icmp with negative imm and overflowSam Parker2018-09-171-0/+11
| | | | | | | | | | We allow overflowing instructions if they're decreasing and only used by an unsigned compare. Add the extra condition that the icmp cannot be using a negative immediate. Differential Revision: https://reviews.llvm.org/D52102 llvm-svn: 342392
* [PowerPC] Fix label address calculation for ppc64Strahinja Petrovic2018-09-171-1/+2
| | | | | | | | This patch fixes calculating address of label for non-pic ppc64. Differential Revision: https://reviews.llvm.org/D50965 llvm-svn: 342368
* [X86][SSE] Always enable ISD::SRL -> ISD::MULHU for v8i16Simon Pilgrim2018-09-161-1/+0
| | | | | | For constant non-uniform cases we'll never introduce more and/andn/or selects than already occur in generic pre-SSE41 ISD::SRL lowering. llvm-svn: 342352
OpenPOWER on IntegriCloud