summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Don't use Lower512IntUnary to split bitcasts with v32i16/v64i8 types ↵Craig Topper2018-04-091-7/+22
| | | | | | | | | | | | | | on targets without AVX512BW. LowerIntUnary as its name says has an assert for integer types. But for the bitcast case one side might be an FP type. Rather than making sure the function really works for fp types and renaming it. Just do really basic splitting directly. The LowerIntUnary has the advantage that it can peek through BUILD_VECTOR because every other call is during Lowering. But these calls are during legalization and will be followed by a DAG combine round. Revert some change to LowerVectorIntUnary that were originally made just to make these two calls work even in pure integer cases. This was found purely by compiling the avx512f-builtins.c test from clang so I've copied over the offending function from that. llvm-svn: 329616
* AArch64: Allow offsets to be folded into addresses with ELF.Peter Collingbourne2018-04-092-17/+24
| | | | | | | | | | | | | | | This is a code size win in code that takes offseted addresses frequently, such as C++ constructors that typically need to compute an offseted address of a vtable. It reduces the size of Chromium for Android's .text section by 46KB, or 56KB with ThinLTO (which exposes more opportunities to use a direct access rather than a GOT access). Because the addend range is limited in COFF and Mach-O, this is enabled for ELF only. Differential Revision: https://reviews.llvm.org/D45199 llvm-svn: 329611
* Revert "AMDGPU: enable 128-bit for local addr space under an option"Alex Shlyapnikov2018-04-095-17/+12
| | | | | | | | | | | | | | This reverts commit r329591. It breaks various bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/16516 http://lab.llvm.org:8011/builders/clang-ppc64be-linux/builds/17374 http://lab.llvm.org:8011/builders/clang-ppc64le-linux/builds/15992 http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/11251 ... llvm-svn: 329610
* [WebAssembly] Change std::sort to llvm::sort in response to r327219Mandeep Singh Grang2018-04-091-10/+10
| | | | | | | | | | | | | | | | | | | | | | Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches. Reviewers: sunfish, RKSimon Reviewed By: sunfish Subscribers: jfb, dschuff, sbc100, jgravelle-google, aheejin, llvm-commits Differential Revision: https://reviews.llvm.org/D44873 llvm-svn: 329607
* [X86] Revert the SLM part of r328914.Craig Topper2018-04-091-1/+3
| | | | | | While it appears to be correct information based on Intel's optimization manual and Agner's data, it causes perf regressions on a couple of the benchmarks in our internal list. llvm-svn: 329593
* AMDGPU: enable 128-bit for local addr space under an optionMarek Olsak2018-04-095-12/+17
| | | | | | | | | | | | | | | | | Author: Samuel Pitoiset ds_read_b128 and ds_write_b128 have been recently enabled under the amdgpu-ds128 option because the performance benefit is unclear. Though, using 128-bit loads/stores for the local address space appears to introduce regressions in tessellation shaders. Not sure what is broken, but as ds_read_b128/ds_write_b128 are not enabled by default, just introduce a global option and enable 128-bit only if requested (until it's fixed/used correctly). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464 llvm-svn: 329591
* AMDGPU: Initialize GlobalISel passesTom Stellard2018-04-091-0/+1
| | | | | | | | | | | | | | Summary: This fixes AMDGPU GlobalISel test failures when enabling the AMDGPU target without any other targets that use GlobalISel. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D45353 llvm-svn: 329588
* [X86][MMX] Fix missing itinerary for PALIGNRSimon Pilgrim2018-04-091-4/+4
| | | | llvm-svn: 329568
* [X86][MMX] Fix missing itinerary for MOVQ2DQ instruction formatSimon Pilgrim2018-04-091-1/+1
| | | | llvm-svn: 329567
* [X86][MMX] Fix missing itinerary for CVTPI2PSSimon Pilgrim2018-04-091-4/+4
| | | | llvm-svn: 329565
* [AMDGPU][MC][GFX9] Added instructions s_mul_hi_*32, s_lshl*_add_u32Dmitry Preobrazhensky2018-04-091-0/+21
| | | | | | | | | | | See bugs 36841: https://bugs.llvm.org/show_bug.cgi?id=36841 36842: https://bugs.llvm.org/show_bug.cgi?id=36842 Differential Revision: https://reviews.llvm.org/D45251 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329562
* [X86][MMX] Fix flipped reg/mem typo in MMX_MISC_FUNC_ITINSSimon Pilgrim2018-04-091-1/+1
| | | | | | The RR/RM itineraries were the wrong way around llvm-svn: 329561
* [X86][SSE] Fix f32 mul/div itinerary groups typoSimon Pilgrim2018-04-091-4/+4
| | | | | | The RM folded itineraries were incorrectly using the f64 version. llvm-svn: 329556
* [NFC] fix trivial typos in comments and error messageHiroshi Inoue2018-04-094-4/+4
| | | | | | "is is" -> "is", "are are" -> "are" llvm-svn: 329546
* [TargetSchedule] shrink interface for init(); NFCISanjay Patel2018-04-084-4/+4
| | | | | | | | | | The TargetSchedModel is always initialized using the TargetSubtargetInfo's MCSchedModel and TargetInstrInfo, so we don't need to extract those and pass 3 parameters to init(). Differential Revision: https://reviews.llvm.org/D44789 llvm-svn: 329540
* [X86] Add SchedWrites for CMOV and SETCC. Use them to remove InstRWs.Craig Topper2018-04-0810-76/+63
| | | | | | | | | | | | | | | | | | | Summary: Cmov and setcc previously used WriteALU, but on Intel processors at least they are more restricted than basic ALU ops. This patch adds new SchedWrites for them and removes the InstRWs. I had to leave some InstRWs for CMOVA/CMOVBE and SETA/SETBE because those have an extra uop relative to the other condition codes on Intel CPUs. The test changes are due to fixing a missing ZnAGU dependency on the memory form of setcc. Reviewers: RKSimon, andreadb, GGanesh Reviewed By: RKSimon Subscribers: GGanesh, llvm-commits Differential Revision: https://reviews.llvm.org/D45380 llvm-svn: 329539
* [X86][Znver1] Remove InstRWs for BLENDVPS/PDCraig Topper2018-04-081-12/+0
| | | | | | | | | | | | | | | | | Summary: This removes the InstRWs for BLENDVPS/PD in favor of WriteFVarBlend. The latency listed was 3 cycles but WriteFVarBlend is defined as 1 cycle latency. The 1 cycle latency matches Agner Fog's data. The patterns were missing the VEX forms which is why there are no test changes. We don't test "-mcpu=znver1 -mattr=-avx" Reviewers: RKSimon, GGanesh Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44841 llvm-svn: 329538
* [PowerPC] Change std::sort to llvm::sort in response to r327219Mandeep Singh Grang2018-04-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches. Reviewers: hfinkel, RKSimon Reviewed By: RKSimon Subscribers: nemanjai, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D44870 llvm-svn: 329535
* [X86] Change std::sort to llvm::sort in response to r327219Mandeep Singh Grang2018-04-081-4/+4
| | | | | | | | | | | | | | | | | | | | | | Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches. Reviewers: chandlerc, craig.topper, RKSimon Reviewed By: chandlerc, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44874 llvm-svn: 329534
* [X86][Btver2] Add vector extract costsSimon Pilgrim2018-04-081-0/+19
| | | | llvm-svn: 329524
* [X86] Combine vXi64 multiplies to MULDQ/MULUDQ during DAG combine instead of ↵Craig Topper2018-04-071-15/+53
| | | | | | | | lowering. Previously we used a custom lowering for this because of the AVX1 splitting requirement. But we can do the split during DAG combine if we check the types and subtarget llvm-svn: 329510
* [CostModel][X86] Fix v32i16/v64i8 SETCC costs on AVX512BW targetsSimon Pilgrim2018-04-071-0/+9
| | | | llvm-svn: 329498
* Reapply ARM: Do not spill CSR to stack on entry to noreturn functionsTim Northover2018-04-072-0/+14
| | | | | | | | | | | | | | | | | | Should fix UBSan bot by also checking there's no "uwtable" attribute before skipping. Otherwise the unwind table will be useless since its moves expect CSRs to actually be preserved. A noreturn nounwind function can be expected to never return in any way, and by never returning it will also never have to restore any callee-saved registers for its caller. This makes it possible to skip spills of those registers during function entry, saving some stack space and time in the process. This is rather useful for embedded targets with limited stack space. Should fix PR9970. Patch mostly by myeisha (pmb). llvm-svn: 329494
* Revert "ARM: Do not spill CSR to stack on entry to noreturn functions"Vitaly Buka2018-04-072-13/+0
| | | | | | | | Breaks ubsan test TestCases/Misc/missing_return.cpp on ARM This reverts commit r329287 llvm-svn: 329486
* [NVPTX] add support for initializing fp16 arrays.Artem Belevich2018-04-061-1/+7
| | | | | | | | | Previously HalfTy was not handled which would either trigger an assertion, or result in array initialized with garbage. Differential Revision: https://reviews.llvm.org/D45391 llvm-svn: 329463
* [NVPTX] Fixed vectorized LDG for f16.Artem Belevich2018-04-061-0/+6
| | | | | | | | | v2f16 is a special case in NVPTX. v4f16 may be loaded as a pair of v2f16 and that was not previously handled correctly by tryLDGLDU() Differential Revision: https://reviews.llvm.org/D45339 llvm-svn: 329456
* [RISCV] Tablegen-driven Instruction Compression.Sameer AbuAsal2018-04-068-5/+332
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch implements a tablegen-driven Instruction Compression mechanism for generating RISCV compressed instructions (C Extension) from the expanded instruction form. This tablegen backend processes CompressPat declarations in a td file and generates all the compile-time and runtime checks required to validate the declarations, validate the input operands and generate correct instructions. The checks include validating register operands, immediate operands, fixed register operands and fixed immediate operands. Example: class CompressPat<dag input, dag output> { dag Input = input; dag Output = output; list<Predicate> Predicates = []; } let Predicates = [HasStdExtC] in { def : CompressPat<(ADD GPRNoX0:$rs1, GPRNoX0:$rs1, GPRNoX0:$rs2), (C_ADD GPRNoX0:$rs1, GPRNoX0:$rs2)>; } The result is an auto-generated header file 'RISCVGenCompressEmitter.inc' which exports two functions for compressing/uncompressing MCInst instructions, plus some helper functions: bool compressInst(MCInst& OutInst, const MCInst &MI, const MCSubtargetInfo &STI, MCContext &Context); bool uncompressInst(MCInst& OutInst, const MCInst &MI, const MCRegisterInfo &MRI, const MCSubtargetInfo &STI); The clients that include this auto-generated header file and invoke these functions can compress an instruction before emitting it, in the target-specific ASM or ELF streamer, or can uncompress an instruction before printing it, when the expanded instruction format aliases is favored. The following clients were added to implement compression\uncompression for RISCV: 1) RISCVAsmParser::MatchAndEmitInstruction: Inserted a call to compressInst() to compresses instructions parsed by llvm-mc coming from an ASM input. 2) RISCVAsmPrinter::EmitInstruction: Inserted a call to compressInst() to compress instructions that were lowered from Machine Instructions (MachineInstr). 3) RVInstPrinter::printInst: Inserted a call to uncompressInst() to print the expanded version of the instruction instead of the compressed one (e.g, add s0, s0, a5 instead of c.add s0, a5) when -riscv-no-aliases is not passed. This patch squashes D45119, D42780 and D41932. It was reviewed in smaller patches by asb, efriedma, apazos and mgrang. Reviewers: asb, efriedma, apazos, llvm-commits, sabuasal Reviewed By: sabuasal Subscribers: mgorny, eraman, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, niosHD, kito-cheng, shiva0217, zzheng Differential Revision: https://reviews.llvm.org/D45385 llvm-svn: 329455
* [AMDGPU][MC][GFX9] Added s_call_b64Dmitry Preobrazhensky2018-04-061-0/+12
| | | | | | | | | See bug 36843: https://bugs.llvm.org/show_bug.cgi?id=36843 Differential Revision: https://reviews.llvm.org/D45268 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329440
* [Hexagon] Fix assert with packetizing IMPLICIT_DEF instructionsKrzysztof Parzyszek2018-04-061-1/+5
| | | | | | | | | | | | | | | | | | The compiler is generating packet with the following instructions, which causes an undefined register assert in the verifier. $r0 = IMPLICIT_DEF $r1 = IMPLICIT_DEF S2_storerd_io killed $r29, 0, killed %d0 The problem is that the packetizer is not saving the IMPLICIT_DEF instructions, which are needed when checking if it is legal to add the store instruction. The fix is to add the IMPLICIT_DEF instructions to the CurrentPacketMIs structure. Patch by Brendon Cahoon. llvm-svn: 329439
* [Hexagon] Prevent a stall across zero-latency instructions in a packetKrzysztof Parzyszek2018-04-061-15/+16
| | | | | | | | | | Packetizer keeps two zero-latency bound instrctions in the same packet ignoring the stalls on the later instruction. This should not be the case if there is no data dependence. Patch by Sumanth Gundapaneni. llvm-svn: 329437
* [Hexagon] Remove duplicated code, NFCKrzysztof Parzyszek2018-04-061-9/+0
| | | | llvm-svn: 329436
* [Hexagon] Handle subregisters when calculating iteration count in HW loopsKrzysztof Parzyszek2018-04-061-0/+1
| | | | llvm-svn: 329434
* [AMDGPU][MC][GFX9] Added instruction s_endpgm_ordered_ps_doneDmitry Preobrazhensky2018-04-061-0/+7
| | | | | | | | | See bug 36844: https://bugs.llvm.org/show_bug.cgi?id=36844 Differential Revision: https://reviews.llvm.org/D45313 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329430
* [X686] Add appropriate ReadAfterLd for the register input to memory forms of ↵Craig Topper2018-04-065-32/+32
| | | | | | ADC/SBB. llvm-svn: 329424
* [AMDGPU][MC][GFX9] Added instructions *saveexec*, *wrexec* and *bitreplicate*Dmitry Preobrazhensky2018-04-061-0/+21
| | | | | | | | | See bug 36840: https://bugs.llvm.org/show_bug.cgi?id=36840 Differential Revision: https://reviews.llvm.org/D45250 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329419
* [X86] Remove InstRWs for basic arithmetic instructions from Sandy Bridge ↵Craig Topper2018-04-061-64/+4
| | | | | | | | scheduler model. We can get this right through WriteALU and friends now. llvm-svn: 329417
* [X86] Attempt to model basic arithmetic instructions in the ↵Craig Topper2018-04-066-257/+35
| | | | | | | | | | | | | | | | | | | | | Haswell/Broadwell/Skylake scheduler models without InstRWs Summary: This patch removes InstRW overrides for basic arithmetic/logic instructions. To do this I've added the store address port to RMW. And used a WriteSequence to make the latency additive. It does not cover ADC/SBB because they have different latency. Apparently we were inconsistent about whether the store has latency or not thus the test changes. I've also left out Sandy Bridge because the load latency there is currently 4 cycles and should be 5. Reviewers: RKSimon, andreadb Reviewed By: andreadb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45351 llvm-svn: 329416
* [X86] Add an extra store address cycle to WriteRMW in the Sandy ↵Craig Topper2018-04-065-15/+15
| | | | | | | | Bridge/Broadwell/Haswell/Skylake scheduler model. Even those the address was calculated for the load, its calculated again for the store. llvm-svn: 329415
* [X86] Merge itineraries for CLC, CMC, and STC.Craig Topper2018-04-063-9/+5
| | | | | | These are very simple flag setting instructions that appear to only be a single uop. They're unlikely to need this separation. llvm-svn: 329414
* [AMDGPU][MC][VI][GFX9] Added s_atc_probe* instructionsDmitry Preobrazhensky2018-04-061-0/+28
| | | | | | | | | See bug 36839: https://bugs.llvm.org/show_bug.cgi?id=36839 Differential Revision: https://reviews.llvm.org/D45249 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329408
* [ARC] Add <.f> suffix for F32_GEN4_{DOP|SOP}.Pete Couperus2018-04-061-4/+32
| | | | | | | | | Add disassembler support for instructions which writeback STATUS32. https://reviews.llvm.org/D45148 Patch by Yan Luo! (Yan.Luo2@synopsys.com) llvm-svn: 329404
* [AMDGPU][MC][GFX9] Added s_dcache_discard* instructionsDmitry Preobrazhensky2018-04-061-0/+30
| | | | | | | | | See bug 36838: https://bugs.llvm.org/show_bug.cgi?id=36838 Differential Revision: https://reviews.llvm.org/D45247 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329397
* [X86][SandyBridge] Add (V)DPPS memory fold latenciesSimon Pilgrim2018-04-061-0/+14
| | | | | | Noticed this during D44654 llvm-svn: 329389
* [X86][SandyBridge] SBWriteResPair +5cy Memory FoldsSimon Pilgrim2018-04-061-6/+6
| | | | | | | | | | | | As mentioned on D44647, this patch increases the default memory latency to +5cy , which more closely matches what most custom cases are doing for reg-mem instructions. I've bumped LoadLatency, ReadAfterLd and WriteLoad values to 5cy to be consistent. As Sandy Bridge is currently our default generic model, this affects a lot of scheduling tests... Differential Revision: https://reviews.llvm.org/D44654 llvm-svn: 329388
* [X86][SkylakeServer] Merge 2 InstRW entries to the same sched group. NFCI.Simon Pilgrim2018-04-061-2/+2
| | | | llvm-svn: 329386
* [PowerPC] allow D-form VSX load/store when accessing FrameIndex without offset Hiroshi Inoue2018-04-061-8/+16
| | | | | | | | | | VSX D-form load/store instructions of POWER9 require the offset be a multiple of 16 and a helper`isOffsetMultipleOf` is used to check this. So far, the helper handles FrameIndex + offset case, but not handling FrameIndex without offset case. Due to this, we are missing opportunities to exploit D-form instructions when accessing an object or array allocated on stack. For example, x-form store (stxvx) is used for int a[4] = {0}; instead of d-form store (stxv). For larger arrays, D-form instruction is not used when accessing the first 16-byte. Using D-form instructions reduces register pressure as well as instructions. Differential Revision: https://reviews.llvm.org/D45079 llvm-svn: 329377
* Fix lld-x86_64-darwin13 build fails.Manoj Gupta2018-04-051-4/+4
| | | | | | | Use double braces in std::array initialization to keep Darwin builders happy. llvm-svn: 329363
* Attempt to fix Mips breakages.Manoj Gupta2018-04-051-7/+8
| | | | | | | | | | | | | | Summary: Replace ArrayRefs by actual std::array objects so that there are no dangling references. Reviewers: rsmith, gkistanova Subscribers: sdardis, arichardson, llvm-commits Differential Revision: https://reviews.llvm.org/D45338 llvm-svn: 329359
* [X86] Separate CDQ and CDQE in the scheduler model.Craig Topper2018-04-055-20/+10
| | | | | | According to Agner's data, CDQE is closer to CWDE. llvm-svn: 329354
* [X86] Add MOVZPQILo2PQIrr to the Sandy Bridge scheduler modelCraig Topper2018-04-051-1/+1
| | | | llvm-svn: 329351
OpenPOWER on IntegriCloud