summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* use repmovsb when optimizing forminsizeClement Courbet2017-04-211-8/+31
| | | | llvm-svn: 300960
* Rename FastString flag.Clement Courbet2017-04-215-12/+15
| | | | llvm-svn: 300959
* X86 memcpy: use REPMOVSB instead of REPMOVS{Q,D,W} for inline copiesClement Courbet2017-04-215-1/+20
| | | | | | | | | | | | when the subtarget has fast strings. This has two advantages: - Speed is improved. For example, on Haswell thoughput improvements increase linearly with size from 256 to 512 bytes, after which they plateau: (e.g. 1% for 260 bytes, 25% for 400 bytes, 40% for 508 bytes). - Code is much smaller (no need to handle boundaries). llvm-svn: 300957
* Delete dead codeClement Courbet2017-04-211-15/+1
| | | | llvm-svn: 300952
* [Thumb1] The recently added tADCS and tSBCS pseudo-instructions were missing ↵Artyom Skrobov2017-04-211-1/+2
| | | | | | | | | | | | | | `Uses = [CPSR]` Summary: Thanks to Oliver Stannard for helping catch this. Reviewers: olista01, efriedma Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D31815 llvm-svn: 300951
* Revert r300932 and r300930.Akira Hatanaka2017-04-216-150/+7
| | | | | | | | | It seems that r300930 was creating an infinite loop in dag-combine when compling the following file: MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c llvm-svn: 300940
* [AArch64] Use suffix ULL to shift a 64-bit value.Akira Hatanaka2017-04-211-1/+1
| | | | llvm-svn: 300932
* [AArch64] Improve code generation for logical instructions takingAkira Hatanaka2017-04-216-7/+150
| | | | | | | | | | | | | | | | | | | | immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. This recommits r300913, which broke bots because I didn't fix a call to ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of TargetLoweringOpt and TargetLowering. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 300930
* X86RegisterInfo: eliminateFrameIndex: Avoid code duplication; NFCMatthias Braun2017-04-203-35/+23
| | | | | | | | | | | | | | X86RegisterInfo::eliminateFrameIndex() and X86FrameLowering::getFrameIndexReference() both had logic to compute the base register. This consolidates the code. Also use MachineInstr::isReturn instead of manually enumerating tail call instructions (return instructions were not included in the previous list because they never reference frame indexes). Differential Revision: https://reviews.llvm.org/D32206 llvm-svn: 300923
* X86RegisterInfo: eliminateFrameIndex: Force SP for AfterFPPop; NFCMatthias Braun2017-04-201-3/+4
| | | | | | | | | | | | | AfterFPPop is used for tailcall/tailjump instructions. We shouldn't ever have frame-pointer/base-pointer relative addressing for those. After all the frame/base pointer should already be restored to their previous values at the return. Make this fact explicit in preparation for an upcoming refactoring. Differential Revision: https://reviews.llvm.org/D32205 llvm-svn: 300922
* Revert "[AArch64] Improve code generation for logical instructions taking"Akira Hatanaka2017-04-205-149/+6
| | | | | | | | This reverts r300913. This broke bots. llvm-svn: 300916
* [AArch64] Improve code generation for logical instructions takingAkira Hatanaka2017-04-205-6/+149
| | | | | | | | | | | | | | | | immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 300913
* AArch64: lower "fence singlethread" to a pure compiler barrier.Tim Northover2017-04-204-0/+14
| | | | | | | | Single-threaded fences aren't required to provide any synchronization with other processing elements so there's no need for a DMB. They should still be a barrier for compiler optimizations though. llvm-svn: 300905
* ARM: lower "fence singlethread" to a pure compiler barrier.Tim Northover2017-04-202-1/+12
| | | | | | | | Single-threaded fences aren't required to provide any synchronization with other processing elements so there's no need for a DMB. They should still be a barrier for compiler optimizations though. llvm-svn: 300904
* [AArch64] Whitespace/ordering fixes for Falkor machine description. NFC.Chad Rosier2017-04-201-2/+4
| | | | llvm-svn: 300893
* [AArch64] Refine Falkor machine description for pre/post-inc and stores.Chad Rosier2017-04-201-5/+5
| | | | llvm-svn: 300892
* ARM: handle post-indexed NEON ops where the offset isn't the access width.Tim Northover2017-04-202-14/+24
| | | | | | | | | | | Before, we assumed that any ConstantInt offset was precisely the access width, so we could use the "[rN]!" form. ISelLowering only ever created that kind, but further simplification during combining could lead to unexpected constants and incorrect codegen. Should fix PR32658. llvm-svn: 300878
* [AArch64] Improve scheduling of logical operations on Falkor.Chad Rosier2017-04-201-0/+6
| | | | llvm-svn: 300871
* [Thumb-1] Fix corner cases for compressed jump tablesWeiming Zhao2017-04-201-0/+9
| | | | | | | | | | | | | | | | | | | | | Summary: When synthesized TBB/TBH is expanded, we need to avoid the case of: BaseReg is redefined after the load of branching target. E.g.: %R2 = tLEApcrelJT <jt#1> %R1 = tLDRr %R1, %R2 ==> %R2 = tLEApcrelJT <jt#1> %R2 = tLDRspi %SP, 12 %R2 = tLDRspi %SP, 12 tBR_JTr %R1 tTBB_JT %R2, %R1 ` Reviewers: jmolloy Reviewed By: jmolloy Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D32250 llvm-svn: 300870
* Fix use-after-frees on memory allocated in a Recycler.Benjamin Kramer2017-04-204-11/+11
| | | | | | | | This will become asan errors once the patch lands that poisons the memory after free. The x86 change is a hack, but I don't see how to solve this properly at the moment. llvm-svn: 300867
* [WebAssembly] Add known failures for wasm object file backendSam Clegg2017-04-201-0/+28
| | | | | | | | Subscribers: jfb, dschuff Differential Revision: https://reviews.llvm.org/D32300 llvm-svn: 300859
* [APInt] Rename getSignBit to getSignMaskCraig Topper2017-04-201-16/+16
| | | | | | | | getSignBit is a static function that creates an APInt with only the sign bit set. getSignMask seems like a better name to convey its functionality. In fact several places use it and then store in an APInt named SignMask. Differential Revision: https://reviews.llvm.org/D32108 llvm-svn: 300856
* [mips][msa] Mask vectors holding shift amountsPetar Jovanovic2017-04-202-6/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | Masked vectors which hold shift amounts when creating the following nodes: ISD::SHL, ISD::SRL or ISD::SRA. Instructions that use said nodes, which have had their arguments altered are sll, srl, sra, bneg, bclr and bset. For said instructions, the shift amount or the bit position that is specified in the corresponding vector elements will be interpreted as the shift amount/bit position modulo the size of the element in bits. The problem lies in compiling with -O2 enabled, where the instructions for formats .w and .d are not generated, but are instead optimized away. In this case, having shift amounts that are either negative or greater than the element bit size results in generation of incorrect results when constant folding. We remedy this by masking the operands for the nodes mentioned above before actually creating them, so that the final result is correct before placed into the constant pool. Patch by Stefan Maksimovic. Differential Revision: https://reviews.llvm.org/D31331 llvm-svn: 300839
* [ARM] Fix handling of mapping symbols when changing sectionsJohn Brawn2017-04-201-1/+1
| | | | | | | | | | | ChangeSection incorrectly registers LastEMSInfo as belonging to the previous section, not the current section. This happens to work when changing sections using .section, as the previous section is set to the current section before the call to ChangeSection, but not when using .popsection. Differential Revision: https://reviews.llvm.org/D32225 llvm-svn: 300831
* [AArch64] Fix handling of zero immediate in fmov instructionsJohn Brawn2017-04-202-19/+14
| | | | | | | | | | | Currently fmov #0 with a vector destination is handle incorrectly and results in fmov #-1.9375 being emitted but should instead give an error. This is due to the way we cope with fmov #0 with a scalar destination being an alias of fmov zr, so fix this by actually doing it through an alias. Differential Revision: https://reviews.llvm.org/D31949 llvm-svn: 300830
* [AArch64] Fix handling of integer fp immediatesJohn Brawn2017-04-201-22/+13
| | | | | | | | When an integer is used as an fp immediate we're failing to check the return value of getFP64Imm, so invalid values are silently permitted. Fix this by merging together the integer and real handling. llvm-svn: 300828
* [ARM] Rename HW div feature to HW div Thumb. NFCI.Diana Picus2017-04-208-36/+38
| | | | | | | | | | | | | | | | The hardware div feature refers only to Thumb, but because of its name it is tempting to use it to check for hardware division in general, which may cause problems in ARM mode. See https://reviews.llvm.org/D32005. This patch adds "Thumb" to its name, to make its scope clear. One notable place where I haven't made the change is in the feature flag (used with -mattr), which is still hwdiv. Changing it would also require changes in a lot of tests, including clang tests, and it doesn't seem like it's worth the effort. Differential Revision: https://reviews.llvm.org/D32160 llvm-svn: 300827
* Revert earlier change. ds permute operations affect lgkm counter. Kannan Narayanan2017-04-191-2/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D32254 llvm-svn: 300791
* X86FrameLowering: Fix getFrameIndexReference() for 'fixed' objectsMatthias Braun2017-04-192-6/+8
| | | | | | | | | | | Debug information is calculated with getFrameIndexReference() which was missing some logic for the fixed object cases (= parameters on the stack). rdar://24557797 Differential Revision: https://reviews.llvm.org/D32204 llvm-svn: 300781
* ARMFrameLowering: Reserve emergency spill slot for large argumentsMatthias Braun2017-04-191-8/+35
| | | | | | | | | | | | | | | | Re-commit after revert in r300668. Changed getMaxFPOffset() to a more conservative heuristic instead of trying to be clever and missing for some exotic calling conventions. We need to reserve an emergency spill slot in cases with large argument types that could overflow immediate offsets for FP relative address calculations. rdar://31317893 Differential Revision: https://reviews.llvm.org/D31643 llvm-svn: 300761
* AMDGPU: Custom lower illegal small select typesMatt Arsenault2017-04-191-0/+29
| | | | | | | Promote them to i32 vectors to avoid unpacking and re-packing the vectors. llvm-svn: 300754
* [ARM] Remove redundant computeKnownBits helper.Eli Friedman2017-04-191-29/+14
| | | | | | | | | | | | Move the BFI logic to computeKnownBitsForTargetNode, and delete the redundant CMOV logic. This is intended as a cleanup, but it's probably possible to construct a case where moving the BFI logic allows more combines. Differential Revision: https://reviews.llvm.org/D31795 llvm-svn: 300752
* [GISEL]: Move getConstantVReg to UtilsAditya Nandakumar2017-04-191-0/+1
| | | | | | NFCI llvm-svn: 300751
* [ARM] Use TableGen patterns to select vtbl. NFC.Eli Friedman2017-04-193-92/+59
| | | | | | Differential Revision: https://reviews.llvm.org/D32103 llvm-svn: 300749
* PR32710: Disable using PMADDWD for unsigned short.Dehao Chen2017-04-191-1/+1
| | | | | | | | | | | | | | Summary: PMADDWD can only handle signed short. Reviewers: mkuper, wmi Reviewed By: mkuper Subscribers: andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D32236 llvm-svn: 300737
* AMDGPU: Don't emit amd_kernel_code_t for callable functionsMatt Arsenault2017-04-191-1/+14
| | | | | | | | | | | | This is inserted directly in the text section. The relocation for the function ends up resolving to the beginning of the amd_kernel_code_t header rather than the actual function entry point. Also skip some of the comments for initialization that only makes sense for kernels. llvm-svn: 300736
* ARM: TLS calling convention doesn't preserve r9 or r12 on Darwin.Tim Northover2017-04-191-3/+3
| | | | llvm-svn: 300726
* AMDGPU: Don't align callable functions to 256Matt Arsenault2017-04-191-1/+3
| | | | llvm-svn: 300720
* AMDGPU: Change DivergenceAnalysis for function argumentsMatt Arsenault2017-04-191-9/+16
| | | | | | Stop assuming all functions are kernels. llvm-svn: 300719
* [Hexagon] Generate proper offset in opt-addr-modeKrzysztof Parzyszek2017-04-192-11/+8
| | | | | | | | | Also, make a few changes to allow using the pass in .mir testcases. Among other things, change the abbreviation from opt-amode to amode-opt, because otherwise lit would expand the "opt" part to the full path to the opt binary. llvm-svn: 300707
* [Hexagon] Remove RDefMap, use Liveness:getNearestAliasedRef insteadKrzysztof Parzyszek2017-04-191-48/+5
| | | | llvm-svn: 300706
* [RDF] Switch NodeList to SmallVector from std::vectorKrzysztof Parzyszek2017-04-191-1/+2
| | | | | | | The list has a single element 75+% of the time, reservation of 4 elements is sufficient in 95% of cases. llvm-svn: 300705
* [RDF] Use faster version of findBlockKrzysztof Parzyszek2017-04-191-1/+1
| | | | llvm-svn: 300704
* [RDF] Cache register units for reg masks instead of recalculating themKrzysztof Parzyszek2017-04-192-31/+29
| | | | llvm-svn: 300702
* [Hexagon] Cache reached blocks in bit tracker instead of scanning listKrzysztof Parzyszek2017-04-192-10/+10
| | | | llvm-svn: 300701
* [GlobalIsel][X86] support G_TRUNC selection.Igor Breger2017-04-192-0/+62
| | | | | | | | | | | | | | | | Summary: [GlobalIsel][X86] support G_TRUNC selection. Add regbank-select and legalizer tests. Currently legalization of trunc i64 on 32bit platform not supported. Reviewers: ab, zvi, rovka Reviewed By: zvi Subscribers: dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D32115 llvm-svn: 300678
* Revert "ARMFrameLowering: Reserve emergency spill slot for large arguments"Renato Golin2017-04-191-41/+8
| | | | | | This reverts commit r300639, as it broke self-hosting on ARM. PR32709. llvm-svn: 300668
* [ARM] GlobalISel: Add support for G_MULDiana Picus2017-04-193-1/+12
| | | | | | | | Support G_MUL, very similar to G_ADD and G_SUB. The only difference is in the instruction selector, where we have to select either MUL or MULv5 depending on the target. llvm-svn: 300665
* [GlobalISel] Support vector-of-pointers in LLTKristof Beyls2017-04-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes PR32471. As comment 10 on that bug report highlights (https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a few different defendable design tradeoffs that could be made, including not representing pointers at all in LLT. I decided to go for representing vector-of-pointer as a concept in LLT, while keeping the size of the LLT type 64 bits (this is an increase from 48 bits before). My rationale for keeping pointers explicit is that on some targets probably it's very handy to have the distinction between pointer and non-pointer (e.g. 68K has a different register bank for pointers IIRC). If we keep a scalar pointer, it probably is easiest to also have a vector-of-pointers to keep LLT relatively conceptually clean and orthogonal, while we don't have a very strong reason to break that orthogonality. Once we gain more experience on the use of LLT, we can of course reconsider this direction. Rejecting vector-of-pointer types in the IRTranslator is also an option to avoid the crash reported in PR32471, but that is only a very short-term solution; also needs quite a bit of code tweaks in places, and is probably fragile. Therefore I didn't consider this the best option. llvm-svn: 300664
* ARM: Use methods to access data stored with frame instructionsSerge Pavlov2017-04-193-8/+27
| | | | | | | | | | | In r300196 several methods were added to TarfetInstrInfo to access data stored with call frame setup/destroy instructions. This change replaces calls to getOperand with calls to such special methods in ARM target. Differential Revision: https://reviews.llvm.org/D32127 llvm-svn: 300655
OpenPOWER on IntegriCloud