summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AArch64
Commit message (Collapse)AuthorAgeFilesLines
...
* [AArch64] Add fallback in FastISel fp16 conversionsI-Jui (Ray) Sung2017-06-091-1/+5
| | | | | | | | | | | | | | | | | Summary: - Fix assertion failures on F16 to/from int types in FastISel by falling back to regular ISel - Add a testcase of various conversion cases with FastISel (-O0) Reviewers: kristof.beyls, jmolloy, SjoerdMeijer Reviewed By: SjoerdMeijer Subscribers: SjoerdMeijer, llvm-commits, srhines, pirama, aemerson, rengolin, javed.absar, kristof.beyls Differential Revision: https://reviews.llvm.org/D33734 llvm-svn: 305127
* Move Object format code to lib/BinaryFormat.Zachary Turner2017-06-075-5/+5
| | | | | | | | | | | | This creates a new library called BinaryFormat that has all of the headers from llvm/Support containing structure and layout definitions for various types of binary formats like dwarf, coff, elf, etc as well as the code for identifying a file from its magic. Differential Revision: https://reviews.llvm.org/D33843 llvm-svn: 304864
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-0612-18/+18
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* [llvm] Remove double semicolonsMandeep Singh Grang2017-06-061-1/+1
| | | | | | | | | | | | Reviewers: craig.topper, arsenm, mehdi_amini Reviewed By: mehdi_amini Subscribers: mehdi_amini, wdng, nhaehnle, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33924 llvm-svn: 304767
* [AArch64][Falkor] Model immediate forwarding.Geoff Berry2017-06-021-13/+28
| | | | llvm-svn: 304552
* [CodeGen] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-06-011-2/+5
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 304495
* [AArch64] Enable FeatureFuseAES on Cortex-A53.Florian Hahn2017-05-311-0/+1
| | | | | | It improves performance on Cortex-A53. llvm-svn: 304307
* [AArch64] Enable FeatureFuseAES on Cortex-A73.Florian Hahn2017-05-311-0/+1
| | | | | | It improves performance on Cortex-A73. llvm-svn: 304304
* Add latency info for Exynos interleaved Load/Store instructions.Abderrazek Zaafrani2017-05-311-5/+335
| | | | llvm-svn: 304259
* TargetPassConfig: Keep a reference to an LLVMTargetMachine; NFCMatthias Braun2017-05-301-3/+3
| | | | | | | | | | | TargetPassConfig is not useful for targets that do not use the CodeGen library, so we may just as well store a pointer to an LLVMTargetMachine instead of just to a TargetMachine. While at it, also change the constructor to take a reference instead of a pointer as the TM must not be nullptr. llvm-svn: 304247
* [SelectionDAG] Set ISD::FPOWI to Expand by defaultCraig Topper2017-05-301-3/+0
| | | | | | | | | | | | | | | | | Summary: Currently FPOWI defaults to Legal and LegalizeDAG.cpp turns Legal into Expand for this opcode because Legal is a "lie". This patch changes the default for this opcode to Expand and removes the hack from LegalizeDAG.cpp. It also removes all the code in the targets that set this opcode to Expand themselves since they can just rely on the default. Reviewers: spatel, RKSimon, efriedma Reviewed By: RKSimon Subscribers: jfb, dschuff, sbc100, jgravelle-google, nemanjai, javed.absar, andrew.w.kaylor, llvm-commits Differential Revision: https://reviews.llvm.org/D33530 llvm-svn: 304215
* Fix PR33031: correct the estimate of maximum offset for instructions ↵Kristof Beyls2017-05-301-5/+30
| | | | | | spilling/filling the stack. llvm-svn: 304196
* [AArch64][Falkor] Combine sched details files into one. NFC.Geoff Berry2017-05-282-514/+503
| | | | llvm-svn: 304109
* [AArch64][Falkor] Fix some sched details.Geoff Berry2017-05-284-294/+461
| | | | | | | | | | | | | | | | | | | | - Remove all uses of base sched model entries and set them all to Unsupported so all the opcodes are described in AArch64SchedFalkorDetails.td. - Remove entries for unsupported half-float opcodes. - Remove entries for unsupported LSE extension opcodes. - Add entry for MOVbaseTLS (and set Sched in base td file entry to WriteSys) and a few other pseudo ops. - Fix a few FP load/store with reg offset entries to use the LSLfast predicates. - Add Q size BIF/BIT/BSL entries. - Fix swapped Q/D sized CLS/CLZ/CNT/RBIT entires. - Fix pre/post increment address register latency (this operand is always dest 0). - Fix swapped FCVTHD/FCVTHS/FCVTDH/FCVTDS entries. - Fix XYZ resource over usage on LD[1-4] opcodes. llvm-svn: 304108
* AArch64/PEI: Do not add reserved regs to liveinsMatthias Braun2017-05-271-2/+5
| | | | | | | We do not track liveness for reserved registers. It is unnecessary to add them to block livein lists. llvm-svn: 304059
* [AArch64][GlobalISel] Add the Localizer pass for the O0 pipelineQuentin Colombet2017-05-271-1/+9
| | | | | | | This should fix most of the issue we have right now with constants being spilled all over the place. llvm-svn: 304052
* AArch64: Fix cmpxchg O0 expansionMatthias Braun2017-05-261-58/+61
| | | | | | | | | | | | | | - Rewrite livein calculation to use the computeLiveIns() helper function. This is slightly less efficient but easier to reason about and doesn't unnecessarily add pristine and reserved registers[1] - Zero the status register at the beginning of the loop to make sure it has a defined value. - Remove kill flags of values that need to stay alive throughout the loop. [1] An upcoming commit of mine will tighten the MachineVerifier to catch these. llvm-svn: 304048
* LivePhysRegs: Rework constructor + documentation; NFCMatthias Braun2017-05-262-4/+4
| | | | | | | - Take reference instead of pointer to a TRI that cannot be nullptr. - Improve documentation comments. llvm-svn: 304038
* Fix signedness of constant. NFC.Nirav Dave2017-05-261-5/+5
| | | | llvm-svn: 303980
* [AArch64]: add 'a' inline asm operand modifier.Manoj Gupta2017-05-251-1/+4
| | | | | | | | | | | | | | | | Summary: This is used in the Linux kernel, and effectively just means "print an address". This brings back r193593. Reviewed by: Renato Golin Reviewers: t.p.northover, rengolin, richard.barton.arm, kristof.beyls Subscribers: aemerson, javed.absar, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D33558 llvm-svn: 303901
* [AArch64] Prevent nested ADDs from address calc in splitStoreSplat. NFCNirav Dave2017-05-241-2/+12
| | | | | | In preparation for late-stage store merging. llvm-svn: 303800
* Revert r291254: [AArch64] Reduce vector insert/extract cost for FalkorMatthew Simpson2017-05-241-1/+0
| | | | | | | The default vector insert/extract cost is more profitable on Falkor than the reduced cost. llvm-svn: 303771
* [AArch64][Falkor] Refine sched details for LSLfast/ASRfast.Geoff Berry2017-05-234-40/+189
| | | | llvm-svn: 303682
* [AArch64][Falkor] Fix sched details for FMOV of WZR/XZR.Geoff Berry2017-05-232-6/+8
| | | | llvm-svn: 303680
* [AArch64] Make instruction fusion more aggressive. Florian Hahn2017-05-232-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch makes instruction fusion more aggressive by * adding artificial edges between the successors of FirstSU and SecondSU, similar to BaseMemOpClusterMutation::clusterNeighboringMemOps. * updating PostGenericScheduler::tryCandidate to keep clusters together, similar to GenericScheduler::tryCandidate. This change increases the number of AES instruction pairs generated on Cortex-A57 and Cortex-A72. This doesn't change code at all in most benchmarks or general code, but we've seen improvement on kernels using AESE/AESMC and AESD/AESIMC. Reviewers: evandro, kristof.beyls, t.p.northover, silviu.baranga, atrick, rengolin, MatzeB Reviewed By: evandro Subscribers: aemerson, rengolin, MatzeB, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33230 llvm-svn: 303618
* [AArch64] Fix PRR33100.Akira Hatanaka2017-05-231-7/+10
| | | | | | | | | | | | | This commit fixes a bug introduced in r301019 where optimizeLogicalImm would replace a logical node's immediate operand that was CSE'd and was also an operand of another node. This commit fixes the bug by replacing the logical node instead of its immediate operand. rdar://problem/32295276 llvm-svn: 303607
* [globalisel][tablegen] Demote OptForSize/OptForMinSize/ForCodeSize to ↵Daniel Sanders2017-05-194-15/+13
| | | | | | | | | | | | | | | | | | per-function predicates. Summary: This causes them to be re-computed more often than necessary but resolves objections that were raised post-commit on r301750. Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls Reviewed By: qcolombet Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D32861 llvm-svn: 303418
* [LegacyPassManager] Remove TargetMachine constructorsFrancis Visoiu Mistrih2017-05-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This provides a new way to access the TargetMachine through TargetPassConfig, as a dependency. The patterns replaced here are: * Passes handling a null TargetMachine call `getAnalysisIfAvailable<TargetPassConfig>`. * Passes not handling a null TargetMachine `addRequired<TargetPassConfig>` and call `getAnalysis<TargetPassConfig>`. * MachineFunctionPasses now use MF.getTarget(). * Remove all the TargetMachine constructors. * Remove INITIALIZE_TM_PASS. This fixes a crash when running `llc -start-before prologepilog`. PEI needs StackProtector, which gets constructed without a TargetMachine by the pass manager. The StackProtector pass doesn't handle the case where there is no TargetMachine, so it segfaults. Related to PR30324. Differential Revision: https://reviews.llvm.org/D33222 llvm-svn: 303360
* BitVector: add iterators for set bitsFrancis Visoiu Mistrih2017-05-171-2/+1
| | | | | | Differential revision: https://reviews.llvm.org/D32060 llvm-svn: 303227
* Re-commit r302678, fixing PR33053.Amara Emerson2017-05-164-269/+101
| | | | | | | The issue was that the AArch64 TTI hook allowed unpacked integer cmp reductions which didn't have a lowering. llvm-svn: 303211
* Fix an improperly placed curly bracket. NFC.Chad Rosier2017-05-161-1/+1
| | | | llvm-svn: 303165
* AArch64: use linker-private symbols for globals in MachO.Tim Northover2017-05-152-0/+11
| | | | | | | | We don't use section-relative relocations on AArch64, so all symbols must be at least visible to the linker (i.e. properly global or l_whatever, but not L_whatever). llvm-svn: 303118
* [SLP] Enable 64-bit wide vectorization on AArch64Adam Nemet2017-05-153-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ARM Neon has native support for half-sized vector registers (64 bits). This is beneficial for example for 2D and 3D graphics. This patch adds the option to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer. *** Performance Analysis This change was motivated by some internal benchmarks but it is also beneficial on SPEC and the LLVM testsuite. The results are with -O3 and PGO. A negative percentage is an improvement. The testsuite was run with a sample size of 4. ** SPEC * CFP2006/482.sphinx3 -3.34% A pretty hot loop is SLP vectorized resulting in nice instruction reduction. This used to be a +22% regression before rL299482. * CFP2000/177.mesa -3.34% * CINT2000/256.bzip2 +6.97% My current plan is to extend the fix in rL299482 to i16 which brings the regression down to +2.5%. There are also other problems with the codegen in this loop so there is further room for improvement. ** LLVM testsuite * SingleSource/Benchmarks/Misc/ReedSolomon -10.75% There are multiple small SLP vectorizations outside the hot code. It's a bit surprising that it adds up to 10%. Some of this may be code-layout noise. * MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40% The opt-viewer screenshot can be seen at F3218284. We start at a colder store but the tree leads us into the hottest loop. * MultiSource/Applications/lambda-0.1.3/lambda -2.68% * MultiSource/Benchmarks/Bullet/bullet -2.18% This is using 3D vectors. * SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67% Noise, binary is unchanged. * MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90% There is an additional SLP in the cold code. The test runs for ~1sec and prints out over 2000 lines. This is most likely noise. * MultiSource/Applications/aha/aha +1.63% * MultiSource/Applications/JM/lencod/lencod +1.41% * SingleSource/Benchmarks/Misc/richards_benchmark +1.15% Differential Revision: https://reviews.llvm.org/D31965 llvm-svn: 303116
* Revert r302678 "[AArch64] Enable use of reduction intrinsics."Hans Wennborg2017-05-154-99/+269
| | | | | | | | | | | | | | | | | | This caused PR33053. Original commit message: > The new experimental reduction intrinsics can now be used, so I'm enabling this > for AArch64. We will need this for SVE anyway, so it makes sense to do this for > NEON reductions as well. > > The existing code to match shufflevector patterns are replaced with a direct > lowering of the reductions to AArch64-specific nodes. Tests updated with the > new, simpler, representation. > > Differential Revision: https://reviews.llvm.org/D32247 llvm-svn: 303115
* AArch64: diagnose unrecognized features in .cpu directive.Tim Northover2017-05-151-2/+17
| | | | | | | We were silently ignoring any features we couldn't match up, which led to errors in an inline asm block missing the conventional "\n\t". llvm-svn: 303108
* [AArch64][Falkor] Fix sched details for FMOVGeoff Berry2017-05-152-4/+6
| | | | llvm-svn: 303099
* [AArch64] Enable FeatureFuseAES on Cortex-A72.Florian Hahn2017-05-151-0/+1
| | | | | | | | This patch enables fusing dependent AESE/AESMC and AESD/AESIMC instruction pairs on Cortex-A72, as recommended in the Software Optimization Guide, section 4.10. llvm-svn: 303073
* [AArch64][Falkor] Refine modeling of multiply accumulate forwarding.Geoff Berry2017-05-122-44/+61
| | | | llvm-svn: 302933
* [AArch64][MachineCombine] Fold FNMUL+FSUB -> FNMADD.Chad Rosier2017-05-111-0/+28
| | | | | | Differential Revision: http://reviews.llvm.org/D33101. llvm-svn: 302822
* [AArch64][RegisterBankInfo] Change the default mapping of fp stores.Quentin Colombet2017-05-101-0/+11
| | | | | | | | For stores, check if the stored value is defined by a floating point instruction and if yes, we return a default mapping with FPR instead of GPR. llvm-svn: 302679
* [AArch64] Enable use of reduction intrinsics.Amara Emerson2017-05-104-269/+99
| | | | | | | | | | | | | | The new experimental reduction intrinsics can now be used, so I'm enabling this for AArch64. We will need this for SVE anyway, so it makes sense to do this for NEON reductions as well. The existing code to match shufflevector patterns are replaced with a direct lowering of the reductions to AArch64-specific nodes. Tests updated with the new, simpler, representation. Differential Revision: https://reviews.llvm.org/D32247 llvm-svn: 302678
* [AArch64] Fix a comment to match the code. NFC.Martin Storsjo2017-05-101-4/+6
| | | | | | | | | | | | | | | For the ELF case, the default/preferred form is the generic one, not the short one as used for Apple - fix the comment to say so. Currently it is a copy-paste typo. Make the comments on the darwin default a bit more verbose. Use enum names instead of literal 0/1 to further increase readability and reduce fragility. Differential Revision: https://reviews.llvm.org/D32963 llvm-svn: 302634
* Add a late IR expansion pass for the experimental reduction intrinsics.Amara Emerson2017-05-101-0/+4
| | | | | | | | | This pass uses a new target hook to decide whether or not to expand a particular intrinsic to the shuffevector sequence. Differential Revision: https://reviews.llvm.org/D32245 llvm-svn: 302631
* [AArch64] Consider widening instructions in cost calculationsMatthew Simpson2017-05-093-6/+106
| | | | | | | | | | | | | | | The AArch64 instruction set has a few "widening" instructions (e.g., uaddl, saddl, uaddw, etc.) that take one or more doubleword operands and produce quadword results. The operands are automatically sign- or zero-extended as appropriate. However, in LLVM IR, these extends are explicit. This patch updates TTI to consider these widening instructions as single operations whose cost is attached to the arithmetic instruction. It marks extends that are part of a widening operation "free" and applies a sub-target specified overhead (zero by default) to the arithmetic instructions. Differential Revision: https://reviews.llvm.org/D32706 llvm-svn: 302582
* Suppress all uses of LLVM_END_WITH_NULL. NFC.Serge Guelton2017-05-091-1/+1
| | | | | | | | | Use variadic templates instead of relying on <cstdarg> + sentinel. This enforces better type checking and makes code more readable. Differential Revision: https://reviews.llvm.org/D32541 llvm-svn: 302571
* Add extra operand to CALLSEQ_START to keep frame part set up previouslySerge Pavlov2017-05-094-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using arguments with attribute inalloca creates problems for verification of machine representation. This attribute instructs the backend that the argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size stored in CALLSEQ_START in this case does not count the size of this argument. However CALLSEQ_END still keeps total frame size, as caller can be responsible for cleanup of entire frame. So CALLSEQ_START and CALLSEQ_END keep different frame size and the difference is treated by MachineVerifier as stack error. Currently there is no way to distinguish this case from actual errors. This patch adds additional argument to CALLSEQ_START and its target-specific counterparts to keep size of stack that is set up prior to the call frame sequence. This argument allows MachineVerifier to calculate actual frame size associated with frame setup instruction and correctly process the case of inalloca arguments. The changes made by the patch are: - Frame setup instructions get the second mandatory argument. It affects all targets that use frame pseudo instructions and touched many files although the changes are uniform. - Access to frame properties are implemented using special instructions rather than calls getOperand(N).getImm(). For X86 and ARM such replacement was made previously. - Changes that reflect appearance of additional argument of frame setup instruction. These involve proper instruction initialization and methods that access instruction arguments. - MachineVerifier retrieves frame size using method, which reports sum of frame parts initialized inside frame instruction pair and outside it. The patch implements approach proposed by Quentin Colombet in https://bugs.llvm.org/show_bug.cgi?id=27481#c1. It fixes 9 tests failed with machine verifier enabled and listed in PR27481. Differential Revision: https://reviews.llvm.org/D32394 llvm-svn: 302527
* [AArch64][RegisterBankInfo] Change the default mapping of fp loads.Quentin Colombet2017-05-081-0/+14
| | | | | | | | | | | This fixes PR32550, in a way that does not imply running the greedy mode at O0. The fix consists in checking if a load is used by any floating point instruction and if yes, we return a default mapping with FPR instead of GPR. llvm-svn: 302453
* [AArch64][RegisterBankInfo] Fix mapping cost for GPR.Quentin Colombet2017-05-081-1/+1
| | | | | | | | | | | | | | | | In r292478, we changed the order of the enum that is referenced by PMI_FirstXXX. This had the side effect of changing the cost of the mapping of all the loads, instead of just the FPRs ones. Reinstate the higher cost for all but GPR loads. Note: This did not have any external visible effects: - For Fast mode, the cost would have been higher, but we don't care because we don't try to use alternative mappings. - For Greedy mode, the higher cost of the GPR loads, would have triggered the use of the supposedly alternative mapping, that would be in fact the same GPR mapping but with a lower cost. llvm-svn: 302452
* [AARCH64][NEON] Add support for ISD::ABS lowering Simon Pilgrim2017-05-082-40/+22
| | | | | | | | Update int_aarch64_neon_abs intrinsic to use the ISD::ABS opcode directly Differential Revision: https://reviews.llvm.org/D32940 llvm-svn: 302415
* [RegisterBankInfo] Uniquely allocate instruction mapping.Quentin Colombet2017-05-052-33/+34
| | | | | | | | | | This is a step toward having statically allocated instruciton mapping. We are going to tablegen them eventually, so let us reflect that in the API. NFC. llvm-svn: 302316
OpenPOWER on IntegriCloud