bcm5719-llvm/llvm/test/Analysis/CostModel, branch meklort-10.0.1

bcm5719-llvm/llvm/test/Analysis/CostModel, branch meklort-10.0.1 Project Ortega BCM5719 LLVM https://git.raptorcs.com/git/bcm5719-llvm/atom?h=meklort-10.0.1 2020-01-06T13:17:02+00:00 [CostModel][X86] Add missing scalar i64->f32 uitofp costs 2020-01-06T13:17:02+00:00 Simon Pilgrim llvm-dev@redking.me.uk 2020-01-06T13:16:43+00:00 urn:sha1:5d986a68a59c9bed7060e87840e61390d8247c1d Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351 2019-12-24T23:57:33+00:00 Fangrui Song maskray@google.com 2019-12-24T23:52:21+00:00 urn:sha1:502a77f125f43ffde57af34d3fd1b900248a91cd [AMDGPU] Implemented fma cost analysis 2019-12-19T07:54:20+00:00 Stanislav Mekhanoshin Stanislav.Mekhanoshin@amd.com 2019-12-18T21:29:21+00:00 urn:sha1:58578f705663a9f31b906a341f0a61ce51f7dcb2 Differential Revision: https://reviews.llvm.org/D71676 [AMDGPU] Fixed cost model for packed 16 bit ops 2019-12-17T23:14:17+00:00 Stanislav Mekhanoshin Stanislav.Mekhanoshin@amd.com 2019-12-17T19:16:06+00:00 urn:sha1:b8ac5894a115987fcc7e871049ec31a8eba66741 Differential Revision: https://reviews.llvm.org/D71622 [ARM] Teach the Arm cost model that a Shift can be folded into other instructions 2019-12-09T10:24:33+00:00 David Green david.green@arm.com 2019-12-08T15:33:24+00:00 urn:sha1:be7a1070700e591732b254e29f2dd703325fb52a This attempts to teach the cost model in Arm that code such as: %s = shl i32 %a, 3 %a = and i32 %s, %b Can under Arm or Thumb2 become: and r0, r1, r2, lsl #3 So the cost of the shift can essentially be free. To do this without trying to artificially adjust the cost of the "and" instruction, it needs to get the users of the shl and check if they are a type of instruction that the shift can be folded into. And so it needs to have access to the actual instruction in getArithmeticInstrCost, which if available is added as an extra parameter much like getCastInstrCost. We otherwise limit it to shifts with a single user, which should hopefully handle most of the cases. The list of instruction that the shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR, ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and ICmp. Differential Revision: https://reviews.llvm.org/D70966 [ARM] Additional tests and minor formatting. NFC 2019-12-09T10:24:33+00:00 David Green david.green@arm.com 2019-12-08T15:26:32+00:00 urn:sha1:f008b5b8ce724d60f0f0eeafceee0119c42022d4 This adds some extra cost model tests for shifts, and does some minor adjustments to some Neon code to make it clear as to what it applies to. Both NFC. [x86] add cost model special-case for insert/extract from element 0 2019-12-06T18:50:25+00:00 Sanjay Patel spatel@rotateright.com 2019-12-06T18:29:31+00:00 urn:sha1:7ff0fcb53f6e71bc22d37494fdfa68bbf2d3709b This is a follow-up to D70607 where we made any extract element on SLM more costly than default. But that is pessimistic for extract from element 0 because that corresponds to x86 movd/movq instructions. These generally have >1 cycle latency, but they are probably implemented as single uop instructions. Note that no vectorization tests are affected by this change. Also, no targets besides SLM are affected because those are falling through to the default cost of 1 anyway. But this will become visible/important if we add more specializations via cost tables. Differential Revision: https://reviews.llvm.org/D71023 [PowerPC] Separate Features that are known to be Power9 specific from Future CPU 2019-11-27T21:40:13+00:00 Stefan Pintilie stefanp@ca.ibm.com 2019-11-27T21:38:05+00:00 urn:sha1:8e84c9ae99846c91c4e9828f1945c200d26d2fb9 The Power 9 CPU has some features that are unlikely to be passed on to future versions of the CPU. This patch separates this out so that future CPU does not inherit them. Differential Revision: https://reviews.llvm.org/D70466 [x86] make SLM extract vector element more expensive than default 2019-11-27T19:08:56+00:00 Sanjay Patel spatel@rotateright.com 2019-11-27T18:33:11+00:00 urn:sha1:5c166f1d1969e9c1e5b72aa672add429b9c22b53 I'm not sure what the effect of this change will be on all of the affected tests or a larger benchmark, but it fixes the horizontal add/sub problems noted here: https://reviews.llvm.org/D59710?vs=227972&id=228095&whitespace=ignore-most#toc The costs are based on reciprocal throughput numbers in Agner's tables for PEXTR*; these appear to be very slow ops on Silvermont. This is a small step towards the larger motivation discussed in PR43605: https://bugs.llvm.org/show_bug.cgi?id=43605 Also, it seems likely that insert/extract is the source of perf regressions on other CPUs (up to 30%) that were cited as part of the reason to revert D59710, so maybe we'll extend the table-based approach to other subtargets. Differential Revision: https://reviews.llvm.org/D70607 AMDGPU: Split test functions to avoid dependency on subtarget 2019-11-19T05:42:13+00:00 Matt Arsenault Matthew.Arsenault@amd.com 2019-11-18T06:54:31+00:00 urn:sha1:b337bce8710f2a7ab8ce9f84c80cfbce1032963c Prepare this test for moving tthe denormal setting out of the subtarget features.