summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [WebAssembly] Lower vselectThomas Lively2018-11-011-0/+9
| | | | | | | | | | Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53630 llvm-svn: 345797
* [WebAssembly] Process p2align operands for SIMD loads and storesThomas Lively2018-10-311-0/+12
| | | | | | | | | | Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53886 llvm-svn: 345795
* [WebAssembly] Handle vector IMPLICIT_DEFs.Thomas Lively2018-10-311-0/+5
| | | | | | | | | | | | | | Summary: Also reduce the test case for implicit defs and test it with all register classes. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53855 llvm-svn: 345794
* [COFF, ARM64] Implement Intrinsic.sponentry for AArch64Mandeep Singh Grang2018-10-313-0/+29
| | | | | | | | | | | | | | Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform. Reviewers: mgrang, TomTan, rnk, compnerd, mstorsjo, efriedma Reviewed By: efriedma Subscribers: majnemer, chrib, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D53673 llvm-svn: 345791
* [AArch64] Sort switch cases (NFC)Evandro Menezes2018-10-311-20/+23
| | | | llvm-svn: 345786
* Revert r345165 "[X86] Bring back the MOV64r0 pseudo instruction"Craig Topper2018-10-315-41/+43
| | | | | | Google is reporting regressions on some benchmarks. llvm-svn: 345785
* [ARM] Add missing pseudo-instruction for Thumb1 RSBS.Eli Friedman2018-10-312-0/+7
| | | | | | | | | Shows up rarely for 64-bit arithmetic, more frequently for the compare patterns added in r325323. Differential Revision: https://reviews.llvm.org/D53848 llvm-svn: 345782
* Check shouldReduceLoadWidth from SimplifySetCCStanislav Mekhanoshin2018-10-311-0/+12
| | | | | | | | | | | | SimplifySetCC could shrink a load without checking for profitability or legality of such shink with a target. Added checks to prevent shrinking of aligned scalar loads in AMDGPU below dword as scalar engine does not support it. Differential Revision: https://reviews.llvm.org/D53846 llvm-svn: 345778
* [AMDGPU] Remove FeatureVGPRSpillingScott Linder2018-10-316-48/+8
| | | | | | | | | | | This feature is only relevant to shaders, and is no longer used. When disabled, lowering of reserved registers for shaders causes a compiler crash. Remove the feature and add a test for compilation of shaders at OptNone. Differential Revision: https://reviews.llvm.org/D53829 llvm-svn: 345763
* [Hexagon] Make sure not to use GP-relative addressing with PICKrzysztof Parzyszek2018-10-313-4/+10
| | | | | | | Make sure that -relocation-model=pic prevents use of GP-relative addressing modes. llvm-svn: 345731
* AMDGPU: Rewrite SILowerI1Copies to always stay on SALUNicolai Haehnle2018-10-316-189/+749
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Instead of writing boolean values temporarily into 32-bit VGPRs if they are involved in PHIs or are observed from outside a loop, we use bitwise masking operations to combine lane masks in a way that is consistent with wave control flow. Move SIFixSGPRCopies to before this pass, since that pass incorrectly attempts to move SGPR phis to VGPRs. This should recover most of the code quality that was lost with the bug fix in "AMDGPU: Remove PHI loop condition optimization". There are still some relevant cases where code quality could be improved, in particular: - We often introduce redundant masks with EXEC. Ideally, we'd have a generic computeKnownBits-like analysis to determine whether masks are already masked by EXEC, so we can avoid this masking both here and when lowering uniform control flow. - The criterion we use to determine whether a def is observed from outside a loop is conservative: it doesn't check whether (loop) branch conditions are uniform. Change-Id: Ibabdb373a7510e426b90deef00f5e16c5d56e64b Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, t-tye, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D53496 llvm-svn: 345719
* AMDGPU: Remove PHI loop condition optimizationNicolai Haehnle2018-10-315-138/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The optimization to early break out of loops if all threads are dead was never fully implemented. But the PHI node analyzing is actually causing a number of problems, so remove all the extra code for it. (This does actually regress code quality in a few places because it ends up relying more heavily on phi's of i1, which we don't do a great job with. However, since it fixes real bugs in the wild, we should take this change. I have some prototype changes to improve i1 lowering in general -- not just for control flow -- which should help recover the code quality, I just need to make those changes fit for general consumption. -- Nicolai) Change-Id: I6fc6c6c8961857ac6009fcfb9f7e5e48dc23fbb1 Patch-by: Christian König <christian.koenig@amd.com> Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53359 llvm-svn: 345718
* [tblgen][PredicateExpander] Add the ability to describe more complex ↵Andrea Di Biagio2018-10-312-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | constraints on instruction operands. Before this patch, class PredicateExpander only knew how to expand simple predicates that performed checks on instruction operands. In particular, the new scheduling predicate syntax was not rich enough to express checks like this one: Foo(MI->getOperand(0).getImm()) == ExpectedVal; Here, the immediate operand value at index zero is passed in input to function Foo, and ExpectedVal is compared against the value returned by function Foo. While this predicate pattern doesn't show up in any X86 model, it shows up in other upstream targets. So, being able to support those predicates is fundamental if we want to be able to modernize all the scheduling models upstream. With this patch, we allow users to specify if a register/immediate operand value needs to be passed in input to a function as part of the predicate check. Now, register/immediate operand checks all derive from base class CheckOperandBase. This patch also changes where TIIPredicate definitions are expanded by the instructon info emitter. Before, definitions were expanded in class XXXGenInstrInfo (where XXX is a target name). With the introduction of this new syntax, we may want to have TIIPredicates expanded directly in XXXInstrInfo. That is because functions used by the new operand predicates may only exist in the derived class (i.e. XXXInstrInfo). This patch is a non functional change for the existing scheduling models. In future, we will be able to use this richer syntax to better describe complex scheduling predicates, and expose them to llvm-mca. Differential Revision: https://reviews.llvm.org/D53880 llvm-svn: 345714
* [AMDGPU] support image load/store a16Neil Henning2018-10-311-2/+4
| | | | | | | | | | | | Our a16 support was only enabled for sample/gather and buffer load/store, but not for image load/store operations (which take an i16 as the pixel index rather than a half). Fix our isel lowering and add test cases to prove it out. Differential Revision: https://reviews.llvm.org/D53750 llvm-svn: 345710
* [LV] Support vectorization of interleave-groups that require an epilog underDorit Nuzman2018-10-3112-33/+63
| | | | | | | | | | | | | | | | | | | | | | optsize using masked wide loads Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53668 llvm-svn: 345705
* [ARM64] [Windows] Exception handling support in frame loweringSanjin Sijaric2018-10-316-52/+398
| | | | | | | | | | Emit pseudo instructions indicating unwind codes corresponding to each instruction inside the prologue/epilogue. These are used by the MCLayer to populate the .xdata section. Differential Revision: https://reviews.llvm.org/D50288 llvm-svn: 345701
* [AArch64] Mark condition flags and x16/x17 as clobbered when calling __chkstkMartin Storsjo2018-10-311-0/+6
| | | | | | | | This is similar to SVN r311061 for ARM. Differential Revision: https://reviews.llvm.org/D53878 llvm-svn: 345698
* Revert r345542: AMDGPU: Enable code object v3 by defaultKonstantin Zhuravlyov2018-10-301-30/+15
| | | | | | It breaks mesa. llvm-svn: 345662
* [COFF, ARM64] Make sure to forward arguments from vararg to musttail varargMandeep Singh Grang2018-10-302-0/+27
| | | | | | | | | | | | | | | | | | Summary: Thunk functions in Windows are varag functions that call a musttail function to pass the arguments after the fixup is done. We need to make sure that we forward the arguments from the caller vararg to the callee vararg function. This is the same mechanism that is used for Windows on X86. Reviewers: ssijaric, eli.friedman, TomTan, mgrang, mstorsjo, rnk, compnerd, efriedma Reviewed By: efriedma Subscribers: efriedma, kristof.beyls, chrib, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D53843 llvm-svn: 345641
* [AArch64] [Windows] SEH opcodes should be scheduling boundaries.Eli Friedman2018-10-302-0/+41
| | | | | | | | | | | | | | Prevents the post-RA scheduler from modifying the prologue sequences emitting by frame lowering. This is roughly similar to what we do for other targets: TargetInstrInfo::isSchedulingBoundary checks isPosition(), which checks for CFI_INSTRUCTION. isSEHInstruction is taken from D50288; it'll land with whatever patch lands first. Differential Revision: https://reviews.llvm.org/D53851 llvm-svn: 345634
* [AArch64] Create proper memoperand for multi-vector storesDavid Greene2018-10-301-1/+1
| | | | | | | | | | | | Re-apply r345315 with testcase fixes. Include all of the store's source vector operands when creating the MachineMemOperand. Previously, we were missing the first operand, making the store size seem smaller than it really is. Differential Revision: https://reviews.llvm.org/D52816 llvm-svn: 345631
* [X86] In lowerVectorShuffleAsBroadcast, make peeking through CONCAT_VECTORS ↵Craig Topper2018-10-301-1/+2
| | | | | | | | | | | | work correctly if we already walked through a bitcast that changed the element size. The CONCAT_VECTORS case was using the original mask element count to determine how to adjust the broadcast index. But if we looked through a bitcast the original mask size doesn't tell us anything about the concat_vectors. This patch switchs to using the concat_vectors input element count directly instead. Differential Revision: https://reviews.llvm.org/D53823 llvm-svn: 345626
* [SystemZ] Simplify LRV/STRV ISD nodesUlrich Weigand2018-10-304-47/+40
| | | | | | | | | | | The LRV and STRV nodes carry an extra operand to indicate the type of the memory access. This is redundant, since the nodes are actually of class MemIntrinsicNode and therefore hold that same information already as MemoryVT. NFC intended. llvm-svn: 345618
* [SystemZ] Improve isFoldableLoad() for Sub, SDiv and UDiv.Jonas Paulsson2018-10-301-0/+5
| | | | | | | | | | Sub, SDiv and UDiv are not commutative, so only the RHS operand can fold a load. This patch adds a check for this. Review: Ulrich Weigand https://reviews.llvm.org/D53791 llvm-svn: 345596
* [X86] Re-enable the machine verifier after fixing more testsFrancis Visoiu Mistrih2018-10-301-4/+0
| | | | | | Was disabled again in r345528. Hopefully this the bots. llvm-svn: 345593
* [X86][BMI1] X86DAGToDAGISel: select BEXTR from x & (-1 >> (32 - y)) patternRoman Lebedev2018-10-302-58/+40
| | | | | | | | | | | | | | | | | | | | | Summary: The final pattern. There is no test changes: * We are looking for the pattern with one-use of it's mask, * If the mask is one-use, D48768 will unfold it into pattern d. * Thus, the tests have extra-use on the mask. * Thus, only the BMI2 BZHI can be tested, and it already worked. * So there is no BMI1 test coverage, we just assume it works since it uses the same codepath. Reviewers: craig.topper, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53575 llvm-svn: 345584
* [AArch64] Add support for UDF instructionDiogo N. Sampaio2018-10-302-10/+29
| | | | | | | | | | | | | | | | Summary: Add support for AArch64 UDF instruction. UDF - Permanently Undefined generates an Undefined Instruction exception (ESR_ELx.EC = 0b000000). Reviewers: DavidSpickett, javed.absar, t.p.northover Reviewed By: javed.absar Subscribers: nhaehnle, kristof.beyls Differential Revision: https://reviews.llvm.org/D53319 llvm-svn: 345581
* [SelectionDAG] Add FoldBUILD_VECTOR to simplify new BUILD_VECTOR nodesSimon Pilgrim2018-10-301-16/+18
| | | | | | | | | | Similar to FoldCONCAT_VECTORS, this patch adds FoldBUILD_VECTOR to simplify cases that can avoid the creation of the BUILD_VECTOR - if all the operands are UNDEF or if the BUILD_VECTOR simplifies to a copy. This exposed an assumption in some AMDGPU code that getBuildVector was guaranteed to be a BUILD_VECTOR node that I've tried to handle. Differential Revision: https://reviews.llvm.org/D53760 llvm-svn: 345578
* [LegalizeTypes] Teach PromoteIntRes_BITCAST to better handle a bitcast with ↵Craig Topper2018-10-301-8/+2
| | | | | | | | | | | | | | | | vector output type and a vector input type that needs to be widened Summary: Previously if we had a bitcast vector output type that needs promotion and a vector input type that needs widening we would just do a stack store and load to handle the conversion. We can do a little better if we can widen the bitcast to a legal vector type the same size as the widened input type. Then we can do the bitcast between this widened type and the widened input type. Afterwards we can extract_subvector back to the original output and any_extend that. Type legalization will then circle back and handle promotion of the extract_subvector and the any_extend will just be removed. This will avoid going through the stack and allows us to remove a custom version of this legalization from X86. Reviewers: efriedma, RKSimon Reviewed By: efriedma Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D53229 llvm-svn: 345567
* [X86] Cleanup the code in LowerFABSorFNEG and LowerFCOPYSIGN a little. NFCCraig Topper2018-10-301-30/+20
| | | | | | Use SelectionDAG::EVTToAPFloatSemantics. Make the LogicVT calculation in LowerFABSorFNEG similar to LowerFCOPYSIGN. Use APInt::getSignedMaxValue instead of ~APInt::getSignMask. llvm-svn: 345565
* [X86] Stop changing f128 fand/for/fxor to v2i64.Craig Topper2018-10-302-21/+38
| | | | | | The additional patterns don't cost us much and it seems better than changing element widths. llvm-svn: 345564
* AMDGPU: Remove custom BUILD_VECTOR combineMatt Arsenault2018-10-302-46/+0
| | | | | | | This was looping in a testcase and removing it now slightly improves a test. llvm-svn: 345560
* AMDGPU: Use scavengeRegisterBackwardsMatt Arsenault2018-10-301-2/+3
| | | | llvm-svn: 345559
* Remove unneeded friend declarations that clang-cl warns onReid Kleckner2018-10-292-4/+0
| | | | llvm-svn: 345549
* AMDGPU: Enable code object v3 by defaultKonstantin Zhuravlyov2018-10-291-15/+30
| | | | | | Differential Revision: https://reviews.llvm.org/D53525 llvm-svn: 345542
* [X86] Set isMachineVerifierClean() back to false (PR27481)Simon Pilgrim2018-10-291-0/+4
| | | | | | Put back the isMachineVerifierClean() override removed at rL345513 to fix Windows ThinLTO tests llvm-svn: 345528
* [WebAssembly] Lower away condition truncations for scalar selectsThomas Lively2018-10-292-0/+14
| | | | | | | | | | Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53676 llvm-svn: 345521
* [X86][SSE] getFauxShuffleMask - Fix shuffle mask adjustment for multiple ↵Simon Pilgrim2018-10-291-4/+3
| | | | | | | | inserted subvectors Part of the issue discovered in PR39483, although its not fully exposed until I reapply rL345395 (by reverting rL345451) llvm-svn: 345520
* [X86] Add AES to KNL CPUs to match clang.Craig Topper2018-10-291-0/+1
| | | | | | I believe this was lost from KNL when AES was pushed from Westmere to Skylake recently. KNL used to inherit from IVB. llvm-svn: 345519
* [AMDGPU] Fixed return value causing warning and regressionStanislav Mekhanoshin2018-10-291-1/+1
| | | | llvm-svn: 345518
* [AArch64] Rename FP16FML instruction format (NFC)Bryan Chan2018-10-292-72/+78
| | | | | | | Rename SIMDThreeSameMult (etc.) to SIMDThreeSameVectorFML (etc.) to follow usual naming convention, and add some comments in the .td files. llvm-svn: 345515
* [AMDGPU] Match v_swap_b32Stanislav Mekhanoshin2018-10-292-0/+175
| | | | | | Differential Revision: https://reviews.llvm.org/D52677 llvm-svn: 345514
* [X86] Enable the MachineVerifier by defaultFrancis Visoiu Mistrih2018-10-291-4/+0
| | | | | | | | | | | | | | | The machine verifier was disabled for x86 by default. There are now only 9 tests failing, compared to what previously was between 20 and 30. This is a good opportunity to file bugs for all the remaining issues, then explicitly disable the failing tests and enabling the machine verifier by default. This allows us to avoid adding new tests that break the verifier. PR27481 llvm-svn: 345513
* [AArch64] Return address signing B key supportLuke Cheeseman2018-10-291-3/+20
| | | | | | | | | | | - Add support to generate AUTIBSP, PACIBSP, RETAB instructions for return address signing - The key used to sign the function is controlled by the function attribute "sign-return-address-key" Differential Revision: https://reviews.llvm.org/D51427 llvm-svn: 345511
* [X86] Force floating point values in constant pool decoding to print in ↵Craig Topper2018-10-291-1/+2
| | | | | | | | | | scientific notation so they can't be confused with integers. When the floating point constants are whole numbers they have no decimal point so look like integers, but mean something very different in something like an 'and' instruction. Ideally we would just print a decimal point and a 0, but I couldn't see how to make APFloat::toString do that. llvm-svn: 345488
* [X86] Recognize constant splats in LowerFCOPYSIGN.Craig Topper2018-10-281-1/+1
| | | | llvm-svn: 345484
* [VectorLegalizer] Enable TargetLowering::expandFP_TO_UINT support.Simon Pilgrim2018-10-281-8/+20
| | | | | | | | Add vector support to TargetLowering::expandFP_TO_UINT. This exposes an issue in X86TargetLowering::LowerVSELECT which was assuming that the select mask was the same width as the LHS/RHS ops - as long as the result is a sign splat we can easily sext/trunk this. llvm-svn: 345473
* AMD BdVer2 (Piledriver) Initial Scheduler modelRoman Lebedev2018-10-273-2/+1293
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: # Overview This is somewhat partial. * Latencies are good {F7371125} * All of these remaining inconsistencies //appear// to be noise/noisy/flaky. * NumMicroOps are somewhat good {F7371158} * Most of the remaining inconsistencies are from `Ld` / `Ld_ReadAfterLd` classes * Actual unit occupation (pipes, `ResourceCycles`) are undiscovered lands, i did not really look there. They are basically verbatum copy from `btver2` * Many `InstRW`. And there are still inconsistencies left... To be noted: I think this is the first new schedule profile produced with the new next-gen tools like llvm-exegesis! # Benchmark I realize that isn't what was suggested, but i'll start with some "internal" public real-world benchmark i understand - [[ https://github.com/darktable-org/rawspeed | RawSpeed raw image decoding library ]]. Diff (the exact clang from trunk without/with this patch): ``` Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench Benchmark Time CPU Time Old Time New CPU Old CPU New ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_mean -0.0607 -0.0604 234 219 233 219 Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_median -0.0630 -0.0626 233 219 233 219 Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_stddev +0.2581 +0.2587 1 2 1 2 Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_mean -0.0770 -0.0767 144 133 144 133 Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_median -0.0767 -0.0763 144 133 144 133 Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_stddev -0.4170 -0.4156 1 0 1 0 Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_mean -0.0271 -0.0270 463 450 463 450 Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_median -0.0093 -0.0093 453 449 453 449 Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_stddev -0.7280 -0.7280 13 4 13 4 Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_pvalue 0.0004 0.0004 U Test, Repetitions: 25 vs 25 Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_mean -0.0065 -0.0065 569 565 569 565 Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_median -0.0077 -0.0077 569 564 569 564 Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_stddev +1.0077 +1.0068 2 5 2 5 Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_pvalue 0.0220 0.0199 U Test, Repetitions: 25 vs 25 Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_mean +0.0006 +0.0007 312 312 312 312 Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_median +0.0031 +0.0032 311 312 311 312 Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_stddev -0.7069 -0.7072 4 1 4 1 Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_pvalue 0.0004 0.0004 U Test, Repetitions: 25 vs 25 Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_mean -0.0015 -0.0015 141 141 141 141 Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_median -0.0010 -0.0011 141 141 141 141 Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_stddev -0.1486 -0.1456 0 0 0 0 Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_pvalue 0.6139 0.8766 U Test, Repetitions: 25 vs 25 Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_mean -0.0008 -0.0005 60 60 60 60 Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_median -0.0006 -0.0002 60 60 60 60 Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_stddev -0.1467 -0.1390 0 0 0 0 Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_pvalue 0.0137 0.0137 U Test, Repetitions: 25 vs 25 Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_mean +0.0002 +0.0002 275 275 275 275 Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_median -0.0015 -0.0014 275 275 275 275 Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_stddev +3.3687 +3.3587 0 2 0 2 Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_pvalue 0.4041 0.3933 U Test, Repetitions: 25 vs 25 Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_mean +0.0004 +0.0004 67 67 67 67 Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_median -0.0000 -0.0000 67 67 67 67 Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_stddev +0.1947 +0.1995 0 0 0 0 Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_pvalue 0.0074 0.0001 U Test, Repetitions: 25 vs 25 Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_mean -0.0092 +0.0074 547 542 25 25 Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_median -0.0054 +0.0115 544 541 25 25 Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_stddev -0.4086 -0.3486 8 5 0 0 Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_pvalue 0.3320 0.0000 U Test, Repetitions: 25 vs 25 Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_mean +0.0015 +0.0204 218 218 12 12 Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_median +0.0001 +0.0203 218 218 12 12 Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_stddev +0.2259 +0.2023 1 1 0 0 GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_pvalue 0.0000 0.0001 U Test, Repetitions: 25 vs 25 GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_mean -0.0209 -0.0179 96 94 90 88 GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_median -0.0182 -0.0155 95 93 90 88 GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_stddev -0.6164 -0.2703 2 1 2 1 Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_mean -0.0098 -0.0098 176 175 176 175 Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_median -0.0126 -0.0126 176 174 176 174 Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_stddev +6.9789 +6.9157 0 2 0 2 Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_mean -0.0237 -0.0238 474 463 474 463 Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_median -0.0267 -0.0267 473 461 473 461 Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_stddev +0.7179 +0.7178 3 5 3 5 Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_pvalue 0.6837 0.6554 U Test, Repetitions: 25 vs 25 Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_mean -0.0014 -0.0013 1375 1373 1375 1373 Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_median +0.0018 +0.0019 1371 1374 1371 1374 Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_stddev -0.7457 -0.7382 11 3 10 3 Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_mean -0.0080 -0.0289 22 22 10 10 Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_median -0.0070 -0.0287 22 22 10 10 Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_stddev +1.0977 +0.6614 0 0 0 0 Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_mean +0.0132 +0.0967 35 36 10 11 Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_median +0.0132 +0.0956 35 36 10 11 Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_stddev -0.0407 -0.1695 0 0 0 0 Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_mean +0.0331 +0.1307 13 13 6 6 Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_median +0.0430 +0.1373 12 13 6 6 Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_stddev -0.9006 -0.8847 1 0 0 0 Pentax/645Z/IMGP2837.PEF/threads:8/real_time_pvalue 0.0016 0.0010 U Test, Repetitions: 25 vs 25 Pentax/645Z/IMGP2837.PEF/threads:8/real_time_mean -0.0023 -0.0024 395 394 395 394 Pentax/645Z/IMGP2837.PEF/threads:8/real_time_median -0.0029 -0.0030 395 394 395 393 Pentax/645Z/IMGP2837.PEF/threads:8/real_time_stddev -0.0275 -0.0375 1 1 1 1 Phase One/P65/CF027310.IIQ/threads:8/real_time_pvalue 0.0232 0.0000 U Test, Repetitions: 25 vs 25 Phase One/P65/CF027310.IIQ/threads:8/real_time_mean -0.0047 +0.0039 114 113 28 28 Phase One/P65/CF027310.IIQ/threads:8/real_time_median -0.0050 +0.0037 114 113 28 28 Phase One/P65/CF027310.IIQ/threads:8/real_time_stddev -0.0599 -0.2683 1 1 0 0 Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_mean +0.0206 +0.0207 405 414 405 414 Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_median +0.0204 +0.0205 405 414 405 414 Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_stddev +0.2155 +0.2212 1 1 1 1 Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_mean -0.0109 -0.0108 147 145 147 145 Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_median -0.0104 -0.0103 147 145 147 145 Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_stddev -0.4919 -0.4800 0 0 0 0 Samsung/NX3000/_3184416.SRW/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Samsung/NX3000/_3184416.SRW/threads:8/real_time_mean -0.0149 -0.0147 220 217 220 217 Samsung/NX3000/_3184416.SRW/threads:8/real_time_median -0.0173 -0.0169 221 217 220 217 Samsung/NX3000/_3184416.SRW/threads:8/real_time_stddev +1.0337 +1.0341 1 3 1 3 Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_pvalue 0.0001 0.0001 U Test, Repetitions: 25 vs 25 Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_mean -0.0019 -0.0019 194 193 194 193 Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_median -0.0021 -0.0021 194 193 194 193 Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_stddev -0.4441 -0.4282 0 0 0 0 Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_pvalue 0.0000 0.4263 U Test, Repetitions: 25 vs 25 Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_mean +0.0258 -0.0006 81 83 19 19 Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_median +0.0235 -0.0011 81 82 19 19 Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_stddev +0.1634 +0.1070 1 1 0 0 ``` {F7443905} If we look at the `_mean`s, the time column, the biggest win is `-7.7%` (`Canon/EOS 5D Mark II/10.canon.sraw2.cr2`), and the biggest loose is `+3.3%` (`Panasonic/DC-GH5S/P1022085.RW2`); Overall: mean `-0.7436%`, median `-0.23%`, `cbrt(sum(time^3))` = `-8.73%` Looks good so far i'd say. llvm-exegesis details: {F7371117} {F7371125} {F7371128} {F7371144} {F7371158} Reviewers: craig.topper, RKSimon, andreadb, courbet, avt77, spatel, GGanesh Reviewed By: andreadb Subscribers: javed.absar, gbedwell, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D52779 llvm-svn: 345463
* [X86][SSE] LowerVSELECT - pull out repeated getOperand(). NFCI.Simon Pilgrim2018-10-271-11/+13
| | | | llvm-svn: 345458
* Revert rL345395: [X86][SSE] Move 2-input limit up from getFauxShuffleMask to ↵Simon Pilgrim2018-10-271-2/+6
| | | | | | | | | | resolveTargetShuffleInputs Makes no difference to actual shuffle decoding yet, but merges all the existing limits in one place for when proper support is fixed. ........ Its been reported that this is causing out of trunk failures. llvm-svn: 345451
OpenPOWER on IntegriCloud