bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[DAGCombine] Add undef shuffle elt support to partitionShuffleOfConcats	Simon Pilgrim	2019-02-25	2	-54/+40
\| \| \| \| \| \| \| \|	Support undef shuffle mask indices in the shuffle(concat_vectors, concat_vectors) -> concat_vectors fold Differential Revision: https://reviews.llvm.org/D58585 llvm-svn: 354793
*	[clangd] Drop documentation in static index if symbols are not indexed for ↵	Haojian Wu	2019-02-25	5	-8/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	completion. Summary: This is a further optimization of r350803, we drop docs in static index for symbols not being indexed for completion, while keeping the docs in dynamic index (we rely on dynamic index to get docs for class members). Reviewers: ilya-biryukov Reviewed By: ilya-biryukov Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Differential Revision: https://reviews.llvm.org/D56539 llvm-svn: 354792
*	[ARM] Add some more missing T1 opcodes for the peephole optimisier	David Green	2019-02-25	4	-44/+456
\| \| \| \| \| \| \| \| \| \| \|	This adds a few extra Thumb1 opcodes to improve the peephole opimisers ability to remove redundant cmp instructions. tADC and tSBC require a small fixup to prevent MOVS being moved past the instruction, giving the wrong flags. Differential Revision: https://reviews.llvm.org/D58281 llvm-svn: 354791
*	[Vectorizer] Add vectorization support for fixed smul/umul intrinsics	Simon Pilgrim	2019-02-25	4	-1001/+773
\| \| \| \| \| \| \| \|	This requires a couple of tweaks to existing vectorization functions as they were assuming that only the second call argument (ctlz/cttz/powi) could ever be the 'always scalar' argument, but for smul.fix + umul.fix its the third argument. Differential Revision: https://reviews.llvm.org/D58616 llvm-svn: 354790
*	[AArch64] Add support for Cortex-A76 and Cortex-A76AE	Luke Cheeseman	2019-02-25	4	-0/+57
\| \| \| \| \| \| \| \| \| \|	- Add LLVM backend support for Cortex-A76 and Cortex-A76AE - Documentation can be found at https://developer.arm.com/products/processors/cortex-a/cortex-a76 Differential Revision: https://reviews.llvm.org/D57764 llvm-svn: 354789
*	[AArch64] Add support for Cortex-A76 and Cortex-A76AE	Luke Cheeseman	2019-02-25	15	-3/+81
\| \| \| \| \| \| \| \|	- Add LLVM backend support for Cortex-A76 and Cortex-A76AE - Documentation can be found at https://developer.arm.com/products/processors/cortex-a/cortex-a76 llvm-svn: 354788
*	[llvm-objcopy] Add --add-symbol	Eugene Leviant	2019-02-25	7	-2/+182
\| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D58234 llvm-svn: 354787
*	Moved clangd docs to a separate directory in preparation to restructure them ↵	Dmitri Gribenko	2019-02-25	3	-180/+183
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	into multiple files Reviewers: ilya-biryukov Subscribers: ioeric, MaskRay, jkorous, arphaman, kadircet, jdoerfert, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D58607 llvm-svn: 354786
*	Fixed typos in tests: s/CHEKC/CHECK/	Dmitri Gribenko	2019-02-25	6	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: ilya-biryukov Subscribers: nemanjai, javed.absar, jsji, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D58611 llvm-svn: 354785
*	[TTI] Add generic cost model for smul/umul overflow intrinsics	Simon Pilgrim	2019-02-25	2	-36/+408
\| \| \| \| \| \|	Based off smul/umul fixed costs and the implementation in TargetLowering::expandMULO. llvm-svn: 354784
*	[SLPVectorizer][X86] Add fixed smul/umul tests	Simon Pilgrim	2019-02-25	1	-0/+2007
\| \| \| \| \| \|	Baseline tests - fixed mul intrinsics aren't flagged as vectorizable yet llvm-svn: 354783
*	[llvm-objdump] Add `Version References` dumper	Xing GUO	2019-02-25	5	-1/+171
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add symbol version dumper for [#30241](https://bugs.llvm.org/show_bug.cgi?id=30241) Reviewers: jhenderson, MaskRay, kristina, emaste, grimar Reviewed By: jhenderson, grimar Subscribers: grimar, rupprecht, jakehehrlich, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54697 llvm-svn: 354782
*	Fixed typos in tests: s/CEHCK/CHECK/	Dmitri Gribenko	2019-02-25	10	-15/+15
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: ilya-biryukov Subscribers: sanjoy, sdardis, javed.absar, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58608 llvm-svn: 354781
*	[clang-tidy] misc-string-integer-assignment: ignore toupper/tolower	Clement Courbet	2019-02-25	2	-4/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Tis represents ~20% of false positives. See PR27723. Reviewers: xazax.hun, alexfh Subscribers: rnkovacs, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D58604 llvm-svn: 354780
*	Updated the documentation build instructions for the current CMake build system	Dmitri Gribenko	2019-02-25	1	-8/+7
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: ilya-biryukov Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D58603 llvm-svn: 354779
*	Fixed grammar in index.rst	Dmitri Gribenko	2019-02-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Subscribers: arphaman, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D58601 llvm-svn: 354778
*	Removed an unhelpful comment in index.rst	Dmitri Gribenko	2019-02-25	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: ilya-biryukov Subscribers: arphaman, jdoerfert, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D58602 llvm-svn: 354777
*	Test commit (remove a blank space)	Ganesh Gopalasubramanian	2019-02-25	1	-1/+1
\| \| \| \| \|	Change-Id: I69175571d3b1defeb85e96fdd87db5c3ccadcb63 llvm-svn: 354775
*	[TTI] Add generic cost model for fixed point smul/umul	Simon Pilgrim	2019-02-25	2	-36/+402
\| \| \| \| \| \| \| \|	Based on an IR equivalent of target lowering's generic expansion - target specific costs will typically be lower (IR doesn't have a good mull/mulh equivalent) but we need a baseline. Differential Revision: https://reviews.llvm.org/D57925 llvm-svn: 354774
*	[SYCL] Add clang front-end option to enable SYCL device compilation flow.	Alexey Bader	2019-02-25	5	-1/+22
\| \| \| \| \| \|	Patch by Mariya Podchishchaeva <mariya.podchishchaeva@intel.com> llvm-svn: 354773
*	[mips] Reduce number of tools invocations in the test. NFC	Simon Atanasyan	2019-02-25	1	-74/+41
\| \| \| \|	llvm-svn: 354772
*	[X86] Merge ISD::ADD/SUB nodes into X86ISD::ADD/SUB equivalents (PR40483)	Simon Pilgrim	2019-02-25	3	-71/+56
\| \| \| \| \| \| \| \| \| \|	Avoid ADD/SUB instruction duplication by reusing the X86ISD::ADD/SUB results. Includes ADD commutation - I tried to include NEG+SUB SUB commutation as well but this causes regressions as we don't have good combine coverage to simplify X86ISD::SUB. Differential Revision: https://reviews.llvm.org/D58597 llvm-svn: 354771
*	[yaml2obj]Re-allow dynamic sections to have raw content	James Henderson	2019-02-25	4	-4/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recently, support was added to yaml2obj to allow dynamic sections to have a list of entries, to make it easier to write tests with dynamic sections. However, this change also removed the ability to provide custom contents to the dynamic section, making it hard to test malformed contents (e.g. because the section is not a valid size to contain an array of entries). This change reinstates this. An error is emitted if raw content and dynamic entries are both specified. Reviewed by: grimar, ruiu Differential Review: https://reviews.llvm.org/D58543 llvm-svn: 354770
*	[ELF][ARM] Accept and ignore -p and -no-pipleline-knowledge	Peter Smith	2019-02-25	2	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \|	The linux kernel uses an old flag -p/-no-pipeline-knowledge that is accepted by bfd and gold but ignored by modern versions of them. The original option is very old and is pre-ABI, it sometimes comes up in code-bases that had support for pre ABI toolchains. The Linux kernel uses it in 3 places in the ARM specific section. Differential Revision: https://reviews.llvm.org/D58540 llvm-svn: 354769
*	[ARM] Make fullfp16 instructions not conditionalisable.	Simon Tatham	2019-02-25	10	-10/+297
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	More or less all the instructions defined in the v8.2a full-fp16 extension are defined as UNPREDICTABLE if you put them in an IT block (Thumb) or use with any condition other than AL (ARM). LLVM didn't know that, and was happy to conditionalise them. In order to force these instructions to count as not predicable, I had to make a small Tablegen change. The code generation back end mostly decides if an instruction was predicable by looking for something it can identify as a predicate operand; there's an isPredicable bit flag that overrides that check in the positive direction, but nothing that overrides it in the negative direction. (I considered the alternative approach of actually removing the predicate operand from those instructions, but thought that it would be more painful overall for instructions differing only in data type to have different shapes of operand list. This way, the only code that has to notice the difference is the if-converter.) So I've added an isUnpredicable bit alongside isPredicable, and set that bit on the right subset of FP16 instructions, and also on the VSEL, VMAXNM/VMINNM and VRINT[ANPM] families which should be unpredicable for all data types. I've included a couple of representative regression tests, both of which previously caused an fp16 instruction to be conditionalised in ARM state and (with -arm-no-restrict-it) to be put in an IT block in Thumb. Reviewers: SjoerdMeijer, t.p.northover, efriedma Reviewed By: efriedma Subscribers: jdoerfert, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57823 llvm-svn: 354768
*	[llvm-exegesis] Split Epsilon param into two (PR40787)	Roman Lebedev	2019-02-25	8	-24/+116
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This eps param is used for two distinct things: * initial point clusterization * checking clusters against the llvm values What if one wants to only look at highly different clusters, without changing the clustering itself? In particular, this helps to weed out noisy measurements (since the clusterization epsilon is still small, so there is a better chance that noisy measurements from the same opcode will go into different clusters) By splitting it into two params it is now possible. This is nearly-free performance-wise: Old: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs): 390.01 msec task-clock # 0.998 CPUs utilized ( +- 0.25% ) 12 context-switches # 31.735 M/sec ( +- 27.38% ) 0 cpu-migrations # 0.000 K/sec 4745 page-faults # 12183.732 M/sec ( +- 0.54% ) 1562711900 cycles # 4012303.327 GHz ( +- 0.24% ) (82.90%) 185567822 stalled-cycles-frontend # 11.87% frontend cycles idle ( +- 0.52% ) (83.30%) 392106234 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.31% ) (33.79%) 1839236666 instructions # 1.18 insn per cycle # 0.21 stalled cycles per insn ( +- 0.15% ) (50.37%) 407035764 branches # 1045074878.710 M/sec ( +- 0.12% ) (66.80%) 10896459 branch-misses # 2.68% of all branches ( +- 0.17% ) (83.20%) 0.390629 +- 0.000972 seconds time elapsed ( +- 0.25% ) ``` ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs): 6803.36 msec task-clock # 0.999 CPUs utilized ( +- 0.96% ) 262 context-switches # 38.546 M/sec ( +- 23.06% ) 0 cpu-migrations # 0.065 M/sec ( +- 76.03% ) 13287 page-faults # 1953.206 M/sec ( +- 0.32% ) 27252537904 cycles # 4006024.257 GHz ( +- 0.95% ) (83.31%) 1496314935 stalled-cycles-frontend # 5.49% frontend cycles idle ( +- 0.97% ) (83.32%) 16128404524 stalled-cycles-backend # 59.18% backend cycles idle ( +- 0.30% ) (33.37%) 17611143370 instructions # 0.65 insn per cycle # 0.92 stalled cycles per insn ( +- 0.05% ) (50.04%) 3894906599 branches # 572537147.437 M/sec ( +- 0.03% ) (66.69%) 116314514 branch-misses # 2.99% of all branches ( +- 0.20% ) (83.35%) 6.8118 +- 0.0689 seconds time elapsed ( +- 1.01%) ``` New: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs): 400.14 msec task-clock # 0.998 CPUs utilized ( +- 0.66% ) 12 context-switches # 29.429 M/sec ( +- 25.95% ) 0 cpu-migrations # 0.100 M/sec ( +-100.00% ) 4714 page-faults # 11796.496 M/sec ( +- 0.55% ) 1603131306 cycles # 4011840.105 GHz ( +- 0.66% ) (82.85%) 199538509 stalled-cycles-frontend # 12.45% frontend cycles idle ( +- 2.40% ) (83.10%) 402249109 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.19% ) (34.05%) 1847783963 instructions # 1.15 insn per cycle # 0.22 stalled cycles per insn ( +- 0.18% ) (50.64%) 407162722 branches # 1018925730.631 M/sec ( +- 0.12% ) (67.02%) 10932779 branch-misses # 2.69% of all branches ( +- 0.51% ) (83.28%) 0.40077 +- 0.00267 seconds time elapsed ( +- 0.67% ) lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs): 6947.79 msec task-clock # 1.000 CPUs utilized ( +- 0.90% ) 217 context-switches # 31.236 M/sec ( +- 36.16% ) 1 cpu-migrations # 0.096 M/sec ( +- 50.00% ) 13258 page-faults # 1908.389 M/sec ( +- 0.34% ) 27830796523 cycles # 4006032.286 GHz ( +- 0.89% ) (83.30%) 1504554006 stalled-cycles-frontend # 5.41% frontend cycles idle ( +- 2.10% ) (83.32%) 16716574843 stalled-cycles-backend # 60.07% backend cycles idle ( +- 0.65% ) (33.38%) 17755545931 instructions # 0.64 insn per cycle # 0.94 stalled cycles per insn ( +- 0.09% ) (50.04%) 3897255686 branches # 560980426.597 M/sec ( +- 0.06% ) (66.70%) 117045395 branch-misses # 3.00% of all branches ( +- 0.47% ) (83.34%) 6.9507 +- 0.0627 seconds time elapsed ( +- 0.90% ) ``` I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps. Within noise i'd say. Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 \| PR40787 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58476 llvm-svn: 354767
*	Finish revert of r354706	Pavel Labath	2019-02-25	1	-2/+0
\| \| \| \| \| \|	The revert in r354711 wasn't complete. Finish the job. llvm-svn: 354766
*	[clangd] Add thread priority lowering for MacOS as well	Kadir Cetinkaya	2019-02-25	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: ilya-biryukov Subscribers: ioeric, MaskRay, jkorous, arphaman, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D58492 llvm-svn: 354765
*	[XRay][tools] Revert "Use Support/JSON.h in llvm-xray convert"	Roman Lebedev	2019-02-25	1	-48/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This reverts D50129 / rL338834: [XRay][tools] Use Support/JSON.h in llvm-xray convert Abstractions are great. Readable code is great. JSON support library is a good idea. However unfortunately, there is an internal detail that one needs to be aware of in `llvm::json::Object` - it uses `llvm::DenseMap`. So for every `llvm::json::Object`, even if you only store a single `int` entry there, you pay the whole price of `llvm::DenseMap`. Unfortunately, it matters for `llvm-xray`. I was trying to analyse the `llvm-exegesis` analysis mode performance, and for that i wanted to view the LLVM X-Ray log visualization in Chrome trace viewer. And the `llvm-xray convert` is sluggish, and sometimes even ended up being killed by OOM. `xray-log.llvm-exegesis.lwZ0sT` was acquired from `llvm-exegesis` (compiled with ` -fxray-instruction-threshold=128`) analysis mode over `-benchmarks-file` with 10099 points (one full latency measurement set), with normal runtime of 0.387s. Timings: Old: (copied from D58580) ``` $ perf stat -r 5 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (5 runs): 21346.24 msec task-clock # 1.000 CPUs utilized ( +- 0.28% ) 314 context-switches # 14.701 M/sec ( +- 59.13% ) 1 cpu-migrations # 0.037 M/sec ( +-100.00% ) 2181354 page-faults # 102191.251 M/sec ( +- 0.02% ) 85477442102 cycles # 4004415.019 GHz ( +- 0.28% ) (83.33%) 14526427066 stalled-cycles-frontend # 16.99% frontend cycles idle ( +- 0.70% ) (83.33%) 32371533721 stalled-cycles-backend # 37.87% backend cycles idle ( +- 0.27% ) (33.34%) 67896890228 instructions # 0.79 insn per cycle # 0.48 stalled cycles per insn ( +- 0.03% ) (50.00%) 14592654840 branches # 683631198.653 M/sec ( +- 0.02% ) (66.67%) 212207534 branch-misses # 1.45% of all branches ( +- 0.94% ) (83.34%) 21.3502 +- 0.0585 seconds time elapsed ( +- 0.27% ) ``` New: ``` $ perf stat -r 9 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (9 runs): 7178.38 msec task-clock # 1.000 CPUs utilized ( +- 0.26% ) 182 context-switches # 25.402 M/sec ( +- 28.84% ) 0 cpu-migrations # 0.046 M/sec ( +- 70.71% ) 33701 page-faults # 4694.994 M/sec ( +- 0.88% ) 28761053971 cycles # 4006833.933 GHz ( +- 0.26% ) (83.32%) 2028297997 stalled-cycles-frontend # 7.05% frontend cycles idle ( +- 1.61% ) (83.32%) 10773154901 stalled-cycles-backend # 37.46% backend cycles idle ( +- 0.38% ) (33.36%) 36199132874 instructions # 1.26 insn per cycle # 0.30 stalled cycles per insn ( +- 0.03% ) (50.02%) 6434504227 branches # 896420204.421 M/sec ( +- 0.03% ) (66.68%) 73355176 branch-misses # 1.14% of all branches ( +- 1.46% ) (83.33%) 7.1807 +- 0.0190 seconds time elapsed ( +- 0.26% ) ``` So using `llvm::json` nearly triples run-time on that test case. (+3x is times, not percent.) Memory: Old: ``` total runtime: 39.88s. bytes allocated in total (ignoring deallocations): 79.07GB (1.98GB/s) calls to allocation functions: 33267816 (834135/s) temporary memory allocations: 5832298 (146235/s) peak heap memory consumption: 9.21GB peak RSS (including heaptrack overhead): 147.98GB total memory leaked: 1.09MB ``` New: ``` total runtime: 17.42s. bytes allocated in total (ignoring deallocations): 5.12GB (293.86MB/s) calls to allocation functions: 21382982 (1227284/s) temporary memory allocations: 232858 (13364/s) peak heap memory consumption: 350.69MB peak RSS (including heaptrack overhead): 2.55GB total memory leaked: 79.95KB ``` Diff: ``` total runtime: -22.46s. bytes allocated in total (ignoring deallocations): -73.95GB (3.29GB/s) calls to allocation functions: -11884834 (529155/s) temporary memory allocations: -5599440 (249307/s) peak heap memory consumption: -8.86GB peak RSS (including heaptrack overhead): 0B total memory leaked: -1.01MB ``` So using `llvm::json` increases peak memory consumption on this testcase ~+27x. And total allocation count +15x. Both of these numbers are times, not percent. And note that memory usage is clearly unbound with `llvm::json`, it directly depends on the length of the log, so peak memory consumption is always increasing. This isn't so with the dumb code, there is no accumulating memory consumption, peak memory consumption is fixed. Naturally, that means it will handle much larger logs without OOM'ing. Readability is good, but the price is simply unacceptable here. Too bad none of this analysis was done as part of the development/review D50129 itself. Reviewers: dberris, kpw, sammccall Reviewed By: dberris Subscribers: riccibruno, hans, courbet, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58584 llvm-svn: 354764
*	[SelectionDAG] Add a OPC_CheckChild2CondCode to SelectionDAGISel to remove a ↵	Craig Topper	2019-02-25	6	-3/+55
\| \| \| \| \| \| \| \| \| \|	MoveChild and MoveParent pair. OPC_CheckCondCode is always used as operand 2 of a setcc. And its always surrounded by a MoveChild2 and a MoveParent. By having a dedicated opcode for this case we can reduce the number of bytes needed for this pattern from 4 bytes to 2. This saves ~3000 bytes in the X86 table. llvm-svn: 354763
*	[PowerPC] [PowerPC] Enhance the fast selection of fptoi & fptrunc ↵	Kang Zhang	2019-02-25	3	-10/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instruction and clean up related asserts Summary: Fast selection of llvm fptoi & fptrunc instructions is not handled well about VSX instruction support. We'd use VSX float convert integer instruction instead of non-vsx float convert integer instruction if the operand register class is VSSRC or VSFRC because i32 and i64 are mapped to VSSRC and VSFRC correspondingly if VSX feature is openeded. For float trunc instruction, we do this silimar work like float convert integer instruction to try to use VSX instruction. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D58430 llvm-svn: 354762
*	[clangd] Enhance macro hover to see full definition	Marc-Andre Laperle	2019-02-24	2	-8/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Signed-off-by: Marc-Andre Laperle <malaperle@gmail.com> Reviewers: simark, ilya-biryukov, sammccall, ioeric, hokein Reviewed By: ilya-biryukov Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D55250 llvm-svn: 354761
*	[InstCombine] Add tests for PR40846; NFC	Nikita Popov	2019-02-24	1	-0/+123
\| \| \| \| \| \|	The icmps are the same as the overflow result of the intrinsic. llvm-svn: 354760
*	[InstCombine] Move with.overflow tests to separate file; NFC	Nikita Popov	2019-02-24	2	-303/+347
\| \| \| \| \| \| \| \| \|	And regenerate checks. I had to rename some variables, because update_test_checks can't deal with the same variable names used in lower and upper case. I've also dropped the result type aliases, as just using the type directly gives a cleaner result. llvm-svn: 354759
*	[X86] Add PR40483 test cases	Simon Pilgrim	2019-02-24	2	-0/+187
\| \| \| \| \| \|	Demonstrate failure to merge ISD::ADD(x,y)/X86ISD::ADD(x,y) + ISD::SUB(x,y)/X86ISD::SUB(x,y) equivalent ops llvm-svn: 354758
*	[X86] Combine zext(packus(x),packus(y)) -> concat(x,y) (PR39637)	Simon Pilgrim	2019-02-24	5	-50/+40
\| \| \| \| \| \|	Its proving tricky to combine shuffles across multiple vector sizes, so for now I'm adding this more specific combine - the pattern is common enough to be worth it as a first step. llvm-svn: 354757
*	[X86] Fix tls variable lowering issue with large code model	Craig Topper	2019-02-24	2	-5/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The problem here is the lowering for tls variable. Below is the DAG for the code. SelectionDAG has 11 nodes: t0: ch = EntryToken t8: i64,ch = load<(load 8 from `i8 addrspace(257)* null`, addrspace 257)> t0, Constant:i64<0>, undef:i64 t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10] t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64 t12: i64 = add t8, t11 t4: i32,ch = load<(dereferenceable load 4 from @x)> t0, t12, undef:i64 t6: ch = CopyToReg t0, Register:i32 %0, t4 And when mcmodel is large, below instruction can NOT be folded. t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10] t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64 So "t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64" is lowered to " Morphed node: t11: i64,ch = MOV64rm<Mem:(load 8 from got)> t10, TargetConstant:i8<1>, Register:i64 $noreg, TargetConstant:i32<0>, Register:i32 $noreg, t0" When llvm start to lower "t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]", it fails. The patch is to fold the load and X86ISD::WrapperRIP. Fixes PR26906 Patch by LuoYuanke Reviewers: craig.topper, rnk, annita.zhang, wxiao3 Reviewed By: rnk Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58336 llvm-svn: 354756
*	[X86][SSE] Use pblendw for v4i32/v2i64 during isel.	Craig Topper	2019-02-24	10	-182/+142
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Previously we used BLENDPS/BLENDPD but that puts the blend in the FP domain. Under optsize, the two address instruction pass can cause blendps/blendpd to commute to blendps/blendpd. But we probably shouldn't do that if the original type was a integer. So use pblendw instead. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58574 llvm-svn: 354755
*	[X86] Correct some ADC/SBB with immediate scheduler data for Broadwell and ↵	Craig Topper	2019-02-24	6	-61/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Skylake. Summary: The AX/EAX/RAX with immediate forms are 2 uops just like the AL with immediate. The modrm form with r8 and immediate is a single uop just like r16/r32/r64 with immediate. Reviewers: RKSimon, andreadb Reviewed By: RKSimon Subscribers: gbedwell, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58581 llvm-svn: 354754
*	[LegalizeTypes][AArch64][X86] Make type legalization of vector ↵	Craig Topper	2019-02-24	11	-362/+296
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(S/U)ADD/SUB/MULO follow getSetCCResultType for the overflow bits. Make UnrollVectorOverflowOp properly convert from scalar boolean contents to vector boolean contents Summary: When promoting the over flow vector for these ops we should use the target's desired setcc result type. This way a v8i32 result type will use a v8i32 overflow vector instead of a v8i16 overflow vector. A v8i16 overflow vector will cause LegalizeDAG/LegalizeVectorOps to have to use v8i32 and truncate to v8i16 in its expansion. By doing this in type legalization instead, we get the truncate into the DAG earlier and give DAG combine more of a chance to optimize it. We also have to fix unrolling to use the scalar setcc result type for the scalarized operation, and convert it to the required vector element type after the scalar operation. We have to observe the vector boolean contents when doing this conversion. The previous code was just taking the scalar result and putting it in the vector. But for X86 and AArch64 that would have only put a the boolean value in bit 0 of the element and left all other bits in the element 0. We need to ensure all bits in the element are the same. I'm using a select with constants here because that's what setcc unrolling in LegalizeVectorOps used. Reviewers: spatel, RKSimon, nikic Reviewed By: nikic Subscribers: javed.absar, kristof.beyls, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58567 llvm-svn: 354753
*	Fix accidentally used hard tabs. NFC	Kristina Brooks	2019-02-24	1	-1/+1
\| \| \| \| \| \| \|	Big sorry. This undoes the indentation mess I made in r354751. llvm-svn: 354752
*	Wrap code for builtin_assume_aligned at 80 col.NFC	Kristina Brooks	2019-02-24	1	-1/+2
\| \| \| \| \| \| \|	Minor style fix to avoid going over 80 cols in handling of case for Builtin::BI__builtin_assume_aligned. NFC. llvm-svn: 354751
*	[InstCombine] add test for icmp+add fold; NFC	Sanjay Patel	2019-02-24	1	-0/+29
\| \| \| \|	llvm-svn: 354750
*	[X86][AVX] Rename lowerShuffleByMerging128BitLanes to ↵	Simon Pilgrim	2019-02-24	1	-10/+11
\| \| \| \| \| \| \| \|	lowerShuffleAsLanePermuteAndRepeatedMask. NFC. Name better matches the other similar 'lane permute' and 'repeated mask' functions we have. llvm-svn: 354749
*	[InstCombine] canonicalize add/sub with bool	Sanjay Patel	2019-02-24	6	-19/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	add A, sext(B) --> sub A, zext(B) We have to choose 1 of these forms, so I'm opting for the zext because that's easier for value tracking. The backend should be prepared for this change after: D57401 rL353433 This is also a preliminary step towards reducing the amount of bit hackery that we do in IR to optimize icmp/select. That should be waiting to happen at a later optimization stage. The seeming regression in the fuzzer test was discussed in: D58359 We were only managing that fold in instcombine by luck, and other passes should be able to deal with that better anyway. llvm-svn: 354748
*	[InstCombine] regenerate checks; NFC	Sanjay Patel	2019-02-24	1	-6/+6
\| \| \| \|	llvm-svn: 354747
*	[CGP] add special-cases to form unsigned add with overflow (PR40486)	Sanjay Patel	2019-02-24	5	-64/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There's likely a missed IR canonicalization for at least 1 of these patterns. Otherwise, we wouldn't have needed the pattern-matching enhancement in D57516. Note that -- unlike usubo added with D57789 -- the TLI hook for this transform defaults to 'on'. So if there's any perf fallout from this, targets should look at how they're lowering the uaddo node in SDAG and/or override that hook. The x86 diffs suggest that there's some missing pattern-matching for forming inc/dec. This should fix the remaining known problems in: https://bugs.llvm.org/show_bug.cgi?id=40486 https://bugs.llvm.org/show_bug.cgi?id=31754 llvm-svn: 354746
*	Fix "enumeral and non-enumeral type in conditional expression" gcc7 warning. ↵	Simon Pilgrim	2019-02-24	1	-1/+2
\| \| \| \| \| \|	NFCI. llvm-svn: 354745
*	[WebAssembly] Rename a variable in CFGStackify (NFC)	Heejin Ahn	2019-02-24	1	-7/+7
\| \| \| \|	llvm-svn: 354744
*	[WebAssembly] Merge two identical switch case routines into one (NFC)	Heejin Ahn	2019-02-24	1	-6/+0
\| \| \| \|	llvm-svn: 354743