Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skiboot 5.4.0-rc3 #696

Merged
1 commit merged into from
Nov 2, 2016
Merged

Conversation

ghost
Copy link

@ghost ghost commented Oct 17, 2016

Signed-off-by: Stewart Smith stewart@linux.vnet.ibm.com


This change is Reviewable

@ghost ghost added the enhancement label Oct 17, 2016
@ghost ghost added this to the v1.13 milestone Oct 17, 2016
@ghost ghost self-assigned this Oct 17, 2016
@ghost
Copy link
Author

ghost commented Oct 17, 2016

retest this please

@ghost
Copy link
Author

ghost commented Oct 17, 2016

Can someone who has any idea how to read HBRT logs and decipher what on earth could have gone wrong please look at this failure?

That it' takes 300kb of log to say nothing is a bug in itself of course...

@ghost
Copy link
Author

ghost commented Oct 17, 2016

@jazurin maybe you possess the magic decoder ring?

@jazurin
Copy link
Member

jazurin commented Oct 17, 2016

@cjcain or @wilbryan could you please take a look at this failure? CI failed twice while trying to run opal-prd occ reset . You can see the the captured syslog file here: http://gfw160.aus.stglabs.ibm.com:8088/jenkins/job/OP_HW_test_multi_p8/1969/MACHINE=HABANERO/artifact/op-ci-automation/hab201-ffdc/syslog.log
Thank you.

@dcrowell77
Copy link
Contributor

A couple things I see right away:

Oct 17 02:34:53 habpart201 opal-prd: HBRT: HTMGT:bldErrLog(mod: 0x03, rc: 0x2A02, data: 0x000C4654 DF8EC7C3 0000044A 00000000, sev: 0x00, fw:n
I think this means that the OCC reported an error. Chris/Wil can help decode it.

Oct 17 02:34:53 habpart201 opal-prd: HBRT: PNOR:RtPnor::readFromDevice> Uncorrectable ECC error : chip=0,offset=0x0
This isn't ever good.

@cjcain
Copy link
Contributor

cjcain commented Oct 18, 2016

The 2A02 error is an informational error that gets generated based on the reset. I believe these are always generated on the reset. Is this log what is triggering the CI failure? No clue about the Uncorrectable ECC error.
I was trying to look at the CI test, but unable to access /gsa/ausgsa/projects/h/hostboot/optools/htmgt_occ_tests/

@jazurin
Copy link
Member

jazurin commented Oct 18, 2016

The test timed out while running opal-prd occ reset:

/tmp/HTMGT_OCC_TESTS.sh
Launching Test: OCC Hardware Test
Running Script (1/5): sudo opal-prd occ disable
Running Script (2/5): sudo opal-prd occ enable
Running Script (3/5): sudo opal-prd process-error
Running Script (4/5): sudo opal-prd occ reset
echo $?
echo $?
Timed out. Could not get expected command return code.

The collected syslog shows an "Uncorrectable ECC error" while reading from PNOR.

Oct 17 02:34:53 habpart201 opal-prd: HBRT: HTMGT:bldErrLog(mod: 0x03, rc: 0x2A02, data: 0x000C4654 DF8EC7C3 0000044A 00000000, sev: 0x00, fw:n
Oct 17 02:34:53 habpart201 opal-prd: HBRT: ERRL:>>ErrlManager::ErrlManager constructor.
Oct 17 02:34:53 habpart201 opal-prd: HBRT: ERRL:iv_hiddenErrorLogsEnable = 0x0
Oct 17 02:34:53 habpart201 opal-prd: HBRT: ERRL:>>setupPnorInfo
Oct 17 02:34:53 habpart201 opal-prd: HBRT: PNOR:>>RtPnor::getSectionInfo
Oct 17 02:34:53 habpart201 opal-prd: HBRT: PNOR:>>RtPnor::readFromDevice: i_offset=0x0, i_procId=0 sec=11 size=0x20000 ecc=1
Oct 17 02:34:53 habpart201 kernel: [   54.441265] mtd mtd0: powernv_flash_async_op(op=0, offset=0x8000, len=147456)
Oct 17 02:34:53 habpart201 opal-prd: HBRT: PNOR:RtPnor::readFromDevice: removing ECC...
Oct 17 02:34:53 habpart201 opal-prd: HBRT: PNOR:RtPnor::readFromDevice> Uncorrectable ECC error : chip=0,offset=0x0

@dcrowell77 does HBRT try to save the OCC informational error logs to PNOR ? Any recent changes that went in HBRT that would affect PNOR functionality recently? Thanks.

@dcrowell77
Copy link
Contributor

Yes, we are trying to save the error log to the PNOR.

@cjcain
Copy link
Contributor

cjcain commented Oct 18, 2016

As part of the reset, we communicate with the OCC and notify them that they will be reset and they have some FFDC data that we collect (to be used to debug why a reset may have been done). This is all a normal part of the OCC reset.
In this test it looks like this hang appeared during this FFDC error log creation and so we probably need someone from Host Boot to investigate the hang/failure.

@ghost
Copy link
Author

ghost commented Oct 18, 2016

retest this please

@ghost
Copy link
Author

ghost commented Oct 18, 2016

(seeing if this is intermittent enough to only fail 2/3 times)

@bofferdn
Copy link
Contributor

I fired up skiboot 5.4.0-rc1 on the firestone version of secureboot tree w/
BR2_SKIBOOT_CUSTOM_VERSION=y
BR2_SKIBOOT_CUSTOM_VERSION_VALUE="skiboot-5.4.0-rc1"

.. with secure mode enabled; it failed with a CAPP verification error ...
[ 4.331421584,3] ROM: romcode_verify failed (rc=1, hw_params.log=0xffffffffffff8420)
[ 4.331424146,0] STB: sb_verify failed: resource CAPP131306, eyecatcher 0x43415050554c4944
[ 4.331426595,0] STB: Secure mode enforced, aborting.
[ 4.331427742,0] Aborting!

Is the secure driver supposed to be signing that? As far as I know that's not being done.

@ghost
Copy link
Author

ghost commented Oct 20, 2016

retest this please

@ghost
Copy link
Author

ghost commented Oct 20, 2016

@bofferdn (as sent in email, but reproducing here for completeness):

Your CAPP partition is likely incorrect for STB. Due to bugs in ffs
though, it's basically impossible to correctly construct a CAPP
partition.

The CAPP partition contains "sub partitions", that is, we have a table
of contents of each of the CAPP microcode files for each cihp revision
and pack them into one partition on flash, looking up the index as
needed.

The way that code worked is that you'd read the first block of the CAPP
partition, find the offset and length needed and then read that
content. We could also have multiple chip revisions point to the same
CAPP ucode, saving space.

With a secure boot container, it only signs the whole partition. Which
is something we never read. So, with secure/trusted boot, we need to
read the entire partition.

Which is where the problems start...

FFS (that giant bit of unmaintained code with zero tests and eight
new inventions that look oddly like the wheel, just to write a tiny data
structure) is buggy and incorrectly puts actualSize = partitionSize in
the header, which means that we'd measure the whole CAPP partition,
including unused space. This is obviously wrong and could lead to
different measurements for the same data.

But it's worse than that, because if you reflash a CAPP partition
without a STB header using anything but pflash, when measuring the whole
partition you get ECC errors!

Yes, for some reason we have ECC on the CAPP partition but not on the
partitions holding skiboot and petitboot (yes, you can fail to boot to a
bootloader, but at least your CAPI cards could be in CAPP mode!)

As you can measure something without a STB header (of course), if
you're not enforcing signature verification and just measuring the boot.

On top of that, it seems that the AMI BMC may incorporate this code that
does that incorrectly when you flash with a HPM image.

Luckily, nobody inside IBM development does that as nobody knows how to
build a HPM image, so for all intents and purposes, the "official" way
of updating a machine has never been used by any developer, ever.

So, we use pflash. pflash does correctly set the actualSize header, so
we measure the correct thing... Except that code doesn't exist to load a
subpartition when you have a STB container, so you'd break CAPI if you
did that.

The work-around is to break CAPI by putting crap in the CAPP partition:
namely, something that doesn't look like a subpartition header. This
should bail out and "gracefully" fail to load the CAPP ucode in any
situation (thus not being able to use CAPI, but being able to enforce
trusted boot).

And this is why it hasn't mattered previously, HOSTBOOT just read the
whole PAYLOAD partition for skiboot. It also explains why a fresh PNOR
image from op-build boots SLOWER than one that has exactly the same
content but the PAYLOAD and BOOTFLASH partition have been rewritten by
pflash (although this is mitigated by an xz compressed PAYLOAD).

But, IT GETS WORSE.

So there's a current bug where we'll measure the BOOTKERNEL partition
incorrectly too, because, well, FFS actualSize, isn't.

So the measurement for BOOTKERNEL will be constant, if you have two
identical machines and upgrade firmware in a different order but end up
in the same place, you'll likely have DIFFERENT measurements
(depending on how you update the firmware, and using different methods
to end up with the same result may end up with different measurements
too).

So, we need to go and rework the code around loading every bit of thing
off flash to ignore FFS because it's just complete and total lies, and
then in the case of a STB header, we can probably assume that the size
in it is correct (although having looked at the signtool, I wouldn't be
overly confident there either), and in the event of there not being a
container header, we need to parse the content of the partition and
compute the size. For BOOTKERNEL, this means parsing the ELF header and
computing the size of the BOOTKERNEL from data in the ELF header. Urgh.

Why should we work out the size rather than defaulting to reading the
whole thing? Because loading things from flash is THE single biggest
contributor to boot time we have. If loading from flash was
instantaneous we'd instantly cut around 15 seconds from boot.

To workaround that issue, write crap to your CAPP partition if enforcing signature verification, that way we'll fail to load the CAPP ucode and gracefully continue booting.

@williamspatrick
Copy link
Contributor

The CAPP partition contains "sub partitions", that is, we have a table
of contents of each of the CAPP microcode files for each cihp revision
and pack them into one partition on flash, looking up the index as
needed.

The way that code worked is that you'd read the first block of the CAPP
partition, find the offset and length needed and then read that
content. We could also have multiple chip revisions point to the same
CAPP ucode, saving space.

With a secure boot container, it only signs the whole partition. Which
is something we never read. So, with secure/trusted boot, we need to
read the entire partition.

That is entirely an implementation choice. Hostboot has the exact same problem with the partitions that contain scan ring data. There is a sub-partition for each processor revision. They can sign at a sub-partition level so that the lab can update each sub-partition independently.

@williamspatrick
Copy link
Contributor

FFS (that giant bit of unmaintained code with zero tests and eight
new inventions that look oddly like the wheel, just to write a tiny data
structure)

Unmaintained? This code came from the FSP driver team and they are the ones that invented ffs. The ffs code is maintained by them internally to resolve issues related to FSP. We might have some issue with the back-and-forth flow of this code.

 is buggy and incorrectly puts actualSize = partitionSize in
the header, which means that we'd measure the whole CAPP partition,
including unused space. This is obviously wrong and could lead to
different measurements for the same data.

"buggy"? This was done on purpose.

The FSP code that uses ffs native, for their own NOR device, actually uses the 'actualSize' field I believe. But in that case they are the sole owner of their chip. For the PNOR chip we decided very early in the FSP/Hostboot program that we would not use that field and instead always set them equal. Since there are two owners of the PNOR chip, you have to make sure they agree on how that field is managed. We decided to save the pain of having FSP and Hostboot, BMC and Hostboot, and pflash all understand this field and ensure it was updated properly. It is too easy for lab users to manually pflash a partition and not update the ffs header.

There is also the huge problem of the golden side... In systems that have a golden side, we are not suppose to ever touch the golden side. It could, in fact, be a locked NOR module. But, the golden side contains information pointing to partitions on the non-golden side for "sideless" data.

In addition you have the primary and backup TOC.

So you'd also have to modify pflash (and all the BMC implementations) to modify 4 TOCs whenever a user modifies a partition. And hopefully your hardware implementation wasn't a lockable flash for the Golden side.

And, what do we do for partitions that contain modifiable data? Are we going to have the Hostboot code update the TOC entry every time it adds a new error log or guard record? This is not only dangerous but also likely to contribute to premature flash wear-out.

There is a lot more to this problem then simply "fixing a tool".

@williamspatrick
Copy link
Contributor

And this is why it hasn't mattered previously, HOSTBOOT just read the
whole PAYLOAD partition for skiboot. It also explains why a fresh PNOR
image from op-build boots SLOWER than one that has exactly the same
content but the PAYLOAD and BOOTFLASH partition have been rewritten by
pflash (although this is mitigated by an xz compressed PAYLOAD).

A key I would like to point out here is that by using xz compression, Hostboot only loads exactly as much as needed because the xz stream has a EOF marker. Since we use virtual memory operations to load PNOR pages on-demand we never look at the whole PAYLOAD partition. Maybe your answer is to make most partitions, that can be xz compressed, xz compressed.

@ghost
Copy link
Author

ghost commented Oct 25, 2016

Patrick Williams notifications@github.com writes:

FFS (that giant bit of unmaintained code with zero tests and eight
new inventions that look oddly like the wheel, just to write a tiny data
structure)

Unmaintained? This code came from the FSP driver team and they are
the ones that invented ffs. The ffs code is maintained by them
internally to resolve issues related to FSP. We might have some issue
with the back-and-forth flow of this code.

I've been waiting on review for 10 months on these:

open-power/ffs#10
open-power/ffs#9
open-power/ffs#8

and the last commit was 18 months ago.

If we're going to rely on code, we should ensure someone is looking
after it. If it's just sitting away internally and never getting out,
then that's a big problem.

 is buggy and incorrectly puts actualSize = partitionSize in
the header, which means that we'd measure the whole CAPP partition,
including unused space. This is obviously wrong and could lead to
different measurements for the same data.

"buggy"? This was done on purpose.

The FSP code that uses ffs native, for their own NOR device, actually
uses the 'actualSize' field I believe. But in that case they are the
sole owner of their chip. For the PNOR chip we decided very early in
the FSP/Hostboot program that we would not use that field and
instead always set them equal. Since there are two owners of the PNOR
chip, you have to make sure they agree on how that field is managed.
We decided to save the pain of having FSP and Hostboot, BMC and
Hostboot, and pflash all understand this field and ensure it was
updated properly. It is too easy for lab users to manually pflash a
partition and not update the ffs header.

It would help if this was documented anywhere, as pflash has been doing
something different for a long time (forever?) and there's no spec to
point to to say it's wrong.

We have something that has ended up being ABI that is undocumented,
parsed/modified by at least four completely different bits of code
(BMC/FSP, hostboot, skiboot/libflash, ffs) and with no public spec or
maintenance.

There is also the huge problem of the golden side... In systems that
have a golden side, we are not suppose to ever touch the golden side.
It could, in fact, be a locked NOR module. But, the golden side
contains information pointing to partitions on the non-golden side for
"sideless" data.

Although those pointers are to whole-partition structures so it isn't
really an issue, right?

Even then though, errors in that sideless data can cause the golden side
not to boot, so in practice, the golden side is of limited use.

In addition you have the primary and backup TOC.

So you'd also have to modify pflash (and all the BMC implementations) to modify 4 TOCs whenever a user modifies a partition. And hopefully your hardware implementation wasn't a lockable flash for the Golden side.

And, what do we do for partitions that contain modifiable data? Are
we going to have the Hostboot code update the TOC entry every time it
adds a new error log or guard record? This is not only dangerous but
also likely to contribute to premature flash wear-out.

The current design of "file" size = whole partition is the elegant
solution here. Indeed, in those cases, actualSize!=totalSize should be
a sign that things are corrupt and it should erase the partition and try again.

There is a lot more to this problem then simply "fixing a tool".

Stewart Smith
OPAL Architect, IBM.

@ghost
Copy link
Author

ghost commented Oct 25, 2016

Patrick Williams notifications@github.com writes:

The CAPP partition contains "sub partitions", that is, we have a table
of contents of each of the CAPP microcode files for each cihp revision
and pack them into one partition on flash, looking up the index as
needed.

The way that code worked is that you'd read the first block of the CAPP
partition, find the offset and length needed and then read that
content. We could also have multiple chip revisions point to the same
CAPP ucode, saving space.

With a secure boot container, it only signs the whole partition. Which
is something we never read. So, with secure/trusted boot, we need to
read the entire partition.

That is entirely an implementation choice. Hostboot has the exact
same problem with the partitions that contain scan ring data. There
is a sub-partition for each processor revision. They can sign at a
sub-partition level so that the lab can update each sub-partition
independently.

This has the issue where if by loading an alternate subpartition could
expose a vulnerability, and the subpartition list isn't protected, you
have a downgrade attack.

Stewart Smith
OPAL Architect, IBM.

@ghost
Copy link
Author

ghost commented Oct 25, 2016

Patrick Williams notifications@github.com writes:

And this is why it hasn't mattered previously, HOSTBOOT just read the
whole PAYLOAD partition for skiboot. It also explains why a fresh PNOR
image from op-build boots SLOWER than one that has exactly the same
content but the PAYLOAD and BOOTFLASH partition have been rewritten by
pflash (although this is mitigated by an xz compressed PAYLOAD).

A key I would like to point out here is that by using xz compression,
Hostboot only loads exactly as much as needed because the xz stream
has a EOF marker. Since we use virtual memory operations to load PNOR
pages on-demand we never look at the whole PAYLOAD partition. Maybe
your answer is to make most partitions, that can be xz compressed, xz
compressed.

Yeah, the XZ stream having the EOF marker and Hostboot's on demand
paging of data from flash work really well.

My RFC hostboot patch (I need to go and make that not RFC but actually
request-for-merge) to do that for HBRT does end up saving a bunch of
boot time because of that,
open-power/hostboot#66

I've been thinking of extending/modifying the interface so that it's
easier and consistent to stream an xz compressed stream off flash.

Stewart Smith
OPAL Architect, IBM.

@ghost
Copy link
Author

ghost commented Oct 25, 2016

Wow... mail got delayed by days.

I wish IBM could get email remotely correct.

@ghost
Copy link
Author

ghost commented Oct 25, 2016

Anyway, skiboot-5.4.0-rc2 (to be released later today) will address the partition size issue.

The remaining issue is this OCC test (where is that test by the way, I don't recognize something from op-test-framework?).

Has anyone been able to further decode what's going on? If it's an error talking to PNOR, looking at the skiboot log could be enlightening too.

@ghost ghost force-pushed the skiboot-5.4.0-rc1-bump branch from 9c59d4b to d7c7315 Compare October 26, 2016 06:12
@ghost ghost changed the title Skiboot 5.4.0-rc1 Skiboot 5.4.0-rc2 Oct 26, 2016
@ghost
Copy link
Author

ghost commented Oct 26, 2016

retest this please

@ghost
Copy link
Author

ghost commented Oct 27, 2016

Oct 26 12:19:21 habpart201 kernel: [    0.000000] Linux version 3.16.0-41.57-openpower12-generic (root@ubuntu-le) (gcc version 4.9.1 (Ubuntu 4.9.1-16ubuntu6) ) #openpower12 SMP Thu Jun 18 16:07:23 AEST 2015 (Ubuntu 3.16.0-41.57-openpower12-generic 3.16.7-ckt11)
....
[    0.965056] /home/jk/linux/ubuntu-utopic/drivers/rtc/hctosys.c: unable to open rtc device (rtc0)

@jk-ozlabs is this some ancient kernel you built way back when?

@jazurin what OS is this machine running? It looks like something ancient. Remotely recent opal-prd even?

@jazurin
Copy link
Member

jazurin commented Oct 27, 2016

@stewart-ibm that habanero system is running ubuntu 14.10

Linux habpart201 3.16.0-41.57-openpower12-generic #openpower12 SMP Thu Jun 18 16:07:23 AEST 2015 ppc64le ppc64le ppc64le GNU/Linux
tyan@habpart201:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.10
Release:        14.10
Codename:       utopic

Looks like @pridhiviraj was able to reproduce the same problem see: #709

@shenki
Copy link
Member

shenki commented Oct 27, 2016

Ubuntu 14.10 reached End of Life on Thursday, July 23rd 2015. It has not received security updates nor bugfixes for the past 15 months.

I suggest that machine is upgraded to the latest LTS (Ubuntu 16.04).

@ghost
Copy link
Author

ghost commented Oct 27, 2016

Joel Stanley notifications@github.com writes:

Ubuntu 14.10 reached End of Life on Thursday, July 23rd 2015. It has not received security updates nor bugfixes for the past 15 months.

I suggest that machine is upgraded to the latest LTS (Ubuntu 16.04).

There's that too. Although we shouldn't regress functionality with old
kernels though (unless we intend to, which we shouldn't).

Stewart Smith
OPAL Architect, IBM.

Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
@ghost ghost force-pushed the skiboot-5.4.0-rc1-bump branch from d7c7315 to 306303a Compare November 2, 2016 08:45
@ghost ghost changed the title Skiboot 5.4.0-rc2 Skiboot 5.4.0-rc3 Nov 2, 2016
@ghost ghost merged commit 5de7666 into open-power:master-next Nov 2, 2016
shenki added a commit to shenki/skeleton that referenced this pull request Nov 3, 2016
ffs_index was used to ensure we updated the ffs header with the actual
size. However, the ffs_index was hardcoded to -1 nd never updated, so
this cade was never executed.

Secondly, recent discussion[1] on the open-power bug tracker suggests that
this was never something that should be done.

[1] open-power/op-build#696 (comment)

Change-Id: I302b48213561c4d4490927fa0953c65a52d82c11
Signed-off-by: Joel Stanley <joel@jms.id.au>
williamspatrick pushed a commit to openbmc/skeleton that referenced this pull request Nov 4, 2016
ffs_index was used to ensure we updated the ffs header with the actual
size. However, the ffs_index was hardcoded to -1 nd never updated, so
this cade was never executed.

Secondly, recent discussion[1] on the open-power bug tracker suggests that
this was never something that should be done.

[1] open-power/op-build#696 (comment)

Change-Id: I302b48213561c4d4490927fa0953c65a52d82c11
Signed-off-by: Joel Stanley <joel@jms.id.au>
@ghost ghost deleted the skiboot-5.4.0-rc1-bump branch June 9, 2017 05:28
mabaiocchi pushed a commit to mabaiocchi/op-build that referenced this pull request Oct 13, 2020
IlyaSmirnov91 pushed a commit that referenced this pull request May 5, 2021
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants