Post a Comment On: /dev/dump

"Update on SATA Expanders"

9 Comments -

1 – 9 of 9
Blogger Ray Van Dolson said...

Garrett, great info... obviously a lot of vendors sell drives that internally are SATA but have SAS interconnects (Dell for example). We have many such systems in production in our data centers with no (apparent) issues (note that these systems aren't typically running Solaris-based OSes).

I presume the problem could exist there as well though as the issue pops up when we must convert between the SATA protocol and SAS? Regardless of whether or not the drive or backplane is doing the conversion.... the issue wouldn't be limited to only LSI based SAS controllers either I assume?

Our issue manifested itself mostly with SSD's we were using as ZIL even though we had 22 other 1TB SATA drives on the same expander, I am guessing the extra high IOPS the ZIL SSD's saw triggered the problem there. We potentially could have seen the same on the 1TB SATA drives as well had the correct workload conditions been met.

Thanks again.

December 8, 2010 at 1:39 PM

Blogger Ray Van Dolson said...

Garrett, great info... obviously a lot of vendors sell drives that internally are SATA but have SAS interconnects (Dell for example). We have many such systems in production in our data centers with no (apparent) issues (note that these systems aren't typically running Solaris-based OSes).

I presume the problem could exist there as well though as the issue pops up when we must convert between the SATA protocol and SAS? Regardless of whether or not the drive or backplane is doing the conversion.... the issue wouldn't be limited to only LSI based SAS controllers either I assume?

Our issue manifested itself mostly with SSD's we were using as ZIL even though we had 22 other 1TB SATA drives on the same expander, I am guessing the extra high IOPS the ZIL SSD's saw triggered the problem there. We potentially could have seen the same on the 1TB SATA drives as well had the correct workload conditions been met.

Thanks again.

December 8, 2010 at 7:09 PM

Blogger Garrett D'Amore said...

I think yes, the very high IOPS is what makes this problem so tragic. Given a more reasonable workload, you'd probably only notice some resets, and maybe a modest degradation in performance that would self-correct.

As far as LSI vs others? I'm not sure -- I've not investigated fully.

I do think we were being too free with the resets, and I think having multiple devices sitting behind an expander has a lot to do with the penalties involved.

I'm hoping to provide some better long term answers here soon.

December 8, 2010 at 7:36 PM

Blogger Craig said...

Garrett, another possibility to explore is indirectly related to the high IOPS of SATA SSDs behind the expanders, namely that the point-to-point channel nature of comms to SATA drives combined with a high IOPS workload to same may conspire to 'hog' the bus … what other devices and expander firmware will do in this occasion is suspect … certainly routinely overreaching and issuing a bus reset rather than a more targetted target reset is certainly a possibility.

It would be interesting to deploy a consistent config with an alternate OS if we could monitor at the protocol layer, there are certainly a few companies over here which have the necessary equipment to achieve protocol tracing at the right level to identify this sort of operation.

Richard and I had a few more ideas last week, will drop you a line privately ...

December 11, 2010 at 2:52 PM

Blogger Ravi said...

The problem is SAS is a connection oriented protocol and expander simply forwards primitives/frames to the disks. May be some day we will see more powerful expanders which terminates SAS connections and handles error/recovery without confusing HBA. As SAS topology gets large, simply forwarding connections is not enough. The SAS flow control (such as RRDY) have typical timeout of 1ms so lengthy cable and daisy chaining expanders would create more problems.

January 6, 2011 at 9:58 PM

Blogger nadav said...

Hello Garrett,
I this (sd.conf) something that can be currently used in NexentaStor 3.0.4 to fix SATA->SAS interposer problems with SATA SSDs connected to SAS backplanes (for ZIL and L2ARC)?
If yes, how?

Thanks,

March 7, 2011 at 8:40 AM

Blogger Garrett D'Amore said...

Not entirely. You really want the fixed drive firmware for the complete fix. I still don't like SATA drives on expanders, to be fully honest.

March 7, 2011 at 9:21 AM

Blogger Donald said...

How can you tell if you are running into a reset storm? What log would show you this information?

I've got SAS only expanders with SAS disks throughout- except for our SSD's which are X25-E's on AAMUX's in the first disk shelf. I've been running into a lot of oddball problems recently and would love to track down any potential problems.

April 19, 2011 at 9:49 AM

Blogger IstarUSA said...

SAS controller can take both SATA and SAS drives. Some higher end SAS controller support SAS-expander.

SAS Expander

July 24, 2012 at 10:02 PM