tag:blogger.com,1999:blog-178640652008-08-08T10:43:03.397-04:00Chips and BSGeorge Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comBlogger81125tag:blogger.com,1999:blog-17864065.post-72373003359273112242008-07-27T12:51:00.009-04:002008-07-27T14:02:07.142-04:00Reuse - the difference between hardware and softwareI wanted to share a perspective I heard while in the Bay Area last week. We've made some of these points, but I liked the way it was wrapped.<br /><br />The context was in discussing issues with hardware design. The engineer's viewpoint was that a fundamental issue was design reuse, or the adequate lack thereof in hardware design.<br /><br />He made a comparison to the software space, where most software applications are largely built on top of pre-existing libraries. In the software space you stand on the shoulders of those before you. These software libraries aren't application specific, but are general libraries of all sorts of useful building blocks and capabilities. It wouldn't be atypical for a GUI software program to consist of only a fraction of new code, with the bulk built using libraries. And, in the software space, as libraries grow, reuse grows commensurately.<br /><br />In contrast, reuse in the hardware space consists of large, application specific blocks, e.g. PCIe, processor, etc. You just don't see a lot of general purpose libraries that can be easily used in different situations to build your designs. And, when you do, typically they're very brittle -- it's difficult to interface to them, to change them and to leverage them for specific needs.<br /><br />I think there are two fundamental reasons for this:<br /><br />1. There are no standard interface conventions (analogous to calling conventions in the software space) -- this means that it's very hard to use hardware IP blocks. Each instantiation requires a custom, manually-intensive, situation-dependent implementation.<br /><br />2. Shared resource conflicts must be manually handled. Hardware is parallel -- and the biggest complication is avoiding (scheduling around) race conditions where two operations might try to access the same resource (such as a register or an interface, such as enqueue and dequeue on a FIFO).<br /><br />These two items prevent IP from being reused as a black box. The semantics of RTL (I include SystemC, SystemVerilog, Verilog, VHDL) prevent this from ever happening -- just like assembly language prevents the type of libraries that you see at a higher level in languages like C/Java/...<br /><br />Atomic transactions address both these problems -- and enable general purpose hardware libraries, including those that look very much like what you might expect to only see in the hardware space.George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-85147322321790515232008-07-21T10:18:00.007-04:002008-07-21T14:25:15.147-04:00Deming and dependence on mass inspectionI got into a discussion last night with someone about the parallels between auto manufacturing (that is, U.S. auto manufacturing of the 70s) and chip design -- about how there is a focus on improving verification, without considering the impact that design has had on this. This person mentioned a slide that he used to have with a quote by W. Edwards Deming on it -- I've used the same quotes from Deming, I am guessing:<br /><blockquote>Inspection with the aim of finding the bad ones and throwing them out is too late, ineffective, costly. In the first place, you can't find the bad ones, not all of them. Second, it costs too much. Quality comes not from inspection but from improvement of the process.<br /><br />The old way: Inspect bad quality out.<br /><br />The new way: Build good quality in.</blockquote><br />People have said that design is not the long pole of the tent. I presume that they mean that verification or software development is the long pole. I believe we have a design problem. We need to find ways to build good quality in -- there is nothing like avoiding bugs in the first place. You can never find all the bugs if you depend on inspecting the bad quality out. Low level design forces a protracted specification/architecture phase where micro-architecture details are too often planned out and prematurely committed -- and this delays verification integration/bringup. A plethora of bugs causes too much time doing debug and not enough time writing and running tests during verification.<br /><br />There's no doubt that verification is killing us -- but it's important to diagnose the root cause of this. Continuing to add more verification engineers will not stem the root cause.<br /><br />The U.S. auto industry thought they could inspect for quality in the 70s -- it took the Japanese to prove them wrong.George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-43454063785617350932008-07-12T15:31:00.005-04:002008-07-12T15:54:50.584-04:00How the MEMOCODE design contests were wonRecently, there was a <a href="http://lambda-the-ultimate.org/node/2881">nice plug on the programming language website Lambda the Ultimate</a> for both this year and last year's MEMOCODE codesign contest winners -- which involved hardware acceleration of a cryptosorter design and a matrix multiplication, respectively. From this Lambda the Ultimate page, you can also get links to download writeups about the designs as well as copies of the designs themselves.<br /><br />Both winning teams were powered by Bluespec. The 2008 team competed against eight other designs and beat second place by 11X. The 2007 team had only one competitor, which it beat by 5X.<br /><br />I found <a href="http://people.csail.mit.edu/mdk/papers/memocode_2008_cryptosorter.pdf">the writeup</a> describing this year's winning design to be a lot of fun. It's a quick read describing what they accomplished in only three weeks -- truly amazing. Aside from the complexity of the design and alternatives they explored, I found it most interesting that they skipped system simulation and went right into FPGA.<br /><br />FPGAs will revolutionize embedded development, modeling and verification (let alone hardware design) when the two sources of drag caused by low-level design get squashed:<br /><ul><li> Bugs - complex designs today spend far too much time in simulation while bugs get wrung out until the quality is to the point that FPGAs can be safely used</li><li> Changes - design changes, such as for micro-architectural exploration, feature enhancements, or even (what should be) small refinements, take much too long. Consequently, designs must be overly carefully planned to get them right first time -- which takes a lot of time planning architectural and micro-architectural details and too often low-level details are fixed far too early in the process</li></ul>George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-60397840685051459352008-06-23T21:02:00.003-04:002008-06-23T22:08:25.649-04:00New blog on scalable atomicity for reconfigurable computingI just got a heads up from an engineer on his new blog called <a href="http://atomicrules.blogspot.com/">Scalable Atomicity</a>. I'm not entirely sure what he's up to (outside of leveraging atomic transactions as a competitive advantage). Reading between the lines, it looks like his focus will be around enabling a new way (that emphasizes time-to-solution, scale and correctness) to do multicore solutions and architectures in the FPGA world.George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-86272484047119885532008-06-16T16:10:00.007-04:002008-06-16T17:26:32.682-04:00Are C to RTL solutions proprietary?The C/C++/SystemC to RTL solutions continually emphasize to the press and analysts (as well to prospective customers) that they are based on standards. When I saw John Cooley seeming to repeat this position in his <a href="http://www.deepchip.com/gadfly/gad060608.html">Cheesy DAC</a> list, I wanted to respond. I sent <a href="http://www.deepchip.com/wiretap/080606.html">an email to John outlining the 5 questions that I'd ask any one of these vendors</a> before buying into this "we're a standard" position.<br /><br />In addition to these five questions, I've since thought of a sixth:<br /><br />6. How much training and/or applications support is required to learn how to properly structure and write code for efficient synthesis results?<br /><br />I'm not claiming that we are more "standard" than these solutions are -- but I am claiming that, with only one exception, there is no fundamental difference in how proprietary these solutions are.<br /><br />There are a lot of potential terrific benefits to going with a standard, e.g.:<br />* Training<br />* Portability and reusability of designs<br /><br />I don't think these apply to the C/C++/SystemC synthesis solutions. I think there's only one real "standards"-related benefit to the C/C++/SystemC to RTL technologies: you can functionally simulate the designs with standard compilers and, in the case of SystemC, free simulators. But this has nothing to do with the tools nor with the "hardware" aspects of a design: cycle-accurate simulation and synthesis. None of these solutions have a level of standardization that will significantly reduce tool-specific training or deliver design portability across vendors -- which I believe are the main reason you go with standards. Without this, then these solutions are proprietary.<br /><br />And, once you get past this smoke screen, the interesting questions are around the benefits, applicability and quality of results.George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-73698491230168259692008-06-12T13:05:00.002-04:002008-06-12T13:34:35.422-04:00Will "behavioral synthesis" ever be successful?A colleague just showed me an <a href="http://www.scdsource.com/experts.php?id=201">interview by SCDSource with Louise Trevillyan, research staff member in the design automation department at IBM's T.J. Watson research center</a>. Louise was this year's winner of the Marie R. Pistilli Women in EDA Achievement Award.<br /><br />Here's <a href="http://www.scdsource.com/experts.php?id=201&amp;page=1">the money quote</a> in response to the question as to whether behavioral synthesis will ever be successful in the future:<br /><blockquote style="font-style: italic;"><br />I would never say never, but I don't foresee it happening. There's always been a prediction that we'll have so much real estate we can use, and technology will move so fast, and time to market will get so important, that you might be able to accept suboptimal solutions in the name of getting a design out the door. If that ever happens, behavioral synthesis will become more acceptable. But remember, that day was predicted 20 years ago and it hasn't come yet.</blockquote>Automatic parallelization is all but impossible for general purpose applications -- and only effective for those sub-blocks that can be described with tightly nested for-loops. The software space has abandoned automatic parallelization for parallel software design -- in favor of explicit parallel designs. That's not to say that they don't still have a huge challenge, but at least there's an acknowledgment that it isn't a panacea.<br /><br />The EDA industry hasn't yet come to the same conclusion. I think it's just a matter of time.George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-33318476280376480512008-06-06T14:01:00.009-04:002008-06-06T14:37:01.847-04:00Our 2008 DAC Giveaway - I want one!We're giving two things away at our booth this year. For free, we're giving away an electronic Sudoku game -- some of you may have gotten this from us before. But the one I'm excited about is free, but we have to draw names. We've got something like eight of them -- and, I'd love to get one but don't think it would be quite right if I drew my own name.<br /><br />Every year, we try to pick something that's unique, useful for more than a few minutes, and, hopefully, desirable. In the past we've given away R/C cars, light-up super balls, fans with LEDs on them that display a Bluespec saying, and electronic Sudoku. I've wanted to do this for a couple of years: give away what may be the lightest indoor R/C airplane on the market -- it weighs just 3.6 grams. And, because it is somewhat fragile, it comes in its own carrying case (which is pretty cool) -- it looks like a padded camera or rifle case. Here are some pictures which link to the website for videos and more information. Some day I'll get one of these puppies for myself -- next week over a handful of you will get one. Stop by the booth and drop your business card in our box (or fill out an entry).<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.microflight.com/Online-Catalog/Ready-to-Fly-Airplanes/Carbon-Butterfly-Livingroom-Flyer"><img style="cursor: pointer; width: 240px; height: 160px;" src="http://www.microflight.com/core/media/media.nl?id=359&amp;c=638206&amp;h=2d3302a2fa5258497e55" alt="" border="0" /></a><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.microflight.com/Online-Catalog/Ready-to-Fly-Airplanes/Carbon-Butterfly-Livingroom-Flyer"><img style="cursor: pointer; width: 210px; height: 141px;" src="http://www.microflight.com/core/media/media.nl?id=361&amp;c=638206&amp;h=744a07035518a69ed3cb" alt="" border="0" /></a><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.microflight.com/Online-Catalog/Ready-to-Fly-Airplanes/Carbon-Butterfly-Livingroom-Flyer"><img style="cursor: pointer; width: 215px; height: 143px;" src="http://www.microflight.com/core/media/media.nl?id=360&amp;c=638206&amp;h=ee6056f112ad25658e86" alt="" border="0" /></a><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.microflight.com/Online-Catalog/Ready-to-Fly-Airplanes/Carbon-Butterfly-Livingroom-Flyer"><img style="cursor: pointer; width: 227px; height: 152px;" src="http://bp0.blogger.com/_ttfD8SnSpdk/SEl9XkzbqiI/AAAAAAAAAUQ/Ef7akahPlmA/s320/CarbonButterfly.jpg" alt="" id="BLOGGER_PHOTO_ID_5208832288000551458" border="0" /></a>George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-47901546843883872242008-06-06T13:47:00.003-04:002008-06-06T14:01:50.990-04:00DAC: Bloggers Meeting Wed Night at DACI'll be at DAC next week -- and I am planning to attend the Birds-of-a-Feather session on blogging in our industry next Wed night at 6P in room 201B at the Anaheim Convention Center. I've gotten to know John from John's Semi Blog a bit through blogging -- which has been fun. Turns out we share alma mater's and degrees, though not identical years. I'm looking forward to meeting some of the others that I've run into -- most of whom are far more diligent than I am.George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-12815892796075781692008-04-23T09:25:00.003-04:002008-04-23T10:42:19.220-04:00EDP Conference in MontereyI didn't attend <a href="http://www.eda.org/edps/">Electronic Design Process 2008</a> (EDP) in Monterey recently. The keynote was entitled <a href="http://www.eda.org/edps/slides/par-prog-doing-it-right-epc-hardcopy.pdf">Parallel Computing: Can We PLEASE Do It Right This Time?</a> by<strong> <span style="font-weight: normal;font-family:arial;" >Timothy G. Mattson of Intel</span></strong>. The focus of the talk was about software languages for multi-core hardware platforms.<br /><br />I think there is a key point made in his presentation that people in the hardware design and EDA communities will find interesting. And, this is coming from someone talking about software development -- a community particularly interested in using traditional, sequential C/C++ software. Timothy's conclusions are not a surprise to anyone looking at how automatic parallelization tools have historically performed -- in a long history starting in the parallel computing market (Cray/Thinking Machines/KSR/...). He concludes:<br /><br />* Automatic parallelization will not solve the software community's problems (that is, be a solution for developing parallel software for multi-cores)<br /><br />* And, so: <span style="font-weight: bold; font-style: italic;">"Our only hope is to get programmers to write parallel </span><br /><span style="font-weight: bold; font-style: italic;">software 'by hand'."</span><br /><br />I completely agree with this viewpoint -- and I think it equally applies to hardware design (actually, MUCH more so). We've seen a lot of "high-level" hardware design tools focused around automatically parallelizing C/C++/SystemC. And, there's no doubt that a certain class of smaller IP blocks (loop-and-array style code) can be effectively and efficiently synthesized automatically from these languages. But, these tools require a lot of tailoring to make efficient hardware -- and can't effectively address the bigger problems:<br /><br />* Software development -- these tools do nothing for software -- isn't that the biggest chip design challenge? Don't we need models and implementations earlier and that run faster for software? It seems to me that behavioral synthesis tools are like trees in the forest for addressing this fundamental problem. Aside from there being many vendors hyping algorithms, since when did they become the long pole in the tent?<br />* Algorithmic subsystem performance -- Just staying specific to the "algorithm" subsystem, these tools can't easily comprehend, express or tailor memory and switch subsystem performance, which can often be the dominating architectural consideration for power, performance, area, ... And, impedance matching these blocks to memory and switch subsystems or impedance matching multiple of these smaller blocks can be a bear.<br />* Loop-and-array algorithms are a small part of designs. Looking at a system, these tools do nothing for the rest of the system: DMA controllers, memory controllers, processors, controllers, communications IP, etc.<br /><br />Hardware designers need to develop the parallel aspects of their design by hand. That doesn't mean it needs to be at the RTL level. Simplifying concurrency is the only way to improve this process -- automatic parallelization just isn't a general solution (so C/C++/SystemC is not a path). This is where atomic transactions come in. (I know a lot less about how atomic transactions will and can work in the software space. That said, I have some thoughts about what Timothy says about the value of transactional memory in this keynote. I am in CA and my day is starting so I have to run).<br /><span style="font-family:Arial;"><strong><span style="font-weight: normal;font-size:100%;" ></span></strong></span>George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-13260640722180326762008-04-14T15:36:00.006-04:002008-04-14T15:49:25.916-04:00Macbeth and RTLI was speaking with <span class="blsp-spelling-error" id="SPELLING_ERROR_0">Nikhil</span>, our <span class="blsp-spelling-error" id="SPELLING_ERROR_1">CTO</span>, about the issue people often run into with <span class="blsp-spelling-error" id="SPELLING_ERROR_2">RTL</span> where one gets committed inexorably to a particular direction. When it becomes clear that there might have been a better approach, you don't have the flexibility to switch directions. You are too mired in your current approach -- and no longer have time to approach it freshly. <span class="blsp-spelling-error" id="SPELLING_ERROR_4">Nikhil</span> immediately thought of the following quote from Shakespeare's Macbeth, in Act III, Scene IV, which provides a terrific description of this situation:<br /><br /><blockquote>"I am in blood<br /><span class="blsp-spelling-error" id="SPELLING_ERROR_5">Stepp'd</span> in so far, that, should I wade no more,<br />Returning were as tedious as go o'er."<br /><br /></blockquote>Macbeth is in too deep -- he's already committed murder. To go back is just as tedious as just to trudge forward.<br /><br />When you get to this point with RTL, the only choice is typically just to continue. You get one shot -- it better be aimed in the right direction.George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-76508062304863249462008-04-12T20:18:00.014-04:002008-04-16T11:17:58.412-04:00Algorithmic MyopiaI was reading the <a href="http://www.ece.cmu.edu/%7Ejhoe/distribution/mc07contest/MEMOCode2007DesignContest_Final.ppt">summary presentation</a> from last year's <a href="http://www.ece.cmu.edu/%7Ejhoe/distribution/mc07contest/">(2007's) MemoCODE codesign contest</a> winners and thought it highlighted an important point about algorithmic design. (Last year, the winning team beat second place by 5X. This year, <a href="http://chipsandbs.blogspot.com/2008/04/unfair-advantage.html"><span style="text-decoration: underline;">which was recently announced, </span></a>the winning team beat second place by 11X).<br /><br />The problem last year was a Blocked Matrix-Matrix Multiplication. It was designed to have both a software and a hardware portion. The core of the hardware piece is a classic "algorithmic" design. Tightly nested for-loops -- the type of solution that one *might* target with traditional behavioral synthesis (algorithmic synthesis) technology:<br /><br />void mmmBlocked(Number* A, Number* B,<br />Number *C, int N, int NB) f<br />int j, i, k;<br />for (j = 0; j < N; j += NB)<br />for (i = 0; i < N; i += NB)<br /> for (k = 0; k < N; k += NB)<br /> mmmKernel(&amp;(A[i*N+k]),<br /> &amp;(B[k*N+j]),<br /> &amp;(C[i*N+j]), N, NB);<br /><br />As is not atypical for these types of problems, however, the real magic lies in the system considerations -- not the core loop-and-array architecture. If you just focused on the portion that algorithm synthesis solutions can address, you would have missed the key architectural consideration -- which was memory bandwidth.<br /><br />The challenge was not only to recognize that this was the core architectural consideration, but focus the innovation and exploration around the memory subsystem -- AND ensure that the algorithmic piece was tightly coupled and scheduled to work with this subsystem. The first place team delivered 5X the performance of the second place team because they could rapidly explore tradeoffs in this area -- and their environment encompassed not just the "algorithm" but the system as well. (I put quotes around "algorithm" because this term is often mis-used -- and I'm mis-using it a bit here. An algorithm is not just the functional description of the problem (as it might be expressed in C), but also the cost-model for how that function is solved. A C function is not "the algorithm", but one "algorithm" for solving a problem -- it never expresses a particular hardware algorithm for solving the problem.)<br /><br />Algorithmic synthesis solutions may be able to automatically produce different pipeline micro-architectures for loop-and-array hardware pieces, but the total algorithm isn't just this piece -- it's the entire system. This is a problem that has been repeatedly learned in history -- a good example is IBM's computer systems, which didn't always have the fastest processor, but didn't have to. Their architects focused on the entire system -- including the memory and disk subsystem -- which delivered mainframes that outperformed their competition.<br /><br />Algorithmic design that focuses solely on the loop-and-array areas fails to take into account inter-loop-and-array-block interactions -- and interactions with the rest of the system. You need both the ying and the yang -- and they should be tightly intertwined so that you can easily optimize and match both. System, memory and inter-block interactions are often more important than pipeline choices in sub-blocks. They are not separate -- to treat them so is to be short-sighted.George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-29760496847821957642008-04-09T23:37:00.005-04:002008-04-09T23:51:10.658-04:00Unfair advantageThe results of the <a href="http://rijndael.ece.vt.edu/memocontest08/everybodywins/">2008 MemoCODE Codesign contest</a> were just released (yesterday, I believe). There were 8 teams with a total of 9 entries -- only one team and entry used Bluespec. No doubt a very strong team -- but the results are stunning. This is the difference between thinking and fluidly expressing architecture <span style="font-weight: bold; font-style: italic;">and</span> doing RTL. In the hands of a very strong team, you get an especially powerful, mutually-reinforcing combination.<br /><br /><table align="center" border="1" cellpadding="5" cellspacing="0"><thead><tr><th scope="col">Team ID</th> <th scope="col">Normalized<br />Speedup</th> <th scope="col">Platform</th> <th scope="col">Design Languages</th> </tr> </thead> <tbody> <tr align="left"> <td> team kermin </td> <td align="right"> 1102.4 </td> <td> XUP </td> <td> Bluespec </td> </tr> <tr align="left"> <td> team brian </td> <td align="right"> 100.2 </td> <td> XUP </td> <td> C </td> </tr> <tr align="left"> <td> team marco </td> <td align="right"> 85.4 </td> <td> XUP </td> <td> C + HDL </td> </tr> <tr align="left"> <td> team uljana </td> <td align="right"> 49.8 </td> <td> XUP </td> <td> C + HDL </td> </tr> <tr align="left"> <td> team sunita (1) </td> <td align="right"> 41.1 </td> <td> XUP </td> <td> C + HDL </td> </tr> <tr align="left"> <td> team vijay </td> <td align="right"> 33.0 </td> <td> XUP </td> <td> C + HDL </td> </tr> <tr align="left"> <td> team rob </td> <td align="right"> 23.5 </td> <td> XUP </td> <td> C + HDL </td> </tr> <tr align="left"> <td> team eric </td> <td align="right"> 12.8 </td> <td> XC2VP100 Amirix </td> <td> C + HDL </td> </tr> <tr align="left"> <td> team sunita (2) </td> <td align="right"> 11.0 </td> <td> XUP </td> <td> C + Impulse C </td></tr></tbody></table>George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-53305388883869513112008-04-01T00:01:00.003-04:002008-04-01T00:00:54.386-04:00Bluespec Acknowledges DOE and IAEA Probes into its Atomic TransactionsApparently when you innovate, people take notice. Unfortunately, this time, it's both the Feds and U.N. showing a bit too much interest in our technology. We're expecting to clear up their misunderstandings quickly, but until then, Bluespec felt compelled to explain the situation and our perspective in a press release today. <a href="http://www.bluespec.com/news/Feds-Investigate-Atomic-Transactions.htm">Please read the complete press release here.</a>George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-70897819262991858602008-03-11T22:12:00.003-04:002008-03-11T22:25:58.858-04:00Atomic Transactions in ProcessorsSun's announced their <a href="http://www.eetimes.com/news/latest/showArticle.jhtml?articleID=206100676">new processor called the Rock</a>. It's the first multi-core processor that uses atomic transactions (the first of many I anticipate) -- specifically, it implements support for transactional memory that is used to support the implementation of atomic transactions.<br /><br />Atomic transactions are the highest way to specify complex concurrent behavior. Atomic transactions attack the root issue behind what makes hardware and concurrent software so:<br /><ul><li>Error-prone</li><li>Brittle</li><li>Complex</li><li>Costly to develop and verify</li></ul><p>That core issue is managing concurrent accesses to shared resources. This is THE issue in programming multi-core processors -- how do you manage shared memory in a coordinated way? Atomic transactions make it tractable.</p>George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-13458318484867778162008-02-13T12:51:00.002-05:002008-02-13T12:59:48.281-05:00MIT Open Source Hardware Designs (OSHD) and SlashdotMIT recently launched <a href="http://csg.csail.mit.edu/oshd/index.html">their website </a>providing some pretty sophisticated, free, open source hardware designs. At this point, they've got three items, though one is a superset of one of the others:<br /><br />1. HD-quality H.264 video decoder - there are a couple interesting things about this design aside from meeting the performance, quality of HD-quality H.264:<br /> a) The design is about 10,000 lines of code. The C/C++ UNSYNTHESIZABLE reference code for this function is 20K+ lines of code -- this shows how succinct and elegant Bluespec is for datapath intensive designs<br /> b) This design illustrates how you can use Bluespec to parameterize on structure. A single design can generate lots of different micro-architectures -- and all the control logic adapts to the new micro-architecture<br /><br />2. OFDM transceiver - this one's very cool because it supports both WiFi and WiMax from a single design (and, MIT's working on support WUSB from the same source as well) -- here, it highlights the power of Bluespec to parameterize on differences in functionality<br /><br />This Sunday, someone <a href="http://hardware.slashdot.org/article.pl?sid=08/02/10/194218">posted a link to this site on Slashdot</a>. Fortunately (unfortunately? :>) ), this was indirect to Bluespec so our website didn't get Slashdotted (brought down with the traffic load).<br /><br />Have fun!George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-62145377751945705942008-01-11T16:54:00.000-05:002008-01-11T16:56:16.050-05:00IIT Bombay course taught this week by ArvindProf Arvind is teaching <a href="http://www.iitb.ac.in/~cep/brochures/2008/shojaei-bro-08.pdf">a short course on Bluespec at IIT Bombay</a> this week. Hopefully when he gets back, MIT will start rolling out their open source hardware designs...George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-74679669916257049672007-12-18T10:41:00.000-05:002007-12-18T11:00:45.655-05:00HW State of the Art.... Bluespec!Mikko Terho is VP and Nokia Fellow at Nokia. According to his bio:<br /><br /><em>Mikko Terho heads Nokia's Intelligent Connectivity Group which focuses on the development of the innovations and prototypes for pervasive communication devices with novel internet services. He also advices as Nokia fellow other Nokia R&amp;D Groups in the area of system design, software architecture and component selection.</em><br /><br />He recently presented at edaForum 07 in Munich. Unfortunately, I don't have access to his slides, which go in detail about results using Bluespec for hardware/software architecture tradeoffs. The title of this presentation was <a href="http://www.edacentrum.de/edaforum/edaforum07/vortragende/terho.html">"Are EDA tools for Systems Architecture, ASIC Design or Agile Software Development"</a>.<br /><br />But, I did find a shorter, but recent, slide (mostly sub-) set <a href="http://www.cs.tut.fi/tapahtumat/mobiili07/terho_materiaali.ppt">here.</a> He first shows the current virtual HW/SW world in <a href="http://www.cs.tut.fi/tapahtumat/mobiili07/terho_materiaali.ppt#310,15,Slide">slide fifteen</a>. In this slide, he shows that RTL/SystemC generation from algorithms and C is "still flaky". Then he goes to the slide I'm quite fond of, <a href="http://www.cs.tut.fi/tapahtumat/mobiili07/terho_materiaali.ppt#312,20,HW">slide twenty</a>. This slide is called "HW state of the art". Using Bluespec, the "concept/algorithm" is fully synthesizable into SystemC models for software development and HW for chip implementation.<br /><br />The funny thing is we just discovered these presentations. Nice surprise!<br /><br />(On a side, but related, note, MIT/Nokia Research will soon be open sourcing some of the beautiful designs they've done with Bluespec. Currently it looks like H.264 and OFDM for WiFi/WiMax/... will be included in the mix... Stay tuned!)George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-67836474013194262162007-12-17T16:21:00.000-05:002007-12-17T16:39:21.992-05:00Faster Chips Are Leaving Programmers in Their DustThe NYTimes has this article today about the industry, and Microsoft in particular, needing to develop new languages for parallel architectures. It notes that one of the issues is that there are tasks that cannot be split across processors. This problem is analogous to the one we see in the hardware domain where many hardware designs cannot be automatically parallelized -- this disconnect has made traditional behavioral synthesis a disappointment for many that have tried it. Okay for simple, nested for-loops -- it quickly breaks down for more complex designs, especially ones with control logic intertwined.<br /><br />In the software domain, they're moving to more explicitly parallel languages -- we advocate the same in the hardware domain. The challenge is picking something that significantly raises the level of abstraction and is still synthesizable with high quality results -- it is just this need that makes SystemC pretty good for modeling, but not great for hardware design.<br /><br />Another interesting observation by Microsoft's Craig Mundie is that hardware designs are less likely to be homogenous matrices of identical processing elements than hetergeneous, optimized-per-task hardware processing elements.George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-82319192127982986162007-12-17T12:59:00.000-05:002007-12-17T13:10:16.663-05:00Microsoft's F# promoting functional languagesThere was an <a href="http://www.eweek.com/article2/0,1759,2212215,00.asp">article on Microsoft's F# language in eWeek</a> last month. Microsoft is apparantly going to target those that care about writing software for concurrency and those in scientific/financial/academic/technical areas. It's a functional programming language -- and the money quote (from my standpoint) is:<br /><br /><blockquote>"It's clear that a bunch of the things in programming that will be becoming more<br />important over time benefit from a more functional style of programming, like<br />concurrency, distributed programming and so forth," Torgersen said.</blockquote><br />At Bluespec, we not only develop our tool using a functional programming language (Haskell), but we've embodied many functional programming language capabilities for our users to use as well. We're a strong believer in these capabilities -- and have integrated them with atomic transactions for a completely new approach on the hardware side. As I posted before, Microsoft research has been doing some really interesting things with functional languages and atomic transactions -- nice to see us sharing the same technologies with our brethren on the software side.George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-66846470283750443522007-12-14T12:47:00.000-05:002007-12-14T13:01:55.319-05:00Alberto Sangiovanni-Vincentelli: The Theoretic Center of Computer ScienceNikhil came upon this "academic" paper, which seeks to identify the center of computer science from a conference and researcher standpoint. It's written in the form of a serious paper, but doesn't take itself very seriously. It's written by Michael Kuhn and Roger Wattenhofer of ETH Zurich -- it starts on page number 54 of the following link: <a href="http://people.csail.mit.edu/idish/sigactNews/DC-col28-Dec07.pdf" name="column 28">Column 28, SIGACT News Volume 38, Number 4, (Whole Number 145), December 2007</a>.<br /><br />In order to find the central 'actor' in computer science, they used an approach similar to that in the mathematics field with the <a href="http://www.oakland.edu/enp/">Erdos Number project</a>, by looking at author/co-authorship/references.<br /><br />And, who do they conclude is at the center of computer science? EDA's own <a href="http://www.eecs.berkeley.edu/~alberto/">Alberto Sangiovanni-Vincentelli</a>.George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-1893683141499306322007-12-12T13:43:00.000-05:002007-12-12T14:03:25.588-05:00Age Determines Technology's ValueI have been a very delinquent blogger. I recently discovered (a little late to the game) "<a href="http://fakesteve.blogspot.com/">The Secret Diary of Steve Jobs</a>" (which I find hilarious BTW) -- and, while I know that blog is more a job than a side effort, its multiple entries per day have thoroughly shamed me to get back on the horse.<br /><br />I've saved up a bunch of items that I obviously haven't gotten to... For some reason I ran across the following article entitled <a href="http://www.cioinsight.com/article2/0,1540,2222167,00.asp">Age Determines Technology's Value</a>. It talks about how people in the workforce communicate completely differently from the latest generation graduating -- and how we need to be more open minded.<br /><br />I saw parallels to our challenges getting engineers, who are fairly set in existing approaches, to take a fresh, serious look at a new, completely different approach to modeling, verification and implementation of hardware. The problem is different -- but the challenges are similar. Change is hard.George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-72641377866902922522007-09-12T10:41:00.000-04:002007-09-12T10:51:30.365-04:00The full series of articles on sequential to parallel programmingA series of articles is now up by Prof Arvind and Rishiyur Nikhil, our CTO, based on their book on "Implicit Parallel Programming in pH". I had referenced the first of this article series in my last post -- this first article is a great explanation of why it's hard to take a sequential description (a la C/C++) and make it (efficiently) parallel for most hardware and software designs. Here are the links to all the parts on embedded.com:<br /><br /><a href="http://www.embedded.com/showArticle.jhtml?articleID=201500267">Part 1</a><br /><a href="http://www.embedded.com/showArticle.jhtml?articleID=201801070">Part 2</a><br /><a href="http://www.embedded.com/showArticle.jhtml?articleID=201802337">Part 3</a><br /><a href="http://www.embedded.com/showArticle.jhtml?articleID=201803783">Part 4</a><br /><a href="http://www.embedded.com/showArticle.jhtml?articleID=201804960">Part 5</a>George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-68080242213315864982007-08-18T08:13:00.000-04:002007-08-18T08:26:47.055-04:00Article outlining issues with parallelizing sequential programs<a href="http://www.embedded.com/design/multicore/201500267">This article on embedded.com</a> is a great introduction as to why sequential programs are not typically a good beginning if the intent is to parallelize it either as a software program (let's say on multiple cores) or into hardware. This article was written a few years ago as part of a book on parallel programming by <a href="http://www.bluespec.com/">Bluespec</a>'s CTO Rishiyur S. Nikhil and one of our founders Arvind of MIT.<br /><br />A lot of people wish that C/C++, given its wide use, could be a specification language for parallel programs or even hardware. A nice thought -- unfortunately, the challenge of automatically identifying concurrency in a sequential software function is intractable for all but a small class of applications (and even with these, algorithms must often be written a particular way for the tools to sift out the concurrency).<br /><br />This article is a great primer to understand why this is the case. I understand that embedded.com will be publishing more excerpts of this book.George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-56798281976517370962007-08-16T20:28:00.000-04:002007-08-16T20:42:34.659-04:00Simon Peyton-Jones's talks at OSCON 2007Simon Peyton-Jones gave a few talks that I'm dying to watch when I get a little time. A few of my co-workers (who know him) have suggested he is among the best lecturers they have seen. He gave three talks. There were two that were recorded and available for internet viewing. One provides an overview of Haskell and the other provides an overview of transactional memory for concurrent (parallel) programming. From the Real World Haskell website comes <a href="http://www.realworldhaskell.org/blog/2007/08/07/wow-oscon-video-viewing-statistics/">the following summary </a>of the viewership of his lectures now available on the web:<br /><br /><blockquote>Simon’s Haskell language talks are the most popular of the OSCON videos, and have been viewed over 50% more times than the next ten most popular videos combined.<br /></blockquote><p>It looks like my co-workers are probably right. Here's a <a href="http://www.realworldhaskell.org/blog/2007/08/03/a-brief-haskell-at-oscon-trip-report/">page with a link to his talks on Haskell and transactional memory for concurrent programming</a>.</p>George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.comtag:blogger.com,1999:blog-17864065.post-88450416961192720752007-07-25T19:29:00.000-04:002007-07-26T11:00:08.623-04:00Software Transactional Memory - a couple references<a href="http://chipsandbs.blogspot.com/2007/07/microsoft-future-of-parallel.html">My last post</a> talked about the emphasis on atomic transactions and functional programming for next generation programming languages for multi-core, parallel architectures. Here is an interesting paper talking about software transactional memory (STM) from Simon Peyton-Jones of Microsoft's <a href="http://research.microsoft.com/Users/simonpj/">webpage</a>:<br /><br /><a href="http://research.microsoft.com/Users/simonpj/papers/stm/stm.pdf">Composable Memory Transactions</a> (about Software Transactional Memory (STM))<br /><br />There are some additional papers on transactional memory programming on Simon's webpage <a href="http://research.microsoft.com/~simonpj/papers/stm/index.htm">here</a>.<br /><br />Wikipedia has a summary, plus some links <a href="http://en.wikipedia.org/wiki/Software_transactional_memory">here</a>.George Harperhttp://www.blogger.com/profile/12782319843580094075noreply@blogger.com