public class ben:harrell

June 8, 2008

SSIS Custom Source Component for EBCDIC

Well, it’s finally done!  Patrick and I finished version 1 of the EBCDIC Source Component which aids tremendously in the importing of EBCDIC (mainframe, IBM, old school stuff) data into SQL Server Integration Services.  We think this component will allow a number of shops to focus on getting the data right in their ETL solution within Integration Services rather than beating their head against an older data format that doesn’t always play well with others.

This component, named Lysine, works like most other Sources in SSIS so it should be easy to get started.  Currently, the component has the following features:

  • Several EBCDIC code pages supported
  • Intuitive Layout UI for rapid  development
  • Quick Preview to show you if your layout is correct
  • All major column types supported Redefines, Occurs, Occurs Depending, Packed (Comp-3), Zoned
  • Single Pass conversion for scalable performance
  • Export/Import of layout for team development

Please come check out the demo, browse the User Guide, try some Samples and let us know what you think!

September 18, 2007

SSIS Custom Component – ProvideComponentProperties vs ReInitializeMetadata

Filed under: .NET, Custom Source Component, database, Integration Servicees, SQL Server 2005, SSIS, Technology — benjamin harrell @ 9:47 am

 

I am currently working on a custom source component in SSIS that converts EBCDIC data to ASCII inline and one of the challenges I face is creating dynamic outputs and output columns based on the layout process of the source component.  Normally, when you want to additional outputs on your component you create them in by override ProvideComponentProperties like this:

public override void ProvideComponentProperties()
{
    // add the outputComponentMetaData.UsesDispositions = true;
   IDTSOutput90 output =     ComponentMetaData.OutputCollection.New();
   output.Name = “My New Output”;
   output.ExternalMetadataColumnCollection.IsUsed = true;

This works really well if all of your output information is available at design time (in the SSIS ui) but what happens if your dynamic outputs are determined at runtime?  ProvideComponentProperties is only called one time, when the component is added to designer surface.  In order to dynamically add outputs at a later point you must use ReInitializeMetaData which is called whenever Validate returns VS_NEEDSNEWMETADATA.

public override void ReinitializeMetaData()
{
   // add the output
   ComponentMetaData.UsesDispositions = true;
   IDTSOutput90 output = ComponentMetaData.OutputCollection.New();
   output.Name = “My New Output”;
   output.ExternalMetadataColumnCollection.IsUsed = true;
 }
 

Note that I have not shown the additional work of adding columns to either of these scenarios, you will need to add that code yourself. 

July 14, 2007

SSIS Row Limits and DefaultBufferMaxRows (Part 2)

Filed under: .NET, Custom Source Component, database, Integration Servicees, SSIS — benjamin harrell @ 10:44 am

I believe the mystery is solved and I’m sad to say that (as usual) it was something silly but crucial.  In a custom source component you must create a new PipelineBuffer in order to write your rows out.  SSIS has provided a way for us to let the engine know when we are done adding rows with a simple method “SetEndOfRowset”.  The name says it all and even the MSDN documentation is clear that you must call this.  In addition, your error logs if you don’t call this method will say something like:

The PrimeOutput method on <your component> returned success, but did not report an end of the rowset. 

 You would think that this error message would be enough to warn any developer that they were missing a key line of code but sadly that wasn’t the case.  In my case I had actually allocated 2 buffers, 1 for data rows and 1 for error rows.  I set a breakpoint on my SetEndOfRowset call and watched it execute so I just knew this wasn’t my problem.  But I forgot the second buffer!  So just a note to all you brave souls commanding bits on the SSIS battlefield.  Call SetEndOfRowset for EACH buffer you allocate. Good Luck! 

July 10, 2007

SSIS Row Limits and DefaultBufferMaxRows

Filed under: C#, database, errors, Integration Servicees, SQL Server 2005, SSIS, Technology — benjamin harrell @ 9:51 pm

I had an odd issue during some testing I was doing on a custom source component I’m writing for EBCDIC data conversion.  I first assumed that surely my code was at fault but the test just seemed to simple to be “my code”.  So here’s the scenario,  I have a file with 100,000 bytes of the number ‘1’ in EBCDIC.  I created a simple package to import the file as 1 byte rows into a Dataset Destination.  I was surprised when I executed the package and found that only 65,536 rows were processed.  Now if you have been around software for more than a week you probably recognize this magic number and unfortunately it triggers all sorts of theories about what could possibly be going wrong.  Well, I’m still trying to investigate the “why” and I’ll update this post and welcome comments if anyone has input but for now the solution to this oddity is to increase the DefaultBufferMaxRows to greater than your number of records.  This seems like a horrible workaround for a situation that should happen quite often so the hunt continues…

Blog at WordPress.com.