Tuesday, January 29, 2008

LINQ to Excel Provider 2.5

This version of the provider, evolving from LINQ to Excel Provider 2.0, adds change tracking in order to allow a user to update, insert, and delete Microsoft Excel data from any .Net code. It follows the LINQ to SQL model of change tracking. To track updates in a class the class will need to implement the System.ComponentModel.INotifyPropertyChanged interface. On the property setters, a developer will write code to raise the PropertyChanged event when a new value is set in the property. The example shows how to write a helper method to ease raising this event. To add or delete records one would call either InsertOnSubmit or DeleteOnSubmit on the generic GetSheet method of the provider. When you would like to commit your changes to Excel, simply call SubmitChanges on the provider and all the inserts, updates, and deletes are committed to the Excel spreadsheet.

The C# source file is available here

A VB.Net version of the source is available here

An online code generator to build the LINQ to Excel class is available here


ExcelProvider provider = ExcelProvider.Create(@"c:\deploy\Book1.xls");
foreach (Person per in (from p in provider.GetSheet<Person>() where p.LastName == "Johnson" select p))
{
per.LastName = "Smith";
}
Person p = new Person();
p.Id = 10.0;
p.FirstName = "Alex";
p.LastName = "Zander";
p.BirthDate = new DateTime(1980, 4, 4);
provider.GetSheet<Person>() InsertOnSubmit(p);
provider.SubmitChanges();
Console.WriteLine("Done");


[ExcelSheet(Name="Sheet1")]
public class Person: System.ComponentModel.INotifyPropertyChanged
{
private double id;
private string fName;
private string lName;
private DateTime bDate;


public event System.ComponentModel.PropertyChangedEventHandler PropertyChanged;


protected virtual void SendPropertyChanged(string propertyName)
{
System.ComponentModel.PropertyChangedEventHandler handler = PropertyChanged;
if (handler != null)
{
handler(this, new System.ComponentModel.PropertyChangedEventArgs(propertyName));
}
}

public Person()
{
id = 0;
}
[ExcelColumn(Name="ID",Storage="id")]
public double Id
{
get { return id; }
}

[ExcelColumn(Name = "FirstName", Storage = "fName")]
public string FirstName
{
get { return this.fName; }
set { fName = value;
SendPropertyChanged("FirstName");
}
}

[ExcelColumn(Name = "LastName", Storage = "lName")]
public string LastName
{
get { return this.lName; }
set { lName = value;
SendPropertyChanged("LastName");
}
}

[ExcelColumn(Name = "BirthDate", Storage = "bDate")]
public DateTime BirthDate
{
get { return this.bDate; }
set { bDate= value; }
SendPropertyChanged("BirthDate");
}
}
}

55 comments:

Chris said...

Fantastic! Many thanks, this has really helped me.

The spreadsheet I had to read had spaces in the column names, so I had to manually amend the autogenerated class, and also change the BuildSelect() slightly to:
builder.AppendFormat("[{0}]", col.GetSelectColumn() );

K Jacobson said...

Thanks, I am glad you found it useful. Also thanks for the info on the errors you found. I have made fixes to the code generator to handle Excel columns with spaces, and also applied your change to the Excel provider (builder.AppendFormat("[{0}]", col.GetSelectColumn() );
).

Andrew said...

Great Work. In playing around with this I found it necessary to make the following change to the foreach within the BuildWhereClause Method to the following. This was necessary in order to get it to play nice with empty cells.


PropertyManager pm = objState.GetProperty(col.GetProperty().Name);
if (builder.Length > 0)
{
builder.Append(" and ");
}

builder.AppendFormat("[{0}]", col.GetSelectColumn());

if (pm.OrginalValue == System.DBNull.Value)
builder.Append(" IS NULL");
else
{
builder.Append(" = ");
string paraNum = "@x" + cmd.Parameters.Count.ToString();
builder.Append(paraNum);
OleDbParameter para = new OleDbParameter(paraNum, pm.OrginalValue);
cmd.Parameters.Add(para);
}

K Jacobson said...

Thanks for the input Andrew! I applied your fix to the code to build the where clause

ricaforrica said...

Great post! I was really looking for somethig just like this!

Do you have perhaps a vb.net version (of the code and of the generator)?

Thanks!

K Jacobson said...

I currently do not have a vb.net version but I will work on creating one. It shouldn't take too much effort. Likewise I will work on getting the code generator to write to C# or VB.Net.

K Jacobson said...

I have posted a VB.Net version of the source and I have updated the online code generator to support VB.Net along with C#

Anonymous said...

Hello mate,
Your generator is great but it doesn't works on Opera browser and halfly works in FireFox browser. Check it out :]

K Jacobson said...

Thanks for the input! Did some rework of the page and have it working on IE and Firefox. I haven't had a chance yet to test with Opera

iaresean said...

Nice piece of code. Thing is I always get an exception when I have an empty column: System.DBNull cannot be converted to type String.

Could you possibly modify to allow for handling empty columns too?

:-)

K Jacobson said...

iaresean,

Thanks for the comment and for bringing the bug to my attention. I believe you were having the error when you where attempting to read data from the excel sheet. I added a fix to the VB and C# version to check for DBNull as the return type from the Excel sheet and, if so, convert to to a simple null value, or in VB set it to nothing

Stef said...

That's just what I've searched for.
You did a great job.
Thanks!

codputer said...

I needed it to read 2003/2007 formats, so I added this class. Used it to get the right connection string.

public static class ExcelConnectionString
{
internal static string GetConnectionString(string pFilePath)
{
string strConnectionString = "";
string strExcelExt = System.IO.Path.GetExtension(pFilePath);

if (strExcelExt == ".xls")
strConnectionString = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties= ""Excel 8.0;HDR=YES;""";
else if (strExcelExt == ".xlsx")
strConnectionString = @"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=""Excel 12.0 Xml;HDR=YES""";
else
throw new ArgumentOutOfRangeException("Excel file extenstion is not known.");

return string.Format(strConnectionString, pFilePath);

}
}

K Jacobson said...

Thanks for the fix codputer!

Below is a VB version of it. Also I will update the source for both with this change in the next couple days.

Public Class ExcelConnectionString
Friend Shared Function GetConnectionString(ByVal pFilePath As String) As String
Dim strConnectionString As String = String.Empty
Dim strExcelExt As String = System.IO.Path.GetExtension(pFilePath)

If (strExcelExt = ".xls") Then
strConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties= ""Excel 8.0;HDR=YES;"""
ElseIf (strExcelExt = ".xlsx") Then
strConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=""Excel 12.0 Xml;HDR=YES"""
Else
Throw New ArgumentOutOfRangeException("Excel file extenstion is not known.")
End If
Return String.Format(strConnectionString, pFilePath)
End Function
End Class

K Jacobson said...

The source for VB and C# has been updated with codputer's change.

dev@krudo.net said...

very very nice!!!
however I have a problem:
I get "No value given for one or more required parameters." - OleDb Exception, when I hit the row: "
using (OleDbDataReader reader = cmd.ExecuteReader())"
what went wrong here? mybe the Excel file?
thanks, krudo

K Jacobson said...

Without seeing your code, the first thing I would verify is the Excel Column Attribute name on the properties in your .Net class matches the column names in the Excel sheet. The code should also be able to handle spaces in the column names. Because the code the code building the Sql statement is placing brackets [] around the column names.

So check that first; and if you still have issues try posting your code so we can take a look at it.

Ahad L. Amdani said...

Codputer's change is great, except he forgot to include XLSM for macro-enabled workbooks.

You can update to this:
public class ExcelConnectionString
{
internal static string GetConnectionString(string pFilePath)
{
string strConnectionString = string.Empty;
string strExcelExt = System.IO.Path.GetExtension(pFilePath);

if (strExcelExt == ".xls")
strConnectionString = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties= ""Excel 8.0;HDR=YES;""";
else if (strExcelExt == ".xlsx" || strExcelExt == ".xlsm")
strConnectionString = @"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=""Excel 12.0 Xml;HDR=YES""";
else
throw new ArgumentOutOfRangeException("Excel file extenstion is not known.");

return string.Format(strConnectionString, pFilePath);

}
}

I'm sure I've left out a format as well, I just updated it to resolve my particular problem; however, I ran into the error mentioned by Krudo. I generated a stack trace and this was the result:

Stack Trace:
at
System.Data.OleDb.OleDbCommand.ExecuteCommandTextForSingleResult(tagDBPARAMS dbParams, Object& executeResult)
at
System.Data.OleDb.OleDbCommand.ExecuteCommandText(Object& executeResult)
at
System.Data.OleDb.OleDbCommand.ExecuteCommand(CommandBehavior behavior, Object& executeResult)
at
System.Data.OleDb.OleDbCommand.ExecuteReaderInternal(Command Behavior behavior, String method)
at System.Data.OleDb.OleDbCommand.ExecuteReader()
at LinqToExcel.ExcelSheet`1.Load() in
C:\Users\amdanial.SCRUGGHOGGS\Documents\Visual Studio 2008\Projects\FortressReports\FortressReports\LinqToExcel.cs:line 208
at System.Linq.Enumerable.< WhereIterator >d__0`1.MoveNext()
at System.Linq.Buffer`1..ctor(IEnumerable`1 source)
at
System.Linq.OrderedEnumerable`1.< GetEnumerator >d__0.MoveNext()
at System.Linq.Enumerable.Count[TSource](IEnumerable`1 source)
at
FortressReports.FLEDataSetProducer.ReadDataSetFromExcel(ExhibitType LiabilityType, String officeLocation, String evaluationPeriod, String templateFileName) in
C:\Users\amdanial.SCRUGGHOGGS\Documents\Visual Studio 2008\Projects\FortressReports\FortressReports\FLEDataSetProducer.cs:line 153

I used your code generator to come up with my MainDataRow.cs class which defines the data types in my MainData table (very nice generator, btw, although it got a bit confused on handling special characters such as "?" or "/").

The code at line 153 of my FLEDataSetProducer.cs is in my ReadDataSetFromExcel() function and it's simply the following:

ExcelProvider provider = ExcelProvider.Create(templateFileName);
LinqToExcel.ExcelSheet< MainDataRow > MainDataTable = provider.GetSheet< MainDataRow >();

// FullData is of type IOrderedEnumerable< MainDataRow >
FullData = from row in MainDataTable
where row.Office.Contains(officeLocation) && row.ReservingPeriod.Contains(evaluationPeriod)
orderby row.Reserves descending
select row;

try
{

int RecordsCount = FullData.Count(); // this is line 153 where it seems to error out on
}
catch (Exception ex)
{
throw new Exception("Data Input Error: Issue loading Exhibit template!\n" + ex.Message + "\n\n Stack Trace:\n" + ex.StackTrace, ex);
}

Ahad L. Amdani said...

Further inspection of the stacktrace led me to inspecting the LinqToExcel.cs file at lines 208 and 162.

Under the LinqToExcel.ExcelSheet< T > public IEnumerator< T > GetEnumerator() function:
line 208 - Load();

Under the LinqToExcel.ExcelSheet< T > private void Load() function:
line 162 - using (OleDbDataReader reader = cmd.ExecuteReader())


This is the 3rd nested using statement. This suggests a possible issue with the GetEnumerator() function in conjunction with the aggregate Count() function.

Thanks,
Ahad L. Amdani

K Jacobson said...

Thanks Ahad.

I added the .xlsm support.

Were you using special characters in your columns names for your Excel sheet? Was that what was causing the execption.

Ahad L. Amdani said...

Well, it seems to be a PEBKAC error on my part. An issue of not having mapped all the column names properly as well as spelling issues, and not having the proper data types.

I'm going to look into this further and see if there is any issue with the provider class itself, though it doesn't look to be that way.

However, the only thing of concern is that there aren't too many exceptions thrown from you your provider class, and the ones thrown aren't as descriptive as they could be.

Good luck,
Ahad L. Amdani

Ahad L. Amdani said...

What's the efficiency of this method as compared to, say, having a dataset populated via an adapter using an ODBC or OLEDB connection to Excel? Performance-wise and memory-wise.

I was running a solution through the former method (so as to utilize LinqToSql) and switched over to using LinqToExcel, and I keep running into a System.OutOfMemory exception; watching the Task Manager I noticed I hit the 1.5GB hard cap quite quickly reading into memory an Excel file with around 7,000 rows x 20 columns.

K Jacobson said...

It is a going to be a performance hog on larger Excel sheets like you describe. The main reason is it attempts to load the entire Excel sheet into memory before performing the LINQ query. Once the Excel sheet is in memory we are just doing plain old LINQ over in memory collections.

If you are using an Excel sheet that large you would have better performance using DataSet adapters and use OLEDB or ODBC.

I never intended on folks using this on something that large. I was assuming a couple 1000 rows and a handful of columns.

This could be re-written to perform the LINQ query like LINQ to SQL does, but I just do not have the time commitment to do something like that and it is a complex task.

TomKob said...

Great code, helped me a lot.

I have a suggestion though. When I first used the class I kept getting conversion exceptions because the formatting in the spreadsheet did not precisely match my class' data type. I made it a lot more robust by replacing the line

fi.SetValue(item,val);

with

fi.SetValue(item, Convert.ChangeType(val,fi.FieldType));

In the function ExcelSheet.Load()

K Jacobson said...

TomKom,

Thanks! I have applied your recommended code changes to the C# and VB.Net code. I was just trying to find a function like that yesterday for another project that I am working. Thanks again.

FYI. I recently rebuilt my machine and I haven't put VB.Net Express on it yet so I "notepadded" the VB.Net changes. If it doesn't compile let my know.

juaran said...

can you add CreateSheet?

I use this

protected void CreateSheet (ExcelProvider provider, Type sheetType)
{
string sheet = ExcelMapReader.GetSheetName (sheetType);
string connectionString = ExcelConnectionString.GetConnectionString (provider.Filepath);
using (OleDbConnection conn = new OleDbConnection (connectionString))
{
conn.Open ();
using (OleDbCommand cmd = conn.CreateCommand ())
{
cmd.CommandText = String.Format (@"CREATE TABLE {0} (Dt date, St char(40), Cr currency)", sheet);
cmd.ExecuteNonQuery ();
}
}
}

Bob said...

Deletes don't work.

K Jacobson said...

It looks you get this error:

Deleting data in a linked table is not supported by this ISAM

Did a quick search online and seems like it's a common issue that delete (with a from clause) is not supported in Excel. I'll have to look at it some more.

If you can delete the entire sheet maybe you could read it all in memory, delete the sheet, delete the row in memory, and insert the rest in memory back to the Excel sheet.

javierjh said...

On the VB.net version, change tracking is not working be cause
the property is only added to the propertymanager if col.IsFieldStorage=true

K Jacobson said...

Thanks javierjh,

The code that adds the property to the propety manager needs to be outside the if else block in the VB.Net version like it is on the C# version.

prietopa said...

javierjh I´ve have the same problem, how do you resolve it?

Can you put the changes in vb code?
I´m dummy in .net

thanks
gracias

TomDom said...

Good work.
How do I have to proceed if the Excel file has more than one sheet ?

K Jacobson said...

prietopa

I forgot to clarify in my reply to javierjh that I applied the fix to the VB.Net source

K Jacobson said...

TomDom,

You would have to create a .Net class for each sheet.

Then when you use the provider you just supply the .Net class type to the GetSheet generic method and it will know which sheet to go to. It is reading the ExcelSheet attribute on the class to determine which sheet in the Excel file to read from.

Example
Sheet 1 begin a class as such

[ExcelSheet(Name="Sheet1")]
public class Sheet1Person: System.ComponentModel.INotifyPropertyChanged
{
//add properties
}

Sheet 2 begin a class as such

[ExcelSheet(Name="Sheet2")]
public class Sheet2Person: System.ComponentModel.INotifyPropertyChanged
{
//add properties
}

from p in provider.GetSheet<Sheet1Person>() where p.LastName == "Johnson" select p

from p in provider.GetSheet<Sheet2Person>() where p.LastName == "Johnson" select p

TomDom said...

Hi,

Thanks for this answer.

Actually I have already built my 6 classes and I'll try to use GetSheet.

I have to develop a software that request in SQL table with jointure from an Excel file.

Do you think that I can join class made with Linq To Excel with class made with Linq to SQL?

K Jacobson said...

The only way to join a LINQ to SQL query with a LINQ to Excel query is to convert the LINQ to SQL query to an in-memory collection and then perform another query joining that collection with the one from the LINQ to Excel query(which is an in-memory collection already)

This is done by executing a ToList() function on the LINQ to SQL query. What this does is run the query in SQL Server (minus the join) and return the results in a List collection. Once in that collecion you can join it with the LINQ to Excel query and like I said before perform another LINQ query on it.

TomDom said...

Thank you very much for your help.

chris said...

I was having some problems reading worksheets that contained columns with numeric and text data. For some reason, the text data was coming out null.

Then I found this surprise:
http://support.microsoft.com/kb/190195/EN-US/

Move the data rows with text data in those columns up to the top and all is well. Thanks Microsoft.

Thanks for your work on this software. It has been very helpful.

Sascha said...

This is great work, well done!

Only one problem.. I get a NullReferenceException at line 260 of the CS code because val is sometimes null.

My tables appear to have a bunch of "empty" rows at the end. Somehow the queries dont recognise where the table ends and keeps reading on for a few rows. Typicaly this happens when I deleted a few rows at the end of the table that used to be there.

Sascha said...

Chris..
I believe the problem with mixed format data can also be fixed without modifying the tables by adding IMEX=1
to the end of the connection string (i.e. it would end with ...HDR=YES;IMEX=1"""

At least that fixed the problem in my case.

MickeyVip said...

Hello!
First - excellent work!

I kept getting exceptions when trying to read data into decimal variable, while some excel fields had text data in them.

The solution was to alter the code, that was added by TomKom in Load() method, inside "if (col.IsFieldStorage())...":

changed
fi.SetValue(item, Convert.ChangeType(val, fi.FieldType));

into
if (val == null && fi.FieldType.IsValueType) { fi.SetValue(item, null); } else { fi.SetValue(item, Convert.ChangeType(val, fi.FieldType)); }


maybe additional changes needed in "else" part of the container "if (col.IsFieldStorage())..."

Thanks

ocirnErepuS said...

Very nice piece of code, really useful!

As MickeyVip I had to check the val!= null before converting type, and as Sasha suggested, I added the IMEX=1, part. Please not that in Sasha's post, he forgot to add the semicolon at after IMEX=1, like this:
IMEX=1;"""

Just wondering, what if I don't know the name of the sheet, but I instead just want to use the first one of the sheet collection? what should I put in the attribute of the class?

Thanks again for your work!

-E-

Isaac said...

This is a really cool apps.
I was using it, and found an issue with the GetConnectionString Class.
When try to use .xls files get an error that no installable ISAM. So to correct that a made the following change:
At the if statement to use .xls files, I change the connection string to use Excel 12.0 without XML properties and work fine. see code below.



public class ExcelConnectionString
{
internal static string GetConnectionString(string pFilePath)
{
string strConnectionString = string.Empty;
string strExcelExt = System.IO.Path.GetExtension(pFilePath);

if (strExcelExt == ".xls")
// strConnectionString = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties=Excel 8.0;HDR=YES";
//Ahad L. Amdani added support for .xslm for workbooks using macros
strConnectionString = @"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=""Excel 12.0;HDR=YES""";
else if (strExcelExt == ".xlsx" || strExcelExt == ".xlsm")
strConnectionString = @"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=""Excel 12.0 Xml;HDR=YES""";
else
throw new ArgumentOutOfRangeException("Excel file extenstion is not known.");

return string.Format(strConnectionString, pFilePath);

}
}

softsysgrp said...

Nice job, it helped me solve a problem.

For the VB version of the code when building the Where clause, it did not work checking for DBNull. Instead I had to use Nothing as shown below.

builder.AppendFormat("[{0}]", col.GetSelectColumn())
'fix from Andrew 4/2/08 to handle empty cells
If (pm.OrginalValue Is Nothing) Then
builder.Append(" IS NULL")
Else

Prithiraj Sengupta said...

Really, Nice work done. http://solidcodingarchive.appspot.com/resources/LINQtoExcelCodeGen.html is very useful.

Thanks

Prithiraj

Pamorale said...

Thanks for sharing you fantastic work...

Anonymous said...

Thank you so much for sharing this piece of code.

NYMEZIDE said...

I rewrite
"if (pm.OrginalValue == System.DBNull.Value)"

to

"if (pm.OrginalValue == null)"

somehow my nullable cells don't Update. Code "builder.Append(" IS NULL");" never using. (

Now Update success. I dont know why both works.

ambodia said...

help!!
when i try to run LinqExcelProvider25
it show an error below.
can anyone help...

The Microsoft Jet database engine could not find the object 'Sheet1$'. Make sure the object exists and that you spell its name and the path name correctly.

K Jacobson said...

In the Excel spreadsheet it is trying to open, it is expecting an Excel sheet named Sheet1. I would guess the Excel spreadsheet you are opening that sheet names have been changed from their defaults of Sheet1, Sheet2, etc.

Anonymous said...

The last SendPropertyChanged should be inside the set {}

Anonymous said...

This code is incredible. Thank you!

pjaaar said...

Gr8 job. Little trick for ppl having trouble with ISAM driver and deletions:

http://pjaaar.blogspot.com/2012/01/k-jacobson-excel-provider-delete-from.html

Anonymous said...

Hi. Out of curiosity, is this class still the best way of linq'ing to excel? Seems to be - but wanted to check before starting to use it. Thanks.

Kaylee said...

Useful! Thanks for sharing this post.