Delimiter concluded in text in some cells

Multi tool use
Delimiter concluded in text in some cells
I am making a Software which hast to read much data from a CSV file. The CSV file uses ';' as the delimiter. Recently I found out that the text in some cells also concludes ';'. Every tuple from the file is an Object and so every cell in this row is an Attribute of this Object. I use the method Split in C#, which returns an Array of Strings. Because of the ';' in some cells I get an error: Array is out of bound. Are there some ways to get rid of this Error without removing the ';' from some cells?
Example for such a tuple:
Cell1;cell2;cell3;stillCell3;cell4;
Cell3;stillCell3 is one cell but the Split method does not know that.
Be very wary of reinventing the wheel - use nuget.org/packages/LumenWorksCsvReader
– MineR
Jul 3 at 4:43
Can you post a bit of the data? From your description, it sounds like there are just multiple ';' in a row which would be easy to deal with, but your example would be impossible to deal with.
– MineR
Jul 3 at 5:17
@MineR No, it's my companies data
– user10024569
Jul 3 at 5:19
Well there's no programmatic way of doing what you're asking. You've gotta manually clean your data.
– MineR
Jul 3 at 5:53
3 Answers
3
Your original data does not meet the requirements of the CSV file format. No parser can deal with issues that humans would scratch their head about.
You cannot parse ambiguous values. You have to either escape delimiters in field values or have to put values between double quotes.
You can escapes these values by embedding the field inside a set of double quotes. For more details you can refer this link
CSV File Format
------------EDIT 1 ---------------------
Consider this example. It is a comma separated csv
csv
Here you can see a column "OK,123;asw.wew" with ',' and';'
If we open it with notepad or text editor we can see that it is automatically added escape character.
You have to code in such a way that it will split on separator not inside the double quotes.
-------------------------- EDIT 2 --------------------------
You can take advantage of .net
class TextFieldParser
.To use it
.net
TextFieldParser
Microsoft.VisualBasic
Microsoft.VisualBasic.FileIO.TextFieldParser
CSV
Here is the sample code:
using (TextFieldParser parser = new TextFieldParser(@"c:test.csv"))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
while (!parser.EndOfData)
{
//Processing row
string fields = parser.ReadFields();
foreach (string field in fields)
{
//TODO: Process field
}
}
}
Here are some more useful links:
This would be manually, right?
– user10024569
Jul 3 at 4:36
@user10024569 check the edit. Open your
.csv
in text editor and please share the data if possible– Jinto Jacob
Jul 3 at 4:51
.csv
The CSV file has about 1000 lines that I would have to change
– user10024569
Jul 3 at 4:54
no in my case when i edited the cell in excel it automatically included the double quotes. can you check your csv by open it in text editor like notepad
– Jinto Jacob
Jul 3 at 4:57
You can try TextFieldParser it has many options to read the fields.
using (TextFieldParser lcsvReader = new TextFieldParser(new MemoryStream(<FILE_BYTES>), Encoding.Default))
{
lcsvReader.Delimiters = new string[2] { ",", "t" };
lcsvReader.HasFieldsEnclosedInQuotes = true;
lcsvReader.TrimWhiteSpace = true;
while (!lcsvReader.EndOfData)
{
string fields = lcsvReader.ReadFields();
//fields -- Actual field in CSV
}
lcsvReader.Close();
}
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
It would be awesome if you could provide a Minimal, Complete, and Verifiable example.
– mjwills
Jul 3 at 4:31