Using PHP to convert XML to CSV but with a twist

Multi tool use
Using PHP to convert XML to CSV but with a twist
I'm trying to convert some XML files I have to CSV using PHP SimpleXML class. However, I'm unable to achieve the result I want, because one parent could have several child elements with the same name. My current XML file is as follows:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<club>
<name>Green Riders</name>
<membership>Free</membership>
<boardMember>
<name>James F.</name>
<position>CEO</position>
</boardMember>
<boardMember>
<name>Helen D.</name>
<position>Associate Director</position>
</boardMember>
</club>
<club>
<name>Broken Dice</name>
<membership>Paid</membership>
<boardMember>
<name>Patrick B.</name>
<position>CEO</position>
</boardMember>
</club>
</root>
The CSV output I was hoping to achieve is as such:
club,name,membership,boardMember>Name,boardMember>position
Green Riders,Free,James F.,CEO
Green Riders,Free,Helen D., Associate Director
Broken Dice,Paid,Patrick B., CEO
Is there anyway to achieve this without hard-coding the element names into the script (i.e. make it work on any generic XML file)?
I'm really hoping this is possible, given that I'll be having more than 25 XML variants; so would really be inefficient to write a dedicated script for each.
Thanks!
<example><foo>1</foo><foo>2</foo><bar>a</bar><bar>b</bar><bar>c</bar></example>
foo,bar
1,a
1,b
1,c
2,a
2,b
2,c
It occurs to me that this is the kind of task that XSLT was created for, but I don't know enough to actually offer an answer using it.
– IMSoP
Jul 3 at 12:23
2 Answers
2
Since every child node's data need to be a row in the csv including the root root data, First you can capture & store the root data, then traverse the children and print their data with the root's data preceding them.
Please check the following code:
$xml = simplexml_load_file("your_xml_file.xml") or die("Error: Cannot create object");
$csv_delimeter = ",";
$csv_new_line = "n";
foreach($xml->children() as $n) {
$club_data = array();
$club_data = $n->name;
$club_data = $n->membership;
if (isset($n->boardMember)) {
foreach ($n->boardMember as $boardMember) {
$boardMember_data = $club_data;
$boardMember_data = $boardMember->name;
$boardMember_data = $boardMember->position;
echo implode($csv_delimeter, $boardMember_data).$csv_new_line;
}
}
else {
echo implode($csv_delimeter, $club_data).$csv_new_line;
}
}
After testing with the example xml data, it generated the following type of output:
Green Riders,Free,James F.,CEO
Green Riders,Free,Helen D., Associate Director
Broken Dice,Paid,Patrick B., CEO
You can set different values based on your scenario for:
$csv_delimeter = ",";
$csv_new_line = "n";
As there are no strict rules in csv output - like delimeter can be ",", ",", ";" or "|" and also new line can be "nr"
The codes prints csv rows one-by-one on the fly, but if you are to save csv data in a file, then instead of writing rows one-by-one, better approach would be create the entire array and write it once(as disk access is costly) unless the xml data is large. You will get plenty of simple php array-to-csv function examples in the net.
While this may solve the problem, it's best to include an explanation of your general approach along with the problem. This helps people with similar problems adapt the solution to their needs.
– IMSoP
Jul 2 at 9:00
Thanks @IMSoP ! I have elaborated the answer. Feel free to add if I missed anything :)
– Tanvir Ahmed
Jul 2 at 18:33
It is not really possible. XML is a nested structure and you miss the information. You can define some default mapping for XML structures, but that gets really complex really fast. So it is far easier (and less time consuming) to define the mapping by hand.
function readXMLAsRecords(string $xml, array $map) {
// load the xml
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
// iterate the elements defining the rows
foreach ($xpath->evaluate($map['row']) as $row) {
$line = ;
// get the field values from the current $row
foreach ($map['columns'] as $name => $expression) {
$line[$name] = $xpath->evaluate($expression, $row);
}
// return a line
yield $line;
}
}
With DOMXpath::evaluate()
Xpath expressions can return strings. So we need one expression that returns the boardMember
nodes and a list of expressions for the fields.
DOMXpath::evaluate()
boardMember
$map = [
'row' => '/root/club/boardMember',
'columns' => [
'club_name' => 'string(parent::club/name)',
'club_membership' => 'string(parent::club/membership)',
'board_member_name' => 'string(name)',
'board_member_position' => 'string(position)'
]
];
readXMLAsRecords()
returns a generator, you can use foreach
on it:
readXMLAsRecords()
foreach
$csv = fopen('php://stdout', 'w');
fputcsv($csv, array_keys($map['columns']));
foreach (readXMLAsRecords($xml, $map) as $record) {
fputcsv($csv, $record);
}
Output:
club_name,club_membership,board_member_name,board_member_position
"Green Riders",Free,"James F.",CEO
"Green Riders",Free,"Helen D.","Associate Director"
"Broken Dice",Paid,"Patrick B.",CEO
It took me a moment to realise that the key to this is that the "row" expression targets a single board member, and the other expressions climb upwards to get other data. Since my comment was never answered, this is a reasonable interpretation of the question, but if there are more complex combinations involved, a more complex algorithm would be needed. I definitely agree that writing a tool which accepts definitions is better than trying to make a zero-config algorithm that will cope with all structure, though.
– IMSoP
Jul 3 at 12:22
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Is there a guarantee that there will only be one name which is repeated, or might you need to generate all possible combinations of several columns? e.g. should
<example><foo>1</foo><foo>2</foo><bar>a</bar><bar>b</bar><bar>c</bar></example>
generate a CSV withfoo,bar
columns and 6 rows for1,a
,1,b
,1,c
,2,a
,2,b
, and2,c
?– IMSoP
Jul 2 at 9:05