Blog Archives
XML (part 1, basics), file IO
The last days were tough. Writing posts requires a lot of time. Explaining some basic know-how will give me some recreation today.
I will concentrate on the C# parts. There will be no explanations about XML, which is well documented across the internet.
XML stands for eXtensible Markup Language. XML is a versatile and flexible data structure. It is easy to read, learn and understand. The downside is the amount of bytes it needs. You can use XML to transport and/or store data, but it quickly becomes quite inefficient when a lot of datasets are involved.
XML is a well established industry standard. So there is no way around it, you have to know at least something about it.
Here is an XML file that we will use in the loading example:
<?xml version="1.0" encoding="UTF-8"?> <!-- XML EXAMPLE --> <Walmart> <food> <name>Banana</name> <price>1.99</price> <description>Mexican delicious</description> </food> <food> <name>Rice</name> <price>0.79</price> <description>the best you can get</description> </food> <food> <name>Cornflakes</name> <price>3.85</price> <description>buy some milk</description> </food> <food> <name>Milk</name> <price>1.43</price> <description>from happy cows</description> </food> <electronic> <name>Kindle fire</name> <price>100</price> <description>Amazon loves you</description> <somethingElse>the perfect Xmas gift for your kids</somethingElse> </electronic> <food> <name>baked beans</name> <price>1.35</price> <description>very British</description> </food> </Walmart>
Paste the XML structure into a text editor and save it as an XML file (“Walmart.xml”) onto your desktop.
The following class will be used to store datasets in memory.
public class WmItem { public readonly string name; public readonly double price; public readonly string description; public WmItem(string xName, double xPrice, string xDescription) { name = xName; price = xPrice; description = xDescription; } // constructor public override string ToString() { return name.PadRight(12) + price.ToString("#,##0.00").PadLeft(8) + " " + description; } // } // class
Loading XML files is easy. Processing files takes a bit longer. The centerpiece generally is the parsing algorithm.
There is a field called “somethingElse” in the XML file. It does not cause any trouble. The code simply disregards it. I added it to demonstrate the “eXtensible” in “eXtensible Markup Language”.
public static void LoadXml() { string lDesktopPath = Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory) + @"\"; string lFile = lDesktopPath + "Walmart.xml"; XDocument lXDocument = XDocument.Load(lFile); // food (using WmItem) WmItem[] lFood = (from lData in lXDocument.Descendants("Walmart").Descendants("food") select new WmItem( lData.Element("name").Value, double.Parse(lData.Element("price").Value), lData.Element("description").Value) ).ToArray(); foreach (WmItem lItem in lFood) Console.WriteLine(lItem); Console.WriteLine(); // electronic (quick and dirty, using var) var lElectronic = from lData in lXDocument.Descendants("Walmart").Descendants("electronic") select lData; foreach (var lItem in lElectronic) { Console.WriteLine(lItem); Console.WriteLine(); Console.WriteLine(lItem.Element("name").Value); Console.WriteLine(lItem.Element("price").Value); Console.WriteLine(lItem.Element("description").Value); } Console.ReadLine(); } //
example output:
Banana 1.99 Mexican delicious
Rice 0.79 the best you can get
Cornflakes 3.85 buy some milk
Milk 1.43 from happy cows
baked beans 1.35 very British<electronic>
<name>Kindle fire</name>
<price>100</price>
<description>Amazon loves you</description>
<somethingElse>the perfect Xmas gift for your kids</somethingElse>
</electronic>Kindle fire
100
Amazon loves you
Saving XML is also straight forward. I added a comment and attributes for demonstration purposes. The program generates and saves the XML file “Genesis.xml” on your desktop.
public static void SaveXml() { string lDesktopPath = Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory) + @"\"; string lFile = lDesktopPath + "Genesis.xml"; WmItem[] lMetals = { new WmItem("Lead", 1.0, "here we go"), new WmItem("Silver", 2.0, "cutlery"), new WmItem("Gold", 3.0, "wife's best friend"), new WmItem("Platinum", 4.0, "posh") }; XDocument lXDocument = new XDocument(); lXDocument.Declaration = new XDeclaration("1.0", "utf-8", "yes"); lXDocument.Add(new XComment("copyfight by Bastian M.K. Ohta")); XElement lLME = new XElement("London_Metal_Exchange", new XAttribute("attribute1", "buy here"), new XAttribute("AreYouSure", "yes")); lXDocument.Add(lLME); foreach (WmItem lMetal in lMetals) { XElement lGroup = new XElement("metal"); lGroup.Add(new XElement("name", lMetal.name)); lGroup.Add(new XElement("price", lMetal.price)); lGroup.Add(new XElement("description", lMetal.description)); lLME.Add(lGroup); } lXDocument.Save(lFile); //Console.ReadLine(); } //
Async and await (advanced, .Net 4.5, C# 5)
The importance is in the details. It all looks easy, but follow each step carefully today.
Windows pauses threads that are waiting for I/O operations to complete (eg. internet or file access). The same threads cannot be used for other jobs in the meantime and new threads need to be created. You could use tasks to solve this specific problem. The program would start an asynchronous task to deal with an I/O operation. After a while the same task would trigger a follow-up procedure via continuation task. It requires some work to cover all code paths, but it can be done.
C# 5 has implemented new keywords to make your life easier. You can use async to mark methods for asynchronous operations, which start synchronously and then split up as soon as the program arrives at any await keyword.
The below Print() method prints the time, sequence and ThreadId. This information is useful to understand the program cycle.
private static void Print(int xStep) { Console.WriteLine(DateTime.Now.ToString("HH:mm:ss") + " step " + xStep + " , thread " + Thread.CurrentThread.ManagedThreadId); } // static async void AsyncCalls1() { Print(1); int i = await Task.Run<int>(() => { Print(2); Thread.Sleep(5000); Print(3); return 0; }); Print(4); // same thread as in step 3 Console.ReadLine(); // return void } //
example output:
19:09:36 step 1 , thread 9
19:09:36 step 2 , thread 10
19:09:41 step 3 , thread 10
19:09:41 step 4 , thread 10
The above code is a warm up for us. The method AsyncCalls1() returns void. I emphasize this seemingly insignificant fact here. If you do not return void then the compiler will complain. It wants you to add async in the calling method as well. But if you do so, then it would also ask you to add async in the calling method, that called the calling method. It would be an endless game until you arrive at Main(). And there you would not know what to do, because you cannot use async in Main(). Novices can get quite frustrated with such minuscule glitch.
What is the program doing? It starts new task, which uses another thread from the thread pool. The original thread is then neglected, there is no follow-up. Now check this out: When the created task ends, the program continues with (Task.ContinueWith()) the same thread, which it was using in the task. It seems there is no context switching.
static async void AsyncCalls2() { Print(1); Task<int> task = AsyncCalls3(); Print(4); int x = await task; Print(7); // same thread as in step 6 Console.ReadLine(); // return void } // static async Task<int> AsyncCalls3() { Print(2); int i = await Task.Run<int>(() => { Print(3); Thread.Sleep(5000); Print(5); return 0; }); Print(6); return i; // same thread as in step 5, returning an INTEGER !!! } //
example output:
19:10:16 step 1 , thread 9
19:10:16 step 2 , thread 9
19:10:16 step 3 , thread 10
19:10:16 step 4 , thread 9
19:10:21 step 5 , thread 10
19:10:21 step 6 , thread 10
19:10:21 step 7 , thread 10
Method AsyncCalls3() has a return value, which is a Task. The task that is started inside this method returns an integer. But doesn’t Task.Run() have to return Task<int> according to its definition? It is the await that changes the behavior. It returns the integer value (0). await has been implemented to shorten code, and this is what it does. The code is more legible.
Method AsyncCalls2() calls AsyncCalls3() and receives an integer and not a Task<int>. This is caused by the async keyword.
AsyncCalls2() itself returns void. This is the same issue as with AsyncCalls1(). However AsyncCalls3() can return a value to AsyncCalls2(), because AsyncCalls2() itself uses the async keyword in the method definition.
Check the program sequence. I marked the steps clearly to make comprehension easy. And then analyse the thread context switching. Between step 2 and 3 is a context switch operation, but not between 5, 6 and 7. This is the same behavior as in the first example code.
public static async void AsyncCalls4() { Print(1); string s = await AsyncCalls5(); Print(4); Console.ReadLine(); // return void } // // using System.Net.Http; public static async Task<string> AsyncCalls5() { using (HttpClient lClient = new HttpClient()) { Print(2); string lResult = await lClient.GetStringAsync("http://www.microsoft.com"); Print(3); return lResult; } } //
example output:
19:11:47 step 1 , thread 10
19:11:47 step 2 , thread 10
19:11:48 step 3 , thread 14
19:11:48 step 4 , thread 14
When searching for async and await on the web you will find the emphasis on I/O. Most example programs concentrate on this and don’t explain what is roughly going on inside the async-I/O method itself. Basically .Net async-I/O methods deal with tasks and use a similar construction to Task.ContinueWith(). This is why I concentrated on different examples that can be used in any context (even though not very meaningful examples). The internet download example is more or less a classical one. You can use await on many I/O methods. Keep in mind that AsyncCalls4() returns void and that you are not supposed to call AsyncCalls5() from the Main() method, because you would have to add async to it.