NSScanner Tutorial: Parsing Data in Mac OS X

NSScanner Tutorial: Parsing Data in Mac OS X

NSScannerFeatureImage

In these days of big data, data is stored in a multitude of formats, which poses a challenge to anyone trying to consolidate and make sense of it. If you’re lucky, the data will be in an organized, hierarchical format such as JSON, XML or CSV. If you’re not so lucky, the data is more freeform and unstructured and you may have to struggle with endless if/else cases or regular expressions.

You can also use automated parsers such as NSScanner to analyze string data in any form, from natural written languages to computer programming languages. In this NSScanner tutorial, you’ll learn about the parser included in Cocoa and how to use its powerful methods to extract information and manipulate strings in really neat ways. You’ll use what you learn to build an OS X application that works like Apple Mail’s interface, as shown below:

NSScanner final

Although you’ll be building an OS X application in this tutorial, NSScanner is available on both OS X and iOS. By the end of this tutorial, you’ll be ready to parse text on either platform. Let’s get started!

Getting Started

Download the starter project, extract the contents of the ZIP file and open NSScannerTutorial.xcodeproj in Xcode.

You’ll find three folders named MasterViewController, Custom Cell and comp.sys.mac.hardware. In the View Controller folder you’ll find a simple xib file with a TableView on the left that contains a custom cell with a bunch of labels, and a TextView on the right hand side.

MasterViewController.m contains a pre-made structure that sets up the delegate/data source for a Table View. The Custom Cell folder contains PostCellView.h and PostCellView.m which form a subclass of NSTableCellView. The cell has all the properties that you need to set each individual data item.

As for the data to parse: the comp.sys.mac.hardware folder contains 49 data files for you to parse with your app; take a minute and browse through the data to see how it’s structured.

Note: The starter project uses table views to present the data; if you’re not familiar with table views in OS X, the How to Make a Simple Mac App on OS X provides some great background on the subject. You’ll find that UITableViews in OS X are quite similar to those in iOS apps.

Build and run the project to see it in action; you’ll see the following appear:

NSScanner starter

The basic framework is there: on the left hand side, the table view currently has placeholder labels with the prefix [Field]Value. These labels will be replaced with parsed data.

Understanding the Structure of the Data

Before going straight into parsing, it’s important to understand what you’re trying to parse.

Below is a sample file of the 49 files you have to parse; you’ll be parsing the items outlined in red below:

sample structure

The set of parsed items includes the From, Subject, Date, Organization, Lines, and Message fields. Out of the six fields, you’ll do something extra special with the “From” and “Message” fields, as follows:

“From” Field
For the “From” field, you’ll split the email and the name. This is trickier than it looks, as the name may come before the email, or vice versa. The “From” field might not even have a name or email, or it might have one but not the other.

“Message” segment
For the message segment, you’ll see if a message contains anything cost related. You’ll search the message for prices such as $1000 or $1.00, as well as particular keywords in the message.

The keywords you’ll search for are: apple, macs, software, keyboard, printer, video, monitor, laser, scanner, disks, cost, price, floppy, card and phone.

Other Fields
For the other fields, you’ll simply separate the field from its value.

The values of the fields are delimited by colons. Also note that the data’s field text segment is separated from the message text segment by a new line.

First off, you’ll need two classes to parse and hold the data to be displayed.

Creating the Object to Hold the Data

Navigate to File\New\File… (or simply press Command+N). Select Mac OS > Cocoa and then Objective-C class and click Next. Set the class name to MacHardwarePost and the subclass to NSObject. Click Next and then Create.

Open MacHardwarePost.h and add the following properties and method prototype between @interface and @end:

//The field’s values once extracted placed in the properties.
@property (nonatomic, strong) NSString *fileName;
@property (nonatomic, strong) NSString *fromPerson;
@property (nonatomic, strong) NSString *email;
@property (nonatomic, strong) NSString *subject;
@property (nonatomic, strong) NSString *date;
@property (nonatomic, strong) NSString *organization;
@property (nonatomic, strong) NSString *message;
@property int lines; 
 
//Does this post have any money related information? E.g. $25, $50, $2000 etc.
@property (nonatomic, strong) NSString *costSearch;
 
//Contains a set of distinct keywords.
@property (nonatomic, strong) NSMutableSet *setOfKeywords;
 
- (NSString *) printKeywords;

printKeywords returns an instance of NSString that places all keywords in one single string separated by commas. Think of this like Java’s toString method.

Open MacHardwarePost.m and add the following code between @implementation and @end:

- (id)init {
  if (self = [super init]) {
    _setOfKeywords = [[NSMutableSet alloc] init]; //1
  }
  return self;
}

init sets up NSMutableSet and its various properties. In line 1 above, _setOfKeywords, which is an instance of NSMutableSet, tracks all keywords found. You’re using NSMutableSet over NSMutableArray because it’s not necessary to store duplicate keywords in this context.

Still working in the same file, add the following code segment right after init:

- (NSString *) printKeywords
{
    NSMutableString *result = [[NSMutableString alloc] init]; //1
 
    NSUInteger i = 0; //2
    NSUInteger numberOfKeywords = [self.setOfKeywords count]; //3
 
    if (numberOfKeywords == 0) return @"No keywords found."; //4
 
    for (NSString *keyword in self.setOfKeywords) //5
    {
        //6
        [result appendFormat:(i != numberOfKeywords - 1) ? @" %@," : @" %@", keyword]; 
        i++;  //7
    }
    return result; 
}

Here’s what’s going on in the code above:

  1. Initialize an instance of NSMutableString named result and is used to append keywords together.
  2. Initialize the counter to 0.
  3. Obtain the size of the list.
  4. Check to see if the list is empty. If so, simply return a message.
  5. Loop over all keywords in self.setOfKeywords.
  6. Check if the counter i is equal to the last index in the list. If it is not, append a comma after the keyword; otherwise, don’t add a comma after the last word.
  7. Increment the counter to keep track of where you are in the list.

You have finished implementing the MacHardwarePost object which will store the data you extract from the files. Now, on to creating the parser!

Creating the Data Parser

Navigate to File\New\File… (or simply press Command+N). Select Mac OS > Cocoa and then Objective-C class and click Next. Set the class name to MacHardwareDataParser and the subclass to NSObject. Click Next and then Create.

Open MacHardwareDataParser.h and add the following imports before the @interface tag:

#import "MacHardwarePost.h"

Next, add the following method prototype between @interface and @end:

- (void)constructSelectorDictionary;
- (MacHardwarePost *)parseRawDataWithData:(NSData *)rawData;
- (id)initWithKeywords:(NSArray *)listOfKeywords fileName:(NSString *)fileName;

Now open MacHardwareDataParser.m and add the following code just before @implementation:

@interface MacHardwareDataParser ()
 
//Object that contain the fully extracted information.
@property (nonatomic, strong) MacHardwarePost *macHardwarePost; //1
 
//Stores selector methods that may be called by the parser.
@property (nonatomic, strong) NSDictionary *selectorDict; //2
 
//Contains the list of keywords to search
@property (nonatomic, strong) NSArray *listOfKeywords; //3
 
//Keeps track of the current file we are extracting information from
@property (nonatomic, strong) NSString *fileName; //4
 
@end

The properties between @interface and @end aren’t exposed to the caller of this class; they’re meant for private and/or internal methods and properties for the use of MacHardwareDataParser alone.

  1. The property macHardwarePost is where all the extracted field’s information will be stored. This property will be returned to the client using our parser once the parsing is complete.
  2. selectorDict is an NSDictionary with its key the field you’re parsing and its value a selector method. It’s really important to have different functions for different tasks and not do everything in one method. Each selector method will be explained later on; check out this StackOverflow post for more information on selectors.
  3. listOfKeywords stores the list of keywords you will use to search the message portion for matching keywords.
  4. fileName stores the data file you are currently parsing. It’s generally a good idea to store the file name mainly for debugging purposes. If there is some error with the data you have just parsed, you can easily pinpoint and examine the file to see what the issue is.

Initializing your Parser

Open MacHardwareDataParser.m add the following code between
@implementation and @end:

#pragma mark - Initialization Phase
 
- (id)initWithKeywords:(NSArray *)listOfKeywords fileName:(NSString *)fileName {
    if ( self = [super init] )
    {
        [self constructSelectorDictionary];
        self.listOfKeywords = listOfKeywords;
        self.fileName = fileName;
    }
    return self;
}
 
// build scanner selectors
- (void)constructSelectorDictionary {
    self.selectorDict = @{
        @"From" : @"extractFromWithString:",
        @"Subject" : @"extractSubjectWithString:",
        @"Date" : @"extractDateWithString:",
        @"Organization" : @"extractOrganizationWithString:",
        @"Lines" : @"extractNumberOfLinesWithString:",
        @"Message" : @"extractMessageWithString:"
        };
}

initWithKeywords:fileName: is an object initializer; when you create a MacHardwareDataParser object, you will pass in a listOfKeywords to be searched when parsing the message. You also need to pass in the filename that you are extracting data from to keep track of what you are parsing.

Invoking constructSelectorDictionary creates an instance of NSDictionary initialized with six key/value pair items. Whenever you see any one of these keys while parsing, selector will automatically call the corresponding method. For example, if you find the field “Subject”, the corresponding method extractSubjectWithString: will be called to extract the “Subject” field’s information.

Still working in the same file, add the following code after constructSelectorDictionary and before @end:

#pragma mark - Build Object Phase
 
// construct MacHardwarePost, and return object.
- (MacHardwarePost *)parseRawDataWithData:(NSData *)rawData // 1
{
    if (rawData == nil) return nil; // 2
 
    //Extracted information from raw data placed in MacHardwarePost fields.
    self.macHardwarePost = [[MacHardwarePost alloc] init]; // 3
 
    //Set the fileName within a MacHardwarePost object
    //to keep track of which file we extracted information from.
    self.macHardwarePost.fileName = self.fileName; // 4
 
    //Contains every field and message
    NSString *rawString = [[NSString alloc] initWithData:rawData encoding:NSUTF8StringEncoding]; // 5
 
    //Split Sections, so we deal with only fields, and then messages
    NSArray *stringSections = [rawString componentsSeparatedByString:@"\n\n"]; // 6
 
    if (stringSections == nil) // 7
    {
        return nil;
    }
 
    //Don't consider data that doesn't have a message. So stringSection must be > 1
    if ([stringSections count] >= 1) // 8
    {
        //Only need to extract the fields. (Located in the 0 index)
        NSString *rawFieldString = stringSections[0]; // 9
 
        //place extracted fields into macHardwarePost properties.
        [self extractFieldsWithString:rawFieldString]; // 10
 
        //Place contiguous message blocks back together in one string.
        NSString *message = [self combineContiguousMessagesWithArray:stringSections
          withRange:NSMakeRange(1, [stringSections count])]; // 11
 
        //Set macHardwarePost message field.
        [self extractMessageWithString:message]; // 12
 
        //Analyze the message for $money money, every amount searched we will record all the amounts
        // concatenate a string of $ e.g. $25, $60, $1250 in one whole string
        // Empty string if no amount of money was talked about.
        [self extractCostRelatedInformationWithMessage: message]; // 13
 
        //We are going to loop through the message string and look for the "keywords".
        [self extractKeywordsWithMessage: message]; // 14
    }
    return self.macHardwarePost; // 15
}

Taking each numbered comment in turn, you’ll find the following:

  1. parseRawDataWithData takes an instance of NSData as a parameter that contains your data. Once it has parsed all the fields and the message body, the method returns a MacHardwarePost object in line 15.
  2. Check to see if the data is nil before you begin parsing.
  3. Create a new MacHardwarePost object and initialize it as empty. You’ll set all the properties’ values once you start extracting information.
  4. Set the filename you’re working on for reference.
  5. Convert the NSData object into a raw string format.
  6. Separate the fields text segment from the message's text segment. The array could have a size larger than 2 since messages may also have newline breaks. componentsSeparatedByString will split the messages into segments if they’re separated by a newline — check the example given below for an example of this.
  7. Safety check to see if array was actually created.
  8. Check to see if the array is greater than 1. This lets you know there will be two or more components that include the fields and message sections.
  9. Store all the field text segments in rawFieldString.
  10. Pass rawFieldString into extractFieldsWithString to extract all the relevant fields and set properties appropriately in the MacHardwarePost object.
  11. Since you split the messages into multiple segments, you must combine the segments back together to parse cost related information and keywords.
  12. Pass the combined message into extractMessageWithString: to be set in the MacHardwarePost object.
  13. extractCostRelatedInformationWithMessage extracts and finds cost-related information.
  14. extractKeywordsWithMessage finds the keywords in the message.

Below is an example of how componentsSeparatedByString splits up the text segments:

splitting components

parseRawDataWithData is the first line of attack, to break up the incoming data into manageable chunks. This gives a clear outline of how the data is structured, and how it can be parsed step by step.

Next you’ll see how the individual fields and messages are parsed — this is where the fun begins! :]

Parsing the Individual Fields

Consider, if you will, the following sample field text segment:
Fields segments

Here is where NSScanner comes in. You know that each field and its value is separated by the delimiter :. The image below gives a visual representation of how each section is split up:

FieldStructure

An NSScanner object interprets and converts the characters of an NSString object into number and string values. You assign the scanner’s string on creating it, and the scanner progresses through the characters of that string from beginning to end as you request items.

Open MacHardwareDataParser.m and add the following code just after parseRawDataWithData and before @end:

/*
 * extractFieldsWithString, extracts the necessary fields for a data set,
 * and places them in the mac hardware post object.
 */
- (void) extractFieldsWithString: (NSString *)rawString
{
    NSScanner *scanner = [NSScanner scannerWithString:rawString]; // 1
 
    //Delimiters
    NSCharacterSet *newLine = [NSCharacterSet newlineCharacterSet]; // 2
 
    NSString *currentLine = nil; // 3
    NSString *field = nil; // 4
    NSString *fieldInformation = nil; // 5
 
    [scanner setCharactersToBeSkipped:[NSCharacterSet characterSetWithCharactersInString:@":"]]; // 6
 
    while (![scanner isAtEnd]) // 7
    {
        //Obtain the field
        if([scanner scanUpToString:@":" intoString:&currentLine]) { // 8
 
            //for some reason \n is always at the front. Probably because we setCharacterToBeSkipped to ":"
            field = [currentLine stringByTrimmingCharactersInSet: newLine]; // 9
        }
 
        //Obtain the value.
        if([scanner scanUpToCharactersFromSet:newLine intoString:&currentLine]) // 10
        {
            fieldInformation = currentLine; // 11
        }
 
        BOOL containsField = (self.selectorDict[field] != nil) ? YES : NO; // 12
 
        //Only parse the fields that are defined in the selectorDict.
        if (containsField) 
        {
          #pragma clang diagnostic push
          #pragma clang diagnostic ignored "-Warc-performSelector-leaks"
            [self performSelector:NSSelectorFromString(self.selectorDict[field])
              withObject:fieldInformation]; // 13
          #pragma clang diagnostic pop
 
        }
    }
}

Here is a comment-by-comment tour of the above code:

  1. scannerWithString initializes the scanner with a given string and returns an NSScanner object.
  2. Create a newline "\n" NSCharacterSet object. This is used when you read each field/value pair one at a time.
  3. currentLine stores the current field/value pair string.
  4. Initialize field to be used to retrieve selector methods from selectorDict.
  5. Initialize fieldInformation to be used to obtain the field’s information which will be passed into the selector’s parameters to be analyzed and extracted.
  6. setCharactersTobeSkipped: provided by NSScanner defines the set of characters to be ignored when scanning for a value representation. Recall that a field and its value are separated by a colon ":"; the colon is ignored when extracting the value. The returned string will not include the colon.
  7. Loop while you haven’t exhausted all significant characters in the string.
  8. Scan up to the colon, which grabs the field segment like so:
  9. scanToColon

  10. After obtaining the field segment, invoke stringByTrimmingCharactersInSet to remove the newline at the end of the string. Later on you’ll need to retrieve the selector using the field as a key to the dictionary selectorDict
  11. Scan up to the new line character to grab the field’s information like so:
  12. scantonewline

  13. Store the data in fieldInformation.
  14. Check to see whether the field exists in selectorDict.
  15. If the field is in selectorDict, execute the method by invoking performSelector. This line is inside pragma tags simply to avoid warnings since the selectors are unknown at run-time.

Creating the Selector Methods

Recall that your selector dictionary is constructed as follows:

        @"From" : @"extractFromWithString:",
        @"Subject" : @"extractSubjectWithString:",
        @"Date" : @"extractDateWithString:",
        @"Organization" : @"extractOrganizationWithString:",
        @"Lines" : @"extractNumberOfLinesWithString:",
        @"Message" : @"extractMessageWithString:"

Now that you have the field and the field’s information, you also have the corresponding method executing automatically to perform the data extraction. You now need to implement the six methods that will be called to extract each field’s value.

Open MacHardwareDataParser.m and add the following code after extractFieldsWithString and before @end:

//Extracts the subject field's value, and update post object.
- (void)extractSubjectWithString: (NSString *)rawString
{
    self.macHardwarePost.subject = rawString;
}
 
//Place date string into date property.
- (void)extractDateWithString: (NSString *)rawString
{
    self.macHardwarePost.date = rawString;
}
 
//Place the organization field value into organization property.
- (void)extractOrganizationWithString: (NSString *)rawString
{
    self.macHardwarePost.organization = rawString;
}
 
 
//Teaches you how to extract an entire message.
- (void)extractMessageWithString: (NSString *)rawString
{
    self.macHardwarePost.message = rawString;
 
}

The methods above simply place the field information you extracted into the MacHardwarePost object.

Still working in the same file, add the following code immediately after extractMessageWithString::

//Teaches you how to extract a number.
- (void)extractNumberOfLinesWithString:(NSString *)rawString
{
    int numberOfLines;
    NSScanner *scanner = [[NSScanner alloc] initWithString:rawString];
 
    //scans the string for an int value.
    [scanner scanInt:&numberOfLines];
    self.macHardwarePost.lines = numberOfLines;
}

For extractNumberOfLinesWithString, NSScanner initializes the string that contains the number of lines. It then invokes scanInt: which scans for an int value from a decimal representation and returns the value found by reference.

Note: NSScanner has various other methods you can explore at your leisure:

  • scanDecimal:
  • scanFloat:
  • scanHexDouble:
  • scanHexFloat:
  • scanHexInt:
  • scanHexLongLong:
  • scanInteger:
  • scanInt:
  • scanLongLong:

Okay folks, brace yourselves: you’re getting deep into the guts of NSScanner and regular expressions. The first bit to parse is the “From” field.

Here you can combine your regular expression skills from the NSRegularExpression Tutorial on this site with your mad NSScanner skills. Regular expressions are a great way to establish string-splitting patterns.

Still working in the same file, add the following code after extractNumberOfLinesWithString: and before @end:

- (void)extractFromWithString: (NSString *)rawString
{
 
    //An advantage of regular expressions could be used here.
    //http://www.raywenderlich.com/30288/
    //Based on the cases stated, we need to establish some form of pattern in order to split the strings up.
 
    NSString *someRegexp = @".*[\\s]*\\({1}(.*)"; //1
    // ROGOSCHP@MAX.CC.Uregina.CA (Are we having Fun yet ???)
    // oelt0002@student.tc.umn.edu (Bret Oeltjen)
    // (iisi owner)
    // mbuntan@staff.tc.umn.edu ()
    // barry.davis@hal9k.ann-arbor.mi.us (Barry Davis)
 
 
    NSString *someRegexp2 = @".*[\\s]*<{1}(.*)"; //2
    // "Jonathan L. Hutchison" <jh6r+@andrew.cmu.edu>
    // <BR4416A@auvm.american.edu>
    // Thomas Kephart <kephart@snowhite.eeap.cwru.edu>
    // Alexander Samuel McDiarmid <am2o+@andrew.cmu.edu>
 
    // Special case:
    // Mel_Shear@maccomw.uucp
    // vng@iscs.nus.sg
 
    NSPredicate *fromPatternTest1 = [NSPredicate predicateWithFormat:@"SELF MATCHES %@", someRegexp]; //3
    NSPredicate *fromPatternTest2 = [NSPredicate predicateWithFormat:@"SELF MATCHES %@", someRegexp2]; 
 
    //Run through the patterns
 
    //Format: Email (Name)
    if ([fromPatternTest1 evaluateWithObject: rawString]) //4
    {
        [self extractFromParenthesesWithString: rawString];
    }
    //Format: Name <Email> || <Email>
    else if ([fromPatternTest2 evaluateWithObject: rawString]) //5
    {
        [self extractFromArrowWithString: rawString]; 
    }
    //Format: Email
    else
    {
        [self extractFromEmailWithString: rawString]; //6
    }
}

After examining the 49 data sets, you end up with three cases to consider:

  • The first case: email ( name )
  • The second case: name
  • The third case: Email with no Name.

Here’s a step-by-step explanation of the above code:

  1. The first regular expression finds a pattern matching the first case. It checks for zero or more occurrences of any character, followed by zero or more occurrence of a space, followed by one open parenthesis "(" and finally zero or more occurrences of a string.
  2. The second regular expression finds a pattern matching the second case. It checks for zero or more occurrences of any character, followed by zero or more occurrence of a space, followed by one occurrence of an open angle bracket "<" and finally zero or more occurrences of any character.
  3. Create a NSPredicate object that defines logical conditions used to constrain a search. The MATCHES operator uses the regular expression package. You can read more about NSPredicate in the official Apple documentation.
  4. First you check if the field’s information is of the pattern Email (Name). If true, then pass it into extractFromParenthesesWithString which extracts the Email and the Name.
  5. If the first pattern doesn’t match, check for Name or just without the Name. If you find a match, pass it into extractFromArrowWithString which extracts the Email and/or Name.
  6. Finally, if neither of the first two patterns matched, this is the case where you only have an email. In this case, pass the string into extractFromEmailWithString.
Note: Check out debuggex; it’s a really cool way to check if a string matches the regular expression you created!

Still working in the same file, add the following code after extractFromWithString and before @end:

#pragma mark - extractFromWithString helper methods
 
//Extract the email, when the pattern is Format: email (No name specified)
- (void)extractFromEmailWithString:(NSString *)rawString {
    self.macHardwarePost.email = rawString;
    self.macHardwarePost.fromPerson = @"unknown";
 
}

extractFromEmailWithString handles the special case where you don’t match on pattern 1 or pattern 2; this is the case that only has the email but no name. In this case you just set MacHardwarePost object’s email and ser the name of the person to “unknown”.

Add the following code after extractFromEmailWithString and before @end:

//Extract the name of the person and email, when the pattern is Format: Name <Email>
- (void)extractFromArrowWithString:(NSString *)rawString {
    NSScanner *scanner = [NSScanner scannerWithString:rawString]; //1
 
    NSString *resultString = nil; //2
 
    [scanner setCharactersToBeSkipped:[NSCharacterSet characterSetWithCharactersInString:@"<>"]]; //3
 
    while (![scanner isAtEnd]) //4
    {
        [scanner scanUpToString:@"<" intoString:&resultString]; //5
        self.macHardwarePost.fromPerson = resultString; //6
        [scanner scanUpToString:@">" intoString:&resultString]; //7
        self.macHardwarePost.email = resultString; //8
    }
}

Here is a step-by-step explanation of the code above:

  1. Create an instance of NSScanner that scans the given string with the pattern Name
  2. Initialize resultString; the extracted name and email will be placed in this string.
  3. Set "<" and ">"to be ignored when scanning for a value representation.
  4. Loop through the scanner until you reach the end.
  5. Scan up to the first "<". This cuts off everything following, leaving only the Name, since you ignored "<" and “>” in line 3. The diagram below illustrates this in detail:
  6. from_example1

  7. Set the fromPerson field in MacHardwarePost.
  8. Scan up to ">" which will give you the email. This cuts out everything before "<" and after ">", like so:
  9. from_example2

  10. Set the email field of MacHardwarePost

You’re not done yet! Add the following code after extractFromArrowWithString: and before @end:

//Extract the name of the person and email, when the pattern is Format: Email (Name)
- (void)extractFromParenthesesWithString:(NSString *)rawString
{
    NSScanner *scanner = [NSScanner scannerWithString:rawString]; 
 
    NSString *resultString = nil;
 
    [scanner setCharactersToBeSkipped:[NSCharacterSet characterSetWithCharactersInString:@"()"]];
 
    while (![scanner isAtEnd])
    {
        [scanner scanUpToString:@"(" intoString:&resultString];
        self.macHardwarePost.email = resultString;
        [scanner scanUpToString:@")" intoString:&resultString];
        self.macHardwarePost.fromPerson = resultString;
    }
}

This is essentially the same as extractFromArrowWithString, except this method deals with parentheses.

Add the following code after extractFromParenthesesWithString and before @end:

#pragma mark- Utilities
 
- (NSString *)combineContiguousMessagesWithArray:(NSArray *)array withRange:(NSRange)range
{
    NSMutableString *resultString = [[NSMutableString alloc] init]; //1
 
    for(int i = (int)range.location; i < range.length; i++) //2
    {
        [resultString appendString: array[i] ]; //3
    }
 
    return [NSString stringWithString:resultString]; //4
}

Think back to the diagram showing how to split the text:

splitting components

You had to split the text segment with field-related information from the message segment with portions of the messages — now you need to recombine the message portion into one instead of multiple segments.

Here is a step-by-step explanation of the above code:

  1. You first create a new NSMutableString so you can edit the string whenever you try to combine a portion of text.
  2. Given the range of the message portion, start from index 1 (index 0 is the field portion) and loop toarray length - 1. You’ll loop through each index containing the message portion.
  3. Get the current index’s message portion and append it to the end of the resultString.
  4. Finally, return the combined text.

Now that you have the message portion in one place, you can start parsing the message for some useful information.

Parsing the Message

Your keyword search strategy is to look at every word and check your keyword’s dictionary to see if it matches. If so, add it to MacHardwarePost keywords array that stores all keywords found relating to this message.

Add the following code to the end of MacHardwarePost.m, just before @end:

//Extract keywords from the message.
- (void)extractKeywordsWithMessage:(NSString *)rawString {
    NSScanner *scanner = [NSScanner scannerWithString:rawString]; //1
    NSCharacterSet *whitespace = [NSCharacterSet whitespaceCharacterSet]; //2
 
    NSString *keyword = @""; //3
 
    while (![scanner isAtEnd]) //4
    {
        [scanner scanUpToCharactersFromSet:whitespace intoString:&keyword]; //5
        NSString *lowercaseKeyword = [keyword lowercaseString]; //6
 
        if([self.listOfKeywords containsObject: lowercaseKeyword]) //7
        {
            [self.macHardwarePost.setOfKeywords addObject:lowercaseKeyword]; //8
        }
    }
}

The above code is fairly straightforward:

  1. scannerWithString: initializes the scanner with a given string — in this case your message — and returns an instance of NSScanner.
  2. Next you create an NSCharacterSet for whitespace; this let you scan up to the next set of characters separated by a whitespace, as shown below:
  3. NSScanner_whitespaces

  4. Initialize the keyword string; this is used to store the found keyword.
  5. Loop until you’re at the end of the text.
  6. Scan up to a whitespace and store the result in the keyword string.
  7. Convert the keyword into lowercase as you want to ignore capital letters.
  8. Check if the keyword exists in setOfKeywords.
  9. If the keyword exists, add the keyword to MacHardwarePost‘s keywords array.

Extracting Cost-Related Information

To search for cost related information, use NSScanner to search each word separated by a whitespace. This is similar to the keywords strategy, but instead you’re now searching for an occurrence of a dollarCharacter ($).

Add the following method code to the end of MacHardwarePost.m, just before @end:

// Extract amount of cost if the message contains "$" symbol.
- (void)extractCostRelatedInformationWithMessage:(NSString *)rawString
{
    NSScanner *scanner = [NSScanner scannerWithString:rawString]; //1
    NSMutableString *costResultString = [[NSMutableString alloc] init]; //2
 
    NSCharacterSet *whitespace = [NSCharacterSet whitespaceCharacterSet]; //3
    NSCharacterSet *dollarCharacter = [NSCharacterSet characterSetWithCharactersInString:@"$"]; //4
 
    NSString *dollarFound;
    float dollarCost;
 
    while (![scanner isAtEnd]) //5
    {
        //Have the scanner find the first $ token
        if (![scanner scanUpToCharactersFromSet:dollarCharacter intoString:nil]) //6
        {
            [scanner scanUpToCharactersFromSet:whitespace intoString:&dollarFound]; //7
 
            NSScanner *integerScanner = [NSScanner scannerWithString:dollarFound]; //8
            [integerScanner scanString: @"$" intoString:nil]; //9
            [integerScanner scanFloat: &dollarCost]; //10
 
            if (!(int)dollarCost == 0) //11
            {
                [costResultString appendFormat:@"$%.2f ", dollarCost];
            }
        }
    }
    self.macHardwarePost.costSearch = costResultString; //12
}

Here’s what’s going on in the code above:

  1. Again, scannerWithString: initializes the scanner with a given string — in this case your message — and returns an instance of NSScanner.
  2. Create an NSMutableString so you can append all the cost related information into a single string.
  3. Create a whitespace NSCharacterSet so you can jump to the next word after analyzing the previous one.
  4. Create a dollarCharacter NSCharacterSet so you can scan up to a string that starts with a $ symbol.
  5. Continue to loop until you reach the end of the message.
  6. scanUpToCharactersFromSet scans the string until it finds a $ symbol.
  7. Once you find a $ symbol, scan up to the next whitespace to give you the cost related portion.
  8. Create a separate NSScanner to scan the cost-related string.
  9. The NSScanner scans past the $ symbol, leaving you with only the amount.
  10. NSScanner scans the cost-related string for a float value; if you find one, store it in dollarCost.
  11. If scanFloat: fails it returns zero, so check dollarCost to see if you actually found a valid amount. If so, append it to the costResultString.
  12. Once the scanner reaches the end, set the MacHardwarePost costSearch field to the cost-related information extracted from the message.

There you have it — your parser is finally complete. Time to put your parser to good use and start extracting information from the 49 data files.

Connecting Your Parser with the 49 Data Files

The last things to do are run all 49 files through your parser to create the MacHardwarePost objects, pass these objects into your masterViewController and set up your delegate and data source for the tableview to display the results.

Open AppDelegate.h and replace the code between @interface and @end with the following code:

@property (assign) IBOutlet NSWindow *window;
 
//Stores a reference to the data set's file path. E.g. /Users/userName/Documents/comp.sys.mac.hardware
@property (nonatomic, strong) NSString *dataSetFilePath;
 
//Stores an array of all data file names. E.g. 50419, 50420, 50421, ...
@property (nonatomic, strong) NSArray *listOfDataFileNames;
 
@property (nonatomic, strong) NSMutableArray *listOfPost;
 
@property (nonatomic, strong) IBOutlet MasterViewController *masterViewController;

Here’s an explanation of what these properties will be used for:

  • dataSetFilePath stores the path to the 49 data files so you can easily obtain each individual file to be parsed.
  • listOfDataFileNames stores all 49 data file names in an array; each file name will be appended to dataSetFilePath to get an individual file.
  • listOfPost stores all MacHardwarePost objects once you’re done parsing.
  • masterViewController contains the TableView and TextView for your app.

Open AppDelegate.m and add the following code just before @implementation:

#import "MacHardwareDataParser.h"
#import "MacHardwarePost.h"

You’ll need to include these imports to reference those classes in the next bit.

Still in the same file, replace applicationDidFinishLaunching with the following code:

- (void)applicationDidFinishLaunching:(NSNotification *)aNotification {
    NSError *error = nil; 
 
    //Obtain the file path to the resource folder.
    NSString *folderPath = [[NSBundle mainBundle] resourcePath]; //1
 
    //Get all the fileNames from the resource folder.
    self.listOfDataFileNames = [[NSFileManager defaultManager] contentsOfDirectoryAtPath:folderPath error:&error]; //2
 
    //Keywords we are passing into the scanner to check if a message contains one or more of these words.
    NSArray *keywords = @[ @"apple", @"macs", @"software", @"keyboard", @"printers",
                           @"printer", @"video", @"monitor", @"laser",
                           @"scanner", @"disks", @"cost", @"price",
                           @"floppy", @"card", @"phone" ]; //3
 
    self.listOfPost = [[NSMutableArray alloc] init]; //4
 
    //Loops through the list of data files, and starts scanning and parsing them and converts them
    //to MacHardwarePost objects.
    for (NSString *fileName in self.listOfDataFileNames) //5
    {
        //ignore system files, fileName we are interested in are numbers.
        if ([fileName intValue] == 0) continue; //6
 
        NSString *path = [folderPath stringByAppendingString:
          [NSString stringWithFormat:@"/%@", fileName]]; //7
        NSData *data = [NSData dataWithContentsOfFile: path]; //8
 
        MacHardwareDataParser *parser = [[MacHardwareDataParser alloc]
          initWithKeywords:keywords fileName:fileName]; //9
 
        MacHardwarePost *post = [parser parseRawDataWithData:data];//10
 
        if (post != nil)
        {
            [self.listOfPost addObject:post];//11
        }
    }
 
    //Create the masterViewController
    self.masterViewController = [[MasterViewController alloc]
      initWithNibName:@"MasterViewController" bundle:nil]; //12
 
    self.masterViewController.listOfMacHardwarePost = self.listOfPost;//13
 
    //Add the view controller to the Window's content view
    [self.window.contentView addSubview:self.masterViewController.view];
    self.masterViewController.view.frame = ((NSView*)self.window.contentView).bounds;
}

Taking each numbered comment in turn, you’ll find the following:

  1. First obtain the resource folder’s path, which is where all 49 files are located.
  2. Get all the filenames within the resource folder by calling contentsOfDirectoryAtPath: which returns an array of file names that could be either files or directory names.
  3. Initialize an instance of NSArray named keywords and set it up with all the keywords to search in our message.
  4. Initialize an instance of NSMutableArray called listOfPost to store all the MacHardwarePost objects.
  5. Loop through the list of files within the resource directory.
  6. Check each fileName to see if it’s an integer as all 49 of the file names are integers. If it isn’t, check the next file to see if it’s a data file to parse.
  7. Append the filename to the resource path to obtain the full path to the file.
  8. Get the individual data file in the form of a NSData using the data file path.
  9. Create an instance of MacHardwareDataParser and pass in the keywords to search for and the fileName of the data file to parse.
  10. Pass the data file into your parser’s parserRawDataWithData which extracts all the important data. Once complete, the method returns a MacHardwarePost object ready to use.
  11. Add the object to the list of MacHardwarePost objects if the parsing was successful.
  12. Once you’re done parsing all 49 data files, create the masterViewController.
  13. Pass the list of MacHardwarePost objects into your masterViewController which will be used later to set your data source.

At this point, you’ve parsed all 49 data files and passed the MacHardwarePost objects to your masterViewController — it’s finally time to display the results of all your hard effort! :]

Setting up the Table View Delegate and DataSource

Open up MasterViewController.m and add the following imports before @implementation:

#import "MacHardwarePost.h"

Find numberOfRowsInTableView: and replace the implementation with the following:

- (NSInteger)numberOfRowsInTableView:(NSTableView *)tableView {
  return [self.listOfMacHardwarePost count];
}

numberOfRowsInTableView is part of the table view’s data source protocol; it sets the number of rows in a section of the table view. In this case you only have one section, with the number of rows being the number of data files you parsed.

Next, find tableView:viewForTableColumn:row:. Replace the comment that says //TODO: Set up cell viewwith the code below:

PostCellView *cellView = [tableView makeViewWithIdentifier:tableColumn.identifier owner:self]; //1
 
if ( [tableColumn.identifier isEqualToString:@"PostColumn"] ) //2
{
  MacHardwarePost *post = [self.listOfMacHardwarePost objectAtIndex:row]; //3
 
  NSString *unknown = @"Unknown"; //4
  NSString *costRelated = @"NO"; //5
 
  cellView.from.stringValue = (post.fromPerson == nil) ? unknown : post.fromPerson; //6
  cellView.subject.stringValue = (post.subject == nil) ? unknown : post.subject; 
  cellView.email.stringValue = (post.email == nil) ? unknown : post.email; 
  cellView.costRelated.stringValue = (post.costSearch.length == 0) ? costRelated : post.costSearch;
  cellView.organization.stringValue = (post.organization == nil) ? unknown : post.organization; 
  cellView.date.stringValue = (post.date == nil) ? unknown : post.date; 
  cellView.lines.stringValue = [NSString stringWithFormat:@"%d", post.lines]; 
  cellView.keywords.stringValue = [post printKeywords]; 
}

NSTableViewDelegate has a method tableView:viewForTableColumn:row: which is a part of the table view’s delegate protocol; this is where you set up every individual cell. There’s a custom cell named PostCellView for your use which contains labels such as from, subject, email, costRelated, organization, date, lines, and keywords for you to set.

Here’s a detailed look at the code above:

  1. Create a new PostCellView.
  2. Check that the tableColumn is indeed PostColumn.
  3. Get an individual MacHardwarePost object that you parsed based on the current row.
  4. Set a NSString variable unknown in case the property within MacHardwarePost turns out to be nil.
  5. Set a NSString variable costRelated and initialize to “NO”.
  6. Use the ternary operator to check if the MacHardwarePost field is nil. If so, set the label to "unknown"; otherwise set the label to what you received from MacHardwarePost.

Lastly, set up TableView and TextView connection

In MasterViewController.m, replace the starter implementation of tableViewSelectionDidChange: with the following:

- (void)tableViewSelectionDidChange:(NSNotification *)aNotification {
  NSInteger selectedRow = [self.tableView selectedRow];  //1
  if( selectedRow >= 0 && [self.listOfMacHardwarePost count] > selectedRow )  //2
  {
    MacHardwarePost *post = [self.listOfMacHardwarePost objectAtIndex:selectedRow];  //3
 
    self.messageTextView.string = post.message;  //4
 
  }
}

tableViewSelectionDidChange instructs the delegate that the table view’s selection has changed. This method executes whenever the user selects a different cell.

Here’s the details of the above code:

  1. Get the currently selected row.
  2. Check if selectedRow is in-bounds.
  3. Get the MacHardwarePost corresponding to the selected row.
  4. Get the message of the object, and set the message on the text view.

Build and run your project; you’ll see all the parsed fields in the table view. Select a cell on the left and you’ll see the corresponding message on the right.

These data files grow up so fast! :] They were just raw data when you found them, and after you groomed them a little with your parser, they look all grown up now. Aww.

Where to Go from Here?

Here’s the source code for the finished project: NSScannerTutorialFinal.zip

There is so much more you can do with the data you have parsed. You could write a formatter that converts all MacHardwarePost into JSON, XML, CSV or any other formats you can think of! With your new-found flexibility to represent data in different forms, you can share your data across different platforms.

Using NSScanner is a great way to quickly manipulate and search for different strings. I hope this new skill gives you the power to parse all that meaningful data in your own apps!

If you’re really interested in the study of computer languages and how they are implemented, take a class in comparative languages. Your course will likely cover formal languages and BNF grammars – all important concepts in the design and implementation of parsers.

For more information on NSScanner and other parsing theory, check out the following resources:

If you have any questions or comments, please join the discussion below and share them!

NSScanner Tutorial: Parsing Data in Mac OS X is a post from: Ray Wenderlich

The post NSScanner Tutorial: Parsing Data in Mac OS X appeared first on Ray Wenderlich.

2
Like
Save

Comments

Write a comment

*