Using Multiple Sort Orders for analyzing Arabic/Hebrew

Vowels can sometimes get in the way of linguistic discovery! In this lesson, we set FieldWorks sort order to ignore vowels to faciliate discovery of consonantal roots in Arabic.

Create a sort order that ignores vowels.

wpid350-media_1387567553451.png

Select the appropriate writing system to modify.

wpid351-media_1387567589605.png

Open the Writing System Properties dialog, Sorting tab.

wpid352-media_1387567687414.png

Read Help topic as a reference for customizing sort orders.

wpid353-media_1387567751930.png

Read Collation in FieldWorks … how to ignore characters.

wpid354-media_1387567841061.png

List the characters that you want to ignore.

wpid355-media_1387567900974.png
  1. Type &[last tertiary ignorable] followed by a space followed by an equals sign followed by a space followed by the character you want to ignore and repeat the equals sign and character as needed.
  2. Example:&[last tertiary ignorable] = a = ə = i = ɪ = o = e = u = æ = ɛ (Note that there must be a space on each side of the equal signs). You may find it easier to create this in Notepad and then copy and paste into this dialog box.
  3. Click OK.

Note how the Wordforms now sort with disregard to vowels.

wpid357-media_1387568435064.png

From Texts / Word Analyses, if you sort by Form, you will note that vowels are now ignored. In this example, we see the possibility of a drb stem with some notion of hit.

Note how Lexical Entries also sort with no regard to vowels.

media_1387569415542.png

In this example, we can clearly see the presence of a ktb root.

Filters can take advantage of ignoring vowels.

wpid359-media_1387568717266.png
  1. From the Lexicon Edit tool,
  2. click the filter drop down and
  3. select Filter for…

Type the consonants that you want to search for.

wpid360-media_1387568745257.png

FLEx ignores vowels and displays all entries with ktb.

wpid361-media_1387568839156.png

The same principle applies when searching the Wordforms.

wpid356-media_1387568271939.png

A filter can be applied to any field that uses the writing system of which we changed the sort order.

To save time, store the custom sort order in a Notebook record.

media_1387569910441.png

To keep miscellaneous system level notes with the database, use the Notebook. First create a new Notebook record type called “System level notes”.

  1. Select Lists
  2. Select Notebook Record Types
  3. Click Insert.
  4. Provide a Name of “System level notes” and a description such as: “Used for storing reference information including the abbreviations I use, custom sort orders, etc. Create a separate record for each note type.”

Add a Notebook record with the custom sort order.

media_1387570151887.png
  1. Select Notebook.
  2. Insert a new record.
  3. Give the record a Title of “_Sort order ignoring vowels” and
  4. give it a record type of “System level notes” (see above step).
  5. Copy and paste the text you typed in the writing system sorting tab into the Description of this Notebook Record.

Adding new types of lexical relations

How to add a new types of lexical relations to the existing set of default types included with FieldWorks.

This tutorial is in video form (6 min: 20s) at: http://www.screencast.com/t/jAGe95Sy

For more information, refer to this video: http://tutorials.canil.ca/FieldWorksMovies/22-add%20lexical%20relations.mp4

Add from an existing sense of an entry.

wpid310-media_1387479019908.png

Suppose that we are on this entry, “to sell”, and we want to add the Actor of this Action, and establish a lexical relation.
Click on the circled blue triangle button beside Lexical Relations.

Select Create New Lexical Relation…

wpid311-media_1387479058784.png

Note that currently there is not an option for “Insert Actor (to this Action)”. Thus we need to create a new lexical relation type.
Select Create New Lexical Relation…

The following dialog appears.

wpid312-media_1387479193357.png
  1. If you do not want to see this dialog again, click Do not show this again.
  2. Click OK.

Learn about Reference set types

wpid313-media_1387479422426.png
  1. By default, the lexical relation will be created with a Reference set type set to “Sense Collection”.
  2. To learn more about Reference set types, click the circled blue triangle button.
  3. Select the Help … menu item.

Select About reference set types from the Help topic.

wpid314-media_1387479492771.png

Read the contents of the above circled link to find out which Reference set type you need.

Select the appropriate Reference set type.

wpid315-media_1387479535968.png

Lexical relations are usually between two or more senses of lexical entries. Thus, use one of the top 5 reference set types. (The others can be used for cross references between entries, or entries or senses).
For our example, we have a pair that has two different relation names – action of the actor, or actor of the action.

  1. Select Sense Pair – 2 relation names.

Fill in the information for the lexical relation.

wpid316-media_1387479794303.png
  1. Fill in the Name for one direction of the pair and
  2. give it a meaningful abbreviation (Nact … the noun of the act/action/activity).
  3. Fill in the Reverse Name and
  4. give it a meaningful abbreviation (Vact … the verb of the act/action/activity).
  5. Optionally, fill in a description to give more information. Sometimes this helps you come up with a better name and abbreviation.

Note that you are in the Lexical Relations list

wpid317-media_1387481353331.png

Return to the lexical entry to now add the relation to "merchant".

wpid318-media_1387481484763.png
  1. Click the Back button or
  2. Return to the Lexicon Edit view.

Select the newly created Actor Relation.

wpid319-media_1387481656707.png

Note that there are now two new options to choose from.

  1. Select Insert Actor Relation (to this Action).

Add Reference to a pre-existing sense of an entry or create a new entry if needed.

wpid320-media_1387481833156.png
  1. Fill in the Find box with the entry that you want to link to.
  2. Note that there are not any pre-existing entries to link to.
  3. Create … a new entry.
wpid321-media_1387481987175.png

wpid322-media_1387482029288.png

View the other entry using CTRL-click

wpid323-media_1387482805918.png

wpid324-media_1387482843816.png

What if the entry already exists?

wpid325-media_1387483065206.png

wpid326-media_1387483086365.png

wpid327-media_1387483357473.png
  1. Start filling in the entry to link to. FLEx will automatically display matching entries below.
  2. Click on the appropriate matching entry. (If the appropriate entry does not exist, then click on Create … otherwise do the following).
  3. Select the appropriate sense. Note that if the correct sense has not yet been created, you will need to manually do that first.
  4. Click on the Add button.
  5. Add in this case, means Add Reference (Action) as shown at the top of the dialog box.
wpid328-media_1387483606027.png

Using the Word List Concordance

The Word List Concordance provides the user with the means to see all occurrences of a particular word or morpheme. Interlinearization can be done in the Word List concordance tool directly while all the occurrences of a word form are displayed, helping the user make generalizations and specifications as necessary.

Open the Word List Concordance tool and peruse the data.

media_1361932886633.png
  1. After opening FieldWorks, click on the Texts & Words view button.
  2. Then Click on the Word List Concordance tool.
  3. Sort the Wordforms pane by the Number in Corpus. Click once on the column heading for an ascending sort. Click one more time on the column heading for a descending sort (largest numbers first).
  4. Scroll through the data to get a sense of the words you have in your data corpus. Select a word that has a relatively large number of occurrences in the corpus.
  5. All of the occurrences appear in the top right pane. Click on any of these occurrences.
  6. The bottom right pane displays the particular occurrence with any interlinearization displaying. You can interlinearize directly in the tool just as you can from Interlinear Texts.

Use filtering to produce a concordance of a particular morpheme

You may want to produce a concordance on a part of word, rather than a full word. For example, you might be interested in dog, dogs, dogged because they all have the root morpheme dog. If the texts are interlinearized (and the interlinearizations are more than just proposals by FieldWorks, but have also been human verified), we can use the Concordance tool to do a concordance directly on a particular Lexical Entry or even a Sense of a particular entry. But often, we want to concord on a morpheme as a help to interlinearization – thus the interlinearization is NOT yet done. In the Word List Concordance tool, we can filter on Form to find all words that contain a morpheme as shown below.

Select Filter for …

media_1361933864418.png
  1. Select the Form column and click on the combox box selection.
  2. Choose Filter for …

Fill in your filter criteria.

media_1361933950781.png
  1. In this case, I am interested in Tagalog words that contain the string of characters gusto.
  2. This string can occur anywhere in the word.
  3. If I was looking for a suffix such as -s or -ed, I could choose At end instead of Anywhere.
  4. For more complicated search patterns, you can use Regular Expressions.
  5. Click the Help button for assistance in using Regular Expressions or any of the other dialog functions.
  6. Click OK.

Peruse the filter results

media_1361934002089.png
  1. Note that when a filter is applied, the 2nd row is highlighted in yellow.
  2. The resulting rows from the filter are displayed. In this case, we see all of the words that have gusto in them. Click on any row to see the occurrences of the word.
  3. The contextual occurrences are displayed in the upper right. Click on a particular row to see it’s full context and interlinearization in pane 4.

Turn off a filter condition

media_1361934553564.png

You can easily turn off a filter condition by selecting the yellow cell in the column and selecting Show All.

Using PyLexTools - pyConc

PyLexTools provides an easy way to generate concordances of morphemes, words, and also has a function to produce collocational concordances. Created by Doug Rintoul, originally as DOS program, he has recently created a Windows interface to LexTools using Python, a scripting programming language, which we will use in our class.

Py stands for Python.
Conc stands for Concordance
Colo stands for Collocation.

In this lesson, we will learn how to use pyConc in the CanIL Lab.

In the lab: Open H:\Class\L587

wpid188-media_1361935237872.png

Open pyLexTools and double-click on pyConc

wpid189-media_1361935324557.png

PyConc is launched.

wpid190-media_1361935425646.png
  1. After double-clicking on pyConc, you should see something like the above screen shot. There will be a window called Python. (Python is a scripting language that is interpreted, rather than compiled. The Python window you see here indicated that the Python interpreted is running).
  2. There will be another window called Conc.
  3. I have already created Projects for each of you in folders on the H drive. Click on Open Project and follow the instructions below.

Open pyLex project

1. pyLex may ask you if you want to save the current project. Answer No (unless it’s yours and you want to save changes).
2. pyLex will present a dialog for you to open a project. Navigate to the H:\Class\L587directory and open your folder (e.g MartinPyLex) and open the *.lt file that appears in the dialog.

View the preferences that have been preset for you.

wpid191-media_1361935916067.png

Click on the Set Preferences button.

Colo/Conc Preferences dialog

wpid192-media_1361936048551.png

These settings were already done for you. They are explained below in case you need to make changes to them.

  1. The Primary Collating Sequence instructs pyLex how to sort concordances. Note how upper and lowercase have = between them. Groupings of upper and lowercase pairs are separated by spaces.
  2. Secondary collating sequence is for characters that might be used in the data such as stress, hyphens, etc. but should be sorted after the primary sort.
  3. The Ignore sequence is often helpful if we have indicated stress or tone and do not want it used in the sorting.
  4. The font is set to Arial. All data will appear in this font. For languages and writing systems that we normally use in Field Methods, this should be set to Doulos SIL or equivalent to handle IPA.
  5. The source text looks like that below. Here we specify the fields used in the text.
  6. The texts are collected from H:\Class\L587\TagalogToolbox. There are many of them!
  7. Click OK or Cancel.

Source text example of one sentence:
\ref TGL15701 002
\tx Dahan dahan siyang lumakad sa bahay nang kanyang
\mo dahan dahan siya na lakad um sa bahay nang kanya na
\ge slow slow 3SN LK walk AF/NFu FMK house FMK 3SO LK

\tx tiya.
\mo tiya
\ge aunt

\tr He walked slowly to his aunt’s house.

The Conc Interface

wpid193-media_1361936551579.png
  1. Because the texts are interlinearized, you can create a concordance based on each morpheme or each word. Switch between the two choices and observe what happens in pane 2.
  2. This displays the number of occurrences of the morpheme or words.
  3. In the generated concordance you can set the item of interest off using characters such as square brackets.
  4. As shown in the data sample above, in our generated concordance we want to see the source text line (tx), the interlinear break down (mo and glosses ge) and the free translation (tr).
  5. If you interested in a certain morpheme but want to exclude words of non-interest to you, you can select ExcludeList and create a list of words to keep out of the concordance.
  6. You can sort by collating sequence (alphabetical order, much like a Bible concordance is done) or by frequency.
  7. If you want to print out the concordance you may want to limit the number of occurrences for high frequency words or morphemes to 100 or less rather than showing the thousands of occurrences.

Generate a Concordance

wpid194-media_1361937091186.png
  1. Select Word or Morpheme for the list to be displayed.
  2. Leave both of these selected. They will be displayed once we create the concordance.
  3. Select a morpheme or word to concord on.
  4. Press Concord. PyLexTools will go through all of those texts and generate a concordance in no time!

Reading the summary results

wpid195-media_1361937339689.png
  1. Select the Summary tab.
  2. PyLexTools has found 371 instances of siya.
  3. In 7 different words.
  4. Each word is listed with the number of occurrences.

Reading the Result results

wpid196-media_1361937471698.png
  1. Select the Result tab.
  2. This is a concordance of the morpheme siya.
  3. \lw = L? word (maybe LexWord)
  4. \wc = Word count ,,, in this case only 1 occurrence of this word. In #6 below, it occurs twice.
  5. The actual sentence, interlinearized that contains the morpheme. Note how the square brackets help us see what we are concording on. Pretty cool.
  6. A reference of another word containing siya.

Saving the results

wpid197-media_1361937878590.png

You can select text right in the results windows and copy and paste to Word or your favorite text processor.
You can also close the wxTabbedViewer. In your project folder, you will see the actual files generated by pyConc. ConcResults can be renamed (eg. Conc_siya.txt)
More to come on how to use pyColo!

Exporting FieldWorks FLEX Discourse charts for printing

Export a FLEX discourse chart to Microsoft Word XML. Remove non-breaking spaces so that the chart can be fit to various paper sizes. Print the Word document.

Export Discourse Chart

wpid-media_12686870885131.png

Select Word XML

wpid-media_12686886417101.png

Select Word XML and click the Export … button. Choose a location to Save the file to.
Note: FLEX may appear unresponsive once you choose to Save the file. Depending on the length of your text, the number of columns, etc. this may take a few minutes to finish exporting.

Open the Word XML file in Microsoft Word

wpid-media_12686889876041.png

1. In the Open file dialog, be sure to select All Files (*.*) (normally this is set to look for .doc, .docx files only).
2. Select the exported Word XML file.

Find and replace non-breaking spaces with normal spaces.

wpid-media_12686895553941.png

By default, the discourse analysis tool exports the chart with non-breaking spaces. This makes reformatting the chart to different paper sizes very difficult. Line wrapping within cells cannot occur with non-breaking spaces so columns remain very wide.

In this step we replace the non-breaking spaces with normal spaces so that we can reformat the chart.

1. In Word 2003, select Edit … Replace …
2. In Word 2007-2010, select the Home palette and the Replace button.
Or in either Word 2003 or Word 2007-2010, press CTRL-H

Find and replace non-breaking spaces with normal spaces continued …

wpid-media_12686945378801.png

1. Type ^s in the Find what: field. ^s is a special Microsoft Word code which is used to represent non-breaking space.
2. Put a space in the Replace with: field.
3. Click Replace All.

wpid-media_12686954845071.png

Depending on the length of your text, you will probably have hundreds to thousands of replacements reported.

Save your work.

Resizing and printing the discourse chart in Word 2003

Select a printer.

media_1268696510199.png

Select File … Print.

Select a printer continued …

media_1268696218151.png

1. Select the printer that you want to use (different paper sizes of CanIL printers are listed below). If the printer you need is not listed, please talk to a lab administrator.
2. Click on Properties.

You may want to select a printer that can handle legal (8.5" x 14") or ledger (11" x17") sized paper.
CanIL printers:
- Lab Printer: Brother HL-5250DN supports letter and legal size with the fold out accessory tray.
- Photocopier: Sharp MX-PS supports letter, legal and ledger with colour capability. Requires a lab administrator code for colour printing.

Select appropriate paper size and orientation.

media_1268696582356.png

1. Select paper size. (Note: Ledger paper is more expensive).
2. Select orientation.
3. If printing to a color printer, select the Color tab and turn color on. (Note: Color printing is more expensive).
4. Click OK.

DO NOT TO CLICK THE OK button on the next screen. See below.

Close the printer dialog box.

media_1268696974783.png

In the Print dialog click on the Close button.

Configure page setup in Microsoft Word

media_1268697447266.png

1. Make margins smaller (0.3 to 0.5 inches) to make maximum use of paper.
2. Select the appropriate orientation (this will very likely be Landscape).

Configure page setup in Microsoft Word continued …

media_1268697597680.png

1. Click on the Paper tab.
2. Select the appropriate size of paper (letter, legal or ledger).
3. Click OK.

Resize the chart to use the full size of the paper.

media_1268698494361.png

1. Click anywhere inside the chart.
2. Select Table … Autofit … AutoFit to Contents

The columns should automatically readjust to fit inside the specified width of the page.

Preview before printing. Print.

media_1268698600115.png

Select File … Print Preview. Make sure that the pages look the way you want them to before you hit print (these discourse charts can end up being many pages!).
If everything looks good, then select Print.

See a lab administrator for a printing code if you are printing in color.

Welcome to tutorials.canil.ca

Welcome to tutorials.canil.ca. This site has been created primarily for the students and faculty of the Canada Institute of Linguistics (www.canil.ca) but we also hope others will be able to make good use of the materials here. The tutorials are helps for using Unicode IPA fonts and keyboards, SIL FieldWorks, SIL Phonology Assistant, Audacity and Microsoft Word for the purpose of linguistic research and class assignments.