Let's say you have 10 files (which contain various tables) with hundreds of names of people. You are interested only in those names that occur in all files. To have a report on the names that occur in more than one file, select all the files that contain the tables and then use this function.
The active file is used as a "pivot file", so it is very important. This file is compared with all the other files, so if the name you're looking for is not in this file, it will not be reported. It is true that is rather difficult to choose the pivot file when you don't know what name you're looking for, but, mainly, the pivot file should be the file which contains the highest number of names.
The final report will contain all names that are found in more than one of the selected files. The names are sorted and displayed in the order of their rank (names that occur in most files are displayed at the top of the report).
Each file is broken into paragraphs. From each paragraph, words are extracted according to the options and a multiple filter is created with them. Each filter from the pivot file is then matched against each filter from all the other files.
Let's say you have the name "vonBrown" in the pivot file, and "Brown" in another file. Because "vonBrown" is a superset of "Brown", it can't be normally found in "Brown". Reverse matching ensures that it is (also) found.
This technology also ensures that the used rank is the highest: the maximum between the direct and the reverse priorities.
Let's say that the first file (the pivot file), named "A.txt", contains:
| Num | First name | Middle name | Last name | Gender | Address | ID |
| 001 | Jonh | Michael | Doe | Male | Jupiter | 742-5361 |
| 002 | Adrian | Michael | Paul | Male | Saturn | 326-5361 |
| 003 | Joanna | Faith | Brown | Female | Venus | 259-5361 |
| 004 | Helen | - | Yu | Female | Pluto | 810-5361 |
Let's say that the second file, named "B.txt", contains:
| Num | First name | Middle name | Last name | Gender | Address | ID |
| 001 | John | - | Doe | Male | Jupiter | 742-5361 |
| 002 | Adrian | - | Gainer | Male | Saturn | 590-5361 |
| 003 | Michelle | Faith | vonBrown | Female | Venus | 259-3314 |
| 004 | Helen | - | Yu | Female | Pluto | 810-5361 |
Let's say you want to see which names occur in both files. Use the following options:
Check "Use delimiter", and set the delimiter with "|".
Set "Begin extract" with 2. If the tables would not have "|" before the record number, you would have to set this with 1.
Set "End extract" with "5" (or 4, if there is no "|" before the record number).
Set "Whole match" with "Right". This ensures that "Brown" - "vonBrown" are detected as a match.
If you want to detect "John" - "Jonh" as a match, set "Misspelling" with 2. Make sure it's 2, not 0 or 1, because "Jonh" has 2 different letters (relative to the letters' positions, not to the letters themselves) compared with "John" ("n" and "h"). However, setting this option will generate many "false" matches, matches that are of no interest to you, like "John" - "vonBrown" or "John" – "Doe". You should also consider that "Helen Yu" can't be used as such, but only as "Helen" since "Yu" is too short to be used.
Set "Minimum matches" with the value you want, normally with 1.
If you would set "Misspelling" with 2, and "Minimum matches" with 2, the report would contain:
Source file: A.txt
Match count: 3
Elapsed time (seconds): 0
Match index: 1
Source filter: "JONH; MICHAEL; DOE"
Match count: 2
Rank: 1090
Partial match index: 1
Against file: "B.txt"
Against filter: "JOHN; DOE"
Matched filter: "JONH -> JOHN; JONH <- DOE; DOE -> JOHN; DOE"
Rank: 670
Partial match index: 2
Against file: "B.txt"
Rank: 420
Match index: 2
Source filter: "FIRST; NAME; MIDDLE; LAST"
Match count: 1
Rank: 980
Partial match index: 1
Against file: "B.txt"
Against filter: "FIRST; NAME; MIDDLE; LAST"
Rank: 980
Match index: 3
Source filter: "JOANNA; FAITH; BROWN"
Match count: 2
Rank: 675
Partial match index: 1
Against file: "B.txt"
Against filter: "JOHN; DOE"
Matched filter: "JOANNA <- JOHN; JOANNA <- DOE; BROWN <- JOHN"
Rank: 375
Partial match index: 2
Against file: "B.txt"
Matched filter: "FAITH; BROWN -> VONBROWN"
Rank: 300
As you can see, "John Doe" is detected (but also an uninteresting partial match: "vonBrown"). The second match is not important because it refers to the headers of the tables. The third match detects "Faith Brown". As you see, "Helen Yu" is not detected since is too short. So, these two names are the ones that are common in both files.
The report would be much easier to read if "Misspelling" would be 0 (but "John Doe" would not be detected in such a case).
The processing time for two files, each with one thousand paragraphs, is a few tens of seconds on a 1 GHz processor. For each new file added, with one thousand paragraphs, the processing time increases with 1 factor: double for one new file, triple for two new files, four times for three new files, and so on. The same happens for each new one thousand names added to any of the files.