- Core functionality
- Script Recorder and Automation Elements
- XPath Generation
- XPath Syntax
- XPath Consumption
- New START Methodology
Core functionality
XPath Generation and Consumption
Automation Library
The Script Recorder captures the location of a specific UI element and generates an address for it. UI elements within a desktop are captured via Microsoft’s UI Automation Library, an accessibility framework that retrieves metadata about buttons, windows, controls, and other elements. These elements are organized into a hierarchy, starting at the desktop and extending to the selected control. To inspect these hierarchies independently of the Script Recorder, you can use FlaUI to explore and view the different elements and their properties in detail. This is important because our XPath is based on these properties. Below is an example of an element hierarchy, starting at the desktop.
Script Recorder and Automation Elements
Each UI Automation element in a chain of elements contains a list of properties that help distinguish between multiple elements.
Script Recorder is built on Microsoft's UI Automation Library, which consists of a hierarchy of elements. Each element contains metadata, and the XPath serves as a summary of this metadata. Below is an example of an element and its properties.
For our purposes, the Script Recorder focuses on the following properties (click the links for more information):
-
Localized Control Type: An identifier for a UI Automation element, based on different standard types.
- See the link for more information on standard types
-
ClassName: A separate identifier that depends on the UIAutomation Provider’s implementation, making it more developer-dependent. The key difference is that it may or may not follow standards. See the example above to distinguish between 'Name' and 'ClassName'.
- Sometimes this is empty, other times it contains long GUIDs, and at other times, it may have specific text.
-
AutomationId: Distinguishes a UI Automation element from its siblings, but not from ancestors or non-sibling elements.
- This is the most reliable, unique property but sometimes can be empty or dynamically generated, resulting in randomized IDs (such as a sequence of numbers) that change with each execution of the target application. In these cases, it may be necessary to use wildcarding to ensure the script remains executable. For more information, see the Wildcards for Class Names, Names, Titles, and Automation IDs.
-
Position: An element’s position in UI Automation’s FindAll(…) methods refers to its relative order among its siblings. It primarily serves as a tie-breaker mechanism in scenarios where two identical elements exist in the tree hierarchy. In other words, element selection will always occur based on position when an element matches.
-
Name: A more specific identifier, closer to what you see on the screen. See the example above to distinguish between 'Name' and 'ClassName'.
-
This is the most specific identifier, ranging from a specific Excel cell to the title of a chat in Teams, etc. Below is an example showing the correlation between a name and its content in an application.
-
-
Ultimately, the Script Recorder takes all of these element properties (such as a class or object) and translates them into a text representation, which is XPath. It compresses the object into a shorter text string, picking only the most important properties and leaving out the rest.
XPath Generation
Script Recorder uses the standard Microsoft UI Automation Library to generate a concise summary of the UI element hierarchy (XPath), which the Engine then consumes. This library queries metadata associated with all the elements available on the desktop and the operating system’s UI. The Script Recorder takes the following functionality from the UI Automation Library:
- Query Microsoft’s UI Automation Library For Any Element on the Desktop (See FindAll, FindFromPoint, among others).
- In other words, it identifies the target elements.
- Collect the relevant metadata for individual elements (see the section above about relevant metadata).
- Build a metadata hierarchy, starting from the target element and extending up to the parent element, which is the desktop.
- It first finds The target element, based on the cursor location (FindByPoint), which is always the deepest element in the hierarchy.
- It then iteratively travels to each ancestor, collecting its metadata until it reaches the root.
- It compresses this hierarchy into a short XPath which we use for the Engine.
The screenshot below shows the full XPath and its processing behind the scenes.
The following screenshot shows an XPath example in Script Recorder. Each node (or element) in the hierarchy is delimited by a slash, and the items inside the brackets represent the metadata associated with each element. This example includes three different elements in the hierarchy.
Another auxiliary role of this library is that it performs the drawing of the rectangles when it identifies elements.
The Engine already made use of a different implementation of the XPath. This version is what existed in the Application X-Ray. Below is an example of an X-Ray XPath and its short XPath.
The X-Ray XPath is shorter and tries not to display the entire hierarchy from the element to the desktop. Script Recorder attempts to capture this hierarchy, as it is often necessary to replay a sequence of actions before being able to see some controls. To learn more, see Using the AutomationID Property.
Additionally, there are no guarantees that a given element has globally unique properties, and there may be scenarios where two distant, non-sibling elements are completely identical. For this reason, it is necessary to have “breadcrumbs” that point towards the correct element. This also applies to the AutomationID (Unique among siblings only). To learn more, see Using the AutomationID Property.
XPath Syntax
The Script Recorder will generate a signal XPath section (per element in the hierarchy) in the following format:
{TagGoesHere} : {ClassNameGoesHere} [{NameGoesHere}][AutomationId : {AutomationIdGoesHere}][Position: {PositionGoesHere}]
All of these individual elements have already been covered in the Script Recorder Automation Elements section. However, the concept of a 'Tag' has not been discussed yet. A Tag is typically related to the element's ControlType. Instead of using the full ControlType name, a shorter version is used. Here's an example of how a ControlTypeID is converted to a Tag.
Here’s a list of the possible control types that can be used as tags:
CONTROL_TYPE(Button),
CONTROL_TYPE(Calendar),
CONTROL_TYPE(Checkbox),
CONTROL_TYPE(Combobox),
CONTROL_TYPE(Edit),
CONTROL_TYPE(Hyperlink),
CONTROL_TYPE(Image),
CONTROL_TYPE(ListItem),
CONTROL_TYPE(List),
CONTROL_TYPE(Menu),
CONTROL_TYPE(MenuBar),
CONTROL_TYPE(MenuItem),
CONTROL_TYPE(ProgessBar),
CONTROL_TYPE(RadioButton),
CONTROL_TYPE(ScrollBar),
CONTROL_TYPE(Slider),
CONTROL_TYPE(Spinner),
CONTROL_TYPE(StatusBar),
CONTROL_TYPE(Tab),
CONTROL_TYPE(TabItem),
CONTROL_TYPE(Text),
CONTROL_TYPE(ToolBar),
CONTROL_TYPE(ToolTip),
CONTROL_TYPE(Tree),
CONTROL_TYPE(TreeItem),
CONTROL_TYPE(Custom),
CONTROL_TYPE(Group),
CONTROL_TYPE(Thumb),
CONTROL_TYPE(DataGrid),
CONTROL_TYPE(DataItem),
CONTROL_TYPE(Document),
CONTROL_TYPE(SplitButton),
CONTROL_TYPE(Window),
CONTROL_TYPE(Pane),
CONTROL_TYPE(Header),
CONTROL_TYPE(HeaderItem),
CONTROL_TYPE(Table),
CONTROL_TYPE(TitleBar),
CONTROL_TYPE(Separator),
CONTROL_TYPE(SemanticZoom),
CONTROL_TYPE(AppBar)
These control types are related to control types defined in the UI Automation Control Types Overview. It's worth remembering that this syntax represents a single section in a string of sections. The rationale for the existence of this mapping, as opposed to direct controlType usage, is described in the IUIAutomationElement::get_CurrentControlType method, which is related to the changing nature of controlTypes depending on the environment. The accepted ControlType IDs are documented in the Control Type Identifiers.
XPath Wildcarding
As of release 5.12, the Script Recording allows the use of wildcarding for all aspects of the XPath string:
- Class Name should now be allowed to use wildcards (therefore, now it is possible to write a random classname, with * mixed into it, and it should still work). So Cl*ssN*me should still be a valid wild-carded version of “ClassName”.
- Name / Title should now also be allowed to be wildcarded.
- AutomationID should now also be allowed to be wildcarded.
Thus taking the example from above, the following scenarios are valid:
{T*e} : {Cl*ssN*meGoesHere} [{N*meGoesH*r*}][AutomationId : {Autom*IdGoesHere}][Position: {PositionGoesHere}].
Wildcarding intends to deal with scenarios where certain XPath properties keep changing between different script executions. Consider the scenario where a Name property changes every time you run the application. In this case, you can trim off the parts of the Name that keep on changing. For example, if you are displaying a ListView where the ListViewItems get resorted every time you run your application. So instead of letting XPath find ListViewItem-1, you let it find ListViewItem-*. In this way, you clean up the dynamic parts of the script to make it more robust.
XPath Consumption
Login Enterprise as a whole will retain compatibility with both the old X-Ray XPath and the ScriptRecorder XPath. The only difference is that Script Recorder makes use of different methods than the Legacy LoginEnterprise scripts. In this way, the use of XRAY and ScriptRecorder is independent of each other.
In addition, there are a few logical changes between the Old Methods and New Methods.
Old Methods | New Methods | Description |
FindControl FindControlWithXpath |
FindControlWithXPathName | Finding Individual Elements and Controls |
Start | StartApplication | Launch Application with certain pre-conditions |
FindWindow | FindWindowByClassAndName | Find a window according to some given pre-conditions |
The logic inside the Engine to replay new XPath scripts works by following the full hierarchy in an XPath string until it reaches the target element. Remember, the XPath is like a set of breadcrumbs that lead to the target element. Here’s an example of an XPath to the Open Dialog in NotePad, split into 4 sections.
The Engine will then start following the breadcrumbs, starting from the desktop. It will then start listing direct children from the current node. The Engine then compares the metadata as received from the XPath with the metadata from all of the desktop’s children. The Engine will prioritize:
- Children with matching automation ID as this ID is guaranteed to be unique between siblings.
- It will check for elements with matching Name and Class Name.
The Engine will take the selected Element and then fetch its children, all the way until we get the target element (i.e. section 4 from the XPath above). There are a few fallback mechanisms as it is often the case that UI hierarchies are complicated and messy:
- What happens if there are multiple matches? We perform a tie-breaking based on the Position of the element when calling UIAutomation Library’s FindAll Method. Recall we take a position as one of the parameters in the XPath, and this is also the position of the element when calling FindAll. To learn more, see the Script Recorder and Automation Elements.
-
What happens if we are not able to find an element? This often takes place whenever we are dealing with dynamic behavior, where a certain interaction has to take place before an element becomes visible. To overcome this, we perform a fallback mechanism by Dumping All Descendants (therefore children of children) from the root element (not desktop, but at most the target application’s root). We then search through all children (a big list with all possibly matching children) And we collect all the children that match
-
What happens if many children match in this case? It will attempt a position tiebreaker, as described in point 1, however, It will most likely keep all viable candidates and traverse multiple concurrent trees to find the expected children. It is then expected that eventually, only the expected tree will remain, given that it is quite difficult for two trees to have the same XPath structure, with the same ancestors. Trees will naturally start to fall off as soon as the differences emerge.
- If indeed, two or more trees are exactly identical, we consider this search a fail, and the search is deemed unreliable. An error is thrown in the Script Recorder.
-
What happens if many children match in this case? It will attempt a position tiebreaker, as described in point 1, however, It will most likely keep all viable candidates and traverse multiple concurrent trees to find the expected children. It is then expected that eventually, only the expected tree will remain, given that it is quite difficult for two trees to have the same XPath structure, with the same ancestors. Trees will naturally start to fall off as soon as the differences emerge.
New START methodology
Starting with Win11, it became apparent that various default applications have a completely different process and window structure that makes it difficult for the Engine or the Recorder to identify these applications. A few examples are Calculator or Notepad, especially when using environment variables to launch these target applications, so Calc.exe or Notepad.exe. What many of these default applications do is that they trigger a process, which itself launches its windows under a separate, unrelated process. For example, Calc.exe will launch a CalculatorApp.exe process, which gets closed after it creates its windows under The ApplicationFrameHost process. This process is unrelated to the Calculator app, and multiple instances of the calculator can live under this same process.
The implications of this architecture on the Script Recorder is that having the recorder latch onto this “launcher (temporary) process” will mean that the Script Recorder complains that the target window is no longer existent (as of course, the process terminates after it creates the windows that it needs). To overcome this limitation, the end user will assist the recorder by clicking on the target window before performing recording operations:
After doing so, recording can take place as normal. As a rule of thumb, you can verify that the application has been identified by verifying that the Task list can record the intended actions. Once a script gets generated, a StartApplication command will be generated so that the Engine can identify the target window.
For example:
StartApplication(mainWindowClass: "Window:ApplicationFrameWindow", mainWindowTitle: "Calculator");
This StartApplication command is meant to replace any pre-existing START commands already in the script.
Handling multi-process and multi-window scenarios
Script Recorder manages scenarios involving multiple processes or windows.
Before Login Enterprise 5.14, the Script Recorder would struggle with scenarios in which an Application triggered the launch of a separate process or process window. For example, this includes Applications that start with a separate login window before launching the main Application, as well as scenarios where Applications launch an updater before starting the main window.
To mitigate this, the Script Recorder now keeps track of processes started from the target window, including their child processes, creating a clear chain of process hierarchy. To achieve this, it periodically polls the process list to identify any processes that have the target Application's process ID as a parent.
- Note 1: To establish this chain of process hierarchy, the Script Recorder relies on performance counters.
- Note 2: Since we use a polling mechanism, it may take some time for the Script Recorder to detect the desired window. During recording, you might notice a slight delay before the new window begins to blink as you transition between windows.