IDAPython developers who enjoy the occasional headache, leaky abstraction enthousiasts, or simply the curious.
IDAPython wraps C++ types, and the lifecycle of C++ objects (and in particular members of larger objects) is not necessarily the same as that of the Python wrapper object that is wrapping it.
One of our users reported IDA crashes when an IDAPython script of theirs. The user came up with a very simple way to reproduce the issue (thank you!), showing that this had to do with accessing the
parents member of a
Here is (an even more simplified version of) the script the user sent us:
from ida_hexrays import * my_parents = None class my_visitor_t(ctree_visitor_t): def __init__(self, func): ctree_visitor_t.__init__(self, CV_PARENTS) def visit_expr(self, i): global my_parents if self.parents is not None: my_parents = self.parents return 0 def my_cb(event, *args): if event == hxe_print_func: f = args my_visitor_t(f).apply_to(f.body, None) import gc gc.collect() my_parents.front() # will crash return 0 install_hexrays_callback(my_cb)
Note: I threw a
gc.collect() in there, to make crashes more likely.
The script above is provided in its entirety for the sake of completeness, but really the important lines are only the following:
def visit_expr(self, i): global my_parents if self.parents is not None: my_parents = self.parents (...) my_visitor_t(f).apply_to(f.body, None) my_parents.front() # will crash
Details, details, details
Since this issue is non-trivial, I’ll try and provide a step-by-step explanation, hopefully as clear as can be, by annotating the important lines of code mentioned above:
my_visitor_t instance. That is a subclass of the
ctree_visitor_t type, which means it eventually extends a C++ object of type
When the underlying C++
ctree_visitor_t object is created, its member named
ctree_items_t vector) is initialized. For the sake of the example, let’s say the C++
ctree_visitor_t instance is located at memory
0x1000 and the
parents member is placed at memory
ctree_visitor_t::apply_to. Thanks to SWiG “magic”, C++ virtual method calls will be properly redirected and our
my_visitor_t.visit_expr method will be called for each
cexpr_t in the tree, as expected.
if self.parents is not None:
self.parents. This will create a Python wrapper object. The key here is to understand that it’s a wrapper object which is backed by the real, C++
For example, any access to the object returned by
self.parents, will in fact translate to an access into the C++
ctree_items_t vector, so if one were to write, e.g.,
self.parents.size() (or even
len(self.parents)), it’s actually the real underlying C++
size() method that will end up being called.
my_parents = self.parents
Another access to self.parents, and another Python wrapper will be created (once again backed by the actual
[Note: the fact that another wrapper is created is not a problem (in fact since it went out of scope, the previous wrapper might already have been garbage collected!)]
Once again, for the sake of the example, let’s say the wrapping
PyObject instance is placed in memory, at
That wrapper is then bound to the global variable
my_parents, causing its python refcount to increase to 2. Past that line, the refcount will drop back to 1 (again, because of scope logic), which means that Python wrapper object will remain alive.
[...apply_to() returns, and we are now back to the `my_cb` function...]
At this point, it’s likely
my_visitor_t(f) has just been garbage collected since nobody keeps a reference to it.
my_visitor_tinstance has been destroyed, which means
- the underlying
ctree_visitor_tC++ object located at memory
0x1000has been deleted, which in turn means
parentsobject, which was located at memory
0x100C, is now invalid
We are now calling
front() on the
my_parents Python object. If you recall, that
my_parents object is a Python wrapper object located in memory at
0xB000. That wrapper object still has a refcount of (at least) 1, and is thus alive.
What is not quite alive anymore, however, is the actual C++
ctree_items_t vector, which was deleted as part of deleting the C++
ctree_visitor_t it belonged to.
In other words, we have a perfectly valid Python wrapper object, that has a dangling pointer to a member of a freshly-deleted C++ object.
The solution is, in terms of effort, rather simple: make a copy of the vector:
- my_parents = self.parents + my_parents = ctree_items_t(self.parents)
since it doesn’t belong to the C++
ctree_visitor_t object, this copy won’t be thrashed when it is deleted.