Discussion:
[uClinux-dev] fix malloc whole page for small allocations?
Steve deRosier
2013-04-12 14:34:19 UTC
Permalink
Hi all,

Every time our application mallocs any small number of bytes (<8k),
the device seems to malloc an entire page (8k). Does anyone know why
this happens, and how I can fix it?

Full details:

This same application ran fine on a Linux 2.4.x (uClinux) platform,
but due to various reasons we've had to upgrade the platform to 3.3.0
(still uClinux). Coldfire mcf5235, 4 MB flash, 32 MB RAM. We're using
uClibc.

At the time we start the application, there is ~20 MB free, it runs
for 30+ seconds and then the OOM killer kills it. On the 2.4.x
platform, it has a high-water-mark of about 2 MB, and runs in the
steady-state at about 1.5 MB.

After much investigation (eventually culminating in a debug version of
malloc printing each allocation request, then sleeping for 2 seconds,
where I cat /proc/meminfo), we finally noticed that each allocation,
typically 10-40 bytes, reduces meminfos's MemFree by 8k.

Other than the underlying kernel and wholesale upgrade of the uClinux
base, nothing has really changed on the system. We're still selecting
the same packages and configuration as near as possible.

Thanks,
- Steve
Lennart Sorensen
2013-04-12 15:00:27 UTC
Permalink
Post by Steve deRosier
Hi all,
Every time our application mallocs any small number of bytes (<8k),
the device seems to malloc an entire page (8k). Does anyone know why
this happens, and how I can fix it?
Imagine the amount of fragmentation you would get if different
applications could all be given allocations from one page. That would
be a mess. Now the libc could be smart and maintain a heap for you that
it does small allocations from and doesn't give back when you do a free.
I think glibc does that, but I have never checked how uClibc does it,
given I have never done such allocationsin uClinux code.

Certainly looking at the code, there is malloc_from_heap used by malloc,
so it sure looks like uClibc does the sensible thing. In that case small
allocations shouldn't be a problem, unless you allocate a bunch, keep
a few and free the rest, repeat a lot, causing enourmous fragmentation
of the heap. If that is the case, perhaps your algorithm needs a major
rethink.
Post by Steve deRosier
This same application ran fine on a Linux 2.4.x (uClinux) platform,
but due to various reasons we've had to upgrade the platform to 3.3.0
(still uClinux). Coldfire mcf5235, 4 MB flash, 32 MB RAM. We're using
uClibc.
At the time we start the application, there is ~20 MB free, it runs
for 30+ seconds and then the OOM killer kills it. On the 2.4.x
platform, it has a high-water-mark of about 2 MB, and runs in the
steady-state at about 1.5 MB.
After much investigation (eventually culminating in a debug version of
malloc printing each allocation request, then sleeping for 2 seconds,
where I cat /proc/meminfo), we finally noticed that each allocation,
typically 10-40 bytes, reduces meminfos's MemFree by 8k.
Other than the underlying kernel and wholesale upgrade of the uClinux
base, nothing has really changed on the system. We're still selecting
the same packages and configuration as near as possible.
So it used to work and now it doesn't?

Might be worth listing the version of uClibc and other bits you are using,
in addition to the kernel version.
--
Len Sorensen
Steve deRosier
2013-04-12 15:42:08 UTC
Permalink
On Fri, Apr 12, 2013 at 8:00 AM, Lennart Sorensen
Post by Lennart Sorensen
Post by Steve deRosier
Hi all,
Every time our application mallocs any small number of bytes (<8k),
the device seems to malloc an entire page (8k). Does anyone know why
this happens, and how I can fix it?
Imagine the amount of fragmentation you would get if different
applications could all be given allocations from one page. That would
be a mess. Now the libc could be smart and maintain a heap for you that
Well, yes, I do actually understand memory allocation in general and I
do get that it would be an issue for the kernel to give different
applications allocations from a single page. That's not what I'm
concerned about.

It's that every single allocation, no matter how many bytes results in
a full page being allocated by the kernel. By my way of thinking, the
C library should be managing the heap and only getting a new page when
it needs one, not a single page each time.

Very importantly, it all worked as expected on the earlier uClinux
platform but isn't working correctly now.
Post by Lennart Sorensen
so it sure looks like uClibc does the sensible thing. In that case small
allocations shouldn't be a problem, unless you allocate a bunch, keep
a few and free the rest, repeat a lot, causing enourmous fragmentation
of the heap. If that is the case, perhaps your algorithm needs a major
rethink.
No, there shouldn't be lots of fragmentation in the way the algo
works. It does a bunch of allocations near the beginning of the
program to load up various working data sets. It does do some frees,
but in generally the allocations and frees related to those are in
chunks and ordered correctly to avoid fragmentation Profiling it on a
normal desktop linux shows pretty good behavior with less than 1% heap
fragmentation I happen to agree that the program in question needs a
major rethink, but that's neither germane to the question nor within
the scope of my contract with the client.
Post by Lennart Sorensen
Post by Steve deRosier
This same application ran fine on a Linux 2.4.x (uClinux) platform,
but due to various reasons we've had to upgrade the platform to 3.3.0
(still uClinux). Coldfire mcf5235, 4 MB flash, 32 MB RAM. We're using
uClibc.
So it used to work and now it doesn't?
Yup.
Post by Lennart Sorensen
Might be worth listing the version of uClibc and other bits you are using,
in addition to the kernel version.
Linux 3.3.0 on a Coldfire 5235 platform. uClinux was from the
20120401 distribution. From digging into the uClibc directory, the
Changelog states: "0.9.27 12 January 2005". Which matches the old
version. So, either there's been and upgrade of uClibc and the
changelog is abandoned, or uClinux hasn't updated that package in a
little while. Since I can't confirm that our uClibc config is the same
between the old and new platforms, I'll assume our config of uClibc is
different.

Thanks,
- Steve
Anthony Best
2013-04-12 16:27:00 UTC
Permalink
Hi,

I am with the client Steve is working for.
Post by Lennart Sorensen
Post by Steve deRosier
Hi all,
Every time our application mallocs any small number of bytes (<8k),
the device seems to malloc an entire page (8k). Does anyone know why
this happens, and how I can fix it?
Imagine the amount of fragmentation you would get if different
applications could all be given allocations from one page. That would
be a mess. Now the libc could be smart and maintain a heap for you that
it does small allocations from and doesn't give back when you do a free.
I think glibc does that, but I have never checked how uClibc does it,
given I have never done such allocationsin uClinux code.
Certainly looking at the code, there is malloc_from_heap used by malloc,
so it sure looks like uClibc does the sensible thing. In that case small
allocations shouldn't be a problem, unless you allocate a bunch, keep
a few and free the rest, repeat a lot, causing enourmous fragmentation
of the heap. If that is the case, perhaps your algorithm needs a major
rethink.
Post by Steve deRosier
This same application ran fine on a Linux 2.4.x (uClinux) platform,
but due to various reasons we've had to upgrade the platform to 3.3.0
(still uClinux). Coldfire mcf5235, 4 MB flash, 32 MB RAM. We're using
uClibc.
At the time we start the application, there is ~20 MB free, it runs
for 30+ seconds and then the OOM killer kills it. On the 2.4.x
platform, it has a high-water-mark of about 2 MB, and runs in the
steady-state at about 1.5 MB.
After much investigation (eventually culminating in a debug version of
malloc printing each allocation request, then sleeping for 2 seconds,
where I cat /proc/meminfo), we finally noticed that each allocation,
typically 10-40 bytes, reduces meminfos's MemFree by 8k.
Other than the underlying kernel and wholesale upgrade of the uClinux
base, nothing has really changed on the system. We're still selecting
the same packages and configuration as near as possible.
So it used to work and now it doesn't?
Might be worth listing the version of uClibc and other bits you are using,
in addition to the kernel version.
The version of uClinux we are currently running is from the tar ball uClinux-dist-20070130.tar.gz

Linux version 2.4.32-uc0
Using power of 2 allocator

Previously we were using a 2003 distribution of uClinux on the M5282 with the same code.
--
Anthony Best
Steve deRosier
2013-04-16 15:40:26 UTC
Permalink
On Fri, Apr 12, 2013 at 9:27 AM, Anthony Best
Post by Anthony Best
Post by Lennart Sorensen
Post by Steve deRosier
Hi all,
Every time our application mallocs any small number of bytes (<8k),
the device seems to malloc an entire page (8k). Does anyone know why
this happens, and how I can fix it?
Imagine the amount of fragmentation you would get if different
applications could all be given allocations from one page. That would
be a mess. Now the libc could be smart and maintain a heap for you that
it does small allocations from and doesn't give back when you do a free.
I think glibc does that, but I have never checked how uClibc does it,
given I have never done such allocationsin uClinux code.
Certainly looking at the code, there is malloc_from_heap used by malloc,
so it sure looks like uClibc does the sensible thing. In that case small
allocations shouldn't be a problem, unless you allocate a bunch, keep
a few and free the rest, repeat a lot, causing enourmous fragmentation
of the heap. If that is the case, perhaps your algorithm needs a major
rethink.
Post by Steve deRosier
This same application ran fine on a Linux 2.4.x (uClinux) platform,
but due to various reasons we've had to upgrade the platform to 3.3.0
(still uClinux). Coldfire mcf5235, 4 MB flash, 32 MB RAM. We're using
uClibc.
At the time we start the application, there is ~20 MB free, it runs
for 30+ seconds and then the OOM killer kills it. On the 2.4.x
platform, it has a high-water-mark of about 2 MB, and runs in the
steady-state at about 1.5 MB.
After much investigation (eventually culminating in a debug version of
malloc printing each allocation request, then sleeping for 2 seconds,
where I cat /proc/meminfo), we finally noticed that each allocation,
typically 10-40 bytes, reduces meminfos's MemFree by 8k.
Other than the underlying kernel and wholesale upgrade of the uClinux
base, nothing has really changed on the system. We're still selecting
the same packages and configuration as near as possible.
So it used to work and now it doesn't?
Might be worth listing the version of uClibc and other bits you are using,
in addition to the kernel version.
The version of uClinux we are currently running is from the tar ball uClinux-dist-20070130.tar.gz
Linux version 2.4.32-uc0
Using power of 2 allocator
Previously we were using a 2003 distribution of uClinux on the M5282 with the same code.
Hi guys. Looks like we figured out what was going on.

uClibc has two different malloc implementations for our platform. The
simple malloc, the default, which we were using, simply goes out and
grabs a chunk of ram via a mmap operation when you malloc anything.
The standard malloc, manages the heap and only mmaps new ram when it
runs out.

Seems the old 2.4.3x version of Linux we were using managed mmap
allocations via power-of-two buckets, giving the smallest bucket for
each request. I don't know if that was standard back then for our
platform or if that was something that they added for the product.
The upgraded Linux 3.3 allocator simply gives an 8k page on each mmap
request.

So, combining the change to the 8k page allocation with using the
simple malloc from uClibc and no matter the size of ram we requested,
we would allocate the full 8k page for it. So a 16-byte request
resulted in an 8192 byte allocation. Hence, OOM in about 3 seconds.

The fix was to change uClibc to using the full version of malloc. Now
it allocated properly and only consumed new 8k pages when it needed
them. I don't know why I had trouble getting that to build, after a
few config file tweaks and some make cleans, I got it to build.

Easy to figure how how to solve the problem once I discovered the
actual underlying behavior, but the inital problem had me scratching
my head for a week or so before I could understand where the problem
even was.

Thanks for your help. I thought the list would appreciate the fix
info incase someone else searched for an answer.

Thanks,
- Steve

Loading...