Page 8 of 9 < 1 2 3 4 5 6 7 8 9 >
Topic Options
#63515 - 21/07/10 01:21 AM Re: Arm Akimbo SDLMAME [Re: R. Belmont]
Flandry Offline
Member

Registered: 01/12/09
Posts: 24
Originally Posted By: R. Belmont
There is no dynarec for rtype. Probably just that the startup code of the game incurs more reschedules due to testing timers and such.


Would that kind of MAME activity cause a lot of time to be spent in libgcc?

Quote:
I gather from ym2151 being expensive that the N900 doesn't have hardware floating point?


It does, and has both neon and vfp modes, but it seems that gcc doesn't support them well. I've just discovered this post on FP optimizations on the Pandora hardware (~N900) and am trying out some of them. Previously i was just specifying -mcpu=cortex-a8 and -mfpu=neon.

After reading that page i am thinking it may be worthwhile trying to add arm ASM helper functions after all, but it's daunting with no ARM ASM experience.


Edited by Flandry (21/07/10 01:22 AM)

Top
#63516 - 21/07/10 01:27 AM Re: Arm Akimbo SDLMAME [Re: Flandry]
R. Belmont Offline
Senior Member

Registered: 17/03/01
Posts: 12492
Loc: USA
Sorry, I misread your post, I thought you had fingered some reschedule-related function and you didn't. I don't know what's spending time in libgcc without seeing better data, but on most platforms that means either 64-bit integer or floating point being emulated in software. MAME does make extensive use of both, which generally is only an issue on ARM targets nowadays. You may be better off sticking with MAME4ALL on that target.

Top
#63517 - 21/07/10 04:23 AM Re: Arm Akimbo SDLMAME [Re: R. Belmont]
Flandry Offline
Member

Registered: 01/12/09
Posts: 24
Thanks.

It's true that MAME4All performs great. My (possibly perverse) goal is to get modern MAME running well on ARM and i think it can do a lot better with some optimizations.

To move forward, i need a bit of guidance if you please. You mention software emulation of FP. I'm still using osd/miniwork.c (NOASM) and am trying to see where optimizations might be made. Assuming a minimal core work function is the starting point, where should i be looking?

Top
#63518 - 21/07/10 04:39 AM Re: Arm Akimbo SDLMAME [Re: Flandry]
R. Belmont Offline
Senior Member

Registered: 17/03/01
Posts: 12492
Loc: USA
I'd love to get modern MAME running well, I agree it can, but I don't have anything near the data I need. Valgrind's sample profiler can give you a complete call trace for hotspots - it's pretty much necessary to know that to understand why libgcc is eating all the CPU time.

Top
#63521 - 21/07/10 03:41 PM Re: Arm Akimbo SDLMAME [Re: R. Belmont]
mangamuscle Offline
Senior Member

Registered: 17/02/03
Posts: 160
Loc: Mexico
Just my 2¢, but wouldn't a machine like the new toshiba ac100 be a more adequate hardware to try to port mame to ARM? I hear it is going to be priced around $500 USD so it is not too expensive.

Top
#63522 - 21/07/10 04:20 PM Re: Arm Akimbo SDLMAME [Re: mangamuscle]
R. Belmont Offline
Senior Member

Registered: 17/03/01
Posts: 12492
Loc: USA
Sure, but it should be possible to get at least the classics to work well on N900/BeagleBoard/Pandora level hardware. It may simply require waiting for a stable version of Clang 2.0. Apple's slides from WWDC indicate that Clang's generated code averages 2 to 5 times faster for ARM targets (it's much less flashy for x86/x64, proving once again that GCC for non-x86 targets can get pretty dire).


Edited by R. Belmont (21/07/10 04:22 PM)

Top
#63584 - 25/07/10 11:54 AM Re: Arm Akimbo SDLMAME [Re: R. Belmont]
ldesnogu Offline
Member

Registered: 23/07/06
Posts: 89
2 to 5 times faster?!? Your bullshit detector should have warned you smile

I've tested recent versions of gcc for ARM and they are pretty good. I'd be surprised to see Clang beat them. Except for one thing: IIRC Clang can use NEON floating-point instructions instead of standard FP ones; that would give a boost for Cortex-A8 (but not for Cortex-A9 such as Tegra2), but then you lose IEEE-754 compliance; so twice faster for carefully chosen small FP loops, yes; for real programs even 10% would be nice. Even armcc isn't 10% faster than gcc...

Back to Flandry issue, I don't think Nokia SDK would rely on FP emulation. We'd need a real profiling of MAME to see what's happening...

Top
#63585 - 25/07/10 12:46 PM Re: Arm Akimbo SDLMAME [Re: ldesnogu]
Vas Crabb Offline
Senior Member

Registered: 08/02/04
Posts: 1257
Loc: Melbourne, Australia
I'd believe it - SunPRO gets 2 to 3 times the performance of GCC on SPARC because GCC's codegen is atrocious.

Top
#63587 - 25/07/10 02:04 PM Re: Arm Akimbo SDLMAME [Re: R. Belmont]
couriersud Offline
Senior Member

Registered: 19/02/07
Posts: 394
Originally Posted By: R. Belmont
Sure, but it should be possible to get at least the classics to work well on N900/BeagleBoard/Pandora level hardware.

Some of the classics have discrete sound emulation. This is 100% floating point. It does however not rely on IEEE compliance.

Top
#63588 - 25/07/10 03:40 PM Re: Arm Akimbo SDLMAME [Re: Vas Crabb]
ldesnogu Offline
Member

Registered: 23/07/06
Posts: 89
Originally Posted By: Vas Crabb
I'd believe it - SunPRO gets 2 to 3 times the performance of GCC on SPARC because GCC's codegen is atrocious.

I'm sorry but I've never found armcc (ARM Ltd own compiler) that much faster than gcc except for some *very* specific things (e.g., detecting widening multiplications). So I won't believe it... until proven wrong smile

Top
Page 8 of 9 < 1 2 3 4 5 6 7 8 9 >


Moderator:  R. Belmont 
Who's Online
2 registered (Micko, Kaylee), 18 Guests and 3 Spiders online.
Key: Admin, Global Mod, Mod
Shout Box

Forum Stats
4015 Members
9 Forums
6215 Topics
63601 Posts

Max Online: 162 @ 01/05/07 03:28 AM