PendulumTech

May 1, 2010

Blogging Against Disablism Day 2010: Accessibility & Ubuntu

Filed under: Uncategorized — by Prydera @ 13:29

Today is Blogging Against Disablism Day and I urge everyone (especially if you’ve never heard of it before) to check out the blogs posts being written.

I never planned on getting involved with accessibility on Ubuntu. I’m a wheelchair user which means fighting for accessibility is almost a daily occurrence for me whether it’s shops with steps or a lack of dropped curbs or just the stares and questions of “what’s wrong with you”. Although my impairment is directly involved with why I started contributing to the Ubuntu community, that had nothing to do with Ubuntu accessibility. Instead, it was because on those days when I can’t even sit up at least I can generally get onto my laptop and getting involved with the community gives me mental stimulation and a way to do something on those days when I can’t manage anything else.

However, when it came down to what I needed for accessibility on Ubuntu, I realised that I have no choice. I periodically can’t type. On MacOS (the other OS I use regularly), this means that I use a combination of MacSpeech Dictate voice recognition software and a program called Dasher which is a mouse controlled text input (not an onscreen keyboard, but much more fluid and faster to use). When I started asking around for similar things on Ubuntu I got good news and bad news.

The bad news was that no one seemed to have a working set-up of a voice recognition software that didn’t involve Dragon Naturally Speaking under Wine. I don’t want to go pay for yet another bit of voice recognition software and I don’t like running things under Wine if I can help it. There are a couple open source voice recognition programs out there, but no one I’ve found has been able to tell me that they have a working set-up that would be useable, especially for someone not very technical.

The good news was that Dasher is open source, has a Linux port, and is in the Ubuntu universe. However, it came with some caveats. Some of the documentation in the program is just wrong. For example, while this is not documented in the official documentation for Dasher yet, the only way to use it to directly input into other programs on Ubuntu is to run it from the command line.  And even doing that I find it crashes after a few minutes. The other option is to use it accessing it through the Applications menu, however, then you can only use it to input into its own text screen and have to copy & paste into whatever program you want the text for. I’m hopeful that this will change as I filed a bug about it with Dasher and it sounded like other people were supportive of creating a GUI menu option to be able to directly input into other programs.

So in my experience, Ubuntu as an operating system and open source software in general has some things is needs to improve with in terms of accessibility.This is why I’ve decided to put energy into getting the Accessibility Team going again. With multiple people working together to identify the main problems and either finding solutions ourselves or advocating to get others to implement solutions it should be easier to get Ubuntu to where it should be as an operating system accessible to all.

All that said about the software, the Ubuntu community is one of the most accessible communities (to me) I have worked with. I’ve already mentioned that one of the reasons I got involved was because it’s a community I can work with from bed. I’ve also found people in the community to be very accommodating. If I can’t manage to get something done because I’ve had a rough week and can’t handle typing or the energy it takes to think, others seem to always be happy to step up. And my experience in working with Marianna and Jorge in arranging for attending UDS-M in a week has been fabulous. Not once has it been suggested that I’m asking for something that’s too hard or beyond reasonable. I’ve never had as little hassle with accessibility arrangements ever – including when at school and while working. It’s the community that keeps me thinking that accessibility is something that is worth fighting for with Ubuntu rather than my looking somewhere else.

We have lots of work to do to become a truly accessible operating system, however, I know we can get there.

18 Comments »

  1. I know it’s strong word… but I absolutely hate having to cope with that “what’s wrong with you”…

    Keeping in mind that I have next to no experience in application development/programming, is there anything I can do to assist you or further your goals?

    Comment by Ewan Ha — May 1, 2010 @ 18:21 |Reply

  2. It is great that you are involved with Ubuntu. I think the future lies in open source software. Just for the simple fact that there are so many people who cannot pay for a suite of products they need to compensate their disabilities. I work a lot with developing countries, and open source is always the way to go. Unfortunately in certain instances there isn’t much to recommend.

    Comment by Tom Babinszki — May 1, 2010 @ 18:24 |Reply

  3. Dasher + festival will give you something that will speak – but speech recognition is significantly harder.

    I would have suggested Matthew Garrett – but I think he’s moved on to other things.

    Looks like they’ve got Dasher working well with eye tracking, though 🙂 The project comes from one of the departments at Cambridge University – it might be worth contacting them.

    Google gives good results on Dasher as its first hit.

    Comment by Andy Cater — May 1, 2010 @ 19:03 |Reply

  4. Hey! I, too, am a macspeech dictate user. When asked about voice recognition et al, I usually say something like although it is effective, it is a compromise — i.e. built around the OS and on top of the OS. I am not really a tech-oriented person, so I don’t know whether or not it is yet possible to use and OS that is natively designed for voice input. Do you think this is something that the open source community might be able to take on?

    Comment by wheelchairdancer — May 1, 2010 @ 19:06 |Reply

  5. Hi,

    I read your post (whilst I should have been revising for my exams!) and found it really thought-provoking and inspiring. Thanks for opening my eyes a little more to the concept of disablism. It has made me better realise the need for universally accessible software.

    Best regards,
    Andrew

    Comment by Andy — May 1, 2010 @ 19:14 |Reply

  6. I have MS and I am working on a post saying something similar!
    Here are some links to my attempts to bring the speech-to-text community and the Linux developer community together (that is the problem– Linux developers develop what they use, and there are not a lot of Linux developers dependent on speech recognition software, because there is not a really viable speech-to-text solution for Linux, because there are not a lot of Linux developers dependent on speech recognition software, because… you get the idea). Also a link to the petition, in case you don’t yet have it.

    Anyway, links:
    Here’s the ubuntu forums discussion:
    http://ubuntuforums.org/showthread.php?t=1064210&page=3

    This one actually has some interim solutions that may help people who have trouble typing be developers, which is what we need:
    http://www.knowbrainer.com/PubForum/index.cfm?page=viewForumTopic&topicId=9731&listFull#50AEA86C-0C91-ACF7-11478496430thread49544

    And here’s the petition to get Dragon for Linux:
    http://www.petitiononline.com/dns4lin/petition.html

    Comment by culturegeek — May 2, 2010 @ 17:12 |Reply

  7. first the bad news: there is no viable alternative to NaturallySpeaking. You will need something on the order of 30 or $40 million and about 10 years with an effort to catch up with NaturallySpeaking where it is today. That is, assuming, you don’t get trapped by the patent minefield.

    And the news gets worse: there is no competition because there isn’t a large enough market to support any competitors. Most of the high-end knowledge about speech recognition exists within one company, not university. Research is aimed at IVR (speaker independent small vocabulary applications using poor quality (cell phone) audio channels.

    Short of bribing nuance, there is no way to get the level of support we need. They have no social conscience because it takes money away from the bottom-line.

    alternative input devices just don’t work for me. I can process letters a rate of about one her 15 seconds. It is faster type and wait for painkillers to take effect.

    So, fundamentally it’s time to “suck it up” work with what we have and build around crap from nuance.

    you may wonder why I take such a harsh stance, harsh to the ideals of FOSS. The answer is simple. I’m in my 50s, I need to work now. I don’t have 10 years to wait. I can’t afford to live on handouts. I need functional speech recognition *now* that I can adapt to my needs *now*. Not 10 years from now, *now*.

    Handicap accessibility always trumps ideology.

    Comment by esj — May 2, 2010 @ 21:09 |Reply

  8. Ok, I will admit that a lot of this was waaaay over my head (ie: I had to look up what ubuntu was before I kept reading), but the experience still seems relevant to me. There are so many unexpected places that those of us with disabilities have to figure out our own ways to make accessible for us, and I’m glad you knew enough about your program to be able to figure it out.

    Comment by NTE — May 3, 2010 @ 11:32 |Reply

  9. Thanks for this post, PendulumTech (and culturegeek, thanks for the petition). My partner and I both have RSI– his is chronic and not responding to treatment– and have been dismayed by how limited the speech recognition software out there is, especially for those who don’t use Windows. He works on development within a non-Windows OS, and coding has been really frustrating (and is currently only possible because he works with a partner). If he had needed the accommodations when he was entering the field that he needs now, I’m not sure the company would have hired him.

    Comment by RachelB — May 3, 2010 @ 13:09 |Reply

  10. I really hope that that was not in response to what I posted, because I said:
    1. That I know speech to text is a heavy lift
    2. the orphan drug effect, several times, in great detail.
    3. That I am not interested in hearing from anybody who is not interested in working on a short term and/or long term solution that works for people like me, who _can’t_ type a letter in 15 seconds.
    4. That an open source alternative to Nuance is absolutely not going to happen until and unless we find a short-term solution to make Linux accessible *now*,
    5. That therefore we, need to look into hybrid solutions.
    6. That the most promising options in that area are
    A. hotkeys and
    B. Running Dragon in a virtual machine
    (1). That the most useful thing that a Dragon user can do at this point is try to supply useful information about known issues and the speech to noise ratios that you get with Dragon, etc., which information I am working on putting together, to those who are working on that.
    (2) that most of what I’ve found is about wine, but a few other Windows emulators looked more promising.
    (3) that I am very interested in whether or not anybody has tried to run Mac speech in a Mac emulator– it strikes me as more promising, given that OSx kernel was, last I heard ( about the time the whole thing was rolled out ) a cousin of sorts to the Linux kernel. I never heard back about that from anyone.
    7. That accessibility trumps ideology for me all the time, because I can’t type a letter in 15 seconds, no matter how much I want to, and painkillers “just don’t work” for spasticity or tremors or loss of feeling or loss of proprioception– I am using Dragon to post this, which means I’m using a Windows box. I do not like Windows. I am not using it because I’m putting ideology above accessibility, I am using it because I need accessibility software.
    8. That Dragon and Windows is not a solution. Dragon has crashed more times than I can count while I made this post. Dragon is irreducibly big, and Windows is inexcusably bloated. while many of these problems are known issues with Dragon, even more are known issues with Windows.

    P.S.Regardless of whether or not you were responding to my post (without ever reading any of the information that I linked), your “suck it up” comment was way out of line, especially for a blogging _against_ disablism discussion.
    If you don’t care what happens in 10 years, that is your right, but I probably won’t be able to type in 10 years, and painkillers will not help. Excuse me for being so precious and impractical as to try to improve my prospects for having any kind of access to computers at that point.

    Comment by culturegeek — May 8, 2010 @ 5:41 |Reply

  11. @culture geek

    If you are crashing as often as you say, the problem is not NaturallySpeaking. The problem is not necessarily Windows. I typically run anywhere from 5 to 7 days on Windows 7 before I need to reboot or restarting NaturallySpeaking. Your system instability leads me to wonder if you are running Windows 98 or Windows 2000. If you’re running XP, then your OS image is heavily corrupted. Run Windows 7 with NaturallySpeaking 10.1 and you will have a significantly better experience in terms of stability. USB microphones generally raise accuracy over sound card input. Also test your memory. Frequently random crashes are a memory problem.

    If someone was to give us $1 million and say “build what you need” I would look at we have for starting point and that is NaturallySpeaking on two platforms. A natural reflex might be to see if we could bribe nuance to produce a Linux version. If they did, it would probably be one of the worst things to happen to us because we would think the problem is solved. It’s not.

    Speech recognition is not the problem holding us back.

    Windows is not the problem holding us back.

    There are many problems holding us back and none of them are technology related. They are all people problems. First is that we have non-disabled people trying to design/specify/build accessibility software and then they don’t live with it. I would love to duck tape one of those developers to chair and force them to use their HCI for a week or two.

    heh.

    In addition to this failure of imagination, we also have a failure to support imagination. we have no way to prototype potential interfaces and let good idea show through. For example, I’ve developed a series of interfaces dealing with the programming problem as well as a framework for moderately complicated but common task. How do I prototype? How do I even animate the images I have in my mind so others can see them when talk about the stages you go through in navigating a file hierarchy by voice? How I show manipulating code in a decision table and that’s an easier way to create code by voice than traditional forms? I know I’m not the only bright person coming up with some of these ideas but, it’s damn near impossible to prototype.

    So back to the million-dollar gift. How would I approach the problem. I would not waste any time on the speech recognition platform because, it’s a solved problem. I would spend all of that money and time it can buy on making a cross platform UI for speech. My goal would be to put speech recognition on a relatively fast netbook and then couple that recognition engine on the net book to any other platform I had to use. I don’t care whether it’s a virtual machine or physical machine. I should be able to put a client on that workspace machine, connected to my speech recognition machine, and start dictating away with the results any up on my workspace machine.

    This is not as difficult as you might think. It’s more difficult than dictating to a terminal session but it’s not horrible. It only requires people with hands who are willing to listen to people who are disabled but know the problem space.

    It only takes money. We could probably get it done for somewhere in the $50,000-$100,000 range. And when I say get it done, he mean a fully complete program that could be used by someone who isn’t computer literate but can click an install button. We want to make it that simple because if you are disabled, you have a limited amount of physical resources to get your work done and we don’t want to burn anything more than we actually have to.

    But only after people can easily work on multiple platforms from one speech recognition engine would I ever consider seeking funding and research to replace NaturallySpeaking because we would have made it possible for disabled people to be able to work and have a more independent life. Nothing else matters. Absolutely nothing else matters helping people have a bit of dignity in being able to make their own way through the world.

    Which I guess is a long way of saying “suck it up and get the right work done to help others, not the work you want to do which doesn’t move the solution forward”

    Comment by esj — May 8, 2010 @ 10:21 |Reply

  12. @esj: “suck it up” and deal with the fact that your pet project is not the only work that needs to be done. Access to speech recognition software is a real accessibility issue, even though esj does not personally use speech recognition software.

    Your post is a textbook derailing attempt and I will not pay any more attention to you because that would be positive reinforcement for your attention-seeking behavior.

    Get your own thread if you want to discuss your pet project. This is not it and, as far as I am concerned, you have forfeited your right to be knowledge by me until such time as you apologize to me and to everybody else who doesn’t find it easier to type a letter in 15 minutes, because saying that we don’t count because our needs are not identical to yours is absolutely inexcusable.

    ——————————–
    ——————————–

    @ those actually interested in doing something constructive towards making Linux accessible to people who depend on speech recognition software, I linked to two threads discussing this, and I recommend reading those first.

    I am not going to repeat my full explanation of why my first priority is coming up with a solution in the near term ( if you’re wondering, my area of expertise is, as my alias kind of indicates, anthropology of science, including computer science), but the information is available in the thread I linked to. I think it’s pretty interesting, but why isn’t really relevant to my reason for being here.

    I am here to talk about how. You can read my information about what I’ve found out so far about what has been done, and what seems to be the best course of action towards making Linux accessible to people who need speech recognition software as quickly as possible.

    I am very much interested in people’s thoughts on the hotkey idea, which strikes me as something that can help get Linux at least partially accessible to people who use speech recognition software as soon as humanly possible. This will help bbring more people who use speech recognition software into the mix, and you can see my explanation of how that would help existing efforts to run Dragon in a virtual machine.

    I think that this is feasible, because on one emulator seems to have been tried, and those efforts are by people who would probably be very much helped– will probably be very much helped– by the information about known issues in Dragon, and the performance of Dragon, across various Windows systems, Then I’m currently in the process of compiling.

    I am also very much interested in anybody who’s tried to run Dragon in anything other than wine, not only because I’m interested in whether or not that might be a better solution, but also because I want to give them the same information that I’m giving the people that are working on running Dragon in wine.

    I also wonder if Mac’s speech in a Mac emulator might work better, because I can’t find any information about anybody trying it, and it seems to me like the it should be easier to do. I am especially interested in seeing what pendulumtech has found about this.

    Comment by culturegeek — May 8, 2010 @ 18:01 |Reply

  13. @culture geek, if you need apology, then I apologize. It was not an attempt to derail or seek attention for my particular project but instead to point out that the techniques you have been proposing and seeking information on have failed not once but dozens of times since 1990. I would be extremely surprised if you got a different result than any of the other failures.

    Hotkeys have been discredited as a useful tool because they don’t solve the problem you think they solve. They are potentially at useful only if you need to activate a single command without arguments (i.e. keystroke macro and voice). Unfortunately, to do most work you would need a keyboard the size of your desk and have to memorize all the keys and what they do. When that failure mode became abundantly clear, most of the people working with disability access walked away from that solution. if that failure isn’t enough, I’m sure what I can point out at least another half dozen failures that others before me have encountered.

    If you can keep your hotkeys down to three or four, then you might be able to make it useful especially if you can go through and change any application using hotkeys as well to use hotkeys that will collide. Like I said, not really useful.

    As for speech recognition in virtual machines. I was one of the pioneers experimenting with this and found out that the most common failure is the audio channel support is inadequate. USB or direct audio device emulation fail in different ways. The USB failure is the appearance of gaps or repeats in your audio. The audio card emulation audio wasn’t consistent in amplitude and also suffered from gaps. The audio gaps are probably due to some form of context switch overhead in the host machine.

    VM Ware was the best for audio. It still sucked because of some of the problems mentioned above. Initially recognition would seem better and faster but within a small number of weeks of use, the audio channel problems would cause degradation in your speech model forcing retraining with new models every couple of months.

    Virtual box audio has always been horrible and I’ve never been successful using it for speech recognition.

    If you flip the model around and use speech recognition of the host and target on the guest, at best, you get intermittent character insertion into an application. Characters are delivered at too rapid a rate and the guest OS will drop characters at random. VM Ware is the best for this, virtual box fails miserably.

    You can use the VNC to access the console on both platforms and you still have the same character dropping problem but not as severe. The downside to using VNC is that it takes so much CPU time that you see rapid degradation of recognition accuracy and speech model.

    NX (the proprietary one, not the open one) seems to work the best for cross platform speech recognition except you have no Select-and-Say, natural text is flaky as hell, and you still see character and word dropouts at a rate much greater than what you see with direct dictation on the same machine.

    Based on what I know from sources I can’t discuss, Mac speech he is a year old version of NaturallySpeaking. It does not have as good accuracy and still suffers from the same problems with natural text. Running a Mac emulator might work but remember what I said above about audio channel problems. That’s not to change with a different OS. You need to fix the virtual machine or use some form of a network audio transport and synthetic sound card on the guest machine.

    I think the age problem with Mac speech will get better because nuance just purchased Mac speech and brought the product back in house. Just like they did with DragonDictate for the Macintosh, they use an external group to do the port and then bring the product back inside. If history is any guide, then they will kill the product after incorporating some of its improvements into the main product line.

    I read your links and I am truly despondent that people haven’t learned. Seriously, most of things people talk about now for handicap accessibility were what people like myself, Jeff Del Papa and others were doing back in 94 with DragonDictate. I swear to God, people don’t learn a damn thing from the past.

    You want to do something good for the speech recognition community, fix vr-mode for Emacs. Put in speech recognition hooks in at-spi, gtk+, and wxwindows. That’s where the current state-of-the-art is, not hotkeys or puttering around with virtual machines. The simple stuff is of limited utility. In contrast, people have created very sophisticated macros using toolkits residing on top of natlink/natpython and that is a good toolbase to work from.

    I apologize for saying things harshly but I don’t apologize for being hard on you. Bad ideas deserve some discussion but deserve more to be recognized for what they are and discarded. I have made most of the mistakes you’ve talked about. I know others who’ve made a bunch of new mistakes that you haven’t thought of yet. Learn from our scar tissue and you will be able to contribute something useful something new to the disabled community.

    I will gladly give anyone big chunks of time to talk about these problems and go over what ideas have failed, how new ideas can work or not work and help them do more than I can.

    Comment by esj — May 8, 2010 @ 19:25 |Reply

  14. I appreciate that you are willing to apologize, but I have looked at the past, and that’s how I came to the conclusion that the Dragon in an emulator project was abandoned in error. I base this on what I know about Dragon, and what I know about people.

    The virtual machine thing is a lot more doable than commonly believed.
    In a lot of cases, I saw people throwing up their hands in despair because they just can’t fix a seeming compatibility issue that they don’t know is is a known issue with Dragon. It’s incredibly frustrating to get Dragon up and running in wine if you don’t know what you have to do to get it up and running in Windows, or what routinely happens anyway, or how to address those things. Any project would be incredibly frustrating, even seem hopeless, without such a large body of key information.

    —————————–
    People problem:
    ——————————
    This is the “people problem” you’ve talked about, and it’s also the kind of people problem that is my area of expertise. There are blind spots– culture has to create blind spots, or we would not be able to parse all the information our sensoria throw at us. Read Harold Conklin if you want to know how much. Well, Linux geeks are like any other H. sapiens: living in a culturally-mediated reality.

    My job as an anthropologist is to figure out what people might be missing due to those filters. When I say that a partial solution for speech recognition matters, I am basing it on different knowledge than you have. I wouldn’t tell you how to design a virtual sound card, so let me explain why this matters.

    Reasoning is same conclusion you came to abt. a “people problem,” with people trying to develop software who don’t use it.
    The reason I’m so into getting imperfect access ASAP is that there is a self-perpetuating orphan drug effect in developer community.
    Goes like this.
    There is no speech-to-text sol’n for Linux.
    So there are very few developers with typing issues (proportionally even fewer people with those issues than in the general population).
    So there are very few people to work on the problem who use speech-to-text.
    So no speech-to-text sol’n for Linux.
    So very few developers with typing issues…
    Etc.
    Q.E.D. We need to do whatever we can to open up development to people with typing issues & every little bit helps.

    Because the more people use Linux who use speech-to-text the more people there are for the Linux speech-to-text project.

    I do salute those trying to get Dragon to work in an emulator, but good intentions don’t magically create knowledge about what is normal for Dragon. For that, there need to be people in the conversation who use Dragon.

    That is why accessibility comes first!

    Even those of us who are not currently doing programming (and I am not– I abandoned that for anthro years ago) can provide a lot of information that you don’t have.

    ——————–
    Technical problems
    ——————-

    The hotkey thing was a bit of shorthand to start with, but in no particular order:

    Hotkey collision: very familiar—get that with Dragon on any Windows box I’ve had Dragon on, from XP to Windows 7.

    What I’m proposing isn’t simple hotkeys—that’s a bit of a shorthand. See KnowBrainer discussion for more detailed explanation of what’s there.

    Even that isn’t a total solution– they are a potentially useful partial solution, partly because if you’re writing code, you are inputting a finite number of strings. Combined with some of the finite-number-of-commands projects under development now, they might help, b/c no giant keyboard is needed. Radio alphabet makes char input very feasible and rapid– anyone who uses speech recognition software can rattle it off at extremely rapid rate (faster than most ppl, even ham enthusiasts can parse auditorially).

    The idea of hotkeys is they may make it a little more accessible for people who can’t type, and in a way that is most useful for writing code.

    It is about using a finite number of auditory inputs (for the finite commands voice activated software) to tell machine to paste a block of text (e.g. if structure). Mouse grid commands are a PITA, but will enable navigation if needed.

    ———
    USB (what’s normal):

    I know USB consistently preferred to mic-in. jack for speech-recognition.
    Do not even bother with built-in mic arrays.

    “appearance of gaps or repeats in your audio… The audio gaps are probably due to some form of context switch overhead in the host machine”.
    Gaps are old hat. Did you try tweaking buffer amount?
    It loses words if buffer gets full. Sometimes even disables keyboard for a second or two.

    Repeats: haven’s noticed them, may be same buffer issue, but another idea occurs to me: Under what circumstances did you observe those?
    Having done literal transcription of real speech (interviews, conversations, etc.), we don’t realize how much repetition there is, in our own speech, or in speech we hear. There’s a lot to this but basically, it has to do with how we parse speech, which is part of how culture allows us to filter out irrelevant stuff so the world will make sense to us. Read a literal transcript of an ethnographic interview or a conversation (not a news transcript—they filter it so it looks normal, and a singer isn’t a good choice either—they don’t do it as much)—it looks really weird.

    —————–
    “ The audio card emulation audio”
    —————–
    See Knowbrainer forums for info on this in Windows.
    They sometimes seem a bit too into selling hardware, but everyone advocates a separate sound card, including my boyfriend, who is the hardware guy in the relationship (toe drop+carpet=intractable static problem). Perhaps there is something useful in the discussion there.
    Don’t know about virtual sound cards, but that is a solvable problem, and KnowBrainer will have info on directions.

    ———-
    “wasn’t consistent in amplitude”
    ———–
    Known issue with Dragon. “sound level is too high” or “sound level is too low” error messages are very common.

    ——————–
    “Virtual box audio has always been horrible and I’ve never been successful using it for speech recognition.”
    —————
    Can’t speak to that, but sounds solvable. The info above tells U where to look for info on direction. Info below (and some above) tell U something about what’s horrible and what isn’t. You need a baseline on the audio quality marks first.

    —————
    “people don’t learn from the past”
    ———————

    I looked at the past and I have good reason to think the project is feasible.

    The known issues above are just some of the ones I saw screwing up attempts to make Dragon work in wine (I really don’t know if wine is the best option– I would diversify). Here’s other stuff not specifically mentioned:

    speech:noise ratios around 20, 15, or even worse, are commonplace testing even established profiles.
    Sometimes it’s a problem, and I know how to tell if I have to reboot before Dragon wants to run. Most of the time, it’s not– it’s just that how Dragon calculates it, it looks really bad. Saw a lot about people trying to fix “low” speech=-to-noise ratios that may, in fact, be perfectly normal.

    Accuracy takes a while, and if you make a profile, that is the profile for that voice and that microphone. Change either and you will get terrible accuracy.

    COM error that is mentioned in wine is a known issue with Dragon. Happens in XP, Vista, Windows 7, Windows 7 64.

    The dictation box thing: not everything works with Dragon in Windows. Dragon has known compatibility issues with Word– looking at what you have to do to get Dragon to work in Word.

    —————————————–

    The speech recognition problem is not solved. One of my machines is runnin Dragon 10.1 on Win 7. Still unstable. Still slow. Still not a solution.

    Nuance getting an unchallenged monopoly is emphatically not a good thing at all. Nothing is going to get improved, because Nuance has no reason to improve it. Before, they may have been ahead of Mac, but they had to stay ahead of MacSpeech. Now they own MacSpeech, so speech recognition isn’t getting any better.

    I am not a programmer. I took a scripting class years ago. The last time I had a Linux box was almost a decade ago. You aren’t going to find a lot of people who can’t type doing programming with our lives.

    If you are working on coming up with speech recognition for Emacs, I can tell you what’s normal. I can dig up my old scripting skills, but that’s not where I’ll be most useful. If you are an attorney and you want to do something about the housing crisis, you don’t go build a house.

    Comment by culturegeek — May 10, 2010 @ 14:02 |Reply

  15. Lost previous post– hunt & pecking, glad it’s cold: appreciate apology.

    Do still disagree, tho. Project was abandoned in error but cd. not be done with the info you had.

    “People problem” my area of expertise, and I am convinced that it is important– that’s why i say must arrange some access first, before all else, because then you’ll have the people. Self-perpetuating orphan drug effect.

    People problem more impactful than you thought. You should believe me because
    A. this is my area of expertise. If I had to fix a Linux box, I’d take your advice on it. You have a people problem. I agree on that.

    B. Almost everything you said is at least partly a known issue with Dragon, and a few have known fixes.

    It is hard enough to set up a program in wine if you know how it works in Windows, but you don’t have anyone who does that. That is your people problem.

    I looked at past first. Past is trying to fix known Dragon issues as compatibility issues.

    ———————–

    Advisability of add’l sound card, known.

    Volume too high/too low error messages, known.

    Missing speech, known.++++++++++++++++++Know fix.

    Apparent repeated speech: complicated, so more later. +May have fix.

    Even the trouble with compatibility || known compatibility issue w/Dragon & Word

    not specifically mentioned here:
    COM error is a known issue too.

    Poor accuracy at first, known.

    Poor accuracy if you switch user or mic, known.

    Speech to noise ratios of 15-20 or even lower: commonplace, even with established profiles in every Win box seen from XP to 7.

    More later.

    Comment by culturegeek — May 10, 2010 @ 17:23 |Reply

  16. “””I appreciate that you are willing to apologize, but I have looked at the past, and that’s how I came to the conclusion that the Dragon in an emulator project was abandoned in error. I base this on what I know about Dragon, and what I know about people.”””

    cool and we may have terminology conflict.

    Emulator: simulates one machine on the other. Usually used in cross architecture environments i.e. arm on x86. see qemu

    virtual machine: an environment which looks like physical hardware but is mediated by software/hardware layer with impenetrable boundary between guest and host environment.

    Wine: libraries enabling Windows program to run in Linux environment. Permeable boundaries. Windows program could call Linux libraries if desired.

    “””The virtual machine thing is a lot more doable than commonly believed.”””

    agreed. The first problem is to get developers to take our needs seriously and fix the USB audio problem. Second is to develop something analogous to what I’ve been proposing so you can dictate into Windows in the virtual machine and have the output sent to applications in Linux. And please please please do not replicate the CF known as a2x, leave it in the graveyard of the past.

    I should explain that the reason I get Saddam about leaving things have failed in the past is because if they are resurrected, perception is “problem solved” when it hasn’t been. we’ve only moved the playground into the middle of the freeway so we can use car lights for playing at night.

    “””In a lot of cases, I saw people throwing up their hands in despair because they just can’t fix a seeming compatibility issue that they don’t know is is a known issue with Dragon. It’s incredibly frustrating to get Dragon up and running in wine if you don’t know what you have to do to get it up and running in Windows, or what routinely happens anyway, or how to address those things. Any project would be incredibly frustrating, even seem hopeless, without such a large body of key information.
    “””

    one of the goals of the wine project, and they have succeeded for many applications, is to make it no more difficult to install than an ordinary swindles application. Dragon is more obnoxious than most comes to installing thanks to nuance copy protection. There are bugs between Dragon and wine which need to be fixed. A lot of the training procedures used on Windows no longer work because of bugs between Dragon and wine. The only thing we can fix is wine. Therefore, all the problems are wine problems (in a rather twisted way)

    “””
    This is the “people problem” you’ve talked about, and it’s also the kind of people problem that is my area of expertise.
    “””
    good, that lets me focus on HCI issues and explaining those issues to people that can write code.

    “””
    My job as an anthropologist is to figure out what people might be missing due to those filters. When I say that a partial solution for speech recognition matters, I am basing it on different knowledge than you have. I wouldn’t tell you how to design a virtual sound card, so let me explain why this matters.

    Reasoning is same conclusion you came to abt. a “people problem,” with people trying to develop software who don’t use it.
    The reason I’m so into getting imperfect access ASAP is that there is a self-perpetuating orphan drug effect in developer community.
    Goes like this.
    There is no speech-to-text sol’n for Linux.
    So there are very few developers with typing issues (proportionally even fewer people with those issues than in the general population).
    So there are very few people to work on the problem who use speech-to-text.
    So no speech-to-text sol’n for Linux.
    So very few developers with typing issues…
    Etc.
    Q.E.D. We need to do whatever we can to open up development to people with typing issues & every little bit helps.
    “””
    with you right up till the last sentence. We are in significant agreement. In many ways, what you propose has already been surpassed thanks to tools like natPython, dragonfly, and others. But simply using them as verbal macro tools has gone as far as it’s going to go. We need to start moving to significantly different HCI models if we really want to make a significant difference and not just reinvent what we’ve had for 10 years.

    “””Because the more people use Linux who use speech-to-text the more people there are for the Linux speech-to-text project
    “””
    do you mean text-to-speech here?

    “””
    That is why accessibility comes first!
    “””

    again agreement on principles

    “””
    What I’m proposing isn’t simple hotkeys—that’s a bit of a shorthand. See KnowBrainer discussion for more detailed explanation of what’s there.
    “””
    I will apologize for this but, I find web forums so painful to use that I will avoid them at all cost. I thought I left that horrible environment behind when bulletin board systems were supplanted by the Internet the 1980s. They really are painful and difficult for me to use. If you would be so kind as to give me a URL, I will take a look.

    To elaborate a little bit, web forums are what I consider an HCI disaster on many levels ranging from physical to cognitive and last, speech recognition hostile.

    “””
    Even that isn’t a total solution– they are a potentially useful partial solution, partly because if you’re writing code, you are inputting a finite number of strings. Combined with some of the finite-number-of-commands projects under development now, they might help, b/c no giant keyboard is needed. Radio alphabet makes char input very feasible and rapid– anyone who uses speech recognition software can rattle it off at extremely rapid rate (faster than most ppl, even ham enthusiasts can parse auditorially).
    “””

    this is where my years of experience both as a developer and HCI experience person takes an entirely different tack. When you are writing code, you’re not generating a finite number of strings. symbol variation and arrangement make programming by voice a very difficult problem. the primary problem with short handing names is that you have to know what scope you are in so you can use the right symbol with the right type signature for the task in a given context. It’s an ugly problem because you must couple into your editing environment.

    As for spelling out symbols with ab characters, that is a sure path to destroying your voice. There’s no way to edit strings by voice, there’s no way to correct misrecognition by voice. If I gave you 1000 lines of code and you had spell every symbol with ab sequences, you would have no voice left and being without hands is bad enough. Being without hands and voice is even worse.

    Doing anything that causes a person to damage or lose their voice is a crime for which there is no punishment severe enough.

    No joke.

    “””The idea of hotkeys is they may make it a little more accessible for people who can’t type, and in a way that is most useful for writing code.

    “””
    if you only have one hot key, can you do anything useful with it? If you had four hotkeys, what can you do about writing code? (I suggest using a comment for this on its own because it’s quite a rich topic)

    “””
    It is about using a finite number of auditory inputs (for the finite commands voice activated software) to tell machine to paste a block of text (e.g. if structure). Mouse grid commands are a PITA, but will enable navigation if needed.
    “””

    if it was possible to do this, don’t you think a group of 15 or 20 very smart software developers could have done this already?

    we’ve tried. We were using this technique in 1998. It only takes you so far and that is not far enough. the best solution so far was voicecoder. It’s a rather rich environment for creating Python and Java code. It’s not good environment for editing code (i.e. 90% of what you do). It screws up your voice models so you can’t dictate comments effectively. But other than that, it’s pretty good. I’m very impressed with the user interface David Fox created.

    “””
    “appearance of gaps or repeats in your audio… The audio gaps are probably due to some form of context switch overhead in the host machine”.
    Gaps are old hat. Did you try tweaking buffer amount?
    It loses words if buffer gets full. Sometimes even disables keyboard for a second or two.
    “””

    interesting, Google showed nothing as well as my queries to a NaturallySpeaking reseller. Any references for Windows 7?

    The gaps I was speaking about happen not only with a host but more commonly with a guest virtual machine. Maybe some buffer tweaking would help but I haven’t seen anything like that for Windows seven.

    “””
    Repeats: haven’s noticed them, may be same buffer issue, but another idea occurs to me: Under what circumstances did you observe those?
    Having done literal transcription of real speech (interviews, conversations, etc.), we don’t realize how much repetition there is, in our own speech, or in speech we hear. There’s a lot to this but basically, it has to do with how we parse speech, which is part of how culture allows us to filter out irrelevant stuff so the world will make sense to us. Read a literal transcript of an ethnographic interview or a conversation (not a news transcript—they filter it so it looks normal, and a singer isn’t a good choice either—they don’t do it as much)—it looks really weird.
    “””

    it’s quite interesting. The repeat dose talking about tend to be more character repeats, character doubles.

    “””
    —————–
    “ The audio card emulation audio”
    —————–
    See Knowbrainer forums for info on this in Windows.
    They sometimes seem a bit too into selling hardware, but everyone advocates a separate sound card, including my boyfriend, who is the hardware guy in the relationship (toe drop+carpet=intractable static problem). Perhaps there is something useful in the discussion there.
    Don’t know about virtual sound cards, but that is a solvable problem, and KnowBrainer will have info on directions.
    “””

    again, I don’t do forums so a URL would be appreciated. Also again I was talking about USB sound in a guest machine context. The problem is solvable. If you have the money, I’m sure VM Ware and Oracle will be glad to solve the problem. I’ve filed and we filed the bug over the past year and, it keeps falling off the list. Funny thing about that.
    ———-
    “wasn’t consistent in amplitude”

    “””
    Known issue with Dragon. “sound level is too high” or “sound level is too low” error messages are very common.
    “””

    in the virtual machine context. The audio varies from the virtual machine USB sound device. It’s not a hardware problem, it’s not a Windows driver problem. It’s a virtual machine problem. See time and money.

    “””
    Can’t speak to that, but sounds solvable. The info above tells U where to look for info on direction. Info below (and some above) tell U something about what’s horrible and what isn’t. You need a baseline on the audio quality marks first.
    “””

    filed bugs, included sound files of audio distortion, tracked bugs until they fall off the list and then get them back on, and have come to the conclusion that only time and money will fix this.

    “””
    I looked at the past and I have good reason to think the project is feasible.

    The known issues above are just some of the ones I saw screwing up attempts to make Dragon work in wine (I really don’t know if wine is the best option– I would diversify). Here’s other stuff not specifically mentioned:

    “””

    having lived the past, I obviously have a different opinion. 🙂

    I would suggest going at this entirely different way.

    Put Linux by the side. I’m not saying give up, I’m saying and move it out of your life until you solve other problems first.

    make a version of Windows 7 (legitimate copy, purchase from a real retailer) work on your desktop. Make NaturallySpeaking 10.1 work in a stable environment. It’s not hard to do, I’ve done it and I can keep my machine up for days at a time. Yes, I have some problems a recognition but it’s more in the line of improper verb tense recognition. It just runs and runs best in dictation box or Dragon Pad.

    don’t use Microsoft Word. It’s notorious for reducing stability and accuracy. OpenOffice, or online buzzword (Acrobat.com) are the two best editors for use with speech recognition.

    I’m beginning to wonder if any Microsoft application has similar negative effects on a speech recognition environment.

    Learn dragonfly or unimacro. Build your tools using one of these two packages. Believe me, this is the minimum energy path getting speech recognition driven tools working. I would show you the scar tissue if I could.

    once you have your Windows environment stable, then start looking at how to translate the output from your speech recognition environment into something that can be used remotely. Remember now, no a2x clones because you will setback accessibility by at least 10 to 15 years. It is a geek solution as appropriate as “speaking the keyboard”.

    Comment by esj — May 13, 2010 @ 17:17 |Reply

  17. the best usb microphone is made by Sennheiser and also Creative makes great usb microphones too**~

    Comment by Amber Phillips — October 6, 2010 @ 13:03 |Reply

  18. A4tech ang Genius both makes cheap but great performing usb microphones~::

    Comment by Round Mirror  — October 20, 2010 @ 17:16 |Reply


RSS feed for comments on this post. TrackBack URI

Leave a reply to culturegeek Cancel reply

Create a free website or blog at WordPress.com.