Category Archives

10 Articles

Lining up to detect text reuse

People say that being copied is one of the greatest honours, but in the world of journalism it is hard to spot. Researchers say they can now identify which articles are really just rewritten press releases and wire reports. Using a combination of genetic and linguistic analysis, computer-based comparisons will identify plagiarists, and help press agencies track how much their services are used.

“In an age where text is passed around freely, it is important to track originality,” states Dr Rob Gaizauskas from the University of Sheffield. “Examiners and teachers like to know if their students have lifted essays from textbooks; disappointed authors would welcome the chance to prove that the scripts of films are actually taken from the ones they had sent in years before.”

In the context of press journalism, agencies like the Press Association (PA) supply thousands of stories a week to publishers as sources for stories. “Press agencies have a very strong commercial interest in knowing whether or not any given story X that appears in a newspaper is derived directly from a story Y that they put out,” Dr Gaizauskas explains. “If an agency could know this they would be able to plan and charge for their services in a more rational way. They could also focus their journalists’ efforts to cover what the papers are likely to use.”

Leading a team from the Department of Computer Science, Dr Gaizauskas has employed a variety of techniques to compare newspaper articles with PA wire stories. In one approach, the researchers take ’tiles’ – sequences of words cut from one text – and overlay them onto matching sequences in the comparison text. The amount of unmatched text and the average size of the tiles give clues as to the likelihood of derivation.

Another technique involves ‘lining up’ the two texts using a method originally intended to align words in translated passages with those in the source language copy. The number of aligned sentences and the degree of overlap between aligned sentences again indicates potential reuse.

Analysing newspaper stories, Dr Gaizauskas distinguishes which stories are wholly derived, partially derived or not at all derived from PA material. Best results to date show wholly or partially derived stories can be distinguished from non-derived texts with greater than 90% accuracy; the three way classification is more than 70% accurate.

“To improve these results we will require more sophisticated modelling of journalists’ rewriting techniques,” says Dr Gaizauskas. “For example, we need to take into consideration the use of newspaper house style and vocabulary, and allow more for distractors such as quoted speech, which may occur identically in independently written texts.

“Our work will have commercial significance for our collaborator, the PA, and help it to monitor the penetration of its stories in the press. In the long term, however, this could be developed by publishers to spot plagiarism. With a bit more understanding, the same techniques may even help newspapers write their stories. It could show how original materials could be automatically rewritten to fit house styles and other criteria.” Perhaps journalists will leave the rewrites to computers while they go out and find a scoop.

Computer experts open the way to using the web for confidential patient records

Computer experts at Salford University have successfully demonstrated that existing security software can be integrated into systems to enable the Internet to be used to transfer highly confidential patient records between hospitals and GPs’ surgeries in a user-friendly way. The system allows GPs to use standard web browsers to access the data while maintaining the degree of security required for such sensitive information. The researchers are also developing new ways to use such information to help GPs to give their patients a better understanding of their medical condition.

‘Hospitals maintain databases of information on patients with chronic conditions such as diabetes or heart disease, but GPs generally do not have very ready access to this data,’ says Dr Andrew Young, one of the researchers on the project, which is being funded by the Engineering and Physical Sciences Research Council.

Periodically hospitals may print out these records and send them to the family doctor, but GPs are keen to have better access. Furthermore, if GPs want to add something to the records they must send the information by letter to the hospital where it gets typed in, which is not ideal.

While it would be possible to create a direct computer link between the GP and the hospital this is expensive, and for such a system to be universally applicable the same equipment of the same specification would be needed in every hospital.

The advantage of using the Internet is that the infrastructure already exists. However, the big challenge is to make certain that confidential information transferred over the internet remains secure.

The Salford team has successfully launched a pilot scheme with the Hope hospital in Eccles and ten local GPs. A web server has been set up in the hospital, linked to the hospital’s database of patients’ records. A ‘firewall’ has been installed – a dedicated computer that stands between the hospital’s network and the internet. Software in this computer vets everything coming in and out of the system to ensure that any information requested is coming from an authorised source.

‘The software installed across the system gives strong authentication and encryption,’ says Dr Young. ‘The data that flows between the surgery and the hospital is encrypted, so that the fact that the Internet is an untrusted network will not compromise the security of the data. We have had to provide mechanisms to allow the hospital to be 100 per cent certain that requests for information are from valid GPs and that the GPs themselves have access only to the records of patients under their care.’

The researchers are also keen to enable doctors to use the information contained in patients’ records to help patients themselves better understand their condition.

‘One of the things the project wanted to study was the ‘human factors’ of making medical data available to patients – how to give them the information they need in a form they can understand,’ says Dr Young. ‘Initially, we wanted patients to access the information themselves but it seemed few patients wanted to do that. So we have ended up with a three-way consultation with a GP, patient and computer. The GP uses the computer but the user-interface is aimed at the patient. What we want to find out is that if doctors give health-related information to a patient based on the patient’s own medical condition, then this will have more impact on the patient than simple general advice; and will this in turn help the patient make worthwhile changes to their lifestyle?’.

A lot of work has been done by the project team to design a variety of user interfaces and test the way that patients respond to them. ‘Graphs are fine for t

Scottish universities give software longer and healthier lives

Sustained funding by the Engineering and Physical Sciences Research Council has assisted researchers at St Andrews and Glasgow Universities to create a new way of building and enhancing computer programs. This can improve efficiency and quality throughout the software industry, including the latest developments on the Internet.

Teams led by Professor Malcolm Atkinson of Glasgow’s Computing Science Department and Professor Ronald Morrison at St Andrews’ School of Mathematical and Computational Sciences have been collaborating for over 15 years to improve the design, construction, maintenance and operation of what they call Persistent Application Systems (PASs). These use software that must evolve over long periods to sustain the applications they support.

‘Most major businesses and public services now depend on their computer software. It is therefore vital that these systems are able to adapt to changing requirements, while continuing to operate reliably and efficiently,’ says Professor Atkinson.

Professor Morrison adds: ‘Developing a major software system is intrinsically complex because it has to control a multitude of tasks in minute detail. This has been made even more difficult by fundamental inconsistencies in the methods used for different aspects, such as programming and database management.’

The Glasgow and St Andrews teams have engineered techniques for a streamlined PAS development environment that overcomes many of these inconsistencies. A key feature of this is known as ‘orthogonality’, a set of design principles which allow programs to handle any type of data, irrespective of how long it has to persist.

An important new long-term EPSRC research project involves the Glasgow team and Sun Microsystems Laboratories in the USA working together to bring orthogonal persistence to Java, Sun’s popular Internet programming system. As part of this project, Cambridge-based Laser-Scan Ltd will examine the value of applying these persistent Java capabilities to its geographical information systems software

EPSRC funding is also enabling Professor Morrison’s group to investigate the use of orthogonal persistence on the World Wide Web. This builds on Napier 88, their pioneering PAS environment, and is exploring the novel concept of ‘hyper- programming’, which facilitates linking new programs to existing data.

The first commercial product based on persistent programming, IT vendor ICL’s ProcessWise Integrator process manager, employs a re-engineered version of the universities’ PS-algol environment, one of their earliest PAS innovations. Professors Atkinson and Morrision have also played prominent roles in projects sponsored by the EU that have greatly assisted their research into orthogonal persistence.

Detector makes sweeping improvements

When you take the bus to work, or drive to the shops, you expect the road to be clean, not cluttered with litter or scattered with stones. Street sweeping is something that’s easy to take for granted, but there’s more to the job that a quick flick of a broom. Efficient cleaning relies on vehicle drivers selecting and controlling brushes according to the debris to be cleared.

While operators concentrate on following the kerb safely they have little time for fine tuning the sweeping process. So researchers from the University of Surrey have decided to give street sweeping a scientific fillip. By identifying the rubbish ahead, a computer can automatically choose the best brush or stroke for maximum efficiency.

Graham Parker and his team from the School of Mechanical and Materials Engineering have closely investigated the sweeping action of rotary brushes found on sweeping vehicles. He categorises two forms of brush. ‘Cutters’ are stiff when forced down onto the surface. They are best used for compacted material like sand on the road. The tines on ‘flicking’ brushes, by contrast, are better designed to throw loose debris into the path of the vacuum hose underneath the sweeper truck.

“Although the operation of the road sweeping vehicle is straightforward,” says Professor Parker, “the choice of which brush to fit and its angle to the ground clearly affects the overall brushing performance when the debris types vary. A sensor system that could determine the most appropriate brush would make a better and more efficient machine.”

The team’s debris detector comprises a digital camera and a laser. Image processing of the camera image allows computers to pinpoint debris and calculate its size and shape.

“Image processing usually requires a lot of computing time, but in a sweeper vehicle you need real-time calculations,” explains Professor Parker. “By using the laser we can build up a 3-D profile of the road without intense processing, just identifying bright pixels in a relatively dark background. When, and only when more than this information is needed to analyse a scene more computer intensive processing techniques are used.”

Tests so far show than this laser striping system can locate and calculate the shape of most types of litter, from large objects that would damage the machine to small items that like bolts and nuts. For debris, such as sand, gravel and leaves that spread over a wide area, extra image processing is important: sand and leaves require quite different brushing styles.

Once the boundaries of the debris have been identified by laser striping, the computer then compared the pixel intensity over the surface. The varying intensity provides an indication of surface roughness, and therefore a guide to the type of debris. “We found that our smooth lab floor was easily distinguished from wood chip, for instance. This method should be able to distinguish gravel from leaves from sand.”

“We now hope to integrate our understanding of brushing with our sensor and predictive analysis,” concludes Professor Parker. “Hopefully we can improve street cleaning and make life easier for the vehicle operators.”

Wireless advances boost mobile mulitimedia communications

Wireless advances boost mobile mulitimedia communications

A wide range of innovations being made at Southampton University’s Department of Electronics and Computer Science is enabling wireless telephone networks to be extended to carry high-quality video, speech, handwriting and graphics for advanced multimedia applications.

The work, by Southampton’s Mobile Multimedia Communications Team (MMCT) in Professor Raymond Steele’s Communications Group efficiently exploits available wireless ‘bandwidths’ to deliver robust, high-quality multimedia communications to anyone, at any location. The bandwidth of a communications link determines the amount of digital ‘bits’ of information it can carry. Speech conversations of acceptable quality can be encoded at rates as low as 5600 bits per second (bps), but moving-image video typically needs significantly higher transmission speeds.

How the Digital Assistant may look Multimedia communications can be handled reliably and efficiently using optical fibres, cables and other high-speed ‘broadband’ networks, which are relatively free from external interference. Users of wireless communications, however, move through a variety of environments, such as a tunnel, which are hostile to the propagation of radio waves. Maintaining consistent mobile reception quality therefore requires complex signal processing.

The MMCT has developed a range of bandwidth-efficient transmission methods for delivering multimedia services, including new encoder/decoder (codec) techniques. These translate video, speech and other signals into ‘compressed’ digital codes that need fewer bps during transmission, with the signal retranslated into the original format by the decoder. Codecs are implemented as micro-chips in users’ devices.

The MMCT’s innovations, many of which were made on EPSRC-funded projects, are targeted towards optimising the overall performance of wireless networks by creating systems that automatically adapt and reconfigure themselves to maintain optimum performance levels, for instance by programmable codecs. This ensures the ever-changing demands of mobile users are met in the most efficient way.

Such overall system optimisation is the team’s prime contribution to making feasible the ubiquitous use of ‘multimode’ devices which combine the capabilities of a mobile phone, hand-held ‘palmtop’ computer, radio receiver, videophone and electronic handwriting tablet. Much industrial interest has been shown in this work.

The MMCT has also been collaborating with academic/industrial consortia on the EU-funded FIRST programme developing a reconfigurable multimedia terminal and MEDIAN project for creating high-speed local wireless networks within buildings.

Limbering up for quality animation

The Titanic rears high above the water. Hanging from the balustrade, terrified passengers look down into the icy waters as their fingers slip. Fortunately for the actors, this is where the computer graphics kick in, simulating the life like movements of limbs as bodies begin to drop into the sea.

The standard methods of computer modelling require human subjects to wear reflective patches. Cameras film the person and image analysis software tracks the movement of the patches. The computer builds up a picture of how the person’s joints move. Originally the application was applied to clinical diagnosis, but the entertainment industry regularly uses the same technology. It can simulate people moving realistically in crowd scenes, stunt shots – or falling from sinking ships.

Fission yeast makes chips

Fission yeast makes chips

Since the late 1980s scientists have known that some yeast species can produce a semiconductor material used in advanced lasers and microchips. When cultured on cadmium salts Schizosaccharomyces pombe (S. pombe) produces cadmium sulphide in the form of peptide-coated crystals. Researchers have now developed an easy way to extract these cell deposits with a high degree of purity.

Tiny crystals of cadmium sulphide, about 1-2 nanometres in diameter, have highly specific electronic properties. When grown in a culture of cadmium salts, the yeast S. pombe consistently makes stable cadmium sulphide crystals of just the right size – 1.8nm. But scientists have struggled to extract the material from the cells. “The problem is that the yeast also secretes some cadmium sulphide into the culture medium,” explains Paul Williams of

Getting fashions to fit by computer

Getting fashions to fit by computer

Every dedicated shopaholic knows the problem. Once you’ve tracked down that elusive garment you actually suits you, it’s not available in your size. But help may be at hand from an unexpected quarter – the world of mechanical engineering.

“Designing clothes patterns is still a craft industry,” says Dr Jim McCartney, an engineer in the School of Mechanical and Manufacturing Engineering at Queens’ University of Belfast. “We want to automate the process, basing it on real body data.”

Most clothes begin life as a designer’s sketch. This is converted into a flat pattern used to cut the cloth and assemble the garment. Unfortunately, the final product is not always what the designer intended, especially in terms of fit. It’s this process, from sketch to pattern to finished article, which McCartney wants to improve.

McCartney has designed a prototype computer system that produces patterns from 3D images of new designs. “We produced a 3D computer model of a kind of mannequin used by clothes designers. Then we designed bodices using computer tools. We chose bodices because most of the potential is in the female market.”

Chaos to calm mobile madness

Chaos to calm mobile madness

Engineers at Staffordshire University are to throw the phone system into chaos – for the sake of getting a better service. Researchers from the School of Engineering and Advanced Technology are to apply chaos theory – the branch of mathematics used to explain chaotic systems – in an attempt to unravel the cordless and confused world of mobile phones. The group is investigating ways to help telephone networks to cope with increasing numbers of calls without expanding the communications infrastructure.

Chaos theory is a branch of mathematics that attempts to explain why seemingly simple systems, such as weather and economics are still so unpredictable even though we have a lot of information and they seem to follow straightforward rules. Experts in chaos try to find patterns and laws where non-experts see only – chaos.

A team of researchers, led by Professor Rolando Carrasco, will harness the complex theory to improve the performance of existing mobile telecommunications systems rather than expanding the networks.

“The increased use of mobile telephones, ISDN lines, satellite communications, cable networks and other digital communications systems is starting to put an immense strain on existing networks,” says Professor Carrasco. “It is not that these networks are laid on another, but they speak different digital ‘languages’. The

Spotting your spending patterns

Spotting your spending patterns

Like it or not, the people who look after our money know a lot about us. Banks and credit companies accumulate vast quantities of data that they hope to use for both monitoring and marketing purposes. Researchers in the Department of Mathematics at Imperial College, London, have developed a method that could help financial services companies trawl through their data more effectively to identify potential ‘problem customers’.

“Data mining is a technology for examining large databases in the hope of answering specific questions or of revealing unknown or ill defined patterns,” says Professor David Hand. “These two aspects of data mining are used for very different purposes. In the first case you might want to determine which customers would respond well to a marketing campaign based on a large number of responses to other campaigns. In contrast, pattern detection will reveal customers that are behaving in an anomalous fashion.”

Professor Hand and his colleagues have now developed a new pattern detection algorithm that helps to identify customers who may mismanage their credit accounts in the future – even though their existing credit record is impeccable.

“Our algorithm identifies groups of accounts which are unexpectedly being used in a similar way,” explains Professor Hand. “