Address Collection with Spring Batch | Distributed Java by Khim Ung

Automated Business Address Processing with Spring Batch

This solution uses Spring Batch to orchestrate the enrichment and transformation of business addresses into downloadable map assets. Below are the building blocks and code examples that make this system scalable and maintainable.

1. Job Configuration

Purpose: Defines the orchestration and flow of steps.

return jobBuilderFactory.get("addressJob")
  .start(partitionedStep())
  .next(finalizeStep())
  .build();

2. Smart Partitioning for Scalable Processing

RangePartitioner dynamically divides work based on the job quota and last processed ID, ensuring even distribution:

int chunkSize = quota / gridSize + (quota % gridSize > 0 ? 1 : 0);
for (int i = 0; i < gridSize; i++) {
  int fromId = lastProcessedKey + (i * chunkSize) + 1;
  int toId = Math.min(lastProcessedKey + ((i + 1) * chunkSize), lastProcessedKey + quota);
  context.put("fromId", fromId);
  context.put("toId", toId);
}

Enables parallelized steps to fetch and process business entries in defined ID ranges.

3. ItemReader

Purpose: Reads business records by ID range.

JdbcPagingItemReader<StaticMapStore> reader = new JdbcPagingItemReader<>();
reader.setQueryProvider(...);

4. Geolocation Transformation with Static Maps

In StaticMapItemProcessor, a latitude and longitude pair is translated into a ready-to-download image request:

GoogleInfo meta = new GoogleMetadataBuilder()
  .setAccessKey(accessKey)
  .setGoogleServiceEndpoint(serviceEndpoint)
  .setPlaceholder(Map.of("{longitude}", store.getLng(), "{latitude}", store.getLat()))
  .execute();

ImageInfo imageInfo = new ImageInfo();
imageInfo.setImageName(prefix + store.getStoreId() + suffix);

item.addMetadata("GoogleInfo", meta);
item.addMetadata("ImageInfo", imageInfo);

This prepares the necessary metadata to later fetch and label the map image for each store.

5. Task-Based Map Download and Upload

StaticMapWriter delegates work to reusable task components, decoupling infrastructure concerns:

googleServiceFactory.getTask("GoogleDownloadTask")
  .execute(googleInfo, imageInfo);

googleServiceFactory.getTask("S3UploadTask")
  .execute(s3Info, imageInfo);

Each task encapsulates a clear responsibility: downloading from Google, uploading to S3 — making the workflow modular and testable.

6. Job Parameters

Purpose: Makes job reusable with dynamic input.

--job.name=addressJob lastProcessedKey=500 quota=1000

7. Skip/Retry Handling

Purpose: Ensures recoverability from partial failures.

<chunk skip-limit="50">
  <skippable-exception-classes>
    <include class="SkippableException"/>
  </skippable-exception-classes>
</chunk>

8. Rundeck Integration

Purpose: Automates and schedules job execution.

java -jar batch-job.jar --job.name=addressJob quota=500

9. Job Listener

Purpose: Tracks success, failure, and job metrics.

public void afterStep(StepExecution step) {
  log.info("Success: " + jobState.getSuccess());
}

View Full Source on GitHub

Code Index

Explore key components of the github.com/khimu/api Sample Code Snippet

Batch Jobs

ScrapperJob.java

// Job definition and step configuration
@Bean
public Job scrapperJob() {
    return jobBuilderFactory.get("scrapperJob")
        .start(scrapperStep())
        .build();
}

MainJob.java

// Main job orchestrating multiple steps
@Bean
public Job mainJob() {
    return jobBuilderFactory.get("mainJob")
        .start(stepOne())
        .next(stepTwo())
        .build();
}

Task Components

S3UploadTask.java

// Uploading file to S3
public void uploadToS3(String filePath) {
    s3Client.putObject(bucketName, keyName, new File(filePath));
}

GoogleGeocodeTask.java

// Making a request to Google Geocode API
String url = "https://maps.googleapis.com/maps/api/geocode/json?address=" + address + "&key=" + apiKey;
HttpResponse response = httpClient.execute(new HttpGet(url));

Readers and Writers

StoreKeywordItemReader.java

// Reading store keywords from database
public StoreKeyword read() {
    return jdbcTemplate.queryForObject("SELECT * FROM store_keywords WHERE ...", new StoreKeywordRowMapper());
}

StaticMapWriter.java

// Writing static map images to S3
public void write(List items) {
    for (StaticMap map : items) {
        s3UploadTask.upload(map.getImagePath());
    }
}

Partitioners

RangePartitioner.java

// Partitioning data range for parallel processing
public Map partition(int gridSize) {
    Map result = new HashMap<>();
    int range = endId - startId + 1;
    int chunkSize = range / gridSize;
    for (int i = 0; i < gridSize; i++) {
        ExecutionContext context = new ExecutionContext();
        context.putInt("startId", startId + (i * chunkSize));
        context.putInt("endId", (i == gridSize - 1) ? endId : startId + ((i + 1) * chunkSize) - 1);
        result.put("partition" + i, context);
    }
    return result;
}

Readers

StoreKeywordItemReader.java

// Reading store keywords from database
public StoreKeyword read() {
    return jdbcTemplate.queryForObject("SELECT * FROM store_keywords WHERE ...", new StoreKeywordRowMapper());
}

Writers

CustomKeywordItemWriter.java

// Writing custom keywords to database
public void write(List items) {
    for (CustomKeyword keyword : items) {
        jdbcTemplate.update("INSERT INTO custom_keywords ...", keyword.getValue());
    }
}

Processors

StaticMapItemProcessor.java

// Processing store data into static map requests
public StaticMap process(Store store) {
    StaticMap map = new StaticMap();
    map.setCoordinates(store.getLatitude(), store.getLongitude());
    return map;
}

Tasks

S3UploadTask.java

// Uploading file to S3
public void uploadToS3(String filePath) {
    s3Client.putObject(bucketName, keyName, new File(filePath));
}

Google API Endpoints Used

These APIs enable the transformation and visualization of location data within the tool.

📍

Geocode API

https://maps.googleapis.com/maps/api/geocode/json?address={address}

Converts an address into geographic coordinates.

🔍

Places API

https://maps.googleapis.com/maps/api/place/nearbysearch/json?location={latitude},{longitude}&radius={radius}

Finds nearby businesses based on location and radius.

🗺️

Static Map API

http://maps.googleapis.com/maps/api/staticmap?center={latitude},{longitude}&zoom=15&size=400x125

Generates map images for preview and storage.

Automated Address Collection

Powered by Spring Batch, Rundeck, and Google Maps API

Project Overview

Technologies Used

Project Sources

Purpose